CN110797033A - Artificial intelligence-based voice recognition method and related equipment thereof - Google Patents

Artificial intelligence-based voice recognition method and related equipment thereof Download PDF

Info

Publication number
CN110797033A
CN110797033A CN201910884202.XA CN201910884202A CN110797033A CN 110797033 A CN110797033 A CN 110797033A CN 201910884202 A CN201910884202 A CN 201910884202A CN 110797033 A CN110797033 A CN 110797033A
Authority
CN
China
Prior art keywords
signal
bird
energy value
artificial intelligence
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910884202.XA
Other languages
Chinese (zh)
Inventor
彭俊清
尚迪雅
王健宗
瞿晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910884202.XA priority Critical patent/CN110797033A/en
Publication of CN110797033A publication Critical patent/CN110797033A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/16Hidden Markov models [HMM]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

The invention relates to the technical field of artificial intelligence, and provides a voice recognition method based on artificial intelligence and related equipment thereof, wherein the voice recognition method based on artificial intelligence comprises the following steps: acquiring a bird sound signal to be detected from a preset acquisition library; carrying out drying treatment on the bird sound signal to obtain a target signal subjected to drying treatment; extracting characteristic parameters from the target signal to obtain characteristic parameters; and inputting the characteristic parameters into a pre-trained target hidden Markov model for recognition to obtain a bird recognition result. Therefore, automatic identification of the bird sound signals is realized, manual intervention is avoided, and the accuracy and the identification efficiency of bird sound signal identification are further improved.

Description

Artificial intelligence-based voice recognition method and related equipment thereof
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a voice recognition method based on artificial intelligence and related equipment thereof.
Background
The traditional bird identification method comprises a video monitoring method and an artificial identification method, but the video monitoring cost is high, and the monitoring target range is limited; and manual identification is difficult to implement and has low efficiency. Birds in ecological environment have wide sound and contain rich information. The sounds can not only reflect the life activities of all birds in the ecological environment, but also reflect the life habits of the birds changing along with seasonal changes, and can analyze the living environment of the birds. Therefore, the sound is one of the important biological characteristics of the birds and is also an important basis for identifying the birds. Therefore, the bird species can be identified through voice, and the survival condition of the birds can be detected. However, the bird sounds are usually obtained by manual intervention and recording, and the recording time is quite long, so that the identification efficiency of bird species is often low.
Disclosure of Invention
The embodiment of the invention provides a voice recognition method based on artificial intelligence and related equipment thereof, which are used for solving the problems of low recognition accuracy and low recognition efficiency caused by the fact that manual intervention is needed to recognize bird voice signals.
A voice recognition method based on artificial intelligence comprises the following steps:
acquiring a bird sound signal to be detected from a preset acquisition library;
carrying out drying treatment on the bird sound signal to obtain a target signal subjected to drying treatment;
extracting characteristic parameters from the target signal to obtain characteristic parameters;
and inputting the characteristic parameters into a pre-trained target hidden Markov model for recognition to obtain a bird recognition result, wherein the target hidden Markov model is used for recognizing the bird recognition result corresponding to the characteristic parameters.
An artificial intelligence based voice recognition apparatus comprising:
the first acquisition module is used for acquiring a bird sound signal to be detected from a preset acquisition library;
the dryness removing processing module is used for removing dryness of the bird sound signals to obtain target signals after the dryness removing processing;
the characteristic parameter extraction module is used for extracting characteristic parameters from the target signal to obtain characteristic parameters;
and the identification module is used for inputting the characteristic parameters into a pre-trained target hidden Markov model for identification to obtain a bird identification result, wherein the target hidden Markov model is used for identifying the bird identification result corresponding to the characteristic parameters.
A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the artificial intelligence based voice recognition method described above when executing said computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the artificial intelligence based sound recognition method as described above.
According to the artificial intelligence-based voice recognition method and the relevant equipment, the target signal is obtained by drying the bird voice signal to be detected, the characteristic parameter extraction processing is carried out on the target signal to obtain the characteristic parameter, and finally the characteristic parameter is led into a pre-trained target hidden Markov model for recognition to obtain the corresponding bird recognition result. Thereby realize automatic discernment to birds sound signal, avoid artificial intervention, can effectively reduce the cost of labor, improve birds sound signal identification's accuracy and recognition efficiency, and then improve the work efficiency of the user who engages in birds research post.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flow chart of a method for artificial intelligence based voice recognition according to an embodiment of the present invention;
FIG. 2 is a flowchart of step S2 in the artificial intelligence based voice recognition method provided by the embodiment of the invention;
FIG. 3 is a flowchart of step S21 in the artificial intelligence based voice recognition method provided by the embodiment of the invention;
FIG. 4 is a flowchart of step S211 in the artificial intelligence based voice recognition method according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating step S3 in the artificial intelligence based voice recognition method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating step S32 in the artificial intelligence based voice recognition method according to an embodiment of the present invention;
fig. 7 is a flowchart of training a hidden markov model to obtain a target hidden markov model in the artificial intelligence based voice recognition method according to the embodiment of the present invention;
FIG. 8 is a schematic diagram of an artificial intelligence based voice recognition apparatus according to an embodiment of the present invention;
fig. 9 is a block diagram of a basic mechanism of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The voice recognition method based on artificial intelligence is applied to the server side, and the server side can be specifically realized by an independent server or a server cluster formed by a plurality of servers. In one embodiment, as shown in fig. 1, there is provided an artificial intelligence based voice recognition method, comprising the steps of:
s1: and acquiring a bird sound signal to be detected from a preset acquisition library.
In the embodiment of the invention, the bird sound signals are extracted by detecting the preset acquisition libraries when the bird sound signals are detected to exist in the preset acquisition libraries. The preset acquisition library is a database specially used for storing bird sound signals.
If the bird sound signal is extracted from the preset collection library, the bird sound signal is deleted from the preset collection library.
S2: and (4) carrying out drying treatment on the bird sound signal to obtain a target signal subjected to drying treatment.
Specifically, a bird sound signal is guided into a preset dryness removal port for drying removal, and a target signal after drying removal is obtained. The preset drying port is a processing port which is specially used for drying bird sound signals.
S3: and extracting the characteristic parameters of the target signal to obtain the characteristic parameters.
In the embodiment of the invention, the target signal is directly led into the preset characteristic parameter extraction port to be subjected to characteristic parameter extraction processing, so that the characteristic parameter corresponding to the target signal subjected to the characteristic parameter extraction processing is obtained.
The preset feature parameter extraction port is a processing port specially used for extracting feature parameters of a target signal.
S4: and inputting the characteristic parameters into a pre-trained target hidden Markov model for recognition to obtain a bird recognition result, wherein the target hidden Markov model is used for recognizing the bird recognition result corresponding to the characteristic parameters.
In the embodiment of the present invention, a Hidden Markov Model (HMM) is developed based on a Markov chain, which is a mathematical Model commonly used in the sound signal recognition process. Since the actual problem is more complex than the problem described by the markov chain model, observed events are not associated with states one-to-one, but are linked by a set of observed probability distributions, such a model is called HMM. It is a double stochastic process, one of which is a markov chain, which is a basic stochastic process that describes transitions between states. Another random process describes a statistical correspondence between states and observed variables. Standing at the viewer's perspective, only observations can be seen, unlike the one-to-one correspondence of observations and states in markov chain models. Therefore, the state cannot be directly seen, but the existence and the characteristics of the state are perceived through a random process, so that the state is called a Hidden Markov Model (HMM).
The HMM process is a double stochastic process, one describing the transitions of states and the other describing the statistical relationships between states and observations. From the viewer's perspective, only the observed value is seen, and the change in state is implicit. The process is consistent with the voice characteristics of human beings, and the voice signals of birds can be better described.
The target HMM is used for training the HMM according to the actual requirements of the user, the trained model is the target HMM, and the target HMM is used for identifying the bird identification result corresponding to the characteristic parameters.
Specifically, the characteristic parameters are input into a pre-trained target hidden Markov model for recognition, the target hidden Markov model directly judges the bird species corresponding to the characteristic parameters according to the input characteristic parameters, and the bird species is output as a bird recognition result.
In the embodiment, a target signal is obtained by drying a bird sound signal to be detected, characteristic parameters are obtained by extracting the characteristic parameters from the target signal, and finally the characteristic parameters are led into a pre-trained target hidden Markov model for recognition to obtain a corresponding bird recognition result. Thereby realize automatic discernment to birds sound signal, avoid artificial intervention, can effectively reduce the cost of labor, improve birds sound signal identification's accuracy and recognition efficiency, and then improve the work efficiency of the user who engages in birds research post.
In one embodiment, as shown in fig. 2, the step S2 of performing a de-drying process on the bird sound signal to obtain a de-dried target signal includes the following steps:
s21: and according to a preset feature extraction rule, carrying out feature extraction on the bird sound signal to obtain tone color features.
Specifically, according to a preset feature extraction rule, feature extraction is carried out on the bird sound signals, and tone color features after feature extraction are obtained. The preset feature extraction rule refers to a rule for setting feature extraction for bird sound signals by a user.
S22: and denoising the tone characteristics by using wavelet transformation to obtain a processed target signal.
In the embodiment of the invention, the tone color characteristics are composed of a pure sound signal and a noise signal, and the characteristics of wavelet transformation of the pure sound signal and the noise signal under different scales are opposite. Therefore, when wavelet transform is performed on the tone color feature, the modulus of the wavelet transform coefficient of the clean sound signal increases with the increase of the wavelet scale, and the modulus of the wavelet transform coefficient of the noise signal decreases with the increase of the wavelet scale. After wavelet transform is performed for several times, the wavelet transform coefficient corresponding to noise is basically removed or the amplitude is very small, and the rest coefficients are mainly controlled by sound signals. That is, as the number of wavelet transform layers increases, the signal-to-noise ratio of each layer signal will also increase.
The timbre characteristics are expressed in the form of superposition of pure sound and random noise, and the specific calculation process is shown as formula (1):
(n) ═ s (n) + σ e (n) formula (1)
Where f (N) is the timbre characteristic, s (N) is the clean sound signal, σ e (N) is the noise signal, where σ is the noise intensity, e (N) is the normal distribution with mean zero and variance 1, i.e., e (N) -N (0.1), and the noise is uncorrelated with the clean sound.
Specifically, noise signals in the tone features can be removed quickly through a wavelet transform mode, and the obtained target signals are favorable for ensuring subsequent accurate extraction of feature parameters. The purpose of the wavelet transform is to estimate the clean sound signal s (n) from the noisy sound signal f (n). Setting thresholds on wavelet coefficients in a layered mode, and reconstructing f (n) by using only those significant wavelet coefficients exceeding the thresholds, wherein the reconstructed f (n) is the target signal.
It should be noted that if the wavelet coefficients used for inverse discrete wavelet transform are wavelet coefficients with noise filtered as much as possible, f (n) is obtained as the denoised sound signal s (n).
In this embodiment, feature extraction is performed on the bird sound signal according to a preset feature extraction rule to obtain a tone color feature, and denoising processing is performed on the tone color feature by using wavelet transform to obtain a target signal. Therefore, the target signal can be accurately acquired, the accuracy of extracting the characteristic parameters by using the target signal subsequently is ensured, and the accuracy and the efficiency of bird sound signal identification are improved.
In one embodiment, as shown in fig. 3, step S21: according to the preset feature extraction rule, feature extraction is carried out on the bird sound signals, and obtaining the tone color features comprises the following steps:
s211: and (4) optimizing the bird sound signal to obtain an optimized initial signal.
In the embodiment of the invention, the bird sound signals are led into a preset optimization library for optimization processing to obtain the initial signals after optimization processing. The preset optimization library is a database specially used for optimizing bird sound signals.
S212: and performing frame division and windowing processing on the initial signal to obtain a characteristic signal.
In the embodiment of the present invention, the initial signal after the optimization processing is divided into a plurality of short-time initial signal segments, and each short-time initial signal segment is referred to as an analysis frame. In this way, a frame having a fixed length can be obtained by performing framing processing on the initial signal, that is, dividing the total frame length of the initial signal by the preset frame length, and if the last frame of the initial signal cannot reach the preset frame length, the number of frames is 0. The length of the preset frame may be 200, or may be set according to the actual requirement of the user, which is not limited herein.
It should be noted that, since the timbre features can be considered to have relatively stable features, i.e., short-term features, and short-term stationarity in a short period of time, the initial signals are subjected to framing, so that each frame of initial signals has short-term stationarity, i.e., the framed signals after framing have short-term stationarity, and thus short-term correlation analysis is performed.
However, after the initial signal is subjected to framing processing, there is a problem of framing signal leakage, for example, when the spectrum has a tail, it is indicated that the framing signal leakage is serious. To reduce the frame signal leakage problem, a window function is applied to the framed frame signal. The essence of windowing is that a window function is used to multiply the framed signal, and the characteristic signal is obtained by framing and windowing, so that the characteristic signal can better meet the periodicity requirement of Fourier transform, and the influence of framing on the edges of the framed signal is reduced.
Specifically, the initial signal is led into a preset processing port to perform special frame windowing processing, so as to obtain a feature signal after frame windowing processing. The preset processing port is a port for performing framing and windowing processing on the initial signal.
Further, framing in the preset processing port may specifically be processing by calling an enframe function in the voicebox tool, and windowing may specifically be processing by using formula (2).
Wherein the windowing function is as follows:
Figure BDA0002206792900000091
wherein Q isnIs a characteristic signal, T [ s (k)]For the framing signal, ω (n-k) is a window function, and n and k are constants.
S213: and transforming the characteristic signal by using short-time Fourier transform to obtain the tone characteristic.
Specifically, by using Hamming window function (Hamming) h-Hamming (N) to intercept an analysis frame with length N from the feature signal, and by using short-time fourier transform to obtain the spectral characteristics of each frame of feature signal from the analysis frame, the feature vector, i.e. the timbre features, can be extracted from the spectral characteristics.
The extracted feature vectors include skewness and kurtosis, spectral center, spectral flux, spectral roll-off, spectral propagation, spectral flatness, zero-crossing rate, Mel-frequency cepstral coefficient (MFCC) and first-order and second-order difference components thereof, and the like.
In this embodiment, an initial signal is obtained by performing optimization processing on a bird sound signal, a characteristic signal is obtained by performing framing and windowing processing on the initial signal, and finally, a timbre characteristic is obtained by performing transformation processing on the characteristic signal by using short-time fourier transform. Therefore, the extraction of the tone color characteristics of accurate bird sound signals is realized, the accuracy of obtaining target signals by subsequent processing by utilizing the tone color characteristics is ensured, and the accuracy and the efficiency of bird sound signal identification are further improved.
In one embodiment, as shown in fig. 4, the step S211 of obtaining the optimized initial signal by optimizing the bird sound signal includes the following steps:
s2111: and carrying out pre-emphasis processing on the bird sound signals to obtain pre-emphasis signals.
In the embodiment of the present invention, pre-emphasis is a signal processing method for compensating high-frequency components of an input signal at a transmitting end. As the signal rate increases, the signal is greatly damaged during transmission, and in order to obtain a better signal waveform at the receiving terminal, the damaged signal needs to be compensated, i.e. pre-emphasis processed.
Specifically, the original bird-calling sound signal is pre-emphasized through a first-order high-pass digital filter to obtain a pre-emphasized signal after pre-emphasis processing. The formula corresponding to the first-order high-pass digital filter is shown as formula (3):
H(z)=1-α×z-1formula (3)
Where h (z) is the pre-emphasis signal, α is the pre-emphasis coefficient, 0.9 < α < 1.0, α is typically a number close to 1, e.g., α is 0.94, and z is the input bird sound signal.
It should be noted that, the pre-emphasis processing of the bird sound signal is a method for compensating the high frequency component of the bird sound signal, the effect of the method on signal improvement depends on the magnitude of the pre-emphasis, and the pre-emphasis is realized by increasing the magnitude of the first jump bit after the jump edge of the bird sound signal. For example, for a bird sound signal sequence of 00111, the amplitude of the first 1 in the bird sound signal sequence after pre-emphasis is performed is larger than the amplitudes of the second and third 1. Because the hopping bit represents the high-frequency component in the signal, the method is favorable for improving the high-frequency component in the bird sound signal, so that the frequency spectrum of the bird sound signal is flattened, the high-frequency resolution of the bird sound signal is increased, and the method is favorable for frequency spectrum analysis, vocal tract parameter analysis and the like.
S2112: and carrying out normalization processing on the pre-emphasis signal to obtain an initial signal.
Specifically, the pre-emphasis signal is normalized by using formula (4), and the normalized initial signal is obtained, so that the characteristic distribution of the initial signal is [ -1, 1 ].
Figure BDA0002206792900000101
Wherein x isiFor the pre-emphasis signal, μ is the mean, δ is the standard deviation, and z is the initial signal.
In this embodiment, a pre-emphasis signal is obtained by performing pre-emphasis processing on the bird sound signal, and then an initial signal is obtained by performing normalization processing on the pre-emphasis signal. Therefore, the bird sound signals are quickly and accurately processed into the initial signals, the accuracy of obtaining the characteristic signals by performing framing and windowing processing on the subsequent initial signals is guaranteed, and the accuracy and the efficiency of bird sound signal identification are further improved.
In one embodiment, as shown in fig. 5, the step S3 of obtaining the feature parameters by performing the feature parameter extraction process on the target signal includes the following steps:
s31: and calculating the energy value of each frame of sound segment in the target signal by using a short-time energy calculation method.
In the embodiment of the present invention, the energy value of each frame of sound segment in the target signal is calculated by using a short-time energy calculation method, and a specific short-time energy calculation formula is shown as the following formula (5):
wherein E isnIs the energy value of the nth frame sound segment, x (m) is the target signal, w (n-m) is the weight, n is the nth time, and n is a constant.
S32: and calculating the effective sound segment according to the energy value.
Specifically, according to a preset calculation rule, from the energy values of each frame of sound segment in the target signal obtained in step S31, the energy values belonging to the valid sound segments are calculated. The preset calculation rule may specifically be a mean value of all energy values, and may also be set according to an actual requirement of a user, which is not limited herein.
The valid sound segment includes a plurality of segments, that is, there are a valid sound segment a, a valid sound segment B, and the like.
S33: and importing the effective sound segments into a preset transformation library to carry out discrete cosine transformation to obtain a Mel frequency cepstrum coefficient.
In an embodiment of the invention, the Mel-Frequency Cepstrum is a linear transformation of the logarithmic energy spectrum based on the nonlinear Mel scale of sound frequencies. Mel-frequency cepstral Coefficients (MFCC) are Coefficients constituting Mel-frequency cepstral.
Specifically, the effective sound segments obtained in step S32 are imported into a preset transform library for discrete cosine transform, so as to obtain mel-frequency cepstrum coefficients corresponding to each effective sound segment. The preset transform library is a database specially used for performing discrete cosine transform on the effective sound segments.
S34: and weighting the mel frequency cepstrum coefficient to obtain the characteristic parameters.
In the embodiment of the present invention, since the extraction process of the MFCC parameters is non-linear, it does not hide sufficient additivity, i.e. let signal f (x) be the sum of two different signals g (x) and h (x), i.e. f (x) g (x) + h (x). The MFCC characteristic parameter Pf of signals f (x), g (x), h (x)MFCC、PgMFCC、PhMFCCIs as in equation (6):
PfMFCC≠PgMFCC+PhMFCCformula (6)
Therefore, the target signal cannot be directly weighted and then the MFCC parameters are extracted, but the MFCC parameters of each effective sound segment in the target signal can be obtained first and then weighted, and only in this way, the significance of the original MFCC parameters can be embodied to the maximum extent, and then the characteristic parameters are obtained.
Specifically, the mel-frequency cepstrum coefficient is weighted according to formula (7) to obtain a weighted characteristic parameter, and a specific calculation formula is as follows:
WTMFCC=P1MFCC*z1+P2MFCC*z2+...+PrMFCCzr formula (7)
Wherein WTMFCC is a characteristic parameter, P1MFCCZ1 is P1 for the Mel frequency cepstrum coefficient corresponding to the first effective sound fragmentMFCCCorresponding weight value, P2MFCCFor the Mel frequency cepstrum coefficient corresponding to the second effective sound fragment, z2 is P2MFCCCorresponding weight value, PrMFCCMerr frequency corresponding to r-th segment of valid soundCoefficient of frequency cepstrum, zr is PrMFCCAnd (4) corresponding weight values.
In this embodiment, an energy value in a target signal is calculated by a short-time energy calculation method, an effective sound segment is calculated according to the energy value, discrete cosine transform is performed on the effective sound segment to obtain a mel-frequency cepstrum coefficient, and finally, weighting processing is performed on the mel-frequency cepstrum coefficient to obtain a characteristic parameter. Therefore, the accurate extraction of the characteristic parameters in the target signal is realized, the accuracy of the subsequent input of taking the characteristic parameters as the target hidden Markov model is ensured, and the accuracy and the efficiency of bird sound signal identification are further improved.
In one embodiment, as shown in fig. 6, the step S32 of calculating the valid sound fragment according to the energy value includes the following steps:
s321: the maximum energy value and the minimum energy value are obtained from the energy values of the target signal.
Specifically, the energy value of each frame of sound segment in the target signal is obtained according to step S31, the energy values are arranged in the order from small to large, and the first and last two energy values are extracted after the arrangement, that is, the maximum energy value and the minimum energy value are extracted.
For example, the energy values are 100, 150, 200, 120, and 180, respectively, the energy values are arranged in the order from small to large, the order is 100, 120, 150, 180, and 200, and the first and last two energy values are extracted, that is, the extracted energy value is 200 as the maximum energy value, and the extracted energy value is 100 as the minimum energy value.
S322: a difference between the maximum energy value and the minimum energy value is calculated.
Specifically, the maximum energy value and the minimum energy value obtained in step S321 are subtracted, and a difference value after the subtraction is obtained.
As shown in the example of step S321, the difference obtained by subtracting the maximum energy value 200 from the minimum energy value 100 is 100.
S323: based on the difference, a start point energy value of the valid sound piece and an end point energy value of the valid sound piece are calculated according to equation (8):
Figure BDA0002206792900000131
wherein E isupFor the starting energy value of the valid sound segment, EminTo a minimum energy value, EdifIs the difference between the maximum and minimum energy values, ElowEnd energy value of valid sound segment, EmaxIs the maximum energy value.
Specifically, the start point energy value of the valid sound segment and the end point energy value of the valid sound segment are calculated according to equation (8) based on the maximum energy value and the minimum energy value obtained in step S321 and the difference value obtained in step S322.
S324: and selecting all frames with energy values between the energy value of the starting point of the effective sound segment and the energy value of the ending point of the effective sound segment to form the effective sound segment.
Specifically, the energy value of each frame of sound segment in the target signal is calculated based on step S31, all frames with energy values between the start energy value of the valid sound segment and the end energy value of the valid sound segment are selected from the target signal, and the selected frames are imported into a preset combination library for synthesis processing, so as to obtain the valid sound segment after synthesis processing.
The preset combinatorial library is a database specially used for performing synthesis processing on frames.
In this embodiment, the maximum energy value and the minimum energy value in the target signal are obtained, based on the difference between the maximum energy value and the minimum energy value, the start energy value of the valid sound segment and the end energy value of the valid sound segment are calculated by using the formula (8), and finally, frames having energy values between the start energy value of the valid sound segment and the end energy value of the valid sound segment are selected to form the valid sound segment. Therefore, the effective sound segment is accurately extracted, and the subsequent discrete cosine transform by using the effective sound segment is ensured to obtain the accuracy of the Mel frequency cepstrum coefficient.
In an embodiment, as shown in fig. 7, after step S3 and before step S4, the artificial intelligence based voice recognition method further includes the following steps:
s5: and initializing the hidden Markov model to obtain an initial model.
In the embodiment of the invention, the model parameters of the hidden Markov model are initialized by the server, and an initial parameter is given to the weight and the bias of each network layer in the hidden Markov model, so that the hidden Markov model can extract and calculate the characteristics of the characteristic parameters according to the initial parameter, wherein the weight and the bias are model parameters used for performing refraction transformation calculation on input data in the model, and the result of the model output after calculation can be consistent with the actual condition.
It can be understood that, taking the example of receiving information by a person, after the person receives the information and the information is judged and transmitted by neurons in the brain of the person, the person can obtain a certain result or cognition, that is, a process of acquiring cognition from the information, and the training process of the hidden markov model is to optimize the weight and bias of neuron connection in the network, so that the trained hidden markov model can achieve a recognition effect consistent with a real situation on the recognition result of the data to be recognized.
Optionally, the server may optionally obtain a weight as an initial parameter in an interval of [ -0.30, +0.30], and set the initial parameter in an interval with an average value of 0 and smaller, so as to improve the convergence rate of the model and improve the construction efficiency of the model.
S6: and importing the preset characteristic parameters into the initial model to carry out iterative operation, and outputting a prediction error.
In the embodiment of the present invention, the preset feature parameters refer to feature parameters specially used for training the initial model. And importing the preset characteristic parameters into an initial model, performing multiple iterations on the characteristic parameters by using a Baum-Welch algorithm, calculating the output probabilities of all observation sequences, accumulating the output probabilities to obtain a total output probability, and taking the total output probability as a prediction error.
S7: and if the prediction error is smaller than a preset threshold value, stopping iterative operation, and determining an initial model corresponding to the prediction error as a target hidden Markov model.
Specifically, according to the prediction error obtained in step S6, the prediction error is compared with a preset threshold, and if the prediction error is smaller than the preset threshold, the iterative operation is stopped, and the initial model corresponding to the prediction error is determined as the target model. The preset threshold may be specifically 0.5, and may also be set according to the actual requirement of the user, which is not limited herein.
It should be noted that, in the model training process, the maximum number of iterations may also be set, and when the number of iterations reaches the maximum number of iterations, the iteration is stopped, and the current model is determined as the target hidden markov model.
In the embodiment, an initial model is obtained by initializing a hidden markov model, preset characteristic parameters are introduced into the initial model to perform iterative operation, a prediction error is output, the iterative operation is stopped when the prediction error is smaller than a preset threshold, and the initial model corresponding to the current prediction error is determined as a target hidden markov model. Therefore, training and tuning of the initial model are achieved, and the accuracy and efficiency of the target hidden Markov model in identifying the bird sound signals are improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, an artificial intelligence based voice recognition apparatus is provided, and the artificial intelligence based voice recognition apparatus is in one-to-one correspondence with the artificial intelligence based voice recognition method in the above embodiment. As shown in fig. 8, the artificial intelligence based voice recognition apparatus includes a first obtaining module 81, a drying processing module 82, a feature parameter extracting module 83 and a recognition module 84. The functional modules are explained in detail as follows:
the first acquisition module 81 is used for acquiring a bird sound signal to be detected from a preset acquisition library;
the dryness removing module 82 is used for removing dryness of the bird sound signals to obtain target signals after the dryness removing;
a characteristic parameter extraction module 83, configured to extract a characteristic parameter from the target signal to obtain a characteristic parameter;
and the recognition module 84 is configured to input the characteristic parameters into a pre-trained target hidden markov model for recognition to obtain a bird recognition result, where the target hidden markov model is used to recognize the bird recognition result corresponding to the characteristic parameters.
Further, the drying process module 82 includes:
the characteristic extraction submodule is used for extracting the characteristics of the bird sound signals according to a preset characteristic extraction rule to obtain tone characteristics;
and the wavelet transform submodule is used for denoising the tone characteristic by utilizing wavelet transform to obtain a processed target signal.
Further, the feature extraction sub-module includes:
the optimization unit is used for optimizing the bird sound signals to obtain initial signals after optimization;
the framing and windowing unit is used for performing framing and windowing processing on the initial signal to obtain a characteristic signal;
and the short-time Fourier transform unit is used for transforming the characteristic signal by using short-time Fourier transform to obtain the tone characteristic.
Further, the optimization unit includes:
the pre-emphasis subunit is used for pre-emphasizing the bird sound signals to obtain pre-emphasized signals;
and the normalization subunit is used for performing normalization processing on the pre-emphasis signal to obtain an initial signal.
Further, the feature parameter extraction module 83 includes:
the first calculation submodule is used for calculating the energy value of each frame of sound fragment in the target signal by using a short-time energy calculation method;
a second calculation submodule for calculating an effective sound segment from the energy value;
the discrete cosine transform submodule is used for importing the effective sound segment into a preset transform library to carry out discrete cosine transform to obtain a Mel frequency cepstrum coefficient;
and the weighting submodule is used for weighting the mel frequency cepstrum coefficient to obtain the characteristic parameters.
Further, the second calculation submodule includes:
a second acquisition unit configured to acquire a maximum energy value and a minimum energy value from among the energy values of the target signal;
a third calculation unit for calculating a difference between the maximum energy value and the minimum energy value;
a fourth calculation unit configured to calculate a start point energy value of the valid sound piece and an end point energy value of the valid sound piece according to equation (8) based on the difference value:
Figure BDA0002206792900000181
wherein E isupFor the starting energy value of the valid sound segment, EminTo a minimum energy value, EdifIs the difference between the maximum and minimum energy values, ElowEnd energy value of valid sound segment, EmaxIs the maximum energy value.
And the composition unit is used for selecting all frames with energy values between the starting point energy value of the effective sound segment and the ending point energy value of the effective sound segment to compose the effective sound segment.
Further, the artificial intelligence based voice recognition apparatus further includes:
the initialization module is used for initializing the hidden Markov model to obtain an initial model;
the iterative operation module is used for importing the preset characteristic parameters into the initial model to carry out iterative operation and outputting a prediction error;
and the model determining module is used for stopping iterative operation and determining the initial model corresponding to the prediction error as the target hidden Markov model if the prediction error is smaller than a preset threshold value.
Some embodiments of the present application disclose a computer device. Referring specifically to fig. 9, a basic structure block diagram of a computer device 90 according to an embodiment of the present application is shown.
As illustrated in fig. 9, the computer device 90 includes a memory 91, a processor 92, and a network interface 93 communicatively connected to each other through a system bus. It is noted that only a computer device 90 having components 91-93 is shown in FIG. 9, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 91 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 91 may be an internal storage unit of the computer device 90, such as a hard disk or a memory of the computer device 90. In other embodiments, the memory 91 may also be an external storage device of the computer device 90, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 90. Of course, the memory 91 may also include both internal and external memory units of the computer device 90. In this embodiment, the memory 91 is generally used for storing an operating system installed on the computer device 90 and various types of application software, such as program codes of the artificial intelligence based voice recognition method. Further, the memory 91 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 92 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device 90. In this embodiment, the processor 92 is configured to execute the program code stored in the memory 91 or process data, for example, execute the program code of the artificial intelligence based voice recognition method.
The network interface 93 may include a wireless network interface or a wired network interface, and the network interface 93 is generally used to establish a communication connection between the computer device 90 and other electronic devices.
The present application further provides another embodiment, which is a computer-readable storage medium storing a bird sound signal information entry program executable by at least one processor to cause the at least one processor to perform any of the above-described artificial intelligence based sound identification method steps.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a computer device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
Finally, it should be noted that the above-mentioned embodiments illustrate only some of the embodiments of the present application, and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An artificial intelligence based voice recognition method for avian voice recognition, the method comprising:
acquiring a bird sound signal to be detected from a preset acquisition library;
carrying out drying treatment on the bird sound signal to obtain a target signal subjected to drying treatment;
extracting characteristic parameters from the target signal to obtain characteristic parameters;
and inputting the characteristic parameters into a pre-trained target hidden Markov model for recognition to obtain a bird recognition result, wherein the target hidden Markov model is used for recognizing the bird recognition result corresponding to the characteristic parameters.
2. The artificial intelligence based voice recognition method of claim 1, wherein the step of de-drying the bird voice signal to obtain a processed target signal comprises:
according to a preset feature extraction rule, carrying out feature extraction on the bird sound signal to obtain tone color features;
and denoising the tone characteristics by utilizing wavelet transformation to obtain the processed target signal.
3. The artificial intelligence based voice recognition method of claim 2, wherein the step of performing feature extraction on the bird voice signal according to a preset feature extraction rule to obtain the timbre features comprises:
obtaining an initial signal after optimization processing by optimizing the bird sound signal;
performing frame windowing on the initial signal to obtain a characteristic signal;
and transforming the characteristic signal by using short-time Fourier transform to obtain the tone characteristic.
4. The artificial intelligence based voice recognition method of claim 3, wherein the step of obtaining the initial signal after optimization processing by performing optimization processing on the bird voice signal comprises:
pre-emphasis processing is carried out on the bird sound signals to obtain pre-emphasis signals;
and carrying out normalization processing on the pre-emphasis signal to obtain the initial signal.
5. The artificial intelligence based voice recognition method according to any one of claims 1 to 4, wherein the step of obtaining the feature parameters by performing feature parameter extraction processing on the target signal includes:
calculating the energy value of each frame of sound segment in the target signal by using a short-time energy calculation method;
calculating a valid sound segment from the energy value;
leading the effective sound segment into a preset transformation library to carry out discrete cosine transformation to obtain a Mel frequency cepstrum coefficient;
and weighting the mel frequency cepstrum coefficient to obtain the characteristic parameters.
6. The artificial intelligence based voice recognition method of claim 5, wherein the step of calculating a valid voice snippet from the energy value comprises:
obtaining a maximum energy value and a minimum energy value from the energy values of the target signal;
calculating a difference between the maximum energy value and the minimum energy value;
based on the difference, calculating a start energy value of the valid sound segment and an end energy value of the valid sound segment according to the following formula:
Eup=Emin+Edif*10%
Elow=Emax-Edif*10%
wherein E isupFor the starting energy value of the valid sound segment, EminIs the minimum energy value, EdifIs the difference between the maximum energy value and the minimum energy value, ElowAs an end energy value of the effective sound segment, EmaxIs the maximum energy value;
and selecting all frames with the energy value between the starting energy value of the effective sound segment and the ending energy value of the effective sound segment to form the effective sound segment.
7. The artificial intelligence based voice recognition method according to claim 1, wherein after the step of obtaining the feature parameters by performing the feature parameter extraction process on the target signal, and before the step of inputting the feature parameters into a pre-trained target hidden markov model for recognition to obtain the bird recognition result, the artificial intelligence based voice recognition method further comprises:
initializing a hidden Markov model to obtain an initial model;
importing preset characteristic parameters into the initial model to perform iterative operation, and outputting a prediction error;
and if the prediction error is smaller than a preset threshold value, stopping iterative operation, and determining the initial model corresponding to the prediction error as the target hidden Markov model.
8. An artificial intelligence based voice recognition apparatus, comprising:
the first acquisition module is used for acquiring a bird sound signal to be detected from a preset acquisition library;
the dryness removing processing module is used for removing dryness of the bird sound signals to obtain target signals after the dryness removing processing;
the characteristic parameter extraction module is used for extracting characteristic parameters from the target signal to obtain characteristic parameters;
and the identification module is used for inputting the characteristic parameters into a pre-trained target hidden Markov model for identification to obtain a bird identification result, wherein the target hidden Markov model is used for identifying the bird identification result corresponding to the characteristic parameters.
9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the artificial intelligence based voice recognition method according to any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the artificial intelligence based voice recognition method according to any one of claims 1 to 7.
CN201910884202.XA 2019-09-19 2019-09-19 Artificial intelligence-based voice recognition method and related equipment thereof Pending CN110797033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910884202.XA CN110797033A (en) 2019-09-19 2019-09-19 Artificial intelligence-based voice recognition method and related equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910884202.XA CN110797033A (en) 2019-09-19 2019-09-19 Artificial intelligence-based voice recognition method and related equipment thereof

Publications (1)

Publication Number Publication Date
CN110797033A true CN110797033A (en) 2020-02-14

Family

ID=69427312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910884202.XA Pending CN110797033A (en) 2019-09-19 2019-09-19 Artificial intelligence-based voice recognition method and related equipment thereof

Country Status (1)

Country Link
CN (1) CN110797033A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111399515A (en) * 2020-03-31 2020-07-10 连云港市水利学会 Wetland environment electronic monitoring system based on natural factor disturbance
CN113345399A (en) * 2021-04-30 2021-09-03 桂林理工大学 Method for monitoring sound of machine equipment in strong noise environment
CN113506579A (en) * 2021-05-27 2021-10-15 华南师范大学 Insect pest recognition method based on artificial intelligence and sound and robot
CN113702513A (en) * 2021-07-16 2021-11-26 陕西师范大学 Method for identifying metal material based on prediction function model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1097278A (en) * 1996-09-20 1998-04-14 Nippon Telegr & Teleph Corp <Ntt> Method and device for recognizing voice
CN107393542A (en) * 2017-06-28 2017-11-24 北京林业大学 A kind of birds species identification method based on binary channels neutral net
CN109243470A (en) * 2018-08-16 2019-01-18 南京农业大学 Broiler chicken cough monitoring method based on Audiotechnica
CN110246504A (en) * 2019-05-20 2019-09-17 平安科技(深圳)有限公司 Birds sound identification method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1097278A (en) * 1996-09-20 1998-04-14 Nippon Telegr & Teleph Corp <Ntt> Method and device for recognizing voice
CN107393542A (en) * 2017-06-28 2017-11-24 北京林业大学 A kind of birds species identification method based on binary channels neutral net
CN109243470A (en) * 2018-08-16 2019-01-18 南京农业大学 Broiler chicken cough monitoring method based on Audiotechnica
CN110246504A (en) * 2019-05-20 2019-09-17 平安科技(深圳)有限公司 Birds sound identification method, device, computer equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111399515A (en) * 2020-03-31 2020-07-10 连云港市水利学会 Wetland environment electronic monitoring system based on natural factor disturbance
CN113345399A (en) * 2021-04-30 2021-09-03 桂林理工大学 Method for monitoring sound of machine equipment in strong noise environment
CN113506579A (en) * 2021-05-27 2021-10-15 华南师范大学 Insect pest recognition method based on artificial intelligence and sound and robot
CN113506579B (en) * 2021-05-27 2024-01-23 华南师范大学 Insect pest identification method and robot based on artificial intelligence and voice
CN113702513A (en) * 2021-07-16 2021-11-26 陕西师范大学 Method for identifying metal material based on prediction function model

Similar Documents

Publication Publication Date Title
Luo et al. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
CN108962237B (en) Hybrid speech recognition method, device and computer readable storage medium
CN111161752B (en) Echo cancellation method and device
CN106486131B (en) A kind of method and device of speech de-noising
CN110797033A (en) Artificial intelligence-based voice recognition method and related equipment thereof
WO2019100606A1 (en) Electronic device, voiceprint-based identity verification method and system, and storage medium
CN108922544B (en) Universal vector training method, voice clustering method, device, equipment and medium
Zhao et al. Late reverberation suppression using recurrent neural networks with long short-term memory
CN110772700B (en) Automatic sleep-aiding music pushing method and device, computer equipment and storage medium
CN111785288B (en) Voice enhancement method, device, equipment and storage medium
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN110634476B (en) Method and system for rapidly building robust acoustic model
WO2014065342A1 (en) Method for transforming input signal
WO2022141868A1 (en) Method and apparatus for extracting speech features, terminal, and storage medium
KR102026226B1 (en) Method for extracting signal unit features using variational inference model based deep learning and system thereof
CN113345460A (en) Audio signal processing method, device, equipment and storage medium
CN113035216B (en) Microphone array voice enhancement method and related equipment
Astudillo et al. Uncertainty propagation
CN106910494B (en) Audio identification method and device
Nathwani et al. An extended experimental investigation of DNN uncertainty propagation for noise robust ASR
WO2020015546A1 (en) Far-field speech recognition method, speech recognition model training method, and server
CN110875037A (en) Voice data processing method and device and electronic equipment
CN113793615B (en) Speaker recognition method, model training method, device, equipment and storage medium
CN115859048A (en) Noise processing method and device for partial discharge signal
Chakrabartty et al. Robust speech feature extraction by growth transformation in reproducing kernel Hilbert space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination