CN111540368A - Stable bird sound extraction method and device and computer readable storage medium - Google Patents

Stable bird sound extraction method and device and computer readable storage medium Download PDF

Info

Publication number
CN111540368A
CN111540368A CN202010379824.XA CN202010379824A CN111540368A CN 111540368 A CN111540368 A CN 111540368A CN 202010379824 A CN202010379824 A CN 202010379824A CN 111540368 A CN111540368 A CN 111540368A
Authority
CN
China
Prior art keywords
sub
noise
signal
sound
power spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010379824.XA
Other languages
Chinese (zh)
Other versions
CN111540368B (en
Inventor
张承云
郑泽鸿
陈庆春
凌嘉乐
肖波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202010379824.XA priority Critical patent/CN111540368B/en
Publication of CN111540368A publication Critical patent/CN111540368A/en
Application granted granted Critical
Publication of CN111540368B publication Critical patent/CN111540368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a steady birdsound extraction method, which comprises the following steps: preprocessing an audio signal to obtain a power spectrum of a signal with noise, and obtaining a noise power spectrum estimation by a minimum search method; on the basis of a preset HBank filter bank, a power spectrum of a signal with noise and a noise power spectrum are estimated and converted into an H domain for analysis, and then a posterior signal-to-noise ratio is obtained; obtaining the prior signal-to-noise ratio estimation of the H domain according to the posterior signal-to-noise ratio and a guide decision method; smoothing the prior signal-to-noise ratio, and then calculating the average value of the prior signal-to-noise ratio to further obtain the prior probability of the voiced frame; judging whether the current frame is a voiced frame or not according to a set threshold value, and collecting continuous voiced frame signals to obtain voiced segments; and obtaining the formant frequency and the formant width through a linear prediction method, and further judging whether the sound section has the bird sound. The method can accurately extract the voiced segments and automatically eliminate the noise, has good effect under the condition of low signal-to-noise ratio, and has low algorithm complexity and strong real-time property.

Description

Stable bird sound extraction method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of ecological monitoring and acoustic signal identification, in particular to a robust method and device for extracting a bird sound and a computer readable storage medium.
Background
At present, China, as a country with the largest number of bird species, has always paid high attention to the problem of bird environmental protection. Through the research of bird song, not only can distinguish species, can analyze biological behavior again, have wide development prospect in the aspects such as field animal monitoring, agriculture and forestry are driven harmful bird, aviation bird is collided. The primary task of the birdsound study is to separate potential birdsound fragments from the continuously acquired audio. The early stage of the extraction of the bird sound is realized by adopting a manual detection method, and after repeated listening and spectrogram analysis, segments with bird sound are extracted and marked by a bird sound expert. Although the bird sound fragment can be accurately obtained by manual extraction, the detection efficiency is very low, and the method is not beneficial to processing mass recording data.
With the mature development of voice activity detection technology, some methods for automatically extracting the birdsong segment have appeared, but still have many problems, including extraction performance, complexity, universality, real-time performance, and the like.
In the prior art, the method for extracting the bird sound has the following defects:
the extraction method based on energy detection has low computation amount, but can not correctly judge under the condition of low signal-to-noise ratio; the extraction method based on the prior probability has moderate computation amount and good detection performance, but the misjudgment phenomenon still exists for the mutation noise; the extraction method based on spectrogram analysis can completely obtain fragments containing the bird sound, but the spectrogram information can be constructed only by continuous multi-frame signals, the calculation amount is large, and the method is only suitable for off-line processing; the extraction method based on the Gaussian mixture model has stable detection performance under the condition of low signal-to-noise ratio, the algorithm complexity is moderate, but model parameters need to be continuously adjusted, and misjudgment can occur on abrupt noise; the extraction algorithm based on deep learning has excellent effect under the condition of sufficient sample amount, but the algorithm has high complexity, a large amount of sample data needs to be trained in the early stage, and the classification result is influenced by over-fitting and under-fitting.
Disclosure of Invention
The purpose of the invention is: based on a steady bird sound extraction method, a steady bird sound extraction device and a computer readable storage medium, the sound segment can be accurately extracted, some human sounds and other animal sounds can be automatically rejected, bird sound signals can be accurately picked up under the condition of low signal-to-noise ratio, the algorithm complexity is low, the real-time performance is strong, and the method and the device can be applied to a bird sound acquisition system.
In order to achieve the above object, the present invention provides a robust method for extracting a bird sound, which is suitable for being executed in a computer device, and at least comprises the following steps:
preprocessing the collected audio signal in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise, and obtaining a noise power spectrum estimation by a minimum value search method;
inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain, and obtaining a posterior signal-to-noise ratio according to the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain;
obtaining the prior signal-to-noise ratio estimation of the H domain according to the posterior signal-to-noise ratio and a guide decision method; smoothing the prior signal-to-noise ratio to obtain a smoothed prior signal-to-noise ratio estimate; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio;
when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments;
during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice;
according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound.
Further, classifying the bird sound or noise of each sub-slice according to the formant frequency and the formant width, and judging whether the bird sound exists or not by comparing the number of the bird sound sub-slices in the sound segment with the number of the noise sub-slices, if so, storing the bird sound sub-slices, including:
when the resonance peak frequency is more than 1.5kHz, the sub-piece is judged as a bird sound sub-piece;
when the formant frequency is less than 400Hz, the sub-sheet is judged as a noise sub-sheet;
the formant frequency is between 400Hz and 1.5kHz, and the sub-sheets need to be analyzed according to the width of the formant; if the width of the resonance peak is less than 500Hz, the sub-piece is judged as a bird sound sub-piece, otherwise, the sub-piece is judged as a noise sub-piece;
and counting the number of the bird sound sub-pieces and the number of the noise sub-pieces in the sound section, if the number of the bird sound sub-pieces is greater than the number of the noise sub-pieces, the sound section contains bird sounds, and the sound section is stored.
Further, parameter setting is performed on the HBank filter bank: at a frequency FCBuild filter for center, build M on its left sideLFilter, right side set up MHA filter of (M)L+1+MH) A filter covering a linear frequency range of FL~FH
Further, the function expression of the HBank filter bank is:
Figure BDA0002481329450000031
the central frequency expression of the HBank filter is as follows:
Figure BDA0002481329450000041
further, the expression of the smoothed a priori snr estimate is:
ζH(λ,b)=αζ×ζH(λ-1,b)+(1-αζ)×ξH(λ,b)。
further, the formula for calculating the prior probability of the voiced frame is as follows:
Figure BDA0002481329450000042
further, the audio signal is preprocessed, specifically: and carrying out sub-band separation, frame shift, windowing and Fourier transform on the audio signal to obtain a power spectrum of the signal with noise.
An embodiment of the present invention further provides a robust apparatus for extracting a birdsound, including:
the preprocessing module is used for preprocessing the collected audio signals in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise and obtaining a noise power spectrum estimation through a minimum search method;
the HBank filter bank module is used for inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain;
the voiced frame processing module is used for obtaining the prior signal-to-noise ratio estimation of an H domain according to the posterior signal-to-noise ratio and a guide decision method, and smoothing the prior signal-to-noise ratio to obtain a smooth prior signal-to-noise ratio estimation; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments;
a birdsound fragment screening module to: during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound.
Further, the birdsound fragment screening module is specifically configured to: during the sound segment, performing bird sound or noise classification on each sub-piece according to the formant frequency and the formant width, judging whether bird sound exists or not by comparing the number of the bird sound sub-pieces in the sound segment with the number of the noise sub-pieces, and if yes, storing the bird sound sub-pieces, wherein the method comprises the following steps:
when the resonance peak frequency is more than 1.5kHz, the sub-piece is judged as a bird sound sub-piece;
when the formant frequency is less than 400Hz, the sub-sheet is judged as a noise sub-sheet;
the formant frequency is between 400Hz and 1.5kHz, and the sub-sheets need to be analyzed according to the width of the formant; if the width of the resonance peak is less than 500Hz, the sub-piece is judged as a bird sound sub-piece, otherwise, the sub-piece is judged as a noise sub-piece;
and counting the number of the bird sound sub-pieces and the number of the noise sub-pieces in the sound section, if the number of the bird sound sub-pieces is greater than the number of the noise sub-pieces, the sound section contains bird sounds, and the sound section is stored.
An embodiment of the present invention also provides a computer-readable storage medium comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a robust birdsound extraction apparatus according to any one of claims 1 to 7.
Compared with the prior art, the stable birdsound extraction method, the stable birdsound extraction device and the computer readable storage medium have the advantages that:
the embodiment of the invention provides a robust method, a device and a computer readable storage medium for extracting a bird sound, wherein the method comprises the following steps: preprocessing the collected audio signal in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise, and obtaining a noise power spectrum estimation by a minimum value search method; inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain, and obtaining a posterior signal-to-noise ratio according to the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain; obtaining the prior signal-to-noise ratio estimation of the H domain according to the posterior signal-to-noise ratio and a guide decision method; smoothing the prior signal-to-noise ratio to obtain a smoothed prior signal-to-noise ratio estimate; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments; during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound. The invention can accurately extract the voiced segments and automatically eliminate some human voices and other animal voices, has good effect under the condition of low signal-to-noise ratio, has lower algorithm complexity and strong real-time property, and can be applied to a bird voice acquisition system.
Drawings
Fig. 1 is a schematic flow chart of a robust birdsong extraction method according to a first embodiment of the present invention;
fig. 2 is a detailed flowchart of a robust birdsong extraction method according to a first embodiment of the present invention;
fig. 3 is a schematic frequency domain distribution diagram of an HBank filter in the robust birdsong extraction method according to the first embodiment of the present invention;
fig. 4 is a schematic structural diagram of a robust apparatus for extracting birdsound according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment of the present invention:
please refer to fig. 1-3.
As shown in fig. 1, a robust birdsong extraction method according to a preferred embodiment of the present invention is suitable for being executed in a computer device, and includes at least the following steps:
s101, preprocessing an acquired audio signal in a target range to obtain a power spectrum of a signal with noise, smoothing the power spectrum of the signal with noise, and obtaining noise power spectrum estimation by a minimum search method;
s102, inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain a power spectrum of the signal with noise in an H domain and a noise power spectrum estimation in the H domain, and obtaining a posterior signal-to-noise ratio according to the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain;
s103, obtaining prior signal-to-noise ratio estimation of an H domain according to the posterior signal-to-noise ratio and a guide decision method; smoothing the prior signal-to-noise ratio to obtain a smoothed prior signal-to-noise ratio estimate;
s104, obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments;
s105, during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice;
s106, classifying the bird sound or noise of each sub-piece according to the formant frequency and the formant width, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound.
For step S101, preprocessing the collected audio signal within the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise, and obtaining a noise power spectrum estimate by a minimum search method, specifically;
through a microphone pairCollecting sound signals within a certain range, wherein the sampling rate is 32kHz, the quantization precision is 16 bits, the time length of each frame of signal is 10ms, the number of sampling points is 320, and the sampling points are recorded as yin(λ), where λ is the frame number.
Since most of the energy of bird singing is concentrated below 8kHz, the high-frequency sub-band does not have too much information for distinguishing bird sounds from noise, and the signal y is subjected to quadrature mirror image filterin(lambda) carrying out sub-band separation to obtain a low-frequency sub-band yl(lambda) and the high-frequency subband yh(lambda). Subsequent steps are only applied to the low frequency sub-band yl(lambda) is analyzed, the sampling frequency FS16000, number of samples Nl=160。
The current frame y of the low frequency subbandl(lambda, N) and the previous frame y (lambda-1, N) to obtain y (lambda, N), wherein N is shown in formula (1)FThe total length after the frame stack is usually an integer power of 2, NF256; n is the number of sampling points, N is 0,1, …, NF-1。
Figure BDA0002481329450000081
Adopting a mixed flat-top Hanning window function w (N) of the formula (2), windowing the signal y (lambda, N), and executing NFPerforming point Fourier transform, and obtaining a magnitude spectrum Y (lambda, k) of the signal with noise by taking a modulus value of the Fourier transform, wherein k is a frequency point number, k is 0,1, …, and N is shown in a formula (3)F-1。
Figure BDA0002481329450000091
Figure BDA0002481329450000092
Due to the symmetry of Fourier transform, only the first N frequency points of the frequency spectrum are analyzed, wherein N is NFThe default value of/2 +1, N, is 129.
Carrying out intra-frame smoothing on a noisy signal power spectrum Y2 (lambda, k) to obtain S' (lambda, k), and obtaining a formula (4), wherein W is a normalized Hanning window function, and the window length is (2 × N)W+1),1≤NW≤5,NWIs 1, i.e. W ═ 0.25,0.5,0.25]。
Figure BDA0002481329450000093
Then, the inter-frame smoothing is performed on S' (λ, k) to obtain S (λ, k), see formula (5), wherein αSFor inter-frame smoothing factor, 0 < αS<1,αSIs 0.8.
S(λ,k)=αS×S(λ-1,k)+(1-αS)×S′(λ,k)k=0,1,…,N-1 (5)
The minimum value S is updated after R successive estimates of the smoothed power spectral density S (lambda, k) are obtained by using the minimum value search method proposed by Rainer Martinmin(λ, k), see equation (6), where min {. cndot.) is the minimum operator.
Smin(λ,k)=min{S(λ′,k)|(λ′=(λ-R+1),…,λ)} (6)
Obtaining noise power spectrum estimation D according to formula (7) smooth updating2(λ, k) in which [ αm·Smin(λ,k)]2 < α for noise decision thresholdm<8,αmIs 5.αDAs a weighting factor, αDMax {0.03,1/(λ +1) }, noise power spectrum D as the number of frames increases2The change in (λ, k) gradually stabilizes.
Figure BDA0002481329450000101
For step S102, the specific preset process of the preset HBank filter bank is as follows:
aiming at the energy spectrum distribution of the actual bird sound, an HBank filter bank with frequency domain distribution as shown in FIG. 3 is established, wherein the related domain is simply called H domain and the frequency is FCBuild filter for center, build M on its left sideLFilter, right side set up MHA filter of (M)L+1+MH) A filter covering a linear frequency range of FL~FH. The above parameters need to be set in advance forNo sound collection of specific birds, 200 ≤ FL<FC<FH≤8000,2<ML<12,2<MH< 12, generally set to FC=3500、FL=200、FH=8000、ML=8、MH(ii) 5; and for the sound collection of specific birds, parameters are adjusted according to the actual spectrum distribution rule of the bird sound, so that the picking-up effect is better in a complex noise environment. Number b and center frequency f of filterC(b) The relation of (2) is shown as the expression (8), wherein s is used for adjusting the discrete degree of the adjacent filters, s is more than 0.7 and less than 1.5, the default value of s is 1.2, when s is more than 1, the filters are close to the central frequency, and when s is less than 1, the filters are close to the two sides.
Figure BDA0002481329450000102
In the HBank filter bank, the upper limit frequency f of the filter bH(b) Corresponding to the center frequency f of the filter b +1C(b +1) shown in formula (9); lower limit frequency f of filter bL(b) Corresponding to the center frequency f of the filter b-1C(b-1) shown in the formula (10).
Figure BDA0002481329450000103
Figure BDA0002481329450000104
Lower limit frequency f of filter bL(b) Center frequency fC(b) And an upper limit frequency fH(b) Respectively mapping to corresponding frequency points to obtain kL(b)、kC(b) And kH(b) See formula (11), wherein
Figure BDA0002481329450000105
Is rounded up.
Figure BDA0002481329450000111
Since the HBank filter exhibits a triangle, and kC(b)=1,kL(b)=kH(b) The expression H (b, k) for the filter bank is found as 0, see equation (12).
Figure BDA0002481329450000112
For step S102, the power spectrum of the noisy signal and the noise power spectrum estimate are respectively input to a preset HBank filter bank to obtain a noisy signal power spectrum in the H domain and a noise power spectrum estimate in the H domain, and a posterior signal-to-noise ratio is obtained according to the noisy signal power spectrum in the H domain and the noise power spectrum estimate in the H domain; wherein, the correlation domain of the HBank filter bank is an H domain, specifically:
will take the power spectrum Y of the noise signal2(λ, k) and noise power spectrum estimate D2(lambda, k) are obtained by means of a filter function H (b, k), respectively
Figure BDA0002481329450000113
And
Figure BDA0002481329450000114
see the formulas (13) and (14).
Figure BDA0002481329450000115
Figure BDA0002481329450000116
In the H domain, is composed of
Figure BDA0002481329450000117
And
Figure BDA0002481329450000118
obtaining the posterior signal-to-noise ratio gammaH(lambda, b) is shown in formula (15).
Figure BDA0002481329450000119
For step S103, obtaining the prior signal-to-noise ratio estimation of the H domain according to the posterior signal-to-noise ratio and a guide decision method; smoothing the prior signal-to-noise ratio to obtain a smoothed prior signal-to-noise ratio estimate; the method specifically comprises the following steps:
in the H domain, the prior SNR estimate ξ of the H domain is obtained by a guided decision methodH(lambda, b) in formula (16).
Figure BDA0002481329450000121
Wherein the content of the first and second substances,
Figure BDA0002481329450000122
for the last frame pure power spectrum estimation in the H domain, the magnitude spectrum estimation is obtained by equation (18). αH(λ, b) is a weight adjustment factor, and is obtained from the formula (17).
Figure BDA0002481329450000123
Wherein, αhIs a constant, 0 < αh<1,αhIs 0.1, when the instantaneous signal-to-noise ratio is larger, the weight α of the current signal-to-noise ratio estimate is increasedH(λ, b), make ξHThe estimation of (λ, b) is more accurate.
Obtaining a u power estimator X of a pure amplitude spectrum according to a minimum mean square error estimation criterionH(lambda, b) is shown in formula (18).
Figure BDA0002481329450000124
Wherein u is a power exponent of the amplitude spectrum estimation, u is more than or equal to 0.1 and less than or equal to 2, and a default value of u is 0.5; (. cndot.) is a gamma function, and the calculation formula is shown in formula (19); phi (·) is a confluent hypergeometric function, and the calculation formula is shown in formula (20). The two functions have higher operation complexity, and the operation can be simplified by using an approximate function in practical engineering application.
Figure BDA0002481329450000125
Figure BDA0002481329450000126
For a priori signal to noise ratio ξH(lambda, b) smoothing the frames to obtain a smoothed prior signal-to-noise ratio zetaH(lambda, b) see the formula (21) wherein αζFor inter-frame smoothing factor, 0 < αζ<1,αζIs 0.7.
ζH(λ,b)=αζ×ζH(λ-1,b)+(1-αζ)×ξH(λ,b) (21)
For step S104, obtaining the prior probability of the voiced frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is greater than a set threshold value, the current frame is determined as a voiced frame, and voiced segments are obtained by collecting continuous voiced frame signals, specifically:
smoothing a priori signal-to-noise ratio ζHThe mean value of (lambda, b) is substituted into the formula (22) to obtain the prior probability p of the voiced frameH(λ)。
Figure BDA0002481329450000131
When probability value pH(lambda) is greater than a set threshold value PHIf yes, the frame is judged to be a sound frame, otherwise, the frame is judged to be a noise frame. Wherein the threshold value PHNeeds to be adjusted to the best effect through experiments, and the P is more than or equal to 0.2H≤0.8,PHIs 0.5. Input y corresponding to voiced frames for successive r framesin(λ) the set yields a voiced segment signal, denoted V ═ yin(λ-r+1),yin(λ-r+2),…,yin(λ)}。
For step S105, during the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; the method specifically comprises the following steps:
since the determination in step S104 is only to simply separate the voiced segments from the noise segments, and whether the voiced segments have bird sounds is not analyzed, further screening needs to be performed according to the formant information of the sound segments. The formant information of the sound segment is obtained by adopting a linear prediction method, but for the sound segment with longer duration (more than 1 second), the calculation amount is larger when the linear prediction coefficient is calculated, and the real-time processing is not facilitated. The bird sound can also construct a formant in a small number of sampling points (6 frames of data), and has certain similarity with the formants of the whole bird sound fragment. By the slicing method, the calculation time of the prediction coefficient is reduced, and the formant frequency and the formant width of the corresponding sub-slices are calculated while bird sound detection is performed on each frame.
In the determination process of step S104, if it is determined that there is a voiced frame, the low-frequency subband signal y of the current frame is usedl(λ) is combined with the 5 preceding frames to obtain subplate v (λ) ═ yl(λ-5),yl(λ-4),…,yl(lambda) sub-patch length 6 × NlA complete voiced segment V corresponds to r sub-segments 960. For convenience of description, the subsequent steps omit the frame number λ.
Pre-emphasis of sub-patches v, see equation (23), raises the high frequency components of the sound signal, αvFor pre-emphasis coefficient, 0.9 < αv<1,αvIs 0.99, and n' is the sample point number of the voiced segment sequence.
v′(n′)=v(n′)-αv·v(n′-1) (23)
The pre-emphasis sub-slice v' is simplified by a linear prediction model of order p, see (24), where a1,a2,…,apE (n') is a linear prediction error.
Figure BDA0002481329450000141
According to formula (25), by NTFast Fourier transform solving of pointsAnd obtaining the power spectrum amplitude response T (k') of the linear prediction model, namely the spectrum envelope curve of the vocal tract signals. Wherein N isTNumber of Fourier transform points, typically an integer power of 2, NTHas a default value of 256.
Figure BDA0002481329450000142
Normalizing the power spectrum magnitude response T (k ') of the linear prediction model yields T ' (k '), see (26), where max {. cndot.) is the maximum operator.
Figure BDA0002481329450000143
Searching a frequency point k corresponding to the maximum value in the normalized power spectrum amplitude response T '(k')/TSee formula (27), and converting by formula (28) to obtain a formant frequency value fT
Figure BDA0002481329450000144
Figure BDA0002481329450000145
In the normalized power spectral amplitude response T '(k'), T '(k') is knownT) 1, at frequency point kTFor the center, search the frequency point k to the leftTLSo that it satisfies T' (k)TL-1) is less than or equal to 0.2, and a frequency point k is searched rightwardsTHSo that it satisfies T' (k)TH+1) is less than or equal to 0.2, and the frequency point k is equal toTLSum frequency point kTHThe lower limit frequencies f of the resonance peaks are obtained by the following equations (29)TLAnd upper limit frequency f of resonance peakTHThe difference between the two is the width of the resonance peak DeltafTSee formula (30).
Figure BDA0002481329450000146
ΔfT=fTH-fTL(30)
For step S106, according to the formant frequency and the formant width, performing classification of bird sounds or noises on each sub-slice, and counting the number of bird sound sub-slices and noise sub-slices in the voiced segment; judging whether the existing sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, if so, storing the section with the bird sound, and specifically:
setting a classification counter CbAnd CnAnd respectively recording the number of the bird sound sub-pieces and the number of the noise sub-pieces, and clearing two counter values and starting counting at the initial frame of the sound segment until the counting is stopped when the sound segment is ended. When the formant frequency f of the subplate vTWhen the frequency is more than 1.5kHz, the bird sound sub-piece is judged to be a bird sound sub-piece, and a bird sound sub-piece counter CbAdding one; when f isTWhen the frequency is less than 400Hz, the signal is judged as a noise sub-chip, and a noise sub-chip counter CnAnd adding one. And if fTBetween 400Hz and 1.5kHz, the combined resonance peak width Δ f is requiredTAnd (6) judging. When the width of resonance peak Δ fTWhen the frequency is less than 500Hz, the bird song piece is judged as a bird song piece, and a bird song piece counter CbAdding one; otherwise, the signal is judged as a noise sub-chip, and a noise sub-chip counter CnAnd adding one.
At the end of the current voiced segment, a total of r sub-segment classification results are obtained, in which case the counter Cb+CnR. If the number of bird sound pieces CbGreater than the number of noise sub-chips CnIf the sound segment V is a segment containing the bird sound, the segment is regarded as the segment containing the bird sound and is stored, otherwise, the segment does not contain the bird sound information and is discarded.
The embodiment of the invention provides a robust method for extracting a birdsong, which comprises the following steps: preprocessing the collected audio signal in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise, and obtaining a noise power spectrum estimation by a minimum value search method; inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain, and obtaining a posterior signal-to-noise ratio according to the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain; obtaining a prior signal-to-noise ratio estimation of an H domain according to the posterior signal-to-noise ratio and a guide decision method, and smoothing the prior signal-to-noise ratio to obtain a smooth prior signal-to-noise ratio estimation; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments; during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound. The invention can accurately extract the voiced segments and automatically eliminate some human voices and other animal voices, has good effect under the condition of low signal-to-noise ratio, has lower algorithm complexity and strong real-time property, and can be applied to a bird voice acquisition system.
Second embodiment of the invention:
please refer to fig. 4.
As shown in fig. 4, the present embodiment further provides a robust extraction apparatus of birdsound, including:
the preprocessing module 201 is configured to preprocess the acquired audio signal within the target range to obtain a power spectrum of the signal with noise, smooth the power spectrum of the signal with noise, and obtain a noise power spectrum estimation by a minimum search method;
the HBank filter bank module 202 is configured to input the power spectrum of the noisy signal and the power spectrum estimation of the noise to a preset HBank filter bank, so as to obtain a power spectrum of the noisy signal in the H domain and a power spectrum estimation of the noise in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain;
the voiced frame processing module 203 is configured to obtain a priori signal-to-noise ratio estimate of an H domain according to the posterior signal-to-noise ratio and a guide decision method, and perform smoothing processing on the priori signal-to-noise ratio to obtain a smoothed priori signal-to-noise ratio estimate; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments;
a bird sound fragment screening module 204 to: during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound.
For the birdsound fragment screening module 204, specific use is made of: classifying the bird sound or noise of each sub-piece according to the formant frequency and the formant width, judging whether the bird sound exists or not by comparing the number of the bird sound sub-pieces in the sound section with the number of the noise sub-pieces, and if so, storing the bird sound sub-pieces, wherein the method comprises the following steps:
when the resonance peak frequency is more than 1.5kHz, the sub-piece is judged as a bird sound sub-piece;
when the formant frequency is less than 400Hz, the sub-sheet is judged as a noise sub-sheet;
the formant frequency is between 400Hz and 1.5kHz, and the sub-sheets need to be analyzed according to the width of the formant; if the width of the resonance peak is less than 500Hz, the sub-piece is judged as a bird sound sub-piece, otherwise, the sub-piece is judged as a noise sub-piece;
and counting the number of the bird sound sub-pieces and the number of the noise sub-pieces in the sound section, if the number of the bird sound sub-pieces is greater than the number of the noise sub-pieces, the sound section contains bird sounds, and the sound section is stored.
The embodiment of the invention provides a steady apparatus for extracting birdsound, which comprises: preprocessing the collected audio signal in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise, and obtaining a noise power spectrum estimation by a minimum value search method; inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain, and obtaining a posterior signal-to-noise ratio according to the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain; obtaining a prior signal-to-noise ratio estimation of an H domain according to the posterior signal-to-noise ratio and a guide decision method, and smoothing the prior signal-to-noise ratio to obtain a smooth prior signal-to-noise ratio estimation; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments; during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound. The invention can accurately extract the voiced segments and automatically eliminate some human voices and other animal voices, has good effect under the condition of low signal-to-noise ratio, has lower algorithm complexity and strong real-time property, and can be applied to a bird voice acquisition system.
An embodiment of the present invention further provides a computer-readable storage medium comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a robust birdsound extraction method as described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (10)

1. A robust extraction method of birdsong, characterized by comprising:
preprocessing the collected audio signal in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise, and obtaining a noise power spectrum estimation by a minimum value search method;
inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain, and obtaining a posterior signal-to-noise ratio according to the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain;
obtaining the prior signal-to-noise ratio estimation of the H domain according to the posterior signal-to-noise ratio and a guide decision method; smoothing the prior signal-to-noise ratio to obtain a smoothed prior signal-to-noise ratio estimate; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments;
during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and then obtaining the formant frequency and the formant width of the sub-sheet;
according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound.
2. The robust extraction method of bird sounds according to claim 1, wherein the classifying of bird sounds or noises for each sub-slice according to the formant frequency and formant width, and the comparing of the number of bird sound sub-slices and the number of noise sub-slices in the voiced segment to determine whether bird sounds exist, if yes, the storing of the segment with bird sounds comprises:
when the resonance peak frequency is more than 1.5kHz, the sub-piece is judged as a bird sound sub-piece;
when the formant frequency is less than 400Hz, the sub-sheet is judged as a noise sub-sheet;
the formant frequency is between 400Hz and 1.5kHz, and the sub-sheets need to be analyzed according to the width of the formant; if the width of the resonance peak is less than 500Hz, the sub-piece is judged as a bird sound sub-piece, otherwise, the sub-piece is judged as a noise sub-piece;
and counting the number of the bird sound sub-pieces and the number of the noise sub-pieces in the sound section, if the number of the bird sound sub-pieces is greater than the number of the noise sub-pieces, the sound section contains bird sounds, and the sound section is stored.
3. A robust guano extraction method as claimed in claim 1, wherein the HBank filter bank is parametrically set: at a frequency FCBuild filter for center, build M on its left sideLFilter, right side set up MHA filter of (M)L+1+MH) A filter covering a linear frequency range of FL~FH
4. A robust guano extraction method as claimed in claim 1, wherein the functional expression of the HBank filter bank is:
Figure FDA0002481329440000021
the central frequency expression of the HBank filter is as follows:
Figure FDA0002481329440000031
5. a robust pitch extraction method as recited in claim 1, wherein the functional expression of the smoothed a priori signal-to-noise ratio is:
ζH(λ,b)=αζ×ζH(λ-1,b)+(1-αζ)×ξH(λ,b)。
6. a robust extraction method of birdsound according to claim 1, wherein the formula for calculating the prior probability of the voiced frames is:
Figure FDA0002481329440000032
7. a robust extraction method of birdsound according to claim 1, characterized in that the audio signal is preprocessed, in particular:
and carrying out sub-band separation, frame shift, windowing and Fourier transform on the audio signal to obtain a power spectrum of the signal with noise.
8. A robust birdsound extraction apparatus, comprising:
the preprocessing module is used for preprocessing the collected audio signals in the target range to obtain a power spectrum of the signal with noise, smoothing the power spectrum of the signal with noise and obtaining a noise power spectrum estimation through a minimum search method;
the HBank filter bank module is used for inputting the power spectrum of the signal with noise and the noise power spectrum estimation to a preset HBank filter bank respectively to obtain the power spectrum of the signal with noise in the H domain and the noise power spectrum estimation in the H domain; wherein, the correlation domain of the HBank filter bank is the H domain;
the voiced frame processing module is used for obtaining the prior signal-to-noise ratio estimation of an H domain according to the posterior signal-to-noise ratio and a guide decision method, and smoothing the prior signal-to-noise ratio to obtain a smooth prior signal-to-noise ratio estimation; obtaining the prior probability of the sound frame according to the average value of the smooth prior signal-to-noise ratio; when the probability value is larger than a set threshold value, the current frame is judged as a voiced frame, and continuous voiced frame signals are collected to obtain voiced segments;
a birdsound fragment screening module to: during the period of the voiced segment, combining the current frame and the previous 5 frames into a sub-slice, and calculating the linear prediction coefficient of the sub-slice; fourier transform is carried out on the linear prediction coefficient to obtain the power spectrum amplitude response of the linear prediction model of the sub-slice; normalizing the power spectrum amplitude response, searching a frequency point corresponding to a spectrum peak, and acquiring the formant frequency and the formant width of the sub-slice; according to the formant frequency and the formant width, classifying the bird sound or noise of each sub-piece, and counting the number of the bird sound sub-pieces and the noise sub-pieces in the voiced segment; and judging whether the sound segment has the bird sound or not by comparing the number of the bird sound sub-pieces with the number of the noise sub-pieces, and if so, storing the sound segment with the bird sound.
9. The robust extraction of birdsound apparatus according to claim 8, wherein the birdsound fragment filtering module is configured to: classifying the bird sound or noise of each sub-piece according to the formant frequency and the formant width, judging whether the bird sound exists or not by comparing the number of the bird sound sub-pieces in the sound section with the number of the noise sub-pieces, and if so, storing the bird sound sub-pieces, wherein the method comprises the following steps:
when the resonance peak frequency is more than 1.5kHz, the sub-piece is judged as a bird sound sub-piece;
when the formant frequency is less than 400Hz, the sub-sheet is judged as a noise sub-sheet;
the formant frequency is between 400Hz and 1.5kHz, and the sub-sheets need to be analyzed according to the width of the formant; if the width of the resonance peak is less than 500Hz, the sub-piece is judged as a bird sound sub-piece, otherwise, the sub-piece is judged as a noise sub-piece;
and counting the number of the bird sound sub-pieces and the number of the noise sub-pieces in the sound section, if the number of the bird sound sub-pieces is greater than the number of the noise sub-pieces, the sound section contains bird sounds, and the sound section is stored.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer-readable storage medium, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a robust extraction method of birdsound as recited in any one of claims 1 to 7.
CN202010379824.XA 2020-05-07 2020-05-07 Stable bird sound extraction method and device and computer readable storage medium Active CN111540368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010379824.XA CN111540368B (en) 2020-05-07 2020-05-07 Stable bird sound extraction method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010379824.XA CN111540368B (en) 2020-05-07 2020-05-07 Stable bird sound extraction method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111540368A true CN111540368A (en) 2020-08-14
CN111540368B CN111540368B (en) 2023-03-14

Family

ID=71977496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010379824.XA Active CN111540368B (en) 2020-05-07 2020-05-07 Stable bird sound extraction method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111540368B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112908344A (en) * 2021-01-22 2021-06-04 广州大学 Intelligent recognition method, device, equipment and medium for bird song
CN113314127A (en) * 2021-04-23 2021-08-27 广州大学 Space orientation-based bird song recognition method, system, computer device and medium
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060150920A1 (en) * 2005-01-11 2006-07-13 Patton Charles M Method and apparatus for the automatic identification of birds by their vocalizations
CN102708860A (en) * 2012-06-27 2012-10-03 昆明信诺莱伯科技有限公司 Method for establishing judgment standard for identifying bird type based on sound signal
WO2016176887A1 (en) * 2015-05-06 2016-11-10 福州大学 Animal sound identification method based on double spectrogram features
CN108694953A (en) * 2017-04-07 2018-10-23 南京理工大学 A kind of chirping of birds automatic identifying method based on Mel sub-band parameter features
CN110047519A (en) * 2019-04-16 2019-07-23 广州大学 A kind of sound end detecting method, device and equipment
CN110246504A (en) * 2019-05-20 2019-09-17 平安科技(深圳)有限公司 Birds sound identification method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060150920A1 (en) * 2005-01-11 2006-07-13 Patton Charles M Method and apparatus for the automatic identification of birds by their vocalizations
CN102708860A (en) * 2012-06-27 2012-10-03 昆明信诺莱伯科技有限公司 Method for establishing judgment standard for identifying bird type based on sound signal
WO2016176887A1 (en) * 2015-05-06 2016-11-10 福州大学 Animal sound identification method based on double spectrogram features
CN108694953A (en) * 2017-04-07 2018-10-23 南京理工大学 A kind of chirping of birds automatic identifying method based on Mel sub-band parameter features
CN110047519A (en) * 2019-04-16 2019-07-23 广州大学 A kind of sound end detecting method, device and equipment
CN110246504A (en) * 2019-05-20 2019-09-17 平安科技(深圳)有限公司 Birds sound identification method, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘钊等: "随机森林和大规模声学特征的噪声环境鸟声识别仿真", 《系统仿真技术》 *
徐淑正等: "基于MFCC和时频图等多种特征的综合鸟声识别分类器设计", 《实验室研究与探索》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112908344A (en) * 2021-01-22 2021-06-04 广州大学 Intelligent recognition method, device, equipment and medium for bird song
CN112908344B (en) * 2021-01-22 2023-08-08 广州大学 Intelligent bird song recognition method, device, equipment and medium
CN113314127A (en) * 2021-04-23 2021-08-27 广州大学 Space orientation-based bird song recognition method, system, computer device and medium
CN113314127B (en) * 2021-04-23 2023-10-10 广州大学 Bird song identification method, system, computer equipment and medium based on space orientation
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111540368B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN106935248B (en) Voice similarity detection method and device
CN111540368B (en) Stable bird sound extraction method and device and computer readable storage medium
US7756700B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
KR101266894B1 (en) Apparatus and method for processing an audio signal for speech emhancement using a feature extraxtion
US9364669B2 (en) Automated method of classifying and suppressing noise in hearing devices
CN102930870B (en) Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
WO2014153800A1 (en) Voice recognition system
CN103280220A (en) Real-time recognition method for baby cry
CN103646649A (en) High-efficiency voice detecting method
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
CN108682432B (en) Speech emotion recognition device
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
Couvreur et al. Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
CN111755025B (en) State detection method, device and equipment based on audio features
KR101671305B1 (en) Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same
Pham et al. Using artificial neural network for robust voice activity detection under adverse conditions
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium
CN113744725B (en) Training method of voice endpoint detection model and voice noise reduction method
CN115331678A (en) Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient
JP4537821B2 (en) Audio signal analysis method, audio signal recognition method using the method, audio signal section detection method, apparatus, program and recording medium thereof
CN111968673A (en) Audio event detection method and system
CN110689875A (en) Language identification method and device and readable storage medium
Fahmeeda et al. Voice Based Gender Recognition Using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant