CN106782565A - A kind of vocal print feature recognition methods and system - Google Patents

A kind of vocal print feature recognition methods and system Download PDF

Info

Publication number
CN106782565A
CN106782565A CN201611075677.7A CN201611075677A CN106782565A CN 106782565 A CN106782565 A CN 106782565A CN 201611075677 A CN201611075677 A CN 201611075677A CN 106782565 A CN106782565 A CN 106782565A
Authority
CN
China
Prior art keywords
vocal print
frequency
print feature
signal
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611075677.7A
Other languages
Chinese (zh)
Inventor
徐晓东
张程
张毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Heavy Chi Robot Research Institute Co Ltd
Original Assignee
Chongqing Heavy Chi Robot Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Heavy Chi Robot Research Institute Co Ltd filed Critical Chongqing Heavy Chi Robot Research Institute Co Ltd
Priority to CN201611075677.7A priority Critical patent/CN106782565A/en
Publication of CN106782565A publication Critical patent/CN106782565A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The embodiment of the present invention provides a kind of vocal print feature recognition methods and system, wherein, the process that implements of the method is, after the speech Separation treatment based on auditory properties is carried out to pretreated noisy mixed noise, extract the frequency cepstral coefficient of signal and perceive linear predictor coefficient, and utilize noise background discrimination, completion Fusion Features are analyzed to frequency cepstral coefficient and perception linear predictor coefficient under different noise circumstances, finally in the vocal print feature ATL for pre-building, pattern match is carried out to the feature for having completed fusion using gauss hybrid models universal background model, realize that vocal print feature is recognized.With traditional method for recognizing sound-groove be combined human auditory system properties by this kind of vocal print feature recognition methods, from bionics angle solve the problems, such as noise under the reduction of Application on Voiceprint Recognition rate, effectively improve the robustness of the accuracy rate of vocal print feature identification and system under noise circumstance.

Description

A kind of vocal print feature recognition methods and system
Technical field
The present invention relates to voice recognition technology field, in particular to a kind of vocal print feature recognition methods and system.
Background technology
Early in the thirties in 20th century, the research of Application on Voiceprint Recognition has just been expanded in information researcher.In early stage In research, human ear listens the emphasis for distinguishing that (Aural) experiment and audition identification feasibility checking are Application on Voiceprint Recognition field.With computer Breakthrough of the science and technology in hardware and algorithm, the research of Application on Voiceprint Recognition is no longer limited only to single human ear to be listened and distinguishes.U.S. Bell Laboratory is chronically at leading position in field of speech recognition, and laboratory member L G.Kesta divide by voice spectrum figure Analysis completes identification, and first proposed " Application on Voiceprint Recognition " this concept by him.With researcher in Application on Voiceprint Recognition field not Disconnected exploration and innovation, automatically analyzes machine and recognizes that human speech signal becomes possibility.But, existing vocal print is special at present Levy recognition methods recognition accuracy in a noisy environment all universal relatively low, system robustness is poor, and application effect is not good.
The content of the invention
It is an object of the invention to provide a kind of vocal print feature recognition methods and system, to improve above mentioned problem.
Present pre-ferred embodiments provide a kind of vocal print feature recognition methods, and the method includes:
Primary speech signal to being input into is pre-processed, and the pretreatment includes preemphasis, framing adding window and end points Detection;
Noisy mixed signal to being obtained after pretreatment carries out the speech Separation based on auditory properties and processes;
Extract the frequency cepstral coefficient of the signal after being processed through speech Separation and perceive linear predictor coefficient;
Using noise background discrimination, frequency cepstral coefficient and perception linear predictor coefficient are entered under different noise circumstances Row is analyzed to complete Fusion Features;And
In the vocal print feature ATL for pre-building, using gauss hybrid models-universal background model to having completed to melt The feature of conjunction carries out pattern match, realizes that vocal print feature is recognized.
Another embodiment of the present invention provides a kind of vocal print feature identifying system, and the system includes:
Pretreatment module, for being pre-processed to the primary speech signal being input into, the pretreatment includes preemphasis, divides Frame adding window and end-point detection;
Speech Separation module, for carrying out the voice point based on auditory properties to the noisy mixed signal obtained after pretreatment From treatment;
Characteristic extracting module, for extracting the frequency cepstral coefficient of the signal after being processed through speech Separation and perceiving linear pre- Survey coefficient;
Fusion Features module, for utilizing noise background discrimination, under different noise circumstances to frequency cepstral coefficient and Linear predictor coefficient is perceived to be analyzed to complete Fusion Features;And
Feature recognition module, in the vocal print feature ATL for pre-building, using gauss hybrid models-general back of the body Scape model carries out pattern match to the feature for having completed fusion, realizes that vocal print feature is recognized.
Vocal print feature recognition methods provided in an embodiment of the present invention and system, by human auditory system properties and traditional vocal print Recognition methods is combined, from bionics angle solve the problems, such as noise under Application on Voiceprint Recognition rate reduction, effectively improve noise circumstance The accuracy rate of lower Application on Voiceprint Recognition and the robustness of system.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be attached to what is used needed for embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, thus be not construed as it is right The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is a kind of block diagram of speech recognition apparatus provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of vocal print feature recognition methods provided in an embodiment of the present invention;
Fig. 3 is the geometrical principle figure of ears time difference provided in an embodiment of the present invention;
Fig. 4 is a kind of functional block diagram of vocal print feature identifying system provided in an embodiment of the present invention.
Icon:100- speech recognition apparatus;110- vocal print feature identifying systems;120- memories;130- processors; 1102- pretreatment modules;1104- speech Separation modules;1106- characteristic extracting modules;1108- Fusion Features modules;1110- is special Levy identification module.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Present invention implementation generally described and illustrated in accompanying drawing herein The component of example can be arranged and designed with a variety of configurations.Therefore, reality of the invention below to providing in the accompanying drawings The detailed description for applying example is not intended to limit the scope of claimed invention, but is merely representative of selected implementation of the invention Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made Every other embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, being a kind of block diagram of speech recognition apparatus 100 provided in an embodiment of the present invention.Institute's predicate Sound identification equipment 100 includes vocal print feature identifying system 110, memory 120 and processor 130.Wherein, the memory Directly or indirectly it is electrically connected between 120 and processor 130, to carry out data transmission or interact.The vocal print feature identification System 110 can be stored in the memory 120 in the form of software or firmware including at least one or be solidificated in the voice Software function module in the operating system of identification equipment 100.The processor 130 accesses institute under the control of storage control Memory 120 is stated, for performing the executable module stored in the memory 120, such as described vocal print feature identification system Software function module and computer program included by system 110 etc..
As shown in Fig. 2 in being a kind of speech recognition apparatus 100 being applied to shown in Fig. 1 provided in an embodiment of the present invention The schematic flow sheet of vocal print feature recognition methods.It should be noted that, the method that the present invention is provided is not with Fig. 2 and as described below Particular order is limitation.Each step shown in Fig. 2 will be described in detail below.
Step S101, the primary speech signal to being input into is pre-processed, and the pretreatment includes preemphasis, framing adding window And end-point detection.
In the present embodiment, the primary speech signal of the speech recognition apparatus 100 is input into, single order FIR high pass numbers are crossed first Word wave filter realizes preemphasis, and its transmission function is:
H (Z)=1- μ Z-1
Wherein, coefficient μ values are that between 0 to 1, its value can determine according to priori rule, generally desirable 0.94.
Then, the voice signal that will be obtained after preemphasis carries out framing, and is multiplied by the Moving Window w (n-m) that amplitude is k.K can By certain function value, will have certain addition to each sampling value of framing.After through framing windowing process, the voice for obtaining Signal is represented by:
Wherein, T [] represents a kind of functional transformation, and x (m) represents voice signal sequence, and Q (n) represents each section by treatment The time series for obtaining afterwards.
Finally, the end points of voice signal is detected.In the present embodiment, language is mainly realized by short-time energy and short-time zero-crossing rate The end-point detection of message number.
Specifically, short-time energy is expressed as:
Wherein, N represents analysis window width, and S (n) represents the signal sampling value of n-th point in t frame voice signals.
Short-time zero-crossing rate is expressed as:
Wherein, Sgn [] represents zero-crossing rate function.
Step S103, the noisy mixed signal to being obtained after pretreatment carries out the speech Separation based on auditory properties and processes.
In the present embodiment, the process that the bionical separating treatment based on auditory properties is carried out to voice signal is, based on periphery Auditory model carries out after resolution process obtains time frequency unit noisy mixed signal, poly- to time frequency unit according to speech Separation clue Class, the voice after being separated eventually through the output of speech reconstruction model.Speech reconstruction model completes the cluster and voice of time frequency unit Stream synthesis, mainly includes two-value mask cluster and recombination model two parts.
Masking model for the i-th frequency channel and jth time frame may be defined as following formula:
Wherein, fc=1500Hz represents the critical frequency of high frequency and middle low frequency, fiRepresent the frequency of the i-th frequency channel, τ (i, J) represent that the i-th frequency channel separates clue with one of jth time frame, L (i, j) represents the i-th frequency channel with jth time frame Another separates clue, Tτ(i, j) and Tl(i, j) represents that above-mentioned two separates the threshold value of clue respectively.
In order to improve the reduction degree of reconstructed voice, first have to carry out prosody adjustment to signal to be synthesized.The rhythm is adjusted The adjustment of information etc. whole amplitude including to voice, length, fundamental tone.Wherein, the amplitude adjustment to voice signal can be by weighting Mode realize that weights formula is expressed as:
τ in formula is signal frame length, and n is moved for frame.
Reconstruction formula is:
In formulaIt is the recombination signal for obtaining, tjIt is the synchronous mark of recombination, hjN () is peripheral auditory model In window function,It is Short Time Speech signal, the adjustment of amplitude is then realized by the weights g in above-mentioned weights formula.
In addition, in the present embodiment, the speech Separation clue can be interaural time difference (Interaural Time Difference, ITD) or binaural sound it is differential (Interaural Level Difference, ILD).Sound is listened to distinguish position from human ear Angle is set out, and simulation human ear differentiates the process of sound, will reflect speech Separation the clue ITD and ILD of acoustic space azimuth information Speech Separation efficiency will be effectively lifted for speech Separation.Below, the realization principle to ITD and ILD is briefly described.
During human auditory system carries out speech Separation, ITD is mainly used in the treatment of centering low frequency voice signal. For simplicity, this section will illustrate the generation principle of ITD by taking single sound source as an example.It is assumed that a certain sound source then may be used closer to left ear Represent that voice signal reaches the process of left ear with α sin2 π ft.And distant auris dextra is then (α-Δ α) sin2 π f (t+ Δs T), wherein f is frequency, and Δ t is time difference information, and representative voice propagates to the time difference of ears, i.e. ITD, and Δ α believes for intensity difference Breath, the sound pressure that representative voice travels to ears is poor, i.e. ILD.According to both information, can be by the difference of sound source position PMD EDM is separated.
As shown in figure 3, being the geometrical principle figure of ears time difference.In Fig. 3, S is sound source position, and A and B is left and right ear, and D is Distance between them, angle C represents the angle of sound source and brain center, and d is that the distance between sound two ears of arrival are poor, is expressed as d =Dsin α.
The voice signal of input, is carried out windowing process, generally by window function by the computation structure figure of ITD values first See the unit impulse response of wave filter as.Hamming window is selected in the present embodiment, to ensure that voice signal is in short-time analysis Smoothly.The expression formula of Hamming window is:
In formula, N represents that window is long.Signal by adding window is transformed into frequency domain by Fourier transformation, such as following two formulas institutes Show:
The cross-correlation of the voice signal of left and right ear is reached, can be expressed as:
Normally, each transfer function hl(t) and hrT () all can come near by the decay factor of an amplitude and a time delay Can be expressed as like expression, therefore the formula of cross-correlation:
In formula, α represents decay factor, and D represents the value of ITD.According to above-mentioned analysis, the ITD voice signal to low frequency Separation is worked, auto-correlation function RssMaximum is reached at τ=0, therefore the value D of ITD can be expressed as:
Crosspower spectrum is defined as two Fourier of the cross-correlation of signal and calculates, such as following formula:
The specific calculating of the formula is with formula:
Represent XrThe complex conjugate of (ω), Fourier conversion is done to the formula, and the power spectrum that can receive signal is:
As can be seen from the above equation, phase of the D values of ITD only with crosspower spectrum is relevant, and cross-correlation is carried out into standards change can :
Thus, the D values of ITD can be accurately calculated as:
ILD represents the acoustic pressure difference that sound-source signal reaches two ears.When the distance that sound is delivered to left and right ear produces difference, Acoustic pressure difference will be caused, and this information provides another clue-ILD for speech Separation.Research shows, in high-frequency region, ILD will play more effects.After voice signal frequency is more than 1500Hz, due to the screening of human auditory peripheral's such as auricle Effect is covered, voice signal will produce stronger sound shadow effect and hinder voice signal to be delivered to inner ear.Produce this result Principal element is that the voice signal wavelength of low frequency is shorter, it is difficult to which diffraction through auricle occurs, and the sound of high frequency can then be bypassed Auricle, therefore in order to separate the voice signal of high frequency, ears level difference need to be extracted.
Calculating ILD needs spectral line rope, and in the case where echo is ignored, the energy spectrum of the signal that left and right ear is received can Expressed with by following two formulas:
Pl(ω)=S (ω) | Hl(ω)|2
Pr(ω)=S (ω) | Hr(ω)|2
In formula, S (ω) is the power spectrum of sound source, and Hl(ω) and Hr(ω) represents the transmission letter of left and right ear respectively Number.Therefore, the intensity difference of left and right ear can be expressed as:
Il(ω)=10log10Pl(ω)=10log10S(ω)+20log10|Hl(ω)|
And
Ir(ω)=10log10Pr(ω)=10log10S(ω)+20log10|Hr(ω)|
Normally, ears level difference can be used for extracting the separation information of High frequency speech signal, and extract binaural sound During differential information, sound source is changed into simple addition from being multiplied with passage relation.Simple addition relation helps subsequently to calculate ILD Extract channel information.
After intensity is calculated, voice signal will be by COCHLEAR FILTER.ILD information only is extracted in HFS, not only The size of feature space is reduced, and the resonance of cochlea frequency selection in human auditory's cental system can be simulated.
Because ILD only works to the voice signal higher than 1500Hz, so there is the interruption of ears level difference extraction frequently Rate fcut, its computing formula:
In formula, C represents the aerial spread speed of voice signal, dαThe aperture of physical size is represented, only in subband Reach interruption frequency fcutILD clues could be calculated later.
Therefore the subband i of interruption frequency is reached for each, there is following formula to set up:
In formula, ΩiIt is the frequency range of subband i, Wi(ω) is the weight of COCHLEAR FILTER.
Therefore the ILD of each subband i is defined as:
Step S105, extracts the frequency cepstral coefficient of the signal after being processed through speech Separation and perceives linear predictor coefficient.
It is well known that having adopted used characteristic parameter predominantly cepstrum coefficient in Application on Voiceprint Recognition research.Cepstrum coefficient reflects Human vocal tract's principle of sound, extraction process median filter group reflection human hearing characteristic.In the present embodiment, to mel-frequency cepstrum Coefficient (MFCC) is improved, and frequency cepstral coefficient is extracted based on Gammatone wave filter groups.
Function is similar with human auditory system periphery in speech signal processing for Gammatone wave filter groups, can preferable simulation Basilar membrane characteristic, to voice signal scaling down processing;Meddis models can well complete the mould of internal tragus cell characteristics Intend, can accurately describe the granting speed of auditory nerve, both constitute complete sense of hearing periphery model.
When voice signal enters human ear, basilar membrane frequency dividing is first passed around, simulated by Gammatone wave filter groups, filtered Ripple device group time-domain expression is as follows:
In formula, N is the number of wave filter, and i is ordinal number, and n is filter order, takes n=4, φiIt is the initial phase of wave filter Position, fiIt is the centre frequency of each wave filter, biIt is decay factor.
Single filter bandwidth is related to human auditory system critical band in Gammatone wave filter groups, auditory critical band Measured with equivalent rectangular bandwidth and be:
EBR (f)=24.7* (4.37f/1000+1)
For centre frequency fi, decay factor b can be corresponded toi
bi=1.019EBR (fi)
To formulaLaplace transformation is carried out to obtain:
And transform is transformed into, inverse transformation is finally carried out again can obtain the discrete shock response of Gammatone wave filter groups:
Step S107, it is linear to frequency cepstral coefficient and perception under different noise circumstances using noise background discrimination Predictive coefficient is analyzed to complete Fusion Features.
DRThe ratio between inter _ class relationship and within-cluster variance for being characterized, reflect in vocal print feature ATL area between each feature Indexing, this discrimination can Efficient Characterization vocal print feature whether adapt to noise circumstance.Vocal print feature is obtained in different signal to noise ratio rings D under borderRValue, further analyzes feature robustness in a noisy environment.DRExpression formula it is as follows:
μ is the mean eigenvalue of all speakers in vocal print feature ATL, μ in formulaiIt is i-th average spy of speaker Value indicative, M is speaker's number in vocal print feature ATL, and N is single speaker's voice signal frame number.
Phonetic feature is generally stored with a matrix type after extraction, can be represented with multidimensional characteristic vectors, to each dimension Discrimination research understands every one-dimensional characteristic parameter robustness in a noisy environment between characteristic vector carries out class, on this basis It is capable of achieving the data fusion to different vocal print features.It is assumed that feature A and feature B is represented by X peacekeeping Y dimensional feature vectors respectively:
A={ α12,......αX}'
B={ β12,.......βY}'
Discrimination analysis carrying out class to two kinds of vocal print features, the D of feature A and feature BRMatrix is as follows:
To study two kinds of every one-dimensional performances of vocal print features in noise circumstance, under various signal to noise ratio environment, to vocal print Speaker extracts feature A and B in feature templates storehouse, and counts DRMaximum DRMax is in eigenmatrix per one-dimensional number of times P:
To ensure that each vectorial weight of fusion feature matrix is appropriate, threshold value P is set according to statistical conditionsth, PthBy concrete outcome It is selected, to Px, PyIt is regular after and ask:
ε=max { Px,Py,Pth}
Fusion feature characteristic parameter C is obtained, expression formula is as follows:
C={ γ12,.......γZ}'
Step S109, in the vocal print feature ATL for pre-building, using gauss hybrid models-universal background model pair The feature for having completed fusion carries out pattern match, realizes that vocal print feature is recognized.
In the present embodiment, the model of pattern match is gauss hybrid models-universal background model (GMM-OUM models).It is high The essence of this mixed model (GMM model) is the probability density function of various dimensions, is tieed up for d and degree of mixing is the GMM model of M, Can be expressed as by the weighted sum of Gaussian function:
In formula, ∑iIt is covariance matrix, piIt is i-th Gaussian function of component d dimensions of GMM model, x is the observation arrow of d dimensions Amount, wiIt is mixed weight-value, and meetsμiIt is mean value vector.
As shown in figure 4, being a kind of functional block diagram of vocal print feature identifying system 110 provided in an embodiment of the present invention. The vocal print feature identifying system 110 includes pretreatment module 1102, speech Separation module 1104, characteristic extracting module 1106, spy Levy Fusion Module 1108 and feature recognition module 1110.
The pretreatment module 1102, for being pre-processed to the primary speech signal being input into, the pretreatment includes Preemphasis, framing adding window and end-point detection;
The speech Separation module 1104, for carrying out the noisy mixed signal obtained after pretreatment based on auditory properties Speech Separation treatment;
The characteristic extracting module 1106, frequency cepstral coefficient and sense for extracting the signal after being processed through speech Separation Know linear predictor coefficient;
The Fusion Features module 1108, for utilizing noise background discrimination, falls under different noise circumstances to frequency Spectral coefficient and perception linear predictor coefficient are analyzed to complete Fusion Features;
The feature recognition module 1110, in the vocal print feature ATL for pre-building, using Gaussian Mixture mould Type-universal background model carries out pattern match to the feature for having completed fusion, realizes that vocal print feature is recognized.
The concrete operation method of each functional module described in the present embodiment can refer to the detailed of the corresponding steps shown in Fig. 2 Thin to illustrate, it is no longer repeated herein.
In sum, vocal print feature recognition methods provided in an embodiment of the present invention and system, solve to make an uproar from bionics angle The problem of Application on Voiceprint Recognition rate reduction, effectively improves the accuracy rate and the robustness of system of Application on Voiceprint Recognition under noise circumstance under sound.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, it is also possible to pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments of the invention, the architectural framework in the cards of method and computer program product, Function and operation.At this point, each square frame in flow chart or block diagram can represent one the one of module, program segment or code Part a, part for the module, program segment or code is used to realize holding for the logic function for specifying comprising one or more Row instruction.It should also be noted that at some as in the implementation replaced, the function of being marked in square frame can also be being different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially be performed substantially in parallel, they are sometimes Can perform in the opposite order, this is depending on involved function.It is also noted that every in block diagram and/or flow chart The combination of the square frame in individual square frame and block diagram and/or flow chart, can use the function or the special base of action for performing regulation Realized in the system of hardware, or can be realized with the combination of computer instruction with specialized hardware.
If the function is to realize in the form of software function module and as independent production marketing or when using, can be with Storage is in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are used to so that a computer equipment (can be individual People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the invention.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of vocal print feature recognition methods, it is characterised in that the method includes:
Primary speech signal to being input into is pre-processed, and the pretreatment includes preemphasis, framing adding window and end-point detection;
Noisy mixed signal to being obtained after pretreatment carries out the speech Separation based on auditory properties and processes;
Extract the frequency cepstral coefficient of the signal after being processed through speech Separation and perceive linear predictor coefficient;
Using noise background discrimination, frequency cepstral coefficient and perception linear predictor coefficient are divided under different noise circumstances Analyse to complete Fusion Features;
In the vocal print feature ATL for pre-building, using gauss hybrid models-universal background model to having completed fusion Feature carries out pattern match, realizes that vocal print feature is recognized.
2. vocal print feature recognition methods according to claim 1, it is characterised in that the noisy mixing to being obtained after pretreatment Signal carries out the step of speech Separation based on auditory properties is processed to be included:
The noisy mixed signal is decomposed, multiple time frequency units are obtained;
The multiple time frequency unit that decomposition is obtained is clustered according to speech Separation clue;
Signal to be synthesized after to cluster carries out speech reconstruction, the voice after output separation.
3. vocal print feature recognition methods according to claim 2, it is characterised in that the speech Separation clue includes two ears The time difference and binaural sound are differential.
4. the vocal print feature recognition methods according to Claims 2 or 3, it is characterised in that according to speech Separation clue to dividing The step of the multiple time frequency unit that solution is obtained is clustered includes:
According to masking modelTwo-value is carried out to the multiple time frequency unit to cover Code cluster, wherein, fiRepresent the frequency of the i-th frequency channel, fcThe critical frequency between high frequency and middle low frequency is represented, τ (i, j) is represented I-th frequency channel separates another of clue, L (i, j) the i-th frequency channels of expression and jth time frame with one of jth time frame Separate clue, Tτ(i, j) and Tl(i, j) represents two threshold values of separation clue respectively.
5. vocal print feature recognition methods according to claim 2, it is characterised in that the signal to be synthesized after to cluster is carried out The step of speech reconstruction, includes:
Prosody adjustment is carried out to the signal to be synthesized, the rhythm includes amplitude, length and fundamental tone;
According to reconstruction formula:To carrying out speech reconstruction through the signal after prosody adjustment, its In, tjRepresent the synchronous mark of reconstruct, hjN () represents window function,Represent Short Time Speech signal, gjThe adjustment of expression amplitude Weights.
6. vocal print feature recognition methods according to claim 1, it is characterised in that extract the letter after being processed through speech Separation Number frequency cepstral coefficient and perceive linear predictor coefficient the step of include:
The frequency cepstral coefficient of the signal after extracting the treatment through speech Separation based on Gammatone wave filter groups.
7. a kind of vocal print feature identifying system, it is characterised in that the system includes:
Pretreatment module, for being pre-processed to the primary speech signal being input into, the pretreatment includes that preemphasis, framing add Window and end-point detection;
Speech Separation module, for being carried out at the speech Separation based on auditory properties to the noisy mixed signal obtained after pretreatment Reason;
Characteristic extracting module, for extracting the frequency cepstral coefficient of the signal after being processed through speech Separation and perceiving linear prediction system Number;
Fusion Features module, for utilizing noise background discrimination, to frequency cepstral coefficient and perception under different noise circumstances Linear predictor coefficient is analyzed to complete Fusion Features;
Feature recognition module, in the vocal print feature ATL for pre-building, using gauss hybrid models-common background mould Type carries out pattern match to the feature for having completed fusion, realizes that vocal print feature is recognized.
8. vocal print feature identifying system according to claim 7, it is characterised in that the speech Separation module is to pretreatment The mode that the noisy mixed signal for obtaining afterwards carries out the speech Separation treatment based on auditory properties includes:
The noisy mixed signal is decomposed, multiple time frequency units are obtained;
The multiple time frequency unit that decomposition is obtained is clustered according to speech Separation clue;
Signal to be synthesized after to cluster carries out speech reconstruction, the voice after output separation.
9. vocal print feature identifying system according to claim 8, it is characterised in that the speech Separation module is according to voice Separate clue includes to the mode that the multiple time frequency unit that decomposition is obtained is clustered:
According to masking modelTwo-value is carried out to the multiple time frequency unit Mask is clustered, wherein, fiRepresent the frequency of the i-th frequency channel, fcRepresent the critical frequency between high frequency and middle low frequency, τ (i, j) table Show that the i-th frequency channel separates clue with one of jth time frame, L (i, j) represents that the i-th frequency channel is another with jth time frame Individual separation clue, Tτ(i, j) and Tl(i, j) represents two threshold values of separation clue respectively.
10. vocal print feature identifying system according to claim 7, it is characterised in that the characteristic extracting module extracts warp The frequency cepstral coefficient of the signal after speech Separation treatment and the mode of perception linear predictor coefficient include:
The frequency cepstral coefficient of the signal after extracting the treatment through speech Separation based on Gammatone wave filter groups.
CN201611075677.7A 2016-11-29 2016-11-29 A kind of vocal print feature recognition methods and system Pending CN106782565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611075677.7A CN106782565A (en) 2016-11-29 2016-11-29 A kind of vocal print feature recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611075677.7A CN106782565A (en) 2016-11-29 2016-11-29 A kind of vocal print feature recognition methods and system

Publications (1)

Publication Number Publication Date
CN106782565A true CN106782565A (en) 2017-05-31

Family

ID=58900777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611075677.7A Pending CN106782565A (en) 2016-11-29 2016-11-29 A kind of vocal print feature recognition methods and system

Country Status (1)

Country Link
CN (1) CN106782565A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107782548A (en) * 2017-10-20 2018-03-09 韦彩霞 One kind is based on to track vehicle parts detecting system
CN108124488A (en) * 2017-12-12 2018-06-05 福建联迪商用设备有限公司 A kind of payment authentication method and terminal based on face and vocal print
CN108182945A (en) * 2018-03-12 2018-06-19 广州势必可赢网络科技有限公司 Voiceprint feature-based multi-person voice separation method and device
CN108231082A (en) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 Updating method and device for self-learning voiceprint recognition
CN108564956A (en) * 2018-03-26 2018-09-21 京北方信息技术股份有限公司 A kind of method for recognizing sound-groove and device, server, storage medium
CN108615532A (en) * 2018-05-03 2018-10-02 张晓雷 A kind of sorting technique and device applied to sound field scape
CN108847253A (en) * 2018-09-05 2018-11-20 平安科技(深圳)有限公司 Vehicle model recognition methods, device, computer equipment and storage medium
WO2018223727A1 (en) * 2017-06-09 2018-12-13 平安科技(深圳)有限公司 Voiceprint recognition method, apparatus and device, and medium
CN109031202A (en) * 2018-06-03 2018-12-18 桂林电子科技大学 indoor environment area positioning system and method based on auditory scene analysis
CN109192216A (en) * 2018-08-08 2019-01-11 联智科技(天津)有限责任公司 A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device
WO2019037426A1 (en) * 2017-08-23 2019-02-28 武汉斗鱼网络科技有限公司 Mfcc voice recognition method, storage medium, electronic device, and system
CN109410976A (en) * 2018-11-01 2019-03-01 北京工业大学 Sound enhancement method based on binaural sound sources positioning and deep learning in binaural hearing aid
CN110299143A (en) * 2018-03-21 2019-10-01 现代摩比斯株式会社 The devices and methods therefor of voice speaker for identification
CN110364168A (en) * 2019-07-22 2019-10-22 南京拓灵智能科技有限公司 A kind of method for recognizing sound-groove and system based on environment sensing
CN110473553A (en) * 2019-08-29 2019-11-19 南京理工大学 The recognition methods of the elderly and physical disabilities speaker based on auditory system model
CN110473566A (en) * 2019-07-25 2019-11-19 深圳壹账通智能科技有限公司 Audio separation method, device, electronic equipment and computer readable storage medium
CN110648553A (en) * 2019-09-26 2020-01-03 北京声智科技有限公司 Site reminding method, electronic equipment and computer readable storage medium
CN111083284A (en) * 2019-12-09 2020-04-28 Oppo广东移动通信有限公司 Vehicle arrival prompting method and related product
WO2020083110A1 (en) * 2018-10-25 2020-04-30 腾讯科技(深圳)有限公司 Speech recognition and speech recognition model training method and apparatus
CN111477235A (en) * 2020-04-15 2020-07-31 厦门快商通科技股份有限公司 Voiceprint acquisition method, device and equipment
WO2021042537A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Voice recognition authentication method and system
CN112767949A (en) * 2021-01-18 2021-05-07 东南大学 Voiceprint recognition system based on binary weight convolutional neural network
CN112863546A (en) * 2021-01-21 2021-05-28 安徽理工大学 Belt conveyor health analysis method based on audio characteristic decision
CN113011506A (en) * 2021-03-24 2021-06-22 华南理工大学 Texture image classification method based on depth re-fractal spectrum network
CN113257266A (en) * 2021-05-21 2021-08-13 特斯联科技集团有限公司 Complex environment access control method and device based on voiceprint multi-feature fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103456312A (en) * 2013-08-29 2013-12-18 太原理工大学 Single channel voice blind separation method based on computational auditory scene analysis
US9131295B2 (en) * 2012-08-07 2015-09-08 Microsoft Technology Licensing, Llc Multi-microphone audio source separation based on combined statistical angle distributions
CN105609099A (en) * 2015-12-25 2016-05-25 重庆邮电大学 Speech recognition pretreatment method based on human auditory characteristic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9131295B2 (en) * 2012-08-07 2015-09-08 Microsoft Technology Licensing, Llc Multi-microphone audio source separation based on combined statistical angle distributions
CN103456312A (en) * 2013-08-29 2013-12-18 太原理工大学 Single channel voice blind separation method based on computational auditory scene analysis
CN105609099A (en) * 2015-12-25 2016-05-25 重庆邮电大学 Speech recognition pretreatment method based on human auditory characteristic

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NICOLETA ROMAN等: ""speech segregation based on sound localization"", 《IEEE》 *
刘继芳: ""基于计算听觉场景分析的混合语音分离研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐鹤: ""城市交通环境下声纹识别算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
罗元 等: ""一种新的鲁棒声纹特征提取与融合方法"", 《计算机科学》 *
陆虎敏: "《飞机座舱显示与控制技术》", 31 December 2015, 航空工业出版社 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018223727A1 (en) * 2017-06-09 2018-12-13 平安科技(深圳)有限公司 Voiceprint recognition method, apparatus and device, and medium
WO2019037426A1 (en) * 2017-08-23 2019-02-28 武汉斗鱼网络科技有限公司 Mfcc voice recognition method, storage medium, electronic device, and system
CN107782548A (en) * 2017-10-20 2018-03-09 韦彩霞 One kind is based on to track vehicle parts detecting system
CN107782548B (en) * 2017-10-20 2020-07-07 亚太空列(河南)轨道交通有限公司 Rail vehicle part detection system
CN108124488A (en) * 2017-12-12 2018-06-05 福建联迪商用设备有限公司 A kind of payment authentication method and terminal based on face and vocal print
CN108231082A (en) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 Updating method and device for self-learning voiceprint recognition
CN108182945A (en) * 2018-03-12 2018-06-19 广州势必可赢网络科技有限公司 Voiceprint feature-based multi-person voice separation method and device
CN110299143B (en) * 2018-03-21 2023-04-11 现代摩比斯株式会社 Apparatus for recognizing a speaker and method thereof
CN110299143A (en) * 2018-03-21 2019-10-01 现代摩比斯株式会社 The devices and methods therefor of voice speaker for identification
CN108564956A (en) * 2018-03-26 2018-09-21 京北方信息技术股份有限公司 A kind of method for recognizing sound-groove and device, server, storage medium
CN108564956B (en) * 2018-03-26 2021-04-20 京北方信息技术股份有限公司 Voiceprint recognition method and device, server and storage medium
CN108615532B (en) * 2018-05-03 2021-12-07 张晓雷 Classification method and device applied to sound scene
CN108615532A (en) * 2018-05-03 2018-10-02 张晓雷 A kind of sorting technique and device applied to sound field scape
CN109031202B (en) * 2018-06-03 2022-10-04 桂林电子科技大学 Indoor environment area positioning system and method based on auditory scene analysis
CN109031202A (en) * 2018-06-03 2018-12-18 桂林电子科技大学 indoor environment area positioning system and method based on auditory scene analysis
CN109192216A (en) * 2018-08-08 2019-01-11 联智科技(天津)有限责任公司 A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device
CN108847253A (en) * 2018-09-05 2018-11-20 平安科技(深圳)有限公司 Vehicle model recognition methods, device, computer equipment and storage medium
US11798531B2 (en) 2018-10-25 2023-10-24 Tencent Technology (Shenzhen) Company Limited Speech recognition method and apparatus, and method and apparatus for training speech recognition model
WO2020083110A1 (en) * 2018-10-25 2020-04-30 腾讯科技(深圳)有限公司 Speech recognition and speech recognition model training method and apparatus
CN109410976B (en) * 2018-11-01 2022-12-16 北京工业大学 Speech enhancement method based on binaural sound source localization and deep learning in binaural hearing aid
CN109410976A (en) * 2018-11-01 2019-03-01 北京工业大学 Sound enhancement method based on binaural sound sources positioning and deep learning in binaural hearing aid
CN110364168A (en) * 2019-07-22 2019-10-22 南京拓灵智能科技有限公司 A kind of method for recognizing sound-groove and system based on environment sensing
CN110364168B (en) * 2019-07-22 2021-09-14 北京拓灵新声科技有限公司 Voiceprint recognition method and system based on environment perception
CN110473566A (en) * 2019-07-25 2019-11-19 深圳壹账通智能科技有限公司 Audio separation method, device, electronic equipment and computer readable storage medium
CN110473553A (en) * 2019-08-29 2019-11-19 南京理工大学 The recognition methods of the elderly and physical disabilities speaker based on auditory system model
WO2021042537A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Voice recognition authentication method and system
CN110648553A (en) * 2019-09-26 2020-01-03 北京声智科技有限公司 Site reminding method, electronic equipment and computer readable storage medium
CN111083284B (en) * 2019-12-09 2021-06-11 Oppo广东移动通信有限公司 Vehicle arrival prompting method and device, electronic equipment and computer readable storage medium
CN111083284A (en) * 2019-12-09 2020-04-28 Oppo广东移动通信有限公司 Vehicle arrival prompting method and related product
CN111477235A (en) * 2020-04-15 2020-07-31 厦门快商通科技股份有限公司 Voiceprint acquisition method, device and equipment
CN112767949A (en) * 2021-01-18 2021-05-07 东南大学 Voiceprint recognition system based on binary weight convolutional neural network
CN112863546A (en) * 2021-01-21 2021-05-28 安徽理工大学 Belt conveyor health analysis method based on audio characteristic decision
CN113011506A (en) * 2021-03-24 2021-06-22 华南理工大学 Texture image classification method based on depth re-fractal spectrum network
CN113011506B (en) * 2021-03-24 2023-08-25 华南理工大学 Texture image classification method based on deep fractal spectrum network
CN113257266A (en) * 2021-05-21 2021-08-13 特斯联科技集团有限公司 Complex environment access control method and device based on voiceprint multi-feature fusion

Similar Documents

Publication Publication Date Title
CN106782565A (en) A kind of vocal print feature recognition methods and system
CN109830245B (en) Multi-speaker voice separation method and system based on beam forming
CN105845127B (en) Audio recognition method and its system
CN110970053B (en) Multichannel speaker-independent voice separation method based on deep clustering
CN103456312B (en) A kind of single-channel voice blind separating method based on Computational auditory scene analysis
CN109427328B (en) Multichannel voice recognition method based on filter network acoustic model
CN110675891B (en) Voice separation method and module based on multilayer attention mechanism
CN112331218B (en) Single-channel voice separation method and device for multiple speakers
CN106057210B (en) Quick speech blind source separation method based on frequency point selection under binaural distance
Wang et al. On spatial features for supervised speech separation and its application to beamforming and robust ASR
CN110047478B (en) Multi-channel speech recognition acoustic modeling method and device based on spatial feature compensation
CN107346664A (en) A kind of ears speech separating method based on critical band
CN110111769A (en) A kind of cochlear implant control method, device, readable storage medium storing program for executing and cochlear implant
CN108122559A (en) Binaural sound sources localization method based on deep learning in a kind of digital deaf-aid
CN108091345A (en) A kind of ears speech separating method based on support vector machines
CN105225672A (en) Merge the system and method for the directed noise suppression of dual microphone of fundamental frequency information
CN103903632A (en) Voice separating method based on auditory center system under multi-sound-source environment
CN107274887A (en) Speaker's Further Feature Extraction method based on fusion feature MGFCC
CN108520756A (en) A kind of method and device of speaker's speech Separation
CN111145726A (en) Deep learning-based sound scene classification method, system, device and storage medium
Sainath et al. Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.
CN104778948A (en) Noise-resistant voice recognition method based on warped cepstrum feature
CN105609099A (en) Speech recognition pretreatment method based on human auditory characteristic
CN109448702A (en) Artificial cochlea's auditory scene recognition methods
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication