CN106782565A - A kind of vocal print feature recognition methods and system - Google Patents
A kind of vocal print feature recognition methods and system Download PDFInfo
- Publication number
- CN106782565A CN106782565A CN201611075677.7A CN201611075677A CN106782565A CN 106782565 A CN106782565 A CN 106782565A CN 201611075677 A CN201611075677 A CN 201611075677A CN 106782565 A CN106782565 A CN 106782565A
- Authority
- CN
- China
- Prior art keywords
- vocal print
- frequency
- print feature
- signal
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001755 vocal effect Effects 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000000926 separation method Methods 0.000 claims abstract description 43
- 230000004927 fusion Effects 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 10
- 239000000284 extract Substances 0.000 claims abstract description 9
- 230000008447 perception Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 20
- 210000005069 ears Anatomy 0.000 claims description 11
- 238000009432 framing Methods 0.000 claims description 8
- 238000001514 detection method Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000000873 masking effect Effects 0.000 claims description 3
- 230000033764 rhythmic process Effects 0.000 claims description 2
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 238000000354 decomposition reaction Methods 0.000 claims 3
- 230000009467 reduction Effects 0.000 abstract description 4
- 235000001968 nicotinic acid Nutrition 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000015654 memory Effects 0.000 description 7
- 238000011160 research Methods 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000006798 recombination Effects 0.000 description 3
- 238000005215 recombination Methods 0.000 description 3
- 210000000721 basilar membrane Anatomy 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 240000006409 Acacia auriculiformis Species 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 241000746998 Tragus Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 210000003477 cochlea Anatomy 0.000 description 1
- 210000000860 cochlear nerve Anatomy 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Collating Specific Patterns (AREA)
Abstract
The embodiment of the present invention provides a kind of vocal print feature recognition methods and system, wherein, the process that implements of the method is, after the speech Separation treatment based on auditory properties is carried out to pretreated noisy mixed noise, extract the frequency cepstral coefficient of signal and perceive linear predictor coefficient, and utilize noise background discrimination, completion Fusion Features are analyzed to frequency cepstral coefficient and perception linear predictor coefficient under different noise circumstances, finally in the vocal print feature ATL for pre-building, pattern match is carried out to the feature for having completed fusion using gauss hybrid models universal background model, realize that vocal print feature is recognized.With traditional method for recognizing sound-groove be combined human auditory system properties by this kind of vocal print feature recognition methods, from bionics angle solve the problems, such as noise under the reduction of Application on Voiceprint Recognition rate, effectively improve the robustness of the accuracy rate of vocal print feature identification and system under noise circumstance.
Description
Technical field
The present invention relates to voice recognition technology field, in particular to a kind of vocal print feature recognition methods and system.
Background technology
Early in the thirties in 20th century, the research of Application on Voiceprint Recognition has just been expanded in information researcher.In early stage
In research, human ear listens the emphasis for distinguishing that (Aural) experiment and audition identification feasibility checking are Application on Voiceprint Recognition field.With computer
Breakthrough of the science and technology in hardware and algorithm, the research of Application on Voiceprint Recognition is no longer limited only to single human ear to be listened and distinguishes.U.S. Bell
Laboratory is chronically at leading position in field of speech recognition, and laboratory member L G.Kesta divide by voice spectrum figure
Analysis completes identification, and first proposed " Application on Voiceprint Recognition " this concept by him.With researcher in Application on Voiceprint Recognition field not
Disconnected exploration and innovation, automatically analyzes machine and recognizes that human speech signal becomes possibility.But, existing vocal print is special at present
Levy recognition methods recognition accuracy in a noisy environment all universal relatively low, system robustness is poor, and application effect is not good.
The content of the invention
It is an object of the invention to provide a kind of vocal print feature recognition methods and system, to improve above mentioned problem.
Present pre-ferred embodiments provide a kind of vocal print feature recognition methods, and the method includes:
Primary speech signal to being input into is pre-processed, and the pretreatment includes preemphasis, framing adding window and end points
Detection;
Noisy mixed signal to being obtained after pretreatment carries out the speech Separation based on auditory properties and processes;
Extract the frequency cepstral coefficient of the signal after being processed through speech Separation and perceive linear predictor coefficient;
Using noise background discrimination, frequency cepstral coefficient and perception linear predictor coefficient are entered under different noise circumstances
Row is analyzed to complete Fusion Features;And
In the vocal print feature ATL for pre-building, using gauss hybrid models-universal background model to having completed to melt
The feature of conjunction carries out pattern match, realizes that vocal print feature is recognized.
Another embodiment of the present invention provides a kind of vocal print feature identifying system, and the system includes:
Pretreatment module, for being pre-processed to the primary speech signal being input into, the pretreatment includes preemphasis, divides
Frame adding window and end-point detection;
Speech Separation module, for carrying out the voice point based on auditory properties to the noisy mixed signal obtained after pretreatment
From treatment;
Characteristic extracting module, for extracting the frequency cepstral coefficient of the signal after being processed through speech Separation and perceiving linear pre-
Survey coefficient;
Fusion Features module, for utilizing noise background discrimination, under different noise circumstances to frequency cepstral coefficient and
Linear predictor coefficient is perceived to be analyzed to complete Fusion Features;And
Feature recognition module, in the vocal print feature ATL for pre-building, using gauss hybrid models-general back of the body
Scape model carries out pattern match to the feature for having completed fusion, realizes that vocal print feature is recognized.
Vocal print feature recognition methods provided in an embodiment of the present invention and system, by human auditory system properties and traditional vocal print
Recognition methods is combined, from bionics angle solve the problems, such as noise under Application on Voiceprint Recognition rate reduction, effectively improve noise circumstance
The accuracy rate of lower Application on Voiceprint Recognition and the robustness of system.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be attached to what is used needed for embodiment
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, thus be not construed as it is right
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is a kind of block diagram of speech recognition apparatus provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of vocal print feature recognition methods provided in an embodiment of the present invention;
Fig. 3 is the geometrical principle figure of ears time difference provided in an embodiment of the present invention;
Fig. 4 is a kind of functional block diagram of vocal print feature identifying system provided in an embodiment of the present invention.
Icon:100- speech recognition apparatus;110- vocal print feature identifying systems;120- memories;130- processors;
1102- pretreatment modules;1104- speech Separation modules;1106- characteristic extracting modules;1108- Fusion Features modules;1110- is special
Levy identification module.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than whole embodiments.Present invention implementation generally described and illustrated in accompanying drawing herein
The component of example can be arranged and designed with a variety of configurations.Therefore, reality of the invention below to providing in the accompanying drawings
The detailed description for applying example is not intended to limit the scope of claimed invention, but is merely representative of selected implementation of the invention
Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made
Every other embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, being a kind of block diagram of speech recognition apparatus 100 provided in an embodiment of the present invention.Institute's predicate
Sound identification equipment 100 includes vocal print feature identifying system 110, memory 120 and processor 130.Wherein, the memory
Directly or indirectly it is electrically connected between 120 and processor 130, to carry out data transmission or interact.The vocal print feature identification
System 110 can be stored in the memory 120 in the form of software or firmware including at least one or be solidificated in the voice
Software function module in the operating system of identification equipment 100.The processor 130 accesses institute under the control of storage control
Memory 120 is stated, for performing the executable module stored in the memory 120, such as described vocal print feature identification system
Software function module and computer program included by system 110 etc..
As shown in Fig. 2 in being a kind of speech recognition apparatus 100 being applied to shown in Fig. 1 provided in an embodiment of the present invention
The schematic flow sheet of vocal print feature recognition methods.It should be noted that, the method that the present invention is provided is not with Fig. 2 and as described below
Particular order is limitation.Each step shown in Fig. 2 will be described in detail below.
Step S101, the primary speech signal to being input into is pre-processed, and the pretreatment includes preemphasis, framing adding window
And end-point detection.
In the present embodiment, the primary speech signal of the speech recognition apparatus 100 is input into, single order FIR high pass numbers are crossed first
Word wave filter realizes preemphasis, and its transmission function is:
H (Z)=1- μ Z-1
Wherein, coefficient μ values are that between 0 to 1, its value can determine according to priori rule, generally desirable 0.94.
Then, the voice signal that will be obtained after preemphasis carries out framing, and is multiplied by the Moving Window w (n-m) that amplitude is k.K can
By certain function value, will have certain addition to each sampling value of framing.After through framing windowing process, the voice for obtaining
Signal is represented by:
Wherein, T [] represents a kind of functional transformation, and x (m) represents voice signal sequence, and Q (n) represents each section by treatment
The time series for obtaining afterwards.
Finally, the end points of voice signal is detected.In the present embodiment, language is mainly realized by short-time energy and short-time zero-crossing rate
The end-point detection of message number.
Specifically, short-time energy is expressed as:
Wherein, N represents analysis window width, and S (n) represents the signal sampling value of n-th point in t frame voice signals.
Short-time zero-crossing rate is expressed as:
Wherein, Sgn [] represents zero-crossing rate function.
Step S103, the noisy mixed signal to being obtained after pretreatment carries out the speech Separation based on auditory properties and processes.
In the present embodiment, the process that the bionical separating treatment based on auditory properties is carried out to voice signal is, based on periphery
Auditory model carries out after resolution process obtains time frequency unit noisy mixed signal, poly- to time frequency unit according to speech Separation clue
Class, the voice after being separated eventually through the output of speech reconstruction model.Speech reconstruction model completes the cluster and voice of time frequency unit
Stream synthesis, mainly includes two-value mask cluster and recombination model two parts.
Masking model for the i-th frequency channel and jth time frame may be defined as following formula:
Wherein, fc=1500Hz represents the critical frequency of high frequency and middle low frequency, fiRepresent the frequency of the i-th frequency channel, τ (i,
J) represent that the i-th frequency channel separates clue with one of jth time frame, L (i, j) represents the i-th frequency channel with jth time frame
Another separates clue, Tτ(i, j) and Tl(i, j) represents that above-mentioned two separates the threshold value of clue respectively.
In order to improve the reduction degree of reconstructed voice, first have to carry out prosody adjustment to signal to be synthesized.The rhythm is adjusted
The adjustment of information etc. whole amplitude including to voice, length, fundamental tone.Wherein, the amplitude adjustment to voice signal can be by weighting
Mode realize that weights formula is expressed as:
τ in formula is signal frame length, and n is moved for frame.
Reconstruction formula is:
In formulaIt is the recombination signal for obtaining, tjIt is the synchronous mark of recombination, hjN () is peripheral auditory model
In window function,It is Short Time Speech signal, the adjustment of amplitude is then realized by the weights g in above-mentioned weights formula.
In addition, in the present embodiment, the speech Separation clue can be interaural time difference (Interaural Time
Difference, ITD) or binaural sound it is differential (Interaural Level Difference, ILD).Sound is listened to distinguish position from human ear
Angle is set out, and simulation human ear differentiates the process of sound, will reflect speech Separation the clue ITD and ILD of acoustic space azimuth information
Speech Separation efficiency will be effectively lifted for speech Separation.Below, the realization principle to ITD and ILD is briefly described.
During human auditory system carries out speech Separation, ITD is mainly used in the treatment of centering low frequency voice signal.
For simplicity, this section will illustrate the generation principle of ITD by taking single sound source as an example.It is assumed that a certain sound source then may be used closer to left ear
Represent that voice signal reaches the process of left ear with α sin2 π ft.And distant auris dextra is then (α-Δ α) sin2 π f (t+ Δs
T), wherein f is frequency, and Δ t is time difference information, and representative voice propagates to the time difference of ears, i.e. ITD, and Δ α believes for intensity difference
Breath, the sound pressure that representative voice travels to ears is poor, i.e. ILD.According to both information, can be by the difference of sound source position
PMD EDM is separated.
As shown in figure 3, being the geometrical principle figure of ears time difference.In Fig. 3, S is sound source position, and A and B is left and right ear, and D is
Distance between them, angle C represents the angle of sound source and brain center, and d is that the distance between sound two ears of arrival are poor, is expressed as d
=Dsin α.
The voice signal of input, is carried out windowing process, generally by window function by the computation structure figure of ITD values first
See the unit impulse response of wave filter as.Hamming window is selected in the present embodiment, to ensure that voice signal is in short-time analysis
Smoothly.The expression formula of Hamming window is:
In formula, N represents that window is long.Signal by adding window is transformed into frequency domain by Fourier transformation, such as following two formulas institutes
Show:
The cross-correlation of the voice signal of left and right ear is reached, can be expressed as:
Normally, each transfer function hl(t) and hrT () all can come near by the decay factor of an amplitude and a time delay
Can be expressed as like expression, therefore the formula of cross-correlation:
In formula, α represents decay factor, and D represents the value of ITD.According to above-mentioned analysis, the ITD voice signal to low frequency
Separation is worked, auto-correlation function RssMaximum is reached at τ=0, therefore the value D of ITD can be expressed as:
Crosspower spectrum is defined as two Fourier of the cross-correlation of signal and calculates, such as following formula:
The specific calculating of the formula is with formula:
Represent XrThe complex conjugate of (ω), Fourier conversion is done to the formula, and the power spectrum that can receive signal is:
As can be seen from the above equation, phase of the D values of ITD only with crosspower spectrum is relevant, and cross-correlation is carried out into standards change can
:
Thus, the D values of ITD can be accurately calculated as:
ILD represents the acoustic pressure difference that sound-source signal reaches two ears.When the distance that sound is delivered to left and right ear produces difference,
Acoustic pressure difference will be caused, and this information provides another clue-ILD for speech Separation.Research shows, in high-frequency region,
ILD will play more effects.After voice signal frequency is more than 1500Hz, due to the screening of human auditory peripheral's such as auricle
Effect is covered, voice signal will produce stronger sound shadow effect and hinder voice signal to be delivered to inner ear.Produce this result
Principal element is that the voice signal wavelength of low frequency is shorter, it is difficult to which diffraction through auricle occurs, and the sound of high frequency can then be bypassed
Auricle, therefore in order to separate the voice signal of high frequency, ears level difference need to be extracted.
Calculating ILD needs spectral line rope, and in the case where echo is ignored, the energy spectrum of the signal that left and right ear is received can
Expressed with by following two formulas:
Pl(ω)=S (ω) | Hl(ω)|2
Pr(ω)=S (ω) | Hr(ω)|2
In formula, S (ω) is the power spectrum of sound source, and Hl(ω) and Hr(ω) represents the transmission letter of left and right ear respectively
Number.Therefore, the intensity difference of left and right ear can be expressed as:
Il(ω)=10log10Pl(ω)=10log10S(ω)+20log10|Hl(ω)|
And
Ir(ω)=10log10Pr(ω)=10log10S(ω)+20log10|Hr(ω)|
Normally, ears level difference can be used for extracting the separation information of High frequency speech signal, and extract binaural sound
During differential information, sound source is changed into simple addition from being multiplied with passage relation.Simple addition relation helps subsequently to calculate ILD
Extract channel information.
After intensity is calculated, voice signal will be by COCHLEAR FILTER.ILD information only is extracted in HFS, not only
The size of feature space is reduced, and the resonance of cochlea frequency selection in human auditory's cental system can be simulated.
Because ILD only works to the voice signal higher than 1500Hz, so there is the interruption of ears level difference extraction frequently
Rate fcut, its computing formula:
In formula, C represents the aerial spread speed of voice signal, dαThe aperture of physical size is represented, only in subband
Reach interruption frequency fcutILD clues could be calculated later.
Therefore the subband i of interruption frequency is reached for each, there is following formula to set up:
In formula, ΩiIt is the frequency range of subband i, Wi(ω) is the weight of COCHLEAR FILTER.
Therefore the ILD of each subband i is defined as:
Step S105, extracts the frequency cepstral coefficient of the signal after being processed through speech Separation and perceives linear predictor coefficient.
It is well known that having adopted used characteristic parameter predominantly cepstrum coefficient in Application on Voiceprint Recognition research.Cepstrum coefficient reflects
Human vocal tract's principle of sound, extraction process median filter group reflection human hearing characteristic.In the present embodiment, to mel-frequency cepstrum
Coefficient (MFCC) is improved, and frequency cepstral coefficient is extracted based on Gammatone wave filter groups.
Function is similar with human auditory system periphery in speech signal processing for Gammatone wave filter groups, can preferable simulation
Basilar membrane characteristic, to voice signal scaling down processing;Meddis models can well complete the mould of internal tragus cell characteristics
Intend, can accurately describe the granting speed of auditory nerve, both constitute complete sense of hearing periphery model.
When voice signal enters human ear, basilar membrane frequency dividing is first passed around, simulated by Gammatone wave filter groups, filtered
Ripple device group time-domain expression is as follows:
In formula, N is the number of wave filter, and i is ordinal number, and n is filter order, takes n=4, φiIt is the initial phase of wave filter
Position, fiIt is the centre frequency of each wave filter, biIt is decay factor.
Single filter bandwidth is related to human auditory system critical band in Gammatone wave filter groups, auditory critical band
Measured with equivalent rectangular bandwidth and be:
EBR (f)=24.7* (4.37f/1000+1)
For centre frequency fi, decay factor b can be corresponded toi:
bi=1.019EBR (fi)
To formulaLaplace transformation is carried out to obtain:
And transform is transformed into, inverse transformation is finally carried out again can obtain the discrete shock response of Gammatone wave filter groups:
Step S107, it is linear to frequency cepstral coefficient and perception under different noise circumstances using noise background discrimination
Predictive coefficient is analyzed to complete Fusion Features.
DRThe ratio between inter _ class relationship and within-cluster variance for being characterized, reflect in vocal print feature ATL area between each feature
Indexing, this discrimination can Efficient Characterization vocal print feature whether adapt to noise circumstance.Vocal print feature is obtained in different signal to noise ratio rings
D under borderRValue, further analyzes feature robustness in a noisy environment.DRExpression formula it is as follows:
μ is the mean eigenvalue of all speakers in vocal print feature ATL, μ in formulaiIt is i-th average spy of speaker
Value indicative, M is speaker's number in vocal print feature ATL, and N is single speaker's voice signal frame number.
Phonetic feature is generally stored with a matrix type after extraction, can be represented with multidimensional characteristic vectors, to each dimension
Discrimination research understands every one-dimensional characteristic parameter robustness in a noisy environment between characteristic vector carries out class, on this basis
It is capable of achieving the data fusion to different vocal print features.It is assumed that feature A and feature B is represented by X peacekeeping Y dimensional feature vectors respectively:
A={ α1,α2,......αX}'
B={ β1,β2,.......βY}'
Discrimination analysis carrying out class to two kinds of vocal print features, the D of feature A and feature BRMatrix is as follows:
To study two kinds of every one-dimensional performances of vocal print features in noise circumstance, under various signal to noise ratio environment, to vocal print
Speaker extracts feature A and B in feature templates storehouse, and counts DRMaximum DRMax is in eigenmatrix per one-dimensional number of times
P:
To ensure that each vectorial weight of fusion feature matrix is appropriate, threshold value P is set according to statistical conditionsth, PthBy concrete outcome
It is selected, to Px, PyIt is regular after and ask:
ε=max { Px,Py,Pth}
Fusion feature characteristic parameter C is obtained, expression formula is as follows:
C={ γ1,γ2,.......γZ}'
Step S109, in the vocal print feature ATL for pre-building, using gauss hybrid models-universal background model pair
The feature for having completed fusion carries out pattern match, realizes that vocal print feature is recognized.
In the present embodiment, the model of pattern match is gauss hybrid models-universal background model (GMM-OUM models).It is high
The essence of this mixed model (GMM model) is the probability density function of various dimensions, is tieed up for d and degree of mixing is the GMM model of M,
Can be expressed as by the weighted sum of Gaussian function:
In formula, ∑iIt is covariance matrix, piIt is i-th Gaussian function of component d dimensions of GMM model, x is the observation arrow of d dimensions
Amount, wiIt is mixed weight-value, and meetsμiIt is mean value vector.
As shown in figure 4, being a kind of functional block diagram of vocal print feature identifying system 110 provided in an embodiment of the present invention.
The vocal print feature identifying system 110 includes pretreatment module 1102, speech Separation module 1104, characteristic extracting module 1106, spy
Levy Fusion Module 1108 and feature recognition module 1110.
The pretreatment module 1102, for being pre-processed to the primary speech signal being input into, the pretreatment includes
Preemphasis, framing adding window and end-point detection;
The speech Separation module 1104, for carrying out the noisy mixed signal obtained after pretreatment based on auditory properties
Speech Separation treatment;
The characteristic extracting module 1106, frequency cepstral coefficient and sense for extracting the signal after being processed through speech Separation
Know linear predictor coefficient;
The Fusion Features module 1108, for utilizing noise background discrimination, falls under different noise circumstances to frequency
Spectral coefficient and perception linear predictor coefficient are analyzed to complete Fusion Features;
The feature recognition module 1110, in the vocal print feature ATL for pre-building, using Gaussian Mixture mould
Type-universal background model carries out pattern match to the feature for having completed fusion, realizes that vocal print feature is recognized.
The concrete operation method of each functional module described in the present embodiment can refer to the detailed of the corresponding steps shown in Fig. 2
Thin to illustrate, it is no longer repeated herein.
In sum, vocal print feature recognition methods provided in an embodiment of the present invention and system, solve to make an uproar from bionics angle
The problem of Application on Voiceprint Recognition rate reduction, effectively improves the accuracy rate and the robustness of system of Application on Voiceprint Recognition under noise circumstance under sound.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, it is also possible to pass through
Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing
Show the device of multiple embodiments of the invention, the architectural framework in the cards of method and computer program product,
Function and operation.At this point, each square frame in flow chart or block diagram can represent one the one of module, program segment or code
Part a, part for the module, program segment or code is used to realize holding for the logic function for specifying comprising one or more
Row instruction.It should also be noted that at some as in the implementation replaced, the function of being marked in square frame can also be being different from
The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially be performed substantially in parallel, they are sometimes
Can perform in the opposite order, this is depending on involved function.It is also noted that every in block diagram and/or flow chart
The combination of the square frame in individual square frame and block diagram and/or flow chart, can use the function or the special base of action for performing regulation
Realized in the system of hardware, or can be realized with the combination of computer instruction with specialized hardware.
If the function is to realize in the form of software function module and as independent production marketing or when using, can be with
Storage is in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used to so that a computer equipment (can be individual
People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the invention.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of vocal print feature recognition methods, it is characterised in that the method includes:
Primary speech signal to being input into is pre-processed, and the pretreatment includes preemphasis, framing adding window and end-point detection;
Noisy mixed signal to being obtained after pretreatment carries out the speech Separation based on auditory properties and processes;
Extract the frequency cepstral coefficient of the signal after being processed through speech Separation and perceive linear predictor coefficient;
Using noise background discrimination, frequency cepstral coefficient and perception linear predictor coefficient are divided under different noise circumstances
Analyse to complete Fusion Features;
In the vocal print feature ATL for pre-building, using gauss hybrid models-universal background model to having completed fusion
Feature carries out pattern match, realizes that vocal print feature is recognized.
2. vocal print feature recognition methods according to claim 1, it is characterised in that the noisy mixing to being obtained after pretreatment
Signal carries out the step of speech Separation based on auditory properties is processed to be included:
The noisy mixed signal is decomposed, multiple time frequency units are obtained;
The multiple time frequency unit that decomposition is obtained is clustered according to speech Separation clue;
Signal to be synthesized after to cluster carries out speech reconstruction, the voice after output separation.
3. vocal print feature recognition methods according to claim 2, it is characterised in that the speech Separation clue includes two ears
The time difference and binaural sound are differential.
4. the vocal print feature recognition methods according to Claims 2 or 3, it is characterised in that according to speech Separation clue to dividing
The step of the multiple time frequency unit that solution is obtained is clustered includes:
According to masking modelTwo-value is carried out to the multiple time frequency unit to cover
Code cluster, wherein, fiRepresent the frequency of the i-th frequency channel, fcThe critical frequency between high frequency and middle low frequency is represented, τ (i, j) is represented
I-th frequency channel separates another of clue, L (i, j) the i-th frequency channels of expression and jth time frame with one of jth time frame
Separate clue, Tτ(i, j) and Tl(i, j) represents two threshold values of separation clue respectively.
5. vocal print feature recognition methods according to claim 2, it is characterised in that the signal to be synthesized after to cluster is carried out
The step of speech reconstruction, includes:
Prosody adjustment is carried out to the signal to be synthesized, the rhythm includes amplitude, length and fundamental tone;
According to reconstruction formula:To carrying out speech reconstruction through the signal after prosody adjustment, its
In, tjRepresent the synchronous mark of reconstruct, hjN () represents window function,Represent Short Time Speech signal, gjThe adjustment of expression amplitude
Weights.
6. vocal print feature recognition methods according to claim 1, it is characterised in that extract the letter after being processed through speech Separation
Number frequency cepstral coefficient and perceive linear predictor coefficient the step of include:
The frequency cepstral coefficient of the signal after extracting the treatment through speech Separation based on Gammatone wave filter groups.
7. a kind of vocal print feature identifying system, it is characterised in that the system includes:
Pretreatment module, for being pre-processed to the primary speech signal being input into, the pretreatment includes that preemphasis, framing add
Window and end-point detection;
Speech Separation module, for being carried out at the speech Separation based on auditory properties to the noisy mixed signal obtained after pretreatment
Reason;
Characteristic extracting module, for extracting the frequency cepstral coefficient of the signal after being processed through speech Separation and perceiving linear prediction system
Number;
Fusion Features module, for utilizing noise background discrimination, to frequency cepstral coefficient and perception under different noise circumstances
Linear predictor coefficient is analyzed to complete Fusion Features;
Feature recognition module, in the vocal print feature ATL for pre-building, using gauss hybrid models-common background mould
Type carries out pattern match to the feature for having completed fusion, realizes that vocal print feature is recognized.
8. vocal print feature identifying system according to claim 7, it is characterised in that the speech Separation module is to pretreatment
The mode that the noisy mixed signal for obtaining afterwards carries out the speech Separation treatment based on auditory properties includes:
The noisy mixed signal is decomposed, multiple time frequency units are obtained;
The multiple time frequency unit that decomposition is obtained is clustered according to speech Separation clue;
Signal to be synthesized after to cluster carries out speech reconstruction, the voice after output separation.
9. vocal print feature identifying system according to claim 8, it is characterised in that the speech Separation module is according to voice
Separate clue includes to the mode that the multiple time frequency unit that decomposition is obtained is clustered:
According to masking modelTwo-value is carried out to the multiple time frequency unit
Mask is clustered, wherein, fiRepresent the frequency of the i-th frequency channel, fcRepresent the critical frequency between high frequency and middle low frequency, τ (i, j) table
Show that the i-th frequency channel separates clue with one of jth time frame, L (i, j) represents that the i-th frequency channel is another with jth time frame
Individual separation clue, Tτ(i, j) and Tl(i, j) represents two threshold values of separation clue respectively.
10. vocal print feature identifying system according to claim 7, it is characterised in that the characteristic extracting module extracts warp
The frequency cepstral coefficient of the signal after speech Separation treatment and the mode of perception linear predictor coefficient include:
The frequency cepstral coefficient of the signal after extracting the treatment through speech Separation based on Gammatone wave filter groups.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611075677.7A CN106782565A (en) | 2016-11-29 | 2016-11-29 | A kind of vocal print feature recognition methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611075677.7A CN106782565A (en) | 2016-11-29 | 2016-11-29 | A kind of vocal print feature recognition methods and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106782565A true CN106782565A (en) | 2017-05-31 |
Family
ID=58900777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611075677.7A Pending CN106782565A (en) | 2016-11-29 | 2016-11-29 | A kind of vocal print feature recognition methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106782565A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107782548A (en) * | 2017-10-20 | 2018-03-09 | 韦彩霞 | One kind is based on to track vehicle parts detecting system |
CN108124488A (en) * | 2017-12-12 | 2018-06-05 | 福建联迪商用设备有限公司 | A kind of payment authentication method and terminal based on face and vocal print |
CN108182945A (en) * | 2018-03-12 | 2018-06-19 | 广州势必可赢网络科技有限公司 | Voiceprint feature-based multi-person voice separation method and device |
CN108231082A (en) * | 2017-12-29 | 2018-06-29 | 广州势必可赢网络科技有限公司 | Updating method and device for self-learning voiceprint recognition |
CN108564956A (en) * | 2018-03-26 | 2018-09-21 | 京北方信息技术股份有限公司 | A kind of method for recognizing sound-groove and device, server, storage medium |
CN108615532A (en) * | 2018-05-03 | 2018-10-02 | 张晓雷 | A kind of sorting technique and device applied to sound field scape |
CN108847253A (en) * | 2018-09-05 | 2018-11-20 | 平安科技(深圳)有限公司 | Vehicle model recognition methods, device, computer equipment and storage medium |
WO2018223727A1 (en) * | 2017-06-09 | 2018-12-13 | 平安科技(深圳)有限公司 | Voiceprint recognition method, apparatus and device, and medium |
CN109031202A (en) * | 2018-06-03 | 2018-12-18 | 桂林电子科技大学 | indoor environment area positioning system and method based on auditory scene analysis |
CN109192216A (en) * | 2018-08-08 | 2019-01-11 | 联智科技(天津)有限责任公司 | A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device |
WO2019037426A1 (en) * | 2017-08-23 | 2019-02-28 | 武汉斗鱼网络科技有限公司 | Mfcc voice recognition method, storage medium, electronic device, and system |
CN109410976A (en) * | 2018-11-01 | 2019-03-01 | 北京工业大学 | Sound enhancement method based on binaural sound sources positioning and deep learning in binaural hearing aid |
CN110299143A (en) * | 2018-03-21 | 2019-10-01 | 现代摩比斯株式会社 | The devices and methods therefor of voice speaker for identification |
CN110364168A (en) * | 2019-07-22 | 2019-10-22 | 南京拓灵智能科技有限公司 | A kind of method for recognizing sound-groove and system based on environment sensing |
CN110473553A (en) * | 2019-08-29 | 2019-11-19 | 南京理工大学 | The recognition methods of the elderly and physical disabilities speaker based on auditory system model |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN110648553A (en) * | 2019-09-26 | 2020-01-03 | 北京声智科技有限公司 | Site reminding method, electronic equipment and computer readable storage medium |
CN111083284A (en) * | 2019-12-09 | 2020-04-28 | Oppo广东移动通信有限公司 | Vehicle arrival prompting method and related product |
WO2020083110A1 (en) * | 2018-10-25 | 2020-04-30 | 腾讯科技(深圳)有限公司 | Speech recognition and speech recognition model training method and apparatus |
CN111477235A (en) * | 2020-04-15 | 2020-07-31 | 厦门快商通科技股份有限公司 | Voiceprint acquisition method, device and equipment |
WO2021042537A1 (en) * | 2019-09-04 | 2021-03-11 | 平安科技(深圳)有限公司 | Voice recognition authentication method and system |
CN112767949A (en) * | 2021-01-18 | 2021-05-07 | 东南大学 | Voiceprint recognition system based on binary weight convolutional neural network |
CN112863546A (en) * | 2021-01-21 | 2021-05-28 | 安徽理工大学 | Belt conveyor health analysis method based on audio characteristic decision |
CN113011506A (en) * | 2021-03-24 | 2021-06-22 | 华南理工大学 | Texture image classification method based on depth re-fractal spectrum network |
CN113257266A (en) * | 2021-05-21 | 2021-08-13 | 特斯联科技集团有限公司 | Complex environment access control method and device based on voiceprint multi-feature fusion |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
US9131295B2 (en) * | 2012-08-07 | 2015-09-08 | Microsoft Technology Licensing, Llc | Multi-microphone audio source separation based on combined statistical angle distributions |
CN105609099A (en) * | 2015-12-25 | 2016-05-25 | 重庆邮电大学 | Speech recognition pretreatment method based on human auditory characteristic |
-
2016
- 2016-11-29 CN CN201611075677.7A patent/CN106782565A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9131295B2 (en) * | 2012-08-07 | 2015-09-08 | Microsoft Technology Licensing, Llc | Multi-microphone audio source separation based on combined statistical angle distributions |
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
CN105609099A (en) * | 2015-12-25 | 2016-05-25 | 重庆邮电大学 | Speech recognition pretreatment method based on human auditory characteristic |
Non-Patent Citations (5)
Title |
---|
NICOLETA ROMAN等: ""speech segregation based on sound localization"", 《IEEE》 * |
刘继芳: ""基于计算听觉场景分析的混合语音分离研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
徐鹤: ""城市交通环境下声纹识别算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
罗元 等: ""一种新的鲁棒声纹特征提取与融合方法"", 《计算机科学》 * |
陆虎敏: "《飞机座舱显示与控制技术》", 31 December 2015, 航空工业出版社 * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018223727A1 (en) * | 2017-06-09 | 2018-12-13 | 平安科技(深圳)有限公司 | Voiceprint recognition method, apparatus and device, and medium |
WO2019037426A1 (en) * | 2017-08-23 | 2019-02-28 | 武汉斗鱼网络科技有限公司 | Mfcc voice recognition method, storage medium, electronic device, and system |
CN107782548A (en) * | 2017-10-20 | 2018-03-09 | 韦彩霞 | One kind is based on to track vehicle parts detecting system |
CN107782548B (en) * | 2017-10-20 | 2020-07-07 | 亚太空列(河南)轨道交通有限公司 | Rail vehicle part detection system |
CN108124488A (en) * | 2017-12-12 | 2018-06-05 | 福建联迪商用设备有限公司 | A kind of payment authentication method and terminal based on face and vocal print |
CN108231082A (en) * | 2017-12-29 | 2018-06-29 | 广州势必可赢网络科技有限公司 | Updating method and device for self-learning voiceprint recognition |
CN108182945A (en) * | 2018-03-12 | 2018-06-19 | 广州势必可赢网络科技有限公司 | Voiceprint feature-based multi-person voice separation method and device |
CN110299143B (en) * | 2018-03-21 | 2023-04-11 | 现代摩比斯株式会社 | Apparatus for recognizing a speaker and method thereof |
CN110299143A (en) * | 2018-03-21 | 2019-10-01 | 现代摩比斯株式会社 | The devices and methods therefor of voice speaker for identification |
CN108564956A (en) * | 2018-03-26 | 2018-09-21 | 京北方信息技术股份有限公司 | A kind of method for recognizing sound-groove and device, server, storage medium |
CN108564956B (en) * | 2018-03-26 | 2021-04-20 | 京北方信息技术股份有限公司 | Voiceprint recognition method and device, server and storage medium |
CN108615532B (en) * | 2018-05-03 | 2021-12-07 | 张晓雷 | Classification method and device applied to sound scene |
CN108615532A (en) * | 2018-05-03 | 2018-10-02 | 张晓雷 | A kind of sorting technique and device applied to sound field scape |
CN109031202B (en) * | 2018-06-03 | 2022-10-04 | 桂林电子科技大学 | Indoor environment area positioning system and method based on auditory scene analysis |
CN109031202A (en) * | 2018-06-03 | 2018-12-18 | 桂林电子科技大学 | indoor environment area positioning system and method based on auditory scene analysis |
CN109192216A (en) * | 2018-08-08 | 2019-01-11 | 联智科技(天津)有限责任公司 | A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device |
CN108847253A (en) * | 2018-09-05 | 2018-11-20 | 平安科技(深圳)有限公司 | Vehicle model recognition methods, device, computer equipment and storage medium |
US11798531B2 (en) | 2018-10-25 | 2023-10-24 | Tencent Technology (Shenzhen) Company Limited | Speech recognition method and apparatus, and method and apparatus for training speech recognition model |
WO2020083110A1 (en) * | 2018-10-25 | 2020-04-30 | 腾讯科技(深圳)有限公司 | Speech recognition and speech recognition model training method and apparatus |
CN109410976B (en) * | 2018-11-01 | 2022-12-16 | 北京工业大学 | Speech enhancement method based on binaural sound source localization and deep learning in binaural hearing aid |
CN109410976A (en) * | 2018-11-01 | 2019-03-01 | 北京工业大学 | Sound enhancement method based on binaural sound sources positioning and deep learning in binaural hearing aid |
CN110364168A (en) * | 2019-07-22 | 2019-10-22 | 南京拓灵智能科技有限公司 | A kind of method for recognizing sound-groove and system based on environment sensing |
CN110364168B (en) * | 2019-07-22 | 2021-09-14 | 北京拓灵新声科技有限公司 | Voiceprint recognition method and system based on environment perception |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN110473553A (en) * | 2019-08-29 | 2019-11-19 | 南京理工大学 | The recognition methods of the elderly and physical disabilities speaker based on auditory system model |
WO2021042537A1 (en) * | 2019-09-04 | 2021-03-11 | 平安科技(深圳)有限公司 | Voice recognition authentication method and system |
CN110648553A (en) * | 2019-09-26 | 2020-01-03 | 北京声智科技有限公司 | Site reminding method, electronic equipment and computer readable storage medium |
CN111083284B (en) * | 2019-12-09 | 2021-06-11 | Oppo广东移动通信有限公司 | Vehicle arrival prompting method and device, electronic equipment and computer readable storage medium |
CN111083284A (en) * | 2019-12-09 | 2020-04-28 | Oppo广东移动通信有限公司 | Vehicle arrival prompting method and related product |
CN111477235A (en) * | 2020-04-15 | 2020-07-31 | 厦门快商通科技股份有限公司 | Voiceprint acquisition method, device and equipment |
CN112767949A (en) * | 2021-01-18 | 2021-05-07 | 东南大学 | Voiceprint recognition system based on binary weight convolutional neural network |
CN112863546A (en) * | 2021-01-21 | 2021-05-28 | 安徽理工大学 | Belt conveyor health analysis method based on audio characteristic decision |
CN113011506A (en) * | 2021-03-24 | 2021-06-22 | 华南理工大学 | Texture image classification method based on depth re-fractal spectrum network |
CN113011506B (en) * | 2021-03-24 | 2023-08-25 | 华南理工大学 | Texture image classification method based on deep fractal spectrum network |
CN113257266A (en) * | 2021-05-21 | 2021-08-13 | 特斯联科技集团有限公司 | Complex environment access control method and device based on voiceprint multi-feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782565A (en) | A kind of vocal print feature recognition methods and system | |
CN109830245B (en) | Multi-speaker voice separation method and system based on beam forming | |
CN105845127B (en) | Audio recognition method and its system | |
CN110970053B (en) | Multichannel speaker-independent voice separation method based on deep clustering | |
CN103456312B (en) | A kind of single-channel voice blind separating method based on Computational auditory scene analysis | |
CN109427328B (en) | Multichannel voice recognition method based on filter network acoustic model | |
CN110675891B (en) | Voice separation method and module based on multilayer attention mechanism | |
CN112331218B (en) | Single-channel voice separation method and device for multiple speakers | |
CN106057210B (en) | Quick speech blind source separation method based on frequency point selection under binaural distance | |
Wang et al. | On spatial features for supervised speech separation and its application to beamforming and robust ASR | |
CN110047478B (en) | Multi-channel speech recognition acoustic modeling method and device based on spatial feature compensation | |
CN107346664A (en) | A kind of ears speech separating method based on critical band | |
CN110111769A (en) | A kind of cochlear implant control method, device, readable storage medium storing program for executing and cochlear implant | |
CN108122559A (en) | Binaural sound sources localization method based on deep learning in a kind of digital deaf-aid | |
CN108091345A (en) | A kind of ears speech separating method based on support vector machines | |
CN105225672A (en) | Merge the system and method for the directed noise suppression of dual microphone of fundamental frequency information | |
CN103903632A (en) | Voice separating method based on auditory center system under multi-sound-source environment | |
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
CN108520756A (en) | A kind of method and device of speaker's speech Separation | |
CN111145726A (en) | Deep learning-based sound scene classification method, system, device and storage medium | |
Sainath et al. | Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction. | |
CN104778948A (en) | Noise-resistant voice recognition method based on warped cepstrum feature | |
CN105609099A (en) | Speech recognition pretreatment method based on human auditory characteristic | |
CN109448702A (en) | Artificial cochlea's auditory scene recognition methods | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |