CN106448682A - Open-set speaker recognition method and apparatus - Google Patents
Open-set speaker recognition method and apparatus Download PDFInfo
- Publication number
- CN106448682A CN106448682A CN201610819015.XA CN201610819015A CN106448682A CN 106448682 A CN106448682 A CN 106448682A CN 201610819015 A CN201610819015 A CN 201610819015A CN 106448682 A CN106448682 A CN 106448682A
- Authority
- CN
- China
- Prior art keywords
- user
- measured
- sample
- speaker
- code book
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 238000012790 confirmation Methods 0.000 claims abstract description 15
- 239000000284 extract Substances 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 abstract description 5
- 230000002045 lasting effect Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 9
- 238000013139 quantization Methods 0.000 description 5
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000018199 S phase Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0007—Codebook element generation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention is applied to the information technology field and provides an open-set speaker recognition method and apparatus wherein the method comprises: obtaining the voice information of each user sample from a sample database; according to the voice information, extracting the characteristic vector set of the user sample; and according to the characteristic vector set, generating a codebook for the user sample; obtaining the voice information of a to-be-detected user; extracting the characteristic vector set of the to-be-detected user; according to the characteristic vector set of the to-be-detected user and the codebook of each user sample in the sample database, carrying out open-set speaker recognition to obtain from the database a targeted user that is mostly approximate to the to-be-detected user; and based on the SVM algorithm, conducting speaker confirmation to the to-be-detected user so as to determine whether the to-de-detected user is the targeted user or not. The method and apparatus of the invention resolve the problems in the existing open-set speaker recognition with large computing, long time lasting and poor practicality. In addition, with the method and apparatus, a good and successful recognition rate can be achieved.
Description
Technical field
The invention belongs to areas of information technology, more particularly to a kind of method and device of open set speaker identification.
Background technology
In open set speaker identifying system, as new user appearance is had, therefore, speaker's identification generally will be first carried out
(Speaker Identification, SI), then carries out speaker's confirmation (Speaker Verification, SV).Existing
Technology is mainly set up every in open set speaker recognition system using vector quantization (Vector Quantization, VQ) algorithm
The personal code book of individual user's sample, and to one public rejection threshold value of all user's Sample Establishings.However, personal code book is not wrapped
Feature containing other users sample in Sample Storehouse, the absolute score for obtaining during identification does not have comparability, needs to return score
One changes;The then more difficult determination of public rejection threshold value, generally takes experiment value, it is difficult to adapt to different situations.In sum, prior art
There is a problem of that when the identification of open set speaker is carried out data volume is big, time length, poor practicability.
Content of the invention
In consideration of it, the embodiment of the present invention provides a kind of method and device of open set speaker identification, to solve prior art
The data volume that exists when the identification of open set speaker is carried out is big, the problem of time length, poor practicability.
First aspect, there is provided a kind of open set speaker knows method for distinguishing, methods described includes:
The voice messaging of each user's sample in Sample Storehouse is obtained, user's sample is extracted according to the voice messaging
Characteristic vector group, generate the code book of user's sample according to the characteristic vector group;
The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;
Code book according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse carries out closed set and says
Words people's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Based on SVM algorithm, speaker's confirmation is carried out to the user to be measured, to determine whether the user to be measured is described
Targeted customer.
A kind of second aspect, there is provided device of open set speaker identification, described device includes:
First acquisition module, for obtaining the voice messaging of each user's sample in Sample Storehouse, believes according to the voice
Breath extracts the characteristic vector group of user's sample, generates the code book of user's sample according to the characteristic vector group;
Second acquisition module, for obtaining the voice messaging of user to be measured, extracts the characteristic vector group of the user to be measured;
Identification module, for the code according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse
Originally carry out closed set speaker's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Confirm module, for SVM algorithm is based on, speaker's confirmation is carried out to the user to be measured, described to be measured to determine
Whether user is the targeted customer.
Compared with prior art, the embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse,
The characteristic vector group of user's sample is extracted according to the voice messaging, and the use is generated according to the characteristic vector group
The code book of family sample;The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;Then according to described
In the characteristic vector group of user to be measured and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtains described
With the immediate targeted customer of the user to be measured in Sample Storehouse;SVM algorithm is finally based on, the user to be measured is spoken
People confirms, to determine whether the user to be measured is the targeted customer;The opener unrelated so as to provide a kind of new text
Speaker Identification mode, it is to avoid use public rejection threshold value, to efficiently solve prior art carrying out open set speaker
The data volume that exists during identification is big, the problem of time length, poor practicability, and with good recognition correct rate.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Accompanying drawing to be used needed for technology description is had to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the flowchart that open set speaker provided in an embodiment of the present invention knows method for distinguishing;
Fig. 2 be open set speaker provided in an embodiment of the present invention know method for distinguishing in step S103 implement flow process
Figure;
Fig. 3 be another embodiment of the present invention provide open set speaker know method for distinguishing in step S104 implement stream
Cheng Tu;
Fig. 4 is that the output result according to the SVM speaker models provided in an embodiment of the present invention determines the use to be measured
Whether family is the flowchart of the targeted customer;
Fig. 5 is the composition structure chart of the device of open set speaker provided in an embodiment of the present invention identification.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, and
It is not used in the restriction present invention.
The embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, according to the voice messaging
Extract the characteristic vector group of user's sample, and the code book that user's sample is generated according to the characteristic vector group;Obtain
The voice messaging of user to be measured is taken, extracts the characteristic vector group of the user to be measured;Then according to the feature of the user to be measured
In Vector Groups and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtain in the Sample Storehouse with described
The immediate targeted customer of user to be measured;SVM algorithm is finally based on, speaker's confirmation is carried out to the user to be measured, to determine
Whether the user to be measured is the targeted customer;The open set speaker identification side unrelated so as to provide a kind of new text
Formula, it is to avoid use public rejection threshold value, efficiently solve what prior art was present when the identification of open set speaker is carried out
Data volume is big, the problem of time length, poor practicability, and with good recognition correct rate.The embodiment of the present invention additionally provides phase
The device that answers, is described in detail individually below.
Fig. 1 show open set speaker provided in an embodiment of the present invention know method for distinguishing realize flow process.
In embodiments of the present invention, the open set speaker knowledge method for distinguishing is applied to terminal unit, the terminal unit
The including but not limited to computer equipment such as server, computer,
Refering to Fig. 1, the open set speaker knows method for distinguishing to be included:
The voice messaging of each user's sample in Sample Storehouse in step S101, is obtained, is carried according to the voice messaging
The characteristic vector group of user's sample is taken, and the code book of user's sample is generated according to the characteristic vector group.
In embodiments of the present invention, the Sample Storehouse includes several user's samples.Exemplarily, for the ease of saying
Bright, user's sample is expressed as U, n user's sample and is then expressed as Ui(1≤i≤n), each user's sample has
Corresponding voice signal.Alternatively, the embodiment of the present invention adopts mel-frequency cepstrum coefficient (Mel-Frequency Cepstrum
Coefficients, MFCCs), user's sample is extracted according to the voice messaging of each user's sample in the Sample Storehouse
Corresponding characteristic vector group.After the characteristic vector group for obtaining user's sample, then adopt vector quantization (Vector
Quantization, VQ) algorithm generates the code book of each user's sample, and the code book of each user's sample is preserved, obtain code
This storehouse.It should be noted that seek characteristic vector group and generated using Vector Quantization algorithm based on mel-frequency cepstrum coefficient using
The code book of family sample is prior art, and here is omitted.
In step s 102, the voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured.
Exemplarily, the embodiment of the present invention is used as survey by gathering one section of voice messaging of user S (i.e. speaker) to be measured
Examination voice, and the characteristic vector group of the user S to be measured is extracted using mel-frequency cepstrum coefficient.
In step s 103, according to the code of each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse
Originally carry out closed set speaker's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured.
Here, embodiment of the present invention simulation closed set Speaker Recognition System, it is assumed that treat described in existing in the Sample Storehouse
Survey user, carry out closed set speaker's identification using Euclidean distance, with obtain in the Sample Storehouse with the user to be measured most phase
As user's sample, for the ease of narration, be designated as targeted customer here, use UobRepresent.As a preferred exemplary of the present invention,
Fig. 2 show open set speaker provided in an embodiment of the present invention know method for distinguishing in step S103 implement flow process.
Refering to Fig. 2, step S103 includes:
In step s 201, each in the characteristic vector group of the user to be measured and Sample Storehouse is calculated using Euclidean distance
The distance between the code book of user's sample information.
Exemplarily, for n user sample U in the Sample Storehousei(1≤i≤n), each user sample UiCorresponding
One code book.The embodiment of the present invention takes a user sample Ui, then according to the characteristic vector group of the user S to be measured with described
User sample UiCode book carry out Euclidean distance calculating, obtain user's S-phase to be measured for user's sample UiDistance letter
Breath Li.Travel through the n user sample Ui(1≤i≤n), is obtained n range information Li(1≤i≤n).Here, described away from
From information LiReflect the user S to be measured and user's sample UiBetween similarity degree, wherein, the range information Li
Less, then both are more similar;The range information LiBigger, then both are more different.
In step S202, the minima in the range information is obtained, made with the corresponding user's sample of the minima
For the immediate targeted customer of the user to be measured.
Exemplarily, each user sample U in the characteristic vector group for obtaining the user S to be measured and Sample Storehousei(1
≤ i≤n) the distance between code book information LiAfter (1≤i≤n), the comparison n range information Li(1≤i≤n), chooses
Minima therein.The corresponding user's sample U of the minimaiAs user, the i.e. target most like with the user S to be measured
User Uob.
By above-mentioned steps S201-S202, it is achieved that find from the Sample Storehouse immediate with the user to be measured
Targeted customer.As the present invention recognizes, therefore, the user to be measured may not be to use present in Sample Storehouse for open set speaker
Family, the user to be measured and the targeted customer are not necessarily same person.Accordingly, it would be desirable to confirm the user to be measured with described
Whether targeted customer is same person.
In step S104, based on SVM algorithm, speaker's confirmation is carried out to the user to be measured, described to be measured to determine
Whether user is the targeted customer.
Here, the embodiment of the present invention combine VQ algorithm, using support vector machine (Support Vector Machine,
SVM) algorithm is carrying out speaker's confirmation.The process that speaker confirms includes SVM training and judging process.Wherein, the SVM
Training process generates the targeted customer by kernel function and pretends to be SVM speaker models between user with each, i.e., described
Targeted customer and the discriminant function that pretends to be between user.The judging process passes through the characteristic vector of the user to be measured
Group input is to the SVM speaker models, and counts the output result of the SVM speaker models to determine the user to be measured
Whether it is the targeted customer.Exemplarily, Fig. 3 shows that open set speaker provided in an embodiment of the present invention is known in method for distinguishing
Step S104 implement flow process.
Refering to Fig. 3, step S104 includes:
In step S301, the frame flag in the code book of the targeted customer is first category, will to pretend to be the code of user
Frame flag in this is second category, described pretends to be user for the user's sample in the Sample Storehouse in addition to the targeted customer.
In embodiments of the present invention, due to adopting SVM algorithm, code first to all user's samples in the Sample Storehouse is needed
Originally it is marked.Wherein, the classification of labelling includes first category and second category, the first category and second category for area
Partial objectives for user and pretend to be user.Frame in the code book of the targeted customer is all labeled as first category by the embodiment of the present invention,
User's sample in the Sample Storehouse in addition to the targeted customer is divided into pretends to be user, and by the code book for pretending to be user
In frame be all labeled as second category.
Exemplarily, for n user sample U in Sample Storehousei(1≤i≤n), U thereinobFor targeted customer, then remove
The targeted customer UobOutside n-1 user's sample for pretending to be sample.In practical operation, can be by the targeted customer
Code book in frame be all labeled as "+1 ", the frame in the code book for pretending to be user is all labeled as " -1 ".
In step s 302, each code book for pretending to be user is carried out SVM instruction with the code book of the targeted customer respectively
Practice, obtain several SVM speaker models.
Exemplarily, when SVM training is carried out, take the n-1 and pretend to be one in user to pretend to be user, emit described
Fill the code book of user and the targeted customer UobCode book carry out SVM training, so as to obtain the targeted customer UobEmit with described
Fill the SVM speaker models of user.Here, the SVM speaker models are used for differentiating which the every frame in |input paramete belongs to
Individual classification, is belonging to first category (targeted customer) and still falls within second category (pretending to be user).Travel through all of n-1 to pretend to be
User, obtains the targeted customer UobPretend to be the SVM speaker models between user with each, n-1 SVM is obtained and speaks
Person's model.Here, the n-1 SVM speaker models are pretended to be user to be trained with the targeted customer by each and are formed,
Different two-by-two.
In step S303, the code book of the user to be measured is generated according to the characteristic vector group of the user to be measured, by institute
The code book for stating user to be measured is input into each SVM speaker models.
In step s 304, determine whether the user to be measured is institute according to the output result of the SVM speaker models
State targeted customer.
Here, it is preferred to use VQ algorithm, corresponding code book is generated according to the characteristic vector group of the user to be measured.Take institute
SVM speaker models in n-1 SVM speaker models are stated, is spoken as the SVM using the code book of the user to be measured
The |input paramete of person's model, the code book of the user to be measured is substituting in selected SVM speaker models, and is obtained described
The output result of SVM speaker models.Through the judgement of the SVM speaker models, described treating in its output result, is contained
Belong to the targeted customer per the every frame in classification, the i.e. code book belonging to frame in the code book of survey user and still pretend to be user.
N-1 accessed SVM speaker models in traversal step S302, will be defeated for the code book difference of the user to be measured
Enter to the n-1 SVM speaker models, obtain n-1 output result.The embodiment of the present invention is tied according to described n-1 output
Really, count in the code book of the user to be measured per the classification belonging to frame, and whether the user to be measured is determined according to statistical result
For the targeted customer.Exemplarily, Fig. 4 shows provided in an embodiment of the present invention according to the defeated of the SVM speaker models
Go out result determine the user to be measured be whether the targeted customer realize flow process.
Refering to Fig. 4, described in the output result according to the SVM speaker models determines that whether the user to be measured be
Targeted customer includes:
The output result of SVM speaker models in step S401, is taken, is treated according to the output result is obtained
Survey per the classification belonging to frame in the code book of user, the frame number for calculating first category in the output result accounts for the ratio of totalframes,
The output result of traversal several SVM speaker models, obtains several ratios.
For the output result of SVM speaker models, the embodiment of the present invention is counted and belongs to the in the output result
The frame number of one classification, and calculate the ratio between the frame number and the totalframes of the output result that this belongs to first category.This ratio
Value reflect with respect to train with the targeted customer to be formed the SVM speaker models pretend to be user for, described to be measured
User belongs to the probability of targeted customer.The output result of n-1 SVM speaker models of traversal, calculates the frame for belonging to first category
Ratio between number and totalframes, is obtained n-1 ratio.
In step S403, the meansigma methodss of several ratios are calculated.
In step s 404, the meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than predetermined threshold value
When, then the user to be measured is judged as the targeted customer, otherwise, it is determined that the user to be measured is new user.
Here, n-1 ratio acquired in embodiment of the present invention synthesis, asks for the meansigma methodss of the n-1 ratio, and
Belong to the probability of targeted customer using the meansigma methodss as the user to be measured, belong to target so as to improve calculating user to be measured
The accuracy of the probability of user.Compare the meansigma methodss and predetermined threshold value.The predetermined threshold value is set in advance for the embodiment of the present invention
Put, be whether the criterion of the targeted customer as the user to be measured.When the meansigma methodss are more than predetermined threshold value,
The user to be measured is then judged as the targeted customer, otherwise, it is determined that the user to be measured is not same with the targeted customer
Individual, the i.e. user to be measured are new user;So as to complete the identification of open set speaker.It can be seen that, the embodiment of the present invention is provided
Open set speaker know method for distinguishing, it is to avoid use public rejection threshold value, to efficiently solve prior art carrying out opener
The data volume that exists during the identification of speaker is big, the problem of time length, poor practicability, and improves whether user to be measured is target
The differentiation accuracy of user.
Further, when the user to be measured is not same person with the targeted customer, i.e., the user to be measured is
New user, the open set speaker knows method for distinguishing can also be included:
When the user to be measured is new user, the voice messaging of the user to be measured is preserved.
Here, the embodiment of the present invention passes through to preserve the voice messaging of the user to be measured, to add new user's sample,
The Sample Storehouse of the open set speaker identifying system is enriched further.
Open set speaker provided in an embodiment of the present invention given below knows the experimental result of method for distinguishing.For 6 bit tests
Personnel, everyone arbitrarily says 12 words, totally 72 word, by the open set speaker recognition method, recognizes that correct number of times is
57 times, accuracy is 79.2%.Therefore, open set speaker knowledge method for distinguishing provided in an embodiment of the present invention has good identification
Rate.
The embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, according to the voice messaging
Extract the characteristic vector group of user's sample, and the code book that user's sample is generated according to the characteristic vector group;Obtain
The voice messaging of user to be measured is taken, extracts the characteristic vector group of the user to be measured;Then according to the feature of the user to be measured
In Vector Groups and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtain in the Sample Storehouse with described
The immediate targeted customer of user to be measured;SVM algorithm is finally based on, speaker's confirmation is carried out to the user to be measured, to determine
Whether the user to be measured is the targeted customer;The open set speaker identification side unrelated so as to provide a kind of new text
Formula, it is to avoid use public rejection threshold value, efficiently solve what prior art was present when the identification of open set speaker is carried out
Data volume is big, the problem of time length, poor practicability, and with good recognition correct rate.
Fig. 5 shows the composition structure of the device of open set speaker provided in an embodiment of the present invention identification, for the ease of saying
Bright, illustrate only the part related to the embodiment of the present invention.
In embodiments of the present invention, the device of the open set speaker identification is used for realizing in above-mentioned Fig. 1 to Fig. 4 embodiment
Described open set speaker knows method for distinguishing, can be built in the software unit of terminal unit, hardware cell, software and hardware combining
Unit, the terminal unit include but is not limited to the computer equipment such as computer, server.
Refering to Fig. 5, the device of the open set speaker identification includes:
First acquisition module 51, for obtaining the voice messaging of each user's sample in Sample Storehouse, according to the voice
The characteristic vector group of user's sample described in information retrieval, generates the code book of user's sample according to the characteristic vector group.
Second acquisition module 52, for obtaining the voice messaging of user to be measured, extracts the characteristic vector of the user to be measured
Group.
Identification module 53, for according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse
Code book carries out closed set speaker's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured.
Confirm module 54, for SVM algorithm is based on, speaker's confirmation is carried out to the user to be measured, to determine described treating
Survey whether user is the targeted customer.
The embodiment of the present invention is by simulating closed set Speaker Recognition System, it is assumed that there is the use to be measured in the Sample Storehouse
Family, carries out closed set speaker's identification using Euclidean distance, most like with the user to be measured in the Sample Storehouse to obtain
User's sample, for the ease of narration, is designated as targeted customer here.Therefore, the identification module 53 also includes:
Computing unit 531, for every in the characteristic vector group using the Euclidean distance calculating user to be measured and Sample Storehouse
The distance between the code book of one user's sample information.
Acquisition module 532, for obtaining the minima in the range information, with the corresponding user's sample of the minima
As the immediate targeted customer of the user to be measured.
Here, the range information reflects the similarity degree of the user to be measured and user's sample, wherein, institute
State range information less, then both are more similar;The range information is bigger, then both are more different.Obtaining the user to be measured
Characteristic vector group and Sample Storehouse in each user's sample the distance between code book information after, the comparison distance letter
Breath, chooses minima therein.The corresponding user's sample of the minima as the user most like with the user to be measured, i.e.,
Targeted customer, it is achieved thereby that find and the immediate targeted customer of the user to be measured from the Sample Storehouse.Due to this
Bright for open set speaker identification, obtain with after the immediate targeted customer of the user to be measured, in addition it is also necessary to treat described in confirming
Survey whether user is same person with the targeted customer.
Here, the embodiment of the present invention combine VQ algorithm, using support vector machine (Support Vector Machine,
SVM) algorithm is carrying out speaker's confirmation.The process that speaker confirms includes SVM training and judging process.Wherein, the SVM
Training process generates the targeted customer by kernel function and pretends to be SVM speaker models between user with each, i.e., described
Targeted customer and the discriminant function that pretends to be between user.The judging process passes through the characteristic vector of the user to be measured
Group input is to the SVM speaker models, and counts the output result of the SVM speaker models to determine the user to be measured
Whether it is the targeted customer.
Further, the confirmation module 54 also includes:
Indexing unit 541, for being first category, will to pretend to be user's by the frame flag in the code book of the targeted customer
Frame flag in code book is second category, described pretends to be user for the user's sample in the Sample Storehouse in addition to the targeted customer
This.
Here, the first category and second category are used for distinguishing targeted customer and pretend to be user.Exemplarily, in reality
In the operation of border, the frame in the code book of the targeted customer all can be labeled as "+1 ", by the code book for pretending to be user
Frame is all labeled as " -1 ".
Training unit 542, the code book for each to be pretended to be user carries out SVM with the code book of the targeted customer respectively
Training, obtains several SVM speaker models.
Here, the SVM speaker models are used for differentiating which classification the every frame in |input paramete belongs to, that is, belong to the
One classification (targeted customer), still falls within second category (pretending to be user).
First judgement unit 543, for generating the code of the user to be measured according to the characteristic vector group of the user to be measured
This, the code book of the user to be measured is input into each SVM speaker models.
According to the output result of the SVM speaker models, second judgement unit 544, for determining that the user to be measured is
No for the targeted customer.
Here, it is preferred to use VQ algorithm, corresponding code book is generated according to the characteristic vector group of the user to be measured.Take one
Individual SVM speaker models, using the code book of the user to be measured as the |input paramete of the SVM speaker models, treat described
The code book for surveying user is substituting to the output result for obtaining the SVM speaker models in selected SVM speaker models.Warp
The judgement of the SVM speaker models is crossed, is contained in its output result per the class belonging to frame in the code book of the user to be measured
Not, i.e., the every frame in code book belongs to the targeted customer and still pretends to be user.First judgement unit 543 travels through training unit
SVM speaker models accessed by 542, the code book of the user to be measured are separately input into the SVM speaker models, are obtained
Several output results.
In the embodiment of the present invention, second judgement unit 544 is then according to some of first judgement unit 543 acquisition
Individual output result, counts per the classification described in frame in the code book of the user to be measured, and is determined according to statistical result described to be measured
Whether user is the targeted customer.
Second judgement unit 544 also includes:
Ratio calculation unit 5441, for taking the output result of SVM speaker models, obtains according to the output result
Take per the classification belonging to frame in the code book of the user to be measured, the frame number for calculating first category accounts for the ratio of totalframes, travels through institute
The output result of several SVM speaker models is stated, obtains several ratios.
Here, the ratio reflects to form the SVM speaker models with respect to training with the targeted customer
For pretending to be user, the user to be measured belongs to the probability of targeted customer.
Average calculation unit 5442, for calculating the meansigma methodss of several ratios.
Comparing unit 5443, for the meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than default
During threshold value, then the user to be measured is judged as the targeted customer, otherwise, it is determined that the user to be measured is new user.
Here, the embodiment of the present invention is by several ratios acquired in 5442 synthesis of the average calculation unit,
The meansigma methodss of several ratios described in asking for, and belong to the probability of targeted customer as the user to be measured using the meansigma methodss,
So as to improve the accuracy for calculating the probability that user to be measured belongs to targeted customer.Then comparing unit 5443 compares described average
Value and predetermined threshold value.Wherein, whether the predetermined threshold value is pre-set for the embodiment of the present invention, as the user to be measured be
The criterion of the targeted customer.When the meansigma methodss are more than predetermined threshold value, then judge the user to be measured as the mesh
Mark user, otherwise, it is determined that the user to be measured is not same person with the targeted customer, i.e., the user to be measured is used for new
Family;So as to complete the identification of open set speaker, it is to avoid use public rejection threshold value, efficiently solve prior art and entering
The data volume that exists during the identification of row open set speaker is big, the problem of time length, poor practicability, and whether improves user to be measured
Differentiation accuracy for targeted customer.
Further, described device also includes:
Preserving module 55, for when the user to be measured is new user, preserving the voice messaging of the user to be measured.
It should be noted that the system in the embodiment of the present invention can be used for realizing the whole skills in said method embodiment
Art scheme, the function of its each functional module can be implemented according to the method in said method embodiment, and which implements
Process can refer to the associated description in examples detailed above, and here is omitted.
In sum, the embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, according to institute
The characteristic vector group that voice messaging extracts user's sample is stated, and user's sample is generated according to the characteristic vector group
Code book;The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;Then according to the use to be measured
In the characteristic vector group at family and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtains the Sample Storehouse
In with the immediate targeted customer of the user to be measured;SVM algorithm is finally based on, the testimony of a witness of speaking is carried out to the user to be measured
Real, to determine whether the user to be measured is the targeted customer;Speak so as to provide a kind of unrelated opener of new text
People's recognition method, it is to avoid use public rejection threshold value, efficiently solve prior art in the identification for carrying out open set speaker
When the data volume that exists is big, the problem of time length, poor practicability, and with good recognition correct rate.
Those of ordinary skill in the art are it is to be appreciated that the list of each example for describing with reference to the embodiments described herein
Unit and algorithm steps, being capable of being implemented in combination in electronic hardware or computer software and electronic hardware.These functions are actually
Executed with hardware or software mode, the application-specific depending on technical scheme and design constraint.Professional and technical personnel
Each specific application can be used different methods to described function is realized, but this realization is it is not considered that exceed
The scope of the present invention.
Those skilled in the art can be understood that, for convenience and simplicity of description, the system of foregoing description,
Module and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.
In several embodiments provided herein, it should be understood that disclosed open set speaker knows method for distinguishing
And device, can realize by another way.For example, system embodiment described above is only schematic, for example,
The module, the division of unit, only a kind of division of logic function, can have other dividing mode, example when actually realizing
As multiple units or component can in conjunction with or be desirably integrated into another system, or some features can be ignored, or not execute.
Another, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, be
INDIRECT COUPLING or the communication connection of system, module or unit, can be electrical, mechanical or other forms.
The unit that illustrates as separating component can be or may not be physically separate, aobvious as unit
The part for showing can be or may not be physical location, you can be located at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention, module can be integrated in a processing unit,
Can be that unit, module are individually physically present, it is also possible to which two or more units, module are integrated in a unit
In.
If the function realized using in the form of SFU software functional unit and as independent production marketing or use when, permissible
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part contributed by prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, is used including some instructions so that a computer equipment (can be individual
People's computer, server, or network equipment etc.) execute each embodiment methods described of the present invention all or part of step.
And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
Claims (10)
1. a kind of open set speaker knows method for distinguishing, it is characterised in that methods described includes:
The voice messaging of each user's sample in Sample Storehouse is obtained, and the spy of user's sample is extracted according to the voice messaging
Vector Groups are levied, and the code book of user's sample is generated according to the characteristic vector group;
The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;
Code book according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse carries out closed set speaker
Identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Based on SVM algorithm, speaker's confirmation is carried out to the user to be measured, to determine whether the user to be measured is the target
User.
2. open set speaker as claimed in claim 1 knows method for distinguishing, it is characterised in that described according to the user's to be measured
In characteristic vector group and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtain in the Sample Storehouse with
The immediate targeted customer of the user to be measured includes:
Using Euclidean distance calculate the characteristic vector group of the user to be measured and each user's sample in Sample Storehouse code book it
Between range information;
The minima in the range information is obtained, is most connect as the user to be measured using the corresponding user's sample of the minima
Near targeted customer.
3. open set speaker as claimed in claim 1 knows method for distinguishing, it is characterised in that described based on SVM algorithm, to described
User to be measured carries out speaker's confirmation, to determine whether the user to be measured is that the targeted customer includes:
It is second first category, by pretend to be the frame flag in the code book of user that frame flag in the code book of the targeted customer is
Classification, described pretends to be user for the user's sample in the Sample Storehouse in addition to the targeted customer;
The code book that each pretends to be user is carried out SVM training with the code book of the targeted customer respectively, is obtained several SVM and say
Speaker model;
The code book of the user to be measured is generated according to the characteristic vector group of the user to be measured, will be defeated for the code book of the user to be measured
Enter to each SVM speaker models;
Output result according to the SVM speaker models determines whether the user to be measured is the targeted customer.
4. open set speaker as claimed in claim 3 knows method for distinguishing, it is characterised in that described according to the SVM speaker
The output result of model determines whether the user to be measured is that the targeted customer includes:
The output result of SVM speaker models is taken, is obtained according to the output result every in the code book of the user to be measured
Classification belonging to frame, the frame number for calculating first category accounts for the ratio of totalframes, and described in traversal, several SVM speaker models is defeated
Go out result, obtain several ratios;
Calculate the meansigma methodss of several ratios;
The meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than predetermined threshold value, is then judged described to be measured
User is the targeted customer, otherwise, it is determined that the user to be measured is new user.
5. the open set speaker as described in any one of Claims 1-4 knows method for distinguishing, it is characterised in that methods described is also wrapped
Include:
When the user to be measured is new user, the voice messaging of the user to be measured is preserved.
6. the device that a kind of open set speaker is recognized, it is characterised in that described device includes:
First acquisition module, for obtaining the voice messaging of each user's sample in Sample Storehouse, carries according to the voice messaging
The characteristic vector group of user's sample is taken, and the code book of user's sample is generated according to the characteristic vector group;
Second acquisition module, for obtaining the voice messaging of user to be measured, extracts the characteristic vector group of the user to be measured;
Identification module, for entering according to the code book of each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse
Row closed set speaker recognizes, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Confirm module, for SVM algorithm is based on, speaker's confirmation is carried out to the user to be measured, to determine the user to be measured
Whether it is the targeted customer.
7. the device that open set speaker as claimed in claim 6 is recognized, it is characterised in that the identification module includes:
Computing unit, for each user in the characteristic vector group using the Euclidean distance calculating user to be measured and Sample Storehouse
The distance between code book of sample information;
Acquisition module, for obtaining the minima in the range information, using the corresponding user's sample of the minima as institute
State the immediate targeted customer of user to be measured.
8. the device that open set speaker as claimed in claim 7 is recognized, it is characterised in that the confirmation module includes:
Indexing unit, for by the frame flag in the code book of the targeted customer be first category, by pretending to be in the code book of user
Frame flag be second category, described pretend to be user for the user's sample in the Sample Storehouse in addition to the targeted customer;
Training unit, the code book for each to be pretended to be user carries out SVM training with the code book of the targeted customer respectively, obtains
To several SVM speaker models;
First judgement unit, for generating the code book of the user to be measured according to the characteristic vector group of the user to be measured, by institute
The code book for stating user to be measured is input into each SVM speaker models;
According to the output result of the SVM speaker models, second judgement unit, for determining whether the user to be measured is institute
State targeted customer.
9. the device that open set speaker as claimed in claim 8 is recognized, it is characterised in that second judgement unit includes:
Ratio calculation unit, for taking the output result of SVM speaker models, treats according to the output result is obtained
Survey and calculate the frame number of first category per classification belonging to frame in the code book of user and account for the ratio of totalframes, traversal described several
The output result of SVM speaker models, obtains several ratios;
Average calculation unit, for calculating the meansigma methodss of several ratios;
Comparing unit, for the meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than predetermined threshold value, then
The user to be measured is judged as the targeted customer, otherwise, it is determined that the user to be measured is new user.
10. the device that the open set speaker as described in any one of claim 6 to 9 is recognized, it is characterised in that described device is also wrapped
Include:
Preserving module, for when the user to be measured is new user, preserving the voice messaging of the user to be measured.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610819015.XA CN106448682A (en) | 2016-09-13 | 2016-09-13 | Open-set speaker recognition method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610819015.XA CN106448682A (en) | 2016-09-13 | 2016-09-13 | Open-set speaker recognition method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106448682A true CN106448682A (en) | 2017-02-22 |
Family
ID=58168849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610819015.XA Pending CN106448682A (en) | 2016-09-13 | 2016-09-13 | Open-set speaker recognition method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106448682A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109612961A (en) * | 2018-12-13 | 2019-04-12 | 温州大学 | The opener recognition methods of the micro- plastics of coastal environment |
CN112735435A (en) * | 2020-12-25 | 2021-04-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Voiceprint open set identification method with unknown class internal division capability |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6411930B1 (en) * | 1998-11-18 | 2002-06-25 | Lucent Technologies Inc. | Discriminative gaussian mixture models for speaker verification |
CN1787076A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speek person based on hybrid supporting vector machine |
CN102509547A (en) * | 2011-12-29 | 2012-06-20 | 辽宁工业大学 | Method and system for voiceprint recognition based on vector quantization based |
CN102664011A (en) * | 2012-05-17 | 2012-09-12 | 吉林大学 | Method for quickly recognizing speaker |
CN102968990A (en) * | 2012-11-15 | 2013-03-13 | 江苏嘉利德电子科技有限公司 | Speaker identifying method and system |
CN103258536A (en) * | 2013-03-08 | 2013-08-21 | 北京理工大学 | Large-scaled speaker identification method |
-
2016
- 2016-09-13 CN CN201610819015.XA patent/CN106448682A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6411930B1 (en) * | 1998-11-18 | 2002-06-25 | Lucent Technologies Inc. | Discriminative gaussian mixture models for speaker verification |
CN1787076A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speek person based on hybrid supporting vector machine |
CN102509547A (en) * | 2011-12-29 | 2012-06-20 | 辽宁工业大学 | Method and system for voiceprint recognition based on vector quantization based |
CN102664011A (en) * | 2012-05-17 | 2012-09-12 | 吉林大学 | Method for quickly recognizing speaker |
CN102968990A (en) * | 2012-11-15 | 2013-03-13 | 江苏嘉利德电子科技有限公司 | Speaker identifying method and system |
CN103258536A (en) * | 2013-03-08 | 2013-08-21 | 北京理工大学 | Large-scaled speaker identification method |
Non-Patent Citations (2)
Title |
---|
刘兴立: "《硕士学位论文》", 30 January 2002, 大连理工大学 * |
司罗,胡起秀,金琴: "基于码字概率分布(BCDM)的说话人辨识系统", 《第五届全国人机语音通讯学术会议论文集》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109612961A (en) * | 2018-12-13 | 2019-04-12 | 温州大学 | The opener recognition methods of the micro- plastics of coastal environment |
CN109612961B (en) * | 2018-12-13 | 2021-06-25 | 温州大学 | Open set identification method of coastal environment micro-plastic |
CN112735435A (en) * | 2020-12-25 | 2021-04-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Voiceprint open set identification method with unknown class internal division capability |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107481720B (en) | Explicit voiceprint recognition method and device | |
CN110263150B (en) | Text generation method, device, computer equipment and storage medium | |
CN107610709B (en) | Method and system for training voiceprint recognition model | |
CN109859772B (en) | Emotion recognition method, emotion recognition device and computer-readable storage medium | |
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
CN108090127B (en) | Method and device for establishing question and answer text evaluation model and evaluating question and answer text | |
CN111243602B (en) | Voiceprint recognition method based on gender, nationality and emotion information | |
CN108288468A (en) | Audio recognition method and device | |
CN110443692A (en) | Enterprise's credit authorization method, apparatus, equipment and computer readable storage medium | |
CN108399169A (en) | Dialog process methods, devices and systems based on question answering system and mobile device | |
CN109271493A (en) | A kind of language text processing method, device and storage medium | |
CN101562012B (en) | Method and system for graded measurement of voice | |
CN112533051A (en) | Bullet screen information display method and device, computer equipment and storage medium | |
CN109117480A (en) | Word prediction technique, device, computer equipment and storage medium | |
CN108304373A (en) | Construction method, device, storage medium and the electronic device of semantic dictionary | |
CN109087205A (en) | Prediction technique and device, the computer equipment and readable storage medium storing program for executing of public opinion index | |
CN110704618B (en) | Method and device for determining standard problem corresponding to dialogue data | |
Rill-García et al. | High-level features for multimodal deception detection in videos | |
CN106709804A (en) | Interactive wealth planning consulting robot system | |
CN107731234A (en) | A kind of method and device of authentication | |
CN109800309A (en) | Classroom Discourse genre classification methods and device | |
CN113051923A (en) | Data verification method and device, computer equipment and storage medium | |
CN110223678A (en) | Audio recognition method and system | |
CN113535925A (en) | Voice broadcasting method, device, equipment and storage medium | |
CN108228950A (en) | A kind of information processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170222 |