CN106448682A - Open-set speaker recognition method and apparatus - Google Patents

Open-set speaker recognition method and apparatus Download PDF

Info

Publication number
CN106448682A
CN106448682A CN201610819015.XA CN201610819015A CN106448682A CN 106448682 A CN106448682 A CN 106448682A CN 201610819015 A CN201610819015 A CN 201610819015A CN 106448682 A CN106448682 A CN 106448682A
Authority
CN
China
Prior art keywords
user
measured
sample
speaker
code book
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610819015.XA
Other languages
Chinese (zh)
Inventor
韩云秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TCL Corp
Original Assignee
TCL Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TCL Corp filed Critical TCL Corp
Priority to CN201610819015.XA priority Critical patent/CN106448682A/en
Publication of CN106448682A publication Critical patent/CN106448682A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0007Codebook element generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention is applied to the information technology field and provides an open-set speaker recognition method and apparatus wherein the method comprises: obtaining the voice information of each user sample from a sample database; according to the voice information, extracting the characteristic vector set of the user sample; and according to the characteristic vector set, generating a codebook for the user sample; obtaining the voice information of a to-be-detected user; extracting the characteristic vector set of the to-be-detected user; according to the characteristic vector set of the to-be-detected user and the codebook of each user sample in the sample database, carrying out open-set speaker recognition to obtain from the database a targeted user that is mostly approximate to the to-be-detected user; and based on the SVM algorithm, conducting speaker confirmation to the to-be-detected user so as to determine whether the to-de-detected user is the targeted user or not. The method and apparatus of the invention resolve the problems in the existing open-set speaker recognition with large computing, long time lasting and poor practicality. In addition, with the method and apparatus, a good and successful recognition rate can be achieved.

Description

The method and device of open set speaker identification
Technical field
The invention belongs to areas of information technology, more particularly to a kind of method and device of open set speaker identification.
Background technology
In open set speaker identifying system, as new user appearance is had, therefore, speaker's identification generally will be first carried out (Speaker Identification, SI), then carries out speaker's confirmation (Speaker Verification, SV).Existing Technology is mainly set up every in open set speaker recognition system using vector quantization (Vector Quantization, VQ) algorithm The personal code book of individual user's sample, and to one public rejection threshold value of all user's Sample Establishings.However, personal code book is not wrapped Feature containing other users sample in Sample Storehouse, the absolute score for obtaining during identification does not have comparability, needs to return score One changes;The then more difficult determination of public rejection threshold value, generally takes experiment value, it is difficult to adapt to different situations.In sum, prior art There is a problem of that when the identification of open set speaker is carried out data volume is big, time length, poor practicability.
Content of the invention
In consideration of it, the embodiment of the present invention provides a kind of method and device of open set speaker identification, to solve prior art The data volume that exists when the identification of open set speaker is carried out is big, the problem of time length, poor practicability.
First aspect, there is provided a kind of open set speaker knows method for distinguishing, methods described includes:
The voice messaging of each user's sample in Sample Storehouse is obtained, user's sample is extracted according to the voice messaging Characteristic vector group, generate the code book of user's sample according to the characteristic vector group;
The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;
Code book according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse carries out closed set and says Words people's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Based on SVM algorithm, speaker's confirmation is carried out to the user to be measured, to determine whether the user to be measured is described Targeted customer.
A kind of second aspect, there is provided device of open set speaker identification, described device includes:
First acquisition module, for obtaining the voice messaging of each user's sample in Sample Storehouse, believes according to the voice Breath extracts the characteristic vector group of user's sample, generates the code book of user's sample according to the characteristic vector group;
Second acquisition module, for obtaining the voice messaging of user to be measured, extracts the characteristic vector group of the user to be measured;
Identification module, for the code according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse Originally carry out closed set speaker's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Confirm module, for SVM algorithm is based on, speaker's confirmation is carried out to the user to be measured, described to be measured to determine Whether user is the targeted customer.
Compared with prior art, the embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, The characteristic vector group of user's sample is extracted according to the voice messaging, and the use is generated according to the characteristic vector group The code book of family sample;The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;Then according to described In the characteristic vector group of user to be measured and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtains described With the immediate targeted customer of the user to be measured in Sample Storehouse;SVM algorithm is finally based on, the user to be measured is spoken People confirms, to determine whether the user to be measured is the targeted customer;The opener unrelated so as to provide a kind of new text Speaker Identification mode, it is to avoid use public rejection threshold value, to efficiently solve prior art carrying out open set speaker The data volume that exists during identification is big, the problem of time length, poor practicability, and with good recognition correct rate.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Accompanying drawing to be used needed for technology description is had to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the flowchart that open set speaker provided in an embodiment of the present invention knows method for distinguishing;
Fig. 2 be open set speaker provided in an embodiment of the present invention know method for distinguishing in step S103 implement flow process Figure;
Fig. 3 be another embodiment of the present invention provide open set speaker know method for distinguishing in step S104 implement stream Cheng Tu;
Fig. 4 is that the output result according to the SVM speaker models provided in an embodiment of the present invention determines the use to be measured Whether family is the flowchart of the targeted customer;
Fig. 5 is the composition structure chart of the device of open set speaker provided in an embodiment of the present invention identification.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, and It is not used in the restriction present invention.
The embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, according to the voice messaging Extract the characteristic vector group of user's sample, and the code book that user's sample is generated according to the characteristic vector group;Obtain The voice messaging of user to be measured is taken, extracts the characteristic vector group of the user to be measured;Then according to the feature of the user to be measured In Vector Groups and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtain in the Sample Storehouse with described The immediate targeted customer of user to be measured;SVM algorithm is finally based on, speaker's confirmation is carried out to the user to be measured, to determine Whether the user to be measured is the targeted customer;The open set speaker identification side unrelated so as to provide a kind of new text Formula, it is to avoid use public rejection threshold value, efficiently solve what prior art was present when the identification of open set speaker is carried out Data volume is big, the problem of time length, poor practicability, and with good recognition correct rate.The embodiment of the present invention additionally provides phase The device that answers, is described in detail individually below.
Fig. 1 show open set speaker provided in an embodiment of the present invention know method for distinguishing realize flow process.
In embodiments of the present invention, the open set speaker knowledge method for distinguishing is applied to terminal unit, the terminal unit The including but not limited to computer equipment such as server, computer,
Refering to Fig. 1, the open set speaker knows method for distinguishing to be included:
The voice messaging of each user's sample in Sample Storehouse in step S101, is obtained, is carried according to the voice messaging The characteristic vector group of user's sample is taken, and the code book of user's sample is generated according to the characteristic vector group.
In embodiments of the present invention, the Sample Storehouse includes several user's samples.Exemplarily, for the ease of saying Bright, user's sample is expressed as U, n user's sample and is then expressed as Ui(1≤i≤n), each user's sample has Corresponding voice signal.Alternatively, the embodiment of the present invention adopts mel-frequency cepstrum coefficient (Mel-Frequency Cepstrum Coefficients, MFCCs), user's sample is extracted according to the voice messaging of each user's sample in the Sample Storehouse Corresponding characteristic vector group.After the characteristic vector group for obtaining user's sample, then adopt vector quantization (Vector Quantization, VQ) algorithm generates the code book of each user's sample, and the code book of each user's sample is preserved, obtain code This storehouse.It should be noted that seek characteristic vector group and generated using Vector Quantization algorithm based on mel-frequency cepstrum coefficient using The code book of family sample is prior art, and here is omitted.
In step s 102, the voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured.
Exemplarily, the embodiment of the present invention is used as survey by gathering one section of voice messaging of user S (i.e. speaker) to be measured Examination voice, and the characteristic vector group of the user S to be measured is extracted using mel-frequency cepstrum coefficient.
In step s 103, according to the code of each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse Originally carry out closed set speaker's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured.
Here, embodiment of the present invention simulation closed set Speaker Recognition System, it is assumed that treat described in existing in the Sample Storehouse Survey user, carry out closed set speaker's identification using Euclidean distance, with obtain in the Sample Storehouse with the user to be measured most phase As user's sample, for the ease of narration, be designated as targeted customer here, use UobRepresent.As a preferred exemplary of the present invention, Fig. 2 show open set speaker provided in an embodiment of the present invention know method for distinguishing in step S103 implement flow process.
Refering to Fig. 2, step S103 includes:
In step s 201, each in the characteristic vector group of the user to be measured and Sample Storehouse is calculated using Euclidean distance The distance between the code book of user's sample information.
Exemplarily, for n user sample U in the Sample Storehousei(1≤i≤n), each user sample UiCorresponding One code book.The embodiment of the present invention takes a user sample Ui, then according to the characteristic vector group of the user S to be measured with described User sample UiCode book carry out Euclidean distance calculating, obtain user's S-phase to be measured for user's sample UiDistance letter Breath Li.Travel through the n user sample Ui(1≤i≤n), is obtained n range information Li(1≤i≤n).Here, described away from From information LiReflect the user S to be measured and user's sample UiBetween similarity degree, wherein, the range information Li Less, then both are more similar;The range information LiBigger, then both are more different.
In step S202, the minima in the range information is obtained, made with the corresponding user's sample of the minima For the immediate targeted customer of the user to be measured.
Exemplarily, each user sample U in the characteristic vector group for obtaining the user S to be measured and Sample Storehousei(1 ≤ i≤n) the distance between code book information LiAfter (1≤i≤n), the comparison n range information Li(1≤i≤n), chooses Minima therein.The corresponding user's sample U of the minimaiAs user, the i.e. target most like with the user S to be measured User Uob.
By above-mentioned steps S201-S202, it is achieved that find from the Sample Storehouse immediate with the user to be measured Targeted customer.As the present invention recognizes, therefore, the user to be measured may not be to use present in Sample Storehouse for open set speaker Family, the user to be measured and the targeted customer are not necessarily same person.Accordingly, it would be desirable to confirm the user to be measured with described Whether targeted customer is same person.
In step S104, based on SVM algorithm, speaker's confirmation is carried out to the user to be measured, described to be measured to determine Whether user is the targeted customer.
Here, the embodiment of the present invention combine VQ algorithm, using support vector machine (Support Vector Machine, SVM) algorithm is carrying out speaker's confirmation.The process that speaker confirms includes SVM training and judging process.Wherein, the SVM Training process generates the targeted customer by kernel function and pretends to be SVM speaker models between user with each, i.e., described Targeted customer and the discriminant function that pretends to be between user.The judging process passes through the characteristic vector of the user to be measured Group input is to the SVM speaker models, and counts the output result of the SVM speaker models to determine the user to be measured Whether it is the targeted customer.Exemplarily, Fig. 3 shows that open set speaker provided in an embodiment of the present invention is known in method for distinguishing Step S104 implement flow process.
Refering to Fig. 3, step S104 includes:
In step S301, the frame flag in the code book of the targeted customer is first category, will to pretend to be the code of user Frame flag in this is second category, described pretends to be user for the user's sample in the Sample Storehouse in addition to the targeted customer.
In embodiments of the present invention, due to adopting SVM algorithm, code first to all user's samples in the Sample Storehouse is needed Originally it is marked.Wherein, the classification of labelling includes first category and second category, the first category and second category for area Partial objectives for user and pretend to be user.Frame in the code book of the targeted customer is all labeled as first category by the embodiment of the present invention, User's sample in the Sample Storehouse in addition to the targeted customer is divided into pretends to be user, and by the code book for pretending to be user In frame be all labeled as second category.
Exemplarily, for n user sample U in Sample Storehousei(1≤i≤n), U thereinobFor targeted customer, then remove The targeted customer UobOutside n-1 user's sample for pretending to be sample.In practical operation, can be by the targeted customer Code book in frame be all labeled as "+1 ", the frame in the code book for pretending to be user is all labeled as " -1 ".
In step s 302, each code book for pretending to be user is carried out SVM instruction with the code book of the targeted customer respectively Practice, obtain several SVM speaker models.
Exemplarily, when SVM training is carried out, take the n-1 and pretend to be one in user to pretend to be user, emit described Fill the code book of user and the targeted customer UobCode book carry out SVM training, so as to obtain the targeted customer UobEmit with described Fill the SVM speaker models of user.Here, the SVM speaker models are used for differentiating which the every frame in |input paramete belongs to Individual classification, is belonging to first category (targeted customer) and still falls within second category (pretending to be user).Travel through all of n-1 to pretend to be User, obtains the targeted customer UobPretend to be the SVM speaker models between user with each, n-1 SVM is obtained and speaks Person's model.Here, the n-1 SVM speaker models are pretended to be user to be trained with the targeted customer by each and are formed, Different two-by-two.
In step S303, the code book of the user to be measured is generated according to the characteristic vector group of the user to be measured, by institute The code book for stating user to be measured is input into each SVM speaker models.
In step s 304, determine whether the user to be measured is institute according to the output result of the SVM speaker models State targeted customer.
Here, it is preferred to use VQ algorithm, corresponding code book is generated according to the characteristic vector group of the user to be measured.Take institute SVM speaker models in n-1 SVM speaker models are stated, is spoken as the SVM using the code book of the user to be measured The |input paramete of person's model, the code book of the user to be measured is substituting in selected SVM speaker models, and is obtained described The output result of SVM speaker models.Through the judgement of the SVM speaker models, described treating in its output result, is contained Belong to the targeted customer per the every frame in classification, the i.e. code book belonging to frame in the code book of survey user and still pretend to be user.
N-1 accessed SVM speaker models in traversal step S302, will be defeated for the code book difference of the user to be measured Enter to the n-1 SVM speaker models, obtain n-1 output result.The embodiment of the present invention is tied according to described n-1 output Really, count in the code book of the user to be measured per the classification belonging to frame, and whether the user to be measured is determined according to statistical result For the targeted customer.Exemplarily, Fig. 4 shows provided in an embodiment of the present invention according to the defeated of the SVM speaker models Go out result determine the user to be measured be whether the targeted customer realize flow process.
Refering to Fig. 4, described in the output result according to the SVM speaker models determines that whether the user to be measured be Targeted customer includes:
The output result of SVM speaker models in step S401, is taken, is treated according to the output result is obtained Survey per the classification belonging to frame in the code book of user, the frame number for calculating first category in the output result accounts for the ratio of totalframes, The output result of traversal several SVM speaker models, obtains several ratios.
For the output result of SVM speaker models, the embodiment of the present invention is counted and belongs to the in the output result The frame number of one classification, and calculate the ratio between the frame number and the totalframes of the output result that this belongs to first category.This ratio Value reflect with respect to train with the targeted customer to be formed the SVM speaker models pretend to be user for, described to be measured User belongs to the probability of targeted customer.The output result of n-1 SVM speaker models of traversal, calculates the frame for belonging to first category Ratio between number and totalframes, is obtained n-1 ratio.
In step S403, the meansigma methodss of several ratios are calculated.
In step s 404, the meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than predetermined threshold value When, then the user to be measured is judged as the targeted customer, otherwise, it is determined that the user to be measured is new user.
Here, n-1 ratio acquired in embodiment of the present invention synthesis, asks for the meansigma methodss of the n-1 ratio, and Belong to the probability of targeted customer using the meansigma methodss as the user to be measured, belong to target so as to improve calculating user to be measured The accuracy of the probability of user.Compare the meansigma methodss and predetermined threshold value.The predetermined threshold value is set in advance for the embodiment of the present invention Put, be whether the criterion of the targeted customer as the user to be measured.When the meansigma methodss are more than predetermined threshold value, The user to be measured is then judged as the targeted customer, otherwise, it is determined that the user to be measured is not same with the targeted customer Individual, the i.e. user to be measured are new user;So as to complete the identification of open set speaker.It can be seen that, the embodiment of the present invention is provided Open set speaker know method for distinguishing, it is to avoid use public rejection threshold value, to efficiently solve prior art carrying out opener The data volume that exists during the identification of speaker is big, the problem of time length, poor practicability, and improves whether user to be measured is target The differentiation accuracy of user.
Further, when the user to be measured is not same person with the targeted customer, i.e., the user to be measured is New user, the open set speaker knows method for distinguishing can also be included:
When the user to be measured is new user, the voice messaging of the user to be measured is preserved.
Here, the embodiment of the present invention passes through to preserve the voice messaging of the user to be measured, to add new user's sample, The Sample Storehouse of the open set speaker identifying system is enriched further.
Open set speaker provided in an embodiment of the present invention given below knows the experimental result of method for distinguishing.For 6 bit tests Personnel, everyone arbitrarily says 12 words, totally 72 word, by the open set speaker recognition method, recognizes that correct number of times is 57 times, accuracy is 79.2%.Therefore, open set speaker knowledge method for distinguishing provided in an embodiment of the present invention has good identification Rate.
The embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, according to the voice messaging Extract the characteristic vector group of user's sample, and the code book that user's sample is generated according to the characteristic vector group;Obtain The voice messaging of user to be measured is taken, extracts the characteristic vector group of the user to be measured;Then according to the feature of the user to be measured In Vector Groups and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtain in the Sample Storehouse with described The immediate targeted customer of user to be measured;SVM algorithm is finally based on, speaker's confirmation is carried out to the user to be measured, to determine Whether the user to be measured is the targeted customer;The open set speaker identification side unrelated so as to provide a kind of new text Formula, it is to avoid use public rejection threshold value, efficiently solve what prior art was present when the identification of open set speaker is carried out Data volume is big, the problem of time length, poor practicability, and with good recognition correct rate.
Fig. 5 shows the composition structure of the device of open set speaker provided in an embodiment of the present invention identification, for the ease of saying Bright, illustrate only the part related to the embodiment of the present invention.
In embodiments of the present invention, the device of the open set speaker identification is used for realizing in above-mentioned Fig. 1 to Fig. 4 embodiment Described open set speaker knows method for distinguishing, can be built in the software unit of terminal unit, hardware cell, software and hardware combining Unit, the terminal unit include but is not limited to the computer equipment such as computer, server.
Refering to Fig. 5, the device of the open set speaker identification includes:
First acquisition module 51, for obtaining the voice messaging of each user's sample in Sample Storehouse, according to the voice The characteristic vector group of user's sample described in information retrieval, generates the code book of user's sample according to the characteristic vector group.
Second acquisition module 52, for obtaining the voice messaging of user to be measured, extracts the characteristic vector of the user to be measured Group.
Identification module 53, for according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse Code book carries out closed set speaker's identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured.
Confirm module 54, for SVM algorithm is based on, speaker's confirmation is carried out to the user to be measured, to determine described treating Survey whether user is the targeted customer.
The embodiment of the present invention is by simulating closed set Speaker Recognition System, it is assumed that there is the use to be measured in the Sample Storehouse Family, carries out closed set speaker's identification using Euclidean distance, most like with the user to be measured in the Sample Storehouse to obtain User's sample, for the ease of narration, is designated as targeted customer here.Therefore, the identification module 53 also includes:
Computing unit 531, for every in the characteristic vector group using the Euclidean distance calculating user to be measured and Sample Storehouse The distance between the code book of one user's sample information.
Acquisition module 532, for obtaining the minima in the range information, with the corresponding user's sample of the minima As the immediate targeted customer of the user to be measured.
Here, the range information reflects the similarity degree of the user to be measured and user's sample, wherein, institute State range information less, then both are more similar;The range information is bigger, then both are more different.Obtaining the user to be measured Characteristic vector group and Sample Storehouse in each user's sample the distance between code book information after, the comparison distance letter Breath, chooses minima therein.The corresponding user's sample of the minima as the user most like with the user to be measured, i.e., Targeted customer, it is achieved thereby that find and the immediate targeted customer of the user to be measured from the Sample Storehouse.Due to this Bright for open set speaker identification, obtain with after the immediate targeted customer of the user to be measured, in addition it is also necessary to treat described in confirming Survey whether user is same person with the targeted customer.
Here, the embodiment of the present invention combine VQ algorithm, using support vector machine (Support Vector Machine, SVM) algorithm is carrying out speaker's confirmation.The process that speaker confirms includes SVM training and judging process.Wherein, the SVM Training process generates the targeted customer by kernel function and pretends to be SVM speaker models between user with each, i.e., described Targeted customer and the discriminant function that pretends to be between user.The judging process passes through the characteristic vector of the user to be measured Group input is to the SVM speaker models, and counts the output result of the SVM speaker models to determine the user to be measured Whether it is the targeted customer.
Further, the confirmation module 54 also includes:
Indexing unit 541, for being first category, will to pretend to be user's by the frame flag in the code book of the targeted customer Frame flag in code book is second category, described pretends to be user for the user's sample in the Sample Storehouse in addition to the targeted customer This.
Here, the first category and second category are used for distinguishing targeted customer and pretend to be user.Exemplarily, in reality In the operation of border, the frame in the code book of the targeted customer all can be labeled as "+1 ", by the code book for pretending to be user Frame is all labeled as " -1 ".
Training unit 542, the code book for each to be pretended to be user carries out SVM with the code book of the targeted customer respectively Training, obtains several SVM speaker models.
Here, the SVM speaker models are used for differentiating which classification the every frame in |input paramete belongs to, that is, belong to the One classification (targeted customer), still falls within second category (pretending to be user).
First judgement unit 543, for generating the code of the user to be measured according to the characteristic vector group of the user to be measured This, the code book of the user to be measured is input into each SVM speaker models.
According to the output result of the SVM speaker models, second judgement unit 544, for determining that the user to be measured is No for the targeted customer.
Here, it is preferred to use VQ algorithm, corresponding code book is generated according to the characteristic vector group of the user to be measured.Take one Individual SVM speaker models, using the code book of the user to be measured as the |input paramete of the SVM speaker models, treat described The code book for surveying user is substituting to the output result for obtaining the SVM speaker models in selected SVM speaker models.Warp The judgement of the SVM speaker models is crossed, is contained in its output result per the class belonging to frame in the code book of the user to be measured Not, i.e., the every frame in code book belongs to the targeted customer and still pretends to be user.First judgement unit 543 travels through training unit SVM speaker models accessed by 542, the code book of the user to be measured are separately input into the SVM speaker models, are obtained Several output results.
In the embodiment of the present invention, second judgement unit 544 is then according to some of first judgement unit 543 acquisition Individual output result, counts per the classification described in frame in the code book of the user to be measured, and is determined according to statistical result described to be measured Whether user is the targeted customer.
Second judgement unit 544 also includes:
Ratio calculation unit 5441, for taking the output result of SVM speaker models, obtains according to the output result Take per the classification belonging to frame in the code book of the user to be measured, the frame number for calculating first category accounts for the ratio of totalframes, travels through institute The output result of several SVM speaker models is stated, obtains several ratios.
Here, the ratio reflects to form the SVM speaker models with respect to training with the targeted customer For pretending to be user, the user to be measured belongs to the probability of targeted customer.
Average calculation unit 5442, for calculating the meansigma methodss of several ratios.
Comparing unit 5443, for the meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than default During threshold value, then the user to be measured is judged as the targeted customer, otherwise, it is determined that the user to be measured is new user.
Here, the embodiment of the present invention is by several ratios acquired in 5442 synthesis of the average calculation unit, The meansigma methodss of several ratios described in asking for, and belong to the probability of targeted customer as the user to be measured using the meansigma methodss, So as to improve the accuracy for calculating the probability that user to be measured belongs to targeted customer.Then comparing unit 5443 compares described average Value and predetermined threshold value.Wherein, whether the predetermined threshold value is pre-set for the embodiment of the present invention, as the user to be measured be The criterion of the targeted customer.When the meansigma methodss are more than predetermined threshold value, then judge the user to be measured as the mesh Mark user, otherwise, it is determined that the user to be measured is not same person with the targeted customer, i.e., the user to be measured is used for new Family;So as to complete the identification of open set speaker, it is to avoid use public rejection threshold value, efficiently solve prior art and entering The data volume that exists during the identification of row open set speaker is big, the problem of time length, poor practicability, and whether improves user to be measured Differentiation accuracy for targeted customer.
Further, described device also includes:
Preserving module 55, for when the user to be measured is new user, preserving the voice messaging of the user to be measured.
It should be noted that the system in the embodiment of the present invention can be used for realizing the whole skills in said method embodiment Art scheme, the function of its each functional module can be implemented according to the method in said method embodiment, and which implements Process can refer to the associated description in examples detailed above, and here is omitted.
In sum, the embodiment of the present invention passes through to obtain the voice messaging of each user's sample in Sample Storehouse, according to institute The characteristic vector group that voice messaging extracts user's sample is stated, and user's sample is generated according to the characteristic vector group Code book;The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;Then according to the use to be measured In the characteristic vector group at family and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtains the Sample Storehouse In with the immediate targeted customer of the user to be measured;SVM algorithm is finally based on, the testimony of a witness of speaking is carried out to the user to be measured Real, to determine whether the user to be measured is the targeted customer;Speak so as to provide a kind of unrelated opener of new text People's recognition method, it is to avoid use public rejection threshold value, efficiently solve prior art in the identification for carrying out open set speaker When the data volume that exists is big, the problem of time length, poor practicability, and with good recognition correct rate.
Those of ordinary skill in the art are it is to be appreciated that the list of each example for describing with reference to the embodiments described herein Unit and algorithm steps, being capable of being implemented in combination in electronic hardware or computer software and electronic hardware.These functions are actually Executed with hardware or software mode, the application-specific depending on technical scheme and design constraint.Professional and technical personnel Each specific application can be used different methods to described function is realized, but this realization is it is not considered that exceed The scope of the present invention.
Those skilled in the art can be understood that, for convenience and simplicity of description, the system of foregoing description, Module and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.
In several embodiments provided herein, it should be understood that disclosed open set speaker knows method for distinguishing And device, can realize by another way.For example, system embodiment described above is only schematic, for example, The module, the division of unit, only a kind of division of logic function, can have other dividing mode, example when actually realizing As multiple units or component can in conjunction with or be desirably integrated into another system, or some features can be ignored, or not execute. Another, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, be INDIRECT COUPLING or the communication connection of system, module or unit, can be electrical, mechanical or other forms.
The unit that illustrates as separating component can be or may not be physically separate, aobvious as unit The part for showing can be or may not be physical location, you can be located at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention, module can be integrated in a processing unit, Can be that unit, module are individually physically present, it is also possible to which two or more units, module are integrated in a unit In.
If the function realized using in the form of SFU software functional unit and as independent production marketing or use when, permissible It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part contributed by prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, is used including some instructions so that a computer equipment (can be individual People's computer, server, or network equipment etc.) execute each embodiment methods described of the present invention all or part of step. And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims (10)

1. a kind of open set speaker knows method for distinguishing, it is characterised in that methods described includes:
The voice messaging of each user's sample in Sample Storehouse is obtained, and the spy of user's sample is extracted according to the voice messaging Vector Groups are levied, and the code book of user's sample is generated according to the characteristic vector group;
The voice messaging of user to be measured is obtained, extracts the characteristic vector group of the user to be measured;
Code book according to each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse carries out closed set speaker Identification, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Based on SVM algorithm, speaker's confirmation is carried out to the user to be measured, to determine whether the user to be measured is the target User.
2. open set speaker as claimed in claim 1 knows method for distinguishing, it is characterised in that described according to the user's to be measured In characteristic vector group and Sample Storehouse, the code book of each user's sample carries out closed set speaker's identification, obtain in the Sample Storehouse with The immediate targeted customer of the user to be measured includes:
Using Euclidean distance calculate the characteristic vector group of the user to be measured and each user's sample in Sample Storehouse code book it Between range information;
The minima in the range information is obtained, is most connect as the user to be measured using the corresponding user's sample of the minima Near targeted customer.
3. open set speaker as claimed in claim 1 knows method for distinguishing, it is characterised in that described based on SVM algorithm, to described User to be measured carries out speaker's confirmation, to determine whether the user to be measured is that the targeted customer includes:
It is second first category, by pretend to be the frame flag in the code book of user that frame flag in the code book of the targeted customer is Classification, described pretends to be user for the user's sample in the Sample Storehouse in addition to the targeted customer;
The code book that each pretends to be user is carried out SVM training with the code book of the targeted customer respectively, is obtained several SVM and say Speaker model;
The code book of the user to be measured is generated according to the characteristic vector group of the user to be measured, will be defeated for the code book of the user to be measured Enter to each SVM speaker models;
Output result according to the SVM speaker models determines whether the user to be measured is the targeted customer.
4. open set speaker as claimed in claim 3 knows method for distinguishing, it is characterised in that described according to the SVM speaker The output result of model determines whether the user to be measured is that the targeted customer includes:
The output result of SVM speaker models is taken, is obtained according to the output result every in the code book of the user to be measured Classification belonging to frame, the frame number for calculating first category accounts for the ratio of totalframes, and described in traversal, several SVM speaker models is defeated Go out result, obtain several ratios;
Calculate the meansigma methodss of several ratios;
The meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than predetermined threshold value, is then judged described to be measured User is the targeted customer, otherwise, it is determined that the user to be measured is new user.
5. the open set speaker as described in any one of Claims 1-4 knows method for distinguishing, it is characterised in that methods described is also wrapped Include:
When the user to be measured is new user, the voice messaging of the user to be measured is preserved.
6. the device that a kind of open set speaker is recognized, it is characterised in that described device includes:
First acquisition module, for obtaining the voice messaging of each user's sample in Sample Storehouse, carries according to the voice messaging The characteristic vector group of user's sample is taken, and the code book of user's sample is generated according to the characteristic vector group;
Second acquisition module, for obtaining the voice messaging of user to be measured, extracts the characteristic vector group of the user to be measured;
Identification module, for entering according to the code book of each user's sample in the characteristic vector group of the user to be measured and Sample Storehouse Row closed set speaker recognizes, obtain in the Sample Storehouse with the immediate targeted customer of the user to be measured;
Confirm module, for SVM algorithm is based on, speaker's confirmation is carried out to the user to be measured, to determine the user to be measured Whether it is the targeted customer.
7. the device that open set speaker as claimed in claim 6 is recognized, it is characterised in that the identification module includes:
Computing unit, for each user in the characteristic vector group using the Euclidean distance calculating user to be measured and Sample Storehouse The distance between code book of sample information;
Acquisition module, for obtaining the minima in the range information, using the corresponding user's sample of the minima as institute State the immediate targeted customer of user to be measured.
8. the device that open set speaker as claimed in claim 7 is recognized, it is characterised in that the confirmation module includes:
Indexing unit, for by the frame flag in the code book of the targeted customer be first category, by pretending to be in the code book of user Frame flag be second category, described pretend to be user for the user's sample in the Sample Storehouse in addition to the targeted customer;
Training unit, the code book for each to be pretended to be user carries out SVM training with the code book of the targeted customer respectively, obtains To several SVM speaker models;
First judgement unit, for generating the code book of the user to be measured according to the characteristic vector group of the user to be measured, by institute The code book for stating user to be measured is input into each SVM speaker models;
According to the output result of the SVM speaker models, second judgement unit, for determining whether the user to be measured is institute State targeted customer.
9. the device that open set speaker as claimed in claim 8 is recognized, it is characterised in that second judgement unit includes:
Ratio calculation unit, for taking the output result of SVM speaker models, treats according to the output result is obtained Survey and calculate the frame number of first category per classification belonging to frame in the code book of user and account for the ratio of totalframes, traversal described several The output result of SVM speaker models, obtains several ratios;
Average calculation unit, for calculating the meansigma methodss of several ratios;
Comparing unit, for the meansigma methodss and predetermined threshold value are compared, when the meansigma methodss are more than predetermined threshold value, then The user to be measured is judged as the targeted customer, otherwise, it is determined that the user to be measured is new user.
10. the device that the open set speaker as described in any one of claim 6 to 9 is recognized, it is characterised in that described device is also wrapped Include:
Preserving module, for when the user to be measured is new user, preserving the voice messaging of the user to be measured.
CN201610819015.XA 2016-09-13 2016-09-13 Open-set speaker recognition method and apparatus Pending CN106448682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610819015.XA CN106448682A (en) 2016-09-13 2016-09-13 Open-set speaker recognition method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610819015.XA CN106448682A (en) 2016-09-13 2016-09-13 Open-set speaker recognition method and apparatus

Publications (1)

Publication Number Publication Date
CN106448682A true CN106448682A (en) 2017-02-22

Family

ID=58168849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610819015.XA Pending CN106448682A (en) 2016-09-13 2016-09-13 Open-set speaker recognition method and apparatus

Country Status (1)

Country Link
CN (1) CN106448682A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109612961A (en) * 2018-12-13 2019-04-12 温州大学 The opener recognition methods of the micro- plastics of coastal environment
CN112735435A (en) * 2020-12-25 2021-04-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Voiceprint open set identification method with unknown class internal division capability

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6411930B1 (en) * 1998-11-18 2002-06-25 Lucent Technologies Inc. Discriminative gaussian mixture models for speaker verification
CN1787076A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speek person based on hybrid supporting vector machine
CN102509547A (en) * 2011-12-29 2012-06-20 辽宁工业大学 Method and system for voiceprint recognition based on vector quantization based
CN102664011A (en) * 2012-05-17 2012-09-12 吉林大学 Method for quickly recognizing speaker
CN102968990A (en) * 2012-11-15 2013-03-13 江苏嘉利德电子科技有限公司 Speaker identifying method and system
CN103258536A (en) * 2013-03-08 2013-08-21 北京理工大学 Large-scaled speaker identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6411930B1 (en) * 1998-11-18 2002-06-25 Lucent Technologies Inc. Discriminative gaussian mixture models for speaker verification
CN1787076A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speek person based on hybrid supporting vector machine
CN102509547A (en) * 2011-12-29 2012-06-20 辽宁工业大学 Method and system for voiceprint recognition based on vector quantization based
CN102664011A (en) * 2012-05-17 2012-09-12 吉林大学 Method for quickly recognizing speaker
CN102968990A (en) * 2012-11-15 2013-03-13 江苏嘉利德电子科技有限公司 Speaker identifying method and system
CN103258536A (en) * 2013-03-08 2013-08-21 北京理工大学 Large-scaled speaker identification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘兴立: "《硕士学位论文》", 30 January 2002, 大连理工大学 *
司罗,胡起秀,金琴: "基于码字概率分布(BCDM)的说话人辨识系统", 《第五届全国人机语音通讯学术会议论文集》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109612961A (en) * 2018-12-13 2019-04-12 温州大学 The opener recognition methods of the micro- plastics of coastal environment
CN109612961B (en) * 2018-12-13 2021-06-25 温州大学 Open set identification method of coastal environment micro-plastic
CN112735435A (en) * 2020-12-25 2021-04-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Voiceprint open set identification method with unknown class internal division capability

Similar Documents

Publication Publication Date Title
CN107481720B (en) Explicit voiceprint recognition method and device
CN110263150B (en) Text generation method, device, computer equipment and storage medium
CN107610709B (en) Method and system for training voiceprint recognition model
CN109859772B (en) Emotion recognition method, emotion recognition device and computer-readable storage medium
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN108090127B (en) Method and device for establishing question and answer text evaluation model and evaluating question and answer text
CN111243602B (en) Voiceprint recognition method based on gender, nationality and emotion information
CN108288468A (en) Audio recognition method and device
CN110443692A (en) Enterprise's credit authorization method, apparatus, equipment and computer readable storage medium
CN108399169A (en) Dialog process methods, devices and systems based on question answering system and mobile device
CN109271493A (en) A kind of language text processing method, device and storage medium
CN101562012B (en) Method and system for graded measurement of voice
CN112533051A (en) Bullet screen information display method and device, computer equipment and storage medium
CN109117480A (en) Word prediction technique, device, computer equipment and storage medium
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN109087205A (en) Prediction technique and device, the computer equipment and readable storage medium storing program for executing of public opinion index
CN110704618B (en) Method and device for determining standard problem corresponding to dialogue data
Rill-García et al. High-level features for multimodal deception detection in videos
CN106709804A (en) Interactive wealth planning consulting robot system
CN107731234A (en) A kind of method and device of authentication
CN109800309A (en) Classroom Discourse genre classification methods and device
CN113051923A (en) Data verification method and device, computer equipment and storage medium
CN110223678A (en) Audio recognition method and system
CN113535925A (en) Voice broadcasting method, device, equipment and storage medium
CN108228950A (en) A kind of information processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170222