CN102623008A - Voiceprint identification method - Google Patents

Voiceprint identification method Download PDF

Info

Publication number
CN102623008A
CN102623008A CN2011101671461A CN201110167146A CN102623008A CN 102623008 A CN102623008 A CN 102623008A CN 2011101671461 A CN2011101671461 A CN 2011101671461A CN 201110167146 A CN201110167146 A CN 201110167146A CN 102623008 A CN102623008 A CN 102623008A
Authority
CN
China
Prior art keywords
sequence number
data
characteristic
voice
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101671461A
Other languages
Chinese (zh)
Inventor
吴丽丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute of Nano Tech and Nano Bionics of CAS
Original Assignee
Suzhou Institute of Nano Tech and Nano Bionics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute of Nano Tech and Nano Bionics of CAS filed Critical Suzhou Institute of Nano Tech and Nano Bionics of CAS
Priority to CN2011101671461A priority Critical patent/CN102623008A/en
Publication of CN102623008A publication Critical patent/CN102623008A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a voiceprint identification method which comprises the following steps of: 1) carrying out characteristic conversion on a plurality of voices to obtain characteristic points which form a voice characteristic space; 2) dividing the characteristic space into a plurality of subspaces, numbering the subspaces respectively, and recording space serial numbers and data description; 3) carrying out characteristic conversion on a trained voice to obtain a time sequence characteristic point set, wherein each characteristic point obtains a corresponding space serial number according to a nearest neighbor principle, and converting a segment of voice into a string of digital sequence; 4) obtaining a digital sequence of a test voice through characteristic conversion; 5) comparing a training voice characteristic and a test voice characteristic. According to the invention, the disadvantages of the prior art are solved, and the voiceprint identification method with small amount of calculation, high identification rate, and small data quantity is provided.

Description

Method for recognizing sound-groove
Technical field
The present invention relates to the Application on Voiceprint Recognition field.
Background technology
Speaker Identification and fingerprint, iris, recognitions of face etc. are the same, belong to a kind of of bio-identification, are considered to the most natural living things feature recognition identity authentication mode, claim " vocal print " identification again.Speaker Identification has that collecting device is simple, and system price is cheap, easily by advantages such as people's acceptance.Use occasions such as gate control system, safety cabinet, personal device (automobile, computer, mobile phone, PDA etc.) rights of using control; The method for identifying speaker that text is relevant; Can verify on both side through speaker's voice biometric feature and voice content; The phrase sound can be realized study and test process simultaneously, has outstanding application advantage.
The basic process of Speaker Identification is voice collecting, feature extraction, disaggregated model.Common phonetic feature method for distilling is the smooth performance in short-term that utilizes voice, and adopting U.S. cepstrum conversion (MFCC) method is the phonetic feature point set with speech conversion.Through learning process speaker's voice are carried out the disaggregated model that modeling obtains the speaker afterwards.Hidden Markov model (HMM) is the best modeling method of effect in the relevant Speaker Identification of text of generally acknowledging at present.HMM uses implicit state corresponding to the metastable pronunciation unit of acoustic layer on the one hand, and describes the variation of pronunciation through state transitions and state presence; It has introduced probability statistics model on the other hand, with the output probability of probability density function computing voice parameter to the HMM model, through search optimum condition sequence, is that criterion finds recognition result with the maximum a posteriori probability.But there are several problems in it: the more learning samples of (1) needs.(2) computation complexity is high.(3) the model data amount that obtains is big.For resource limited embedded system, above problem has limited the use of algorithm.Therefore need a kind of new method to address the above problem.
Summary of the invention
In order to overcome the deficiency that prior art exists, the object of the present invention is to provide the method for recognizing sound-groove that a kind of calculated amount is little, discrimination good, data volume is little.
For reaching above purpose, the invention provides a kind of method for recognizing sound-groove, comprise the steps:
1) speech feature space establishment step is divided into the voice segments of length-specific with the voice of different background, different voice, and each voice segments obtains the phonetic feature point after doing eigentransformation, and the phonetic feature point of all voice segments constitutes speech feature space;
2) subspace partiting step is divided into a plurality of subspaces with speech feature space, and data description is used in the subspace after the division, and each sub spaces is numbered, and writes down the data description and its corresponding sequence number of each subspace;
3) training statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is training statement characteristic;
4) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;
5) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.
Further improvement of the present invention is that described eigentransformation is U.S. cepstrum conversion.
Further improvement of the present invention is, in the described U.S. cepstrum conversion voice is divided into 20ms one frame, and the 10ms frame pipettes out speech frame; It is quiet with the frame to be that unit removes; After speech frame helped the cepstrum conversion, every frame stayed 12 coefficients, and these 12 coefficients constitute phonetic feature points.
Further improvement of the present invention is, described step 2) in adopt " K-average " algorithm that speech feature space is divided into several subspaces, several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ".
Further improvement of the present invention is that described step 3) and step 4) also comprise statement feature compression step, and the data of training statement characteristic and test statement characteristic are compressed.
Further improvement of the present invention is that described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number, the quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1; Remove this group data, after removing these group data, when the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.
The invention has the beneficial effects as follows: the process said method corresponds to the spatial index sequence with the continuity information of voice; Utilize the similarity characteristic of the less label sequence of data volume as the speaker; The characteristic amount that obtains through this method is little, representative, only need carry out the Serial No. comparison and can obtain confirming the result; It is little to have calculated amount; Save advantages such as storage resources, overcome problem, be suitable for the limited embedded system of system resource and use based on the modeling method existence of probability statistics.
Description of drawings
The schematic flow sheet that a kind of method for recognizing sound-groove speech feature space of Fig. 1 the present invention is set up;
A kind of method for recognizing sound-groove feature space of Fig. 2 the present invention is divided schematic flow sheet;
The schematic flow sheet of statement feature extraction in a kind of method for recognizing sound-groove of Fig. 3 the present invention;
Serial No. compression process synoptic diagram in a kind of method for recognizing sound-groove of Fig. 4 the present invention;
The schematic flow sheet of a kind of method for recognizing sound-groove of Fig. 5 the present invention.
Embodiment
Set forth in detail in the face of preferred embodiment of the present invention down, thereby protection scope of the present invention is made more explicit defining so that advantage of the present invention and characteristic can be easier to it will be appreciated by those skilled in the art that.
Referring to accompanying drawing 5, a kind of method for recognizing sound-groove comprises the steps:
1) referring to accompanying drawing 1; The speech feature space establishment step, it is a frame that the voice of different background, different voice are divided into 20ms, the 10ms frame pipettes out speech frame (voice segments); It is quiet with the frame to be that unit removes; Speech frame is helped the cepstrum conversion, and every frame stays 12 coefficients, and these 12 coefficients constitute the phonetic feature point.The phonetic feature point of all voice segments constitutes phonetic feature point set, just speech feature space.
2) referring to accompanying drawing 2; The subspace partiting step; Adopt " K-average " algorithm that speech feature space is divided into several subspaces; Several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ", and each sub spaces is numbered, and write down the data description and its corresponding sequence number of each subspace;
3) referring to accompanying drawing 3; Training statement characteristic extraction step; Statement is obtained the temporal aspect point set through U.S. cepstrum conversion, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence (2,2,4,8,8,8,5,5,5,5,5) that this sequence number is formed, this sequential recording is training statement characteristic;
4) referring to accompanying drawing 4, statement feature compression step, the sequence number of record subspace and the quantity of same sequence number; The quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1, remove this group data, sequence number is that 4 data have only 1 in the present embodiment; In the process of data compression, we delete the reorganization data.
If after removing these group data; When the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.In this enforcement, when sequence number is that the sequence number that is positioned at the data of last group of these group data was 2 after 4 data set was removed, the sequence number that is positioned at after these group data one group data is 8,2 and 8 inequality, so keep former data set.
5) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through U.S. cepstrum conversion; Each unique point is dispensed into each sub spaces according to nearest neighbouring rule; Write down the sequence number of the corresponding subspace of each unique point, write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;
Described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number; The quantity of sequence number and same sequence number is arranged as one group of data, when the quantity of same sequence number is 1, remove this group data; After removing these group data; When the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical, then also with two combinations, in the new data set that forms; Sequence number be with merge before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.
6) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.
Can find out that through above embodiment the present invention is the method for recognizing sound-groove that a kind of calculated amount is little, discrimination good, data volume is little.
Above embodiment only is explanation technical conceive of the present invention and characteristics; Its purpose is to let the people that is familiar with this technology understand content of the present invention and implements; Can not limit protection scope of the present invention with this, all equivalences that spirit is done according to the present invention change or modification all is encompassed in protection scope of the present invention.

Claims (6)

1. a method for recognizing sound-groove is characterized in that: comprise the steps:
1) speech feature space establishment step is divided into the voice segments of length-specific with the voice of different background, different voice, and each voice segments obtains the phonetic feature point after doing eigentransformation, and the phonetic feature point of all voice segments constitutes speech feature space;
2) subspace partiting step is divided into a plurality of subspaces with speech feature space, and data description is used in the subspace after the division, and each sub spaces is numbered, and writes down the data description and its corresponding sequence number of each subspace;
3) training statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is training statement characteristic;
4) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;
5) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.
2. method for recognizing sound-groove as claimed in claim 1 is characterized in that: described eigentransformation is U.S. cepstrum conversion.
3. method for recognizing sound-groove as claimed in claim 2; It is characterized in that: in the described U.S. cepstrum conversion voice are divided into 20ms one frame; The 10ms frame pipettes out speech frame, and it is quiet with the frame to be that unit removes, speech frame is helped the cepstrum conversion after; Every frame stays 12 coefficients, and these 12 coefficients constitute the phonetic feature point.
4. method for recognizing sound-groove as claimed in claim 1; It is characterized in that: adopt " K-average " algorithm that speech feature space is divided into several subspaces described step 2), several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ".
5. method for recognizing sound-groove as claimed in claim 1 is characterized in that: described step 3) and step 4) also comprise statement feature compression step, and the data of training statement characteristic and test statement characteristic are compressed.
6. method for recognizing sound-groove as claimed in claim 4 is characterized in that: described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number, the quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1; Remove this group data, after removing these group data, when the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.
CN2011101671461A 2011-06-21 2011-06-21 Voiceprint identification method Pending CN102623008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101671461A CN102623008A (en) 2011-06-21 2011-06-21 Voiceprint identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101671461A CN102623008A (en) 2011-06-21 2011-06-21 Voiceprint identification method

Publications (1)

Publication Number Publication Date
CN102623008A true CN102623008A (en) 2012-08-01

Family

ID=46562888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101671461A Pending CN102623008A (en) 2011-06-21 2011-06-21 Voiceprint identification method

Country Status (1)

Country Link
CN (1) CN102623008A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106887230A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove in feature based space
CN106971730A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove based on channel compensation
CN106971731A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of modification method of Application on Voiceprint Recognition
CN106971727A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of verification method of Application on Voiceprint Recognition
CN106971737A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove spoken based on many people
CN108320752A (en) * 2018-01-26 2018-07-24 青岛易方德物联科技有限公司 Cloud Voiceprint Recognition System and its method applied to community gate inhibition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1455388A (en) * 2002-09-30 2003-11-12 中国科学院声学研究所 Voice identifying system and compression method of characteristic vector set for voice identifying system
US6735563B1 (en) * 2000-07-13 2004-05-11 Qualcomm, Inc. Method and apparatus for constructing voice templates for a speaker-independent voice recognition system
CN101004913A (en) * 2006-01-18 2007-07-25 中国科学院半导体研究所 Method for identifying speaker based on identification principle of bionic mode
CN101452488A (en) * 2008-10-11 2009-06-10 大连大学 Human body motion capturing data retrieval method based on bionic pattern recognition
CN101540170A (en) * 2008-03-19 2009-09-23 中国科学院半导体研究所 Voiceprint recognition method based on biomimetic pattern recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6735563B1 (en) * 2000-07-13 2004-05-11 Qualcomm, Inc. Method and apparatus for constructing voice templates for a speaker-independent voice recognition system
CN1455388A (en) * 2002-09-30 2003-11-12 中国科学院声学研究所 Voice identifying system and compression method of characteristic vector set for voice identifying system
CN101004913A (en) * 2006-01-18 2007-07-25 中国科学院半导体研究所 Method for identifying speaker based on identification principle of bionic mode
CN101540170A (en) * 2008-03-19 2009-09-23 中国科学院半导体研究所 Voiceprint recognition method based on biomimetic pattern recognition
CN101452488A (en) * 2008-10-11 2009-06-10 大连大学 Human body motion capturing data retrieval method based on bionic pattern recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
武妍等: "基于仿生模式识别理论的高阶神经网络说话人识别方法", 《计算机工程》, vol. 32, no. 12, 30 June 2006 (2006-06-30), pages 184 - 186 *
邓浩江等: "基于聚类统计与文本无关的说话人识别研究", 《电路与系统学报》, vol. 6, no. 3, 30 September 2001 (2001-09-30), pages 77 - 80 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106887230A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove in feature based space
CN106971730A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove based on channel compensation
CN106971731A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of modification method of Application on Voiceprint Recognition
CN106971727A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of verification method of Application on Voiceprint Recognition
CN106971737A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove spoken based on many people
CN106971731B (en) * 2016-01-14 2020-10-23 芋头科技(杭州)有限公司 Correction method for voiceprint recognition
CN108320752A (en) * 2018-01-26 2018-07-24 青岛易方德物联科技有限公司 Cloud Voiceprint Recognition System and its method applied to community gate inhibition
CN108320752B (en) * 2018-01-26 2020-12-15 青岛易方德物联科技有限公司 Cloud voiceprint recognition system and method applied to community access control

Similar Documents

Publication Publication Date Title
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
CN106940998B (en) Execution method and device for setting operation
CN110136727B (en) Speaker identification method, device and storage medium based on speaking content
KR101056511B1 (en) Speech Segment Detection and Continuous Speech Recognition System in Noisy Environment Using Real-Time Call Command Recognition
CN107731233B (en) Voiceprint recognition method based on RNN
Carlin et al. Rapid evaluation of speech representations for spoken term discovery
CN105869624A (en) Method and apparatus for constructing speech decoding network in digital speech recognition
CN102623008A (en) Voiceprint identification method
CN102800316A (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN104036774A (en) Method and system for recognizing Tibetan dialects
CN111402891A (en) Speech recognition method, apparatus, device and storage medium
CN108877769B (en) Method and device for identifying dialect type
CN109545226B (en) Voice recognition method, device and computer readable storage medium
CN113053410B (en) Voice recognition method, voice recognition device, computer equipment and storage medium
CN109448732B (en) Digital string voice processing method and device
WO2021098318A1 (en) Response method, terminal, and storage medium
CN113851136A (en) Clustering-based speaker recognition method, device, equipment and storage medium
CA2596126A1 (en) Speech recognition by statistical language using square-root discounting
Ali et al. Fake audio detection using hierarchical representations learning and spectrogram features
Hassanzadeh et al. Deep learning for speaker recognition: A comparative analysis of 1D-CNN and LSTM models using diverse datasets
CN103247316B (en) The method and system of index building in a kind of audio retrieval
KR101229108B1 (en) Apparatus for utterance verification based on word specific confidence threshold
CN111326161B (en) Voiceprint determining method and device
CN102522086A (en) Voiceprint recognition application of ordered sequence similarity comparison method
CN115424616A (en) Audio data screening method, device, equipment and computer readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120801