CN102623008A - Voiceprint identification method - Google Patents
Voiceprint identification method Download PDFInfo
- Publication number
- CN102623008A CN102623008A CN2011101671461A CN201110167146A CN102623008A CN 102623008 A CN102623008 A CN 102623008A CN 2011101671461 A CN2011101671461 A CN 2011101671461A CN 201110167146 A CN201110167146 A CN 201110167146A CN 102623008 A CN102623008 A CN 102623008A
- Authority
- CN
- China
- Prior art keywords
- sequence number
- data
- characteristic
- voice
- statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a voiceprint identification method which comprises the following steps of: 1) carrying out characteristic conversion on a plurality of voices to obtain characteristic points which form a voice characteristic space; 2) dividing the characteristic space into a plurality of subspaces, numbering the subspaces respectively, and recording space serial numbers and data description; 3) carrying out characteristic conversion on a trained voice to obtain a time sequence characteristic point set, wherein each characteristic point obtains a corresponding space serial number according to a nearest neighbor principle, and converting a segment of voice into a string of digital sequence; 4) obtaining a digital sequence of a test voice through characteristic conversion; 5) comparing a training voice characteristic and a test voice characteristic. According to the invention, the disadvantages of the prior art are solved, and the voiceprint identification method with small amount of calculation, high identification rate, and small data quantity is provided.
Description
Technical field
The present invention relates to the Application on Voiceprint Recognition field.
Background technology
Speaker Identification and fingerprint, iris, recognitions of face etc. are the same, belong to a kind of of bio-identification, are considered to the most natural living things feature recognition identity authentication mode, claim " vocal print " identification again.Speaker Identification has that collecting device is simple, and system price is cheap, easily by advantages such as people's acceptance.Use occasions such as gate control system, safety cabinet, personal device (automobile, computer, mobile phone, PDA etc.) rights of using control; The method for identifying speaker that text is relevant; Can verify on both side through speaker's voice biometric feature and voice content; The phrase sound can be realized study and test process simultaneously, has outstanding application advantage.
The basic process of Speaker Identification is voice collecting, feature extraction, disaggregated model.Common phonetic feature method for distilling is the smooth performance in short-term that utilizes voice, and adopting U.S. cepstrum conversion (MFCC) method is the phonetic feature point set with speech conversion.Through learning process speaker's voice are carried out the disaggregated model that modeling obtains the speaker afterwards.Hidden Markov model (HMM) is the best modeling method of effect in the relevant Speaker Identification of text of generally acknowledging at present.HMM uses implicit state corresponding to the metastable pronunciation unit of acoustic layer on the one hand, and describes the variation of pronunciation through state transitions and state presence; It has introduced probability statistics model on the other hand, with the output probability of probability density function computing voice parameter to the HMM model, through search optimum condition sequence, is that criterion finds recognition result with the maximum a posteriori probability.But there are several problems in it: the more learning samples of (1) needs.(2) computation complexity is high.(3) the model data amount that obtains is big.For resource limited embedded system, above problem has limited the use of algorithm.Therefore need a kind of new method to address the above problem.
Summary of the invention
In order to overcome the deficiency that prior art exists, the object of the present invention is to provide the method for recognizing sound-groove that a kind of calculated amount is little, discrimination good, data volume is little.
For reaching above purpose, the invention provides a kind of method for recognizing sound-groove, comprise the steps:
1) speech feature space establishment step is divided into the voice segments of length-specific with the voice of different background, different voice, and each voice segments obtains the phonetic feature point after doing eigentransformation, and the phonetic feature point of all voice segments constitutes speech feature space;
2) subspace partiting step is divided into a plurality of subspaces with speech feature space, and data description is used in the subspace after the division, and each sub spaces is numbered, and writes down the data description and its corresponding sequence number of each subspace;
3) training statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is training statement characteristic;
4) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;
5) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.
Further improvement of the present invention is that described eigentransformation is U.S. cepstrum conversion.
Further improvement of the present invention is, in the described U.S. cepstrum conversion voice is divided into 20ms one frame, and the 10ms frame pipettes out speech frame; It is quiet with the frame to be that unit removes; After speech frame helped the cepstrum conversion, every frame stayed 12 coefficients, and these 12 coefficients constitute phonetic feature points.
Further improvement of the present invention is, described step 2) in adopt " K-average " algorithm that speech feature space is divided into several subspaces, several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ".
Further improvement of the present invention is that described step 3) and step 4) also comprise statement feature compression step, and the data of training statement characteristic and test statement characteristic are compressed.
Further improvement of the present invention is that described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number, the quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1; Remove this group data, after removing these group data, when the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.
The invention has the beneficial effects as follows: the process said method corresponds to the spatial index sequence with the continuity information of voice; Utilize the similarity characteristic of the less label sequence of data volume as the speaker; The characteristic amount that obtains through this method is little, representative, only need carry out the Serial No. comparison and can obtain confirming the result; It is little to have calculated amount; Save advantages such as storage resources, overcome problem, be suitable for the limited embedded system of system resource and use based on the modeling method existence of probability statistics.
Description of drawings
The schematic flow sheet that a kind of method for recognizing sound-groove speech feature space of Fig. 1 the present invention is set up;
A kind of method for recognizing sound-groove feature space of Fig. 2 the present invention is divided schematic flow sheet;
The schematic flow sheet of statement feature extraction in a kind of method for recognizing sound-groove of Fig. 3 the present invention;
Serial No. compression process synoptic diagram in a kind of method for recognizing sound-groove of Fig. 4 the present invention;
The schematic flow sheet of a kind of method for recognizing sound-groove of Fig. 5 the present invention.
Embodiment
Set forth in detail in the face of preferred embodiment of the present invention down, thereby protection scope of the present invention is made more explicit defining so that advantage of the present invention and characteristic can be easier to it will be appreciated by those skilled in the art that.
Referring to accompanying drawing 5, a kind of method for recognizing sound-groove comprises the steps:
1) referring to accompanying drawing 1; The speech feature space establishment step, it is a frame that the voice of different background, different voice are divided into 20ms, the 10ms frame pipettes out speech frame (voice segments); It is quiet with the frame to be that unit removes; Speech frame is helped the cepstrum conversion, and every frame stays 12 coefficients, and these 12 coefficients constitute the phonetic feature point.The phonetic feature point of all voice segments constitutes phonetic feature point set, just speech feature space.
2) referring to accompanying drawing 2; The subspace partiting step; Adopt " K-average " algorithm that speech feature space is divided into several subspaces; Several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ", and each sub spaces is numbered, and write down the data description and its corresponding sequence number of each subspace;
3) referring to accompanying drawing 3; Training statement characteristic extraction step; Statement is obtained the temporal aspect point set through U.S. cepstrum conversion, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence (2,2,4,8,8,8,5,5,5,5,5) that this sequence number is formed, this sequential recording is training statement characteristic;
4) referring to accompanying drawing 4, statement feature compression step, the sequence number of record subspace and the quantity of same sequence number; The quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1, remove this group data, sequence number is that 4 data have only 1 in the present embodiment; In the process of data compression, we delete the reorganization data.
If after removing these group data; When the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.In this enforcement, when sequence number is that the sequence number that is positioned at the data of last group of these group data was 2 after 4 data set was removed, the sequence number that is positioned at after these group data one group data is 8,2 and 8 inequality, so keep former data set.
5) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through U.S. cepstrum conversion; Each unique point is dispensed into each sub spaces according to nearest neighbouring rule; Write down the sequence number of the corresponding subspace of each unique point, write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;
Described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number; The quantity of sequence number and same sequence number is arranged as one group of data, when the quantity of same sequence number is 1, remove this group data; After removing these group data; When the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical, then also with two combinations, in the new data set that forms; Sequence number be with merge before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.
6) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.
Can find out that through above embodiment the present invention is the method for recognizing sound-groove that a kind of calculated amount is little, discrimination good, data volume is little.
Above embodiment only is explanation technical conceive of the present invention and characteristics; Its purpose is to let the people that is familiar with this technology understand content of the present invention and implements; Can not limit protection scope of the present invention with this, all equivalences that spirit is done according to the present invention change or modification all is encompassed in protection scope of the present invention.
Claims (6)
1. a method for recognizing sound-groove is characterized in that: comprise the steps:
1) speech feature space establishment step is divided into the voice segments of length-specific with the voice of different background, different voice, and each voice segments obtains the phonetic feature point after doing eigentransformation, and the phonetic feature point of all voice segments constitutes speech feature space;
2) subspace partiting step is divided into a plurality of subspaces with speech feature space, and data description is used in the subspace after the division, and each sub spaces is numbered, and writes down the data description and its corresponding sequence number of each subspace;
3) training statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is training statement characteristic;
4) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;
5) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.
2. method for recognizing sound-groove as claimed in claim 1 is characterized in that: described eigentransformation is U.S. cepstrum conversion.
3. method for recognizing sound-groove as claimed in claim 2; It is characterized in that: in the described U.S. cepstrum conversion voice are divided into 20ms one frame; The 10ms frame pipettes out speech frame, and it is quiet with the frame to be that unit removes, speech frame is helped the cepstrum conversion after; Every frame stays 12 coefficients, and these 12 coefficients constitute the phonetic feature point.
4. method for recognizing sound-groove as claimed in claim 1; It is characterized in that: adopt " K-average " algorithm that speech feature space is divided into several subspaces described step 2), several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ".
5. method for recognizing sound-groove as claimed in claim 1 is characterized in that: described step 3) and step 4) also comprise statement feature compression step, and the data of training statement characteristic and test statement characteristic are compressed.
6. method for recognizing sound-groove as claimed in claim 4 is characterized in that: described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number, the quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1; Remove this group data, after removing these group data, when the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101671461A CN102623008A (en) | 2011-06-21 | 2011-06-21 | Voiceprint identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101671461A CN102623008A (en) | 2011-06-21 | 2011-06-21 | Voiceprint identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102623008A true CN102623008A (en) | 2012-08-01 |
Family
ID=46562888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101671461A Pending CN102623008A (en) | 2011-06-21 | 2011-06-21 | Voiceprint identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102623008A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106887230A (en) * | 2015-12-16 | 2017-06-23 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove in feature based space |
CN106971730A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove based on channel compensation |
CN106971731A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of modification method of Application on Voiceprint Recognition |
CN106971727A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of verification method of Application on Voiceprint Recognition |
CN106971737A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove spoken based on many people |
CN108320752A (en) * | 2018-01-26 | 2018-07-24 | 青岛易方德物联科技有限公司 | Cloud Voiceprint Recognition System and its method applied to community gate inhibition |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1455388A (en) * | 2002-09-30 | 2003-11-12 | 中国科学院声学研究所 | Voice identifying system and compression method of characteristic vector set for voice identifying system |
US6735563B1 (en) * | 2000-07-13 | 2004-05-11 | Qualcomm, Inc. | Method and apparatus for constructing voice templates for a speaker-independent voice recognition system |
CN101004913A (en) * | 2006-01-18 | 2007-07-25 | 中国科学院半导体研究所 | Method for identifying speaker based on identification principle of bionic mode |
CN101452488A (en) * | 2008-10-11 | 2009-06-10 | 大连大学 | Human body motion capturing data retrieval method based on bionic pattern recognition |
CN101540170A (en) * | 2008-03-19 | 2009-09-23 | 中国科学院半导体研究所 | Voiceprint recognition method based on biomimetic pattern recognition |
-
2011
- 2011-06-21 CN CN2011101671461A patent/CN102623008A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6735563B1 (en) * | 2000-07-13 | 2004-05-11 | Qualcomm, Inc. | Method and apparatus for constructing voice templates for a speaker-independent voice recognition system |
CN1455388A (en) * | 2002-09-30 | 2003-11-12 | 中国科学院声学研究所 | Voice identifying system and compression method of characteristic vector set for voice identifying system |
CN101004913A (en) * | 2006-01-18 | 2007-07-25 | 中国科学院半导体研究所 | Method for identifying speaker based on identification principle of bionic mode |
CN101540170A (en) * | 2008-03-19 | 2009-09-23 | 中国科学院半导体研究所 | Voiceprint recognition method based on biomimetic pattern recognition |
CN101452488A (en) * | 2008-10-11 | 2009-06-10 | 大连大学 | Human body motion capturing data retrieval method based on bionic pattern recognition |
Non-Patent Citations (2)
Title |
---|
武妍等: "基于仿生模式识别理论的高阶神经网络说话人识别方法", 《计算机工程》, vol. 32, no. 12, 30 June 2006 (2006-06-30), pages 184 - 186 * |
邓浩江等: "基于聚类统计与文本无关的说话人识别研究", 《电路与系统学报》, vol. 6, no. 3, 30 September 2001 (2001-09-30), pages 77 - 80 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106887230A (en) * | 2015-12-16 | 2017-06-23 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove in feature based space |
CN106971730A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove based on channel compensation |
CN106971731A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of modification method of Application on Voiceprint Recognition |
CN106971727A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of verification method of Application on Voiceprint Recognition |
CN106971737A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of method for recognizing sound-groove spoken based on many people |
CN106971731B (en) * | 2016-01-14 | 2020-10-23 | 芋头科技(杭州)有限公司 | Correction method for voiceprint recognition |
CN108320752A (en) * | 2018-01-26 | 2018-07-24 | 青岛易方德物联科技有限公司 | Cloud Voiceprint Recognition System and its method applied to community gate inhibition |
CN108320752B (en) * | 2018-01-26 | 2020-12-15 | 青岛易方德物联科技有限公司 | Cloud voiceprint recognition system and method applied to community access control |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110364143B (en) | Voice awakening method and device and intelligent electronic equipment | |
CN106940998B (en) | Execution method and device for setting operation | |
CN110136727B (en) | Speaker identification method, device and storage medium based on speaking content | |
KR101056511B1 (en) | Speech Segment Detection and Continuous Speech Recognition System in Noisy Environment Using Real-Time Call Command Recognition | |
CN107731233B (en) | Voiceprint recognition method based on RNN | |
Carlin et al. | Rapid evaluation of speech representations for spoken term discovery | |
CN105869624A (en) | Method and apparatus for constructing speech decoding network in digital speech recognition | |
CN102623008A (en) | Voiceprint identification method | |
CN102800316A (en) | Optimal codebook design method for voiceprint recognition system based on nerve network | |
CN104036774A (en) | Method and system for recognizing Tibetan dialects | |
CN111402891A (en) | Speech recognition method, apparatus, device and storage medium | |
CN108877769B (en) | Method and device for identifying dialect type | |
CN109545226B (en) | Voice recognition method, device and computer readable storage medium | |
CN113053410B (en) | Voice recognition method, voice recognition device, computer equipment and storage medium | |
CN109448732B (en) | Digital string voice processing method and device | |
WO2021098318A1 (en) | Response method, terminal, and storage medium | |
CN113851136A (en) | Clustering-based speaker recognition method, device, equipment and storage medium | |
CA2596126A1 (en) | Speech recognition by statistical language using square-root discounting | |
Ali et al. | Fake audio detection using hierarchical representations learning and spectrogram features | |
Hassanzadeh et al. | Deep learning for speaker recognition: A comparative analysis of 1D-CNN and LSTM models using diverse datasets | |
CN103247316B (en) | The method and system of index building in a kind of audio retrieval | |
KR101229108B1 (en) | Apparatus for utterance verification based on word specific confidence threshold | |
CN111326161B (en) | Voiceprint determining method and device | |
CN102522086A (en) | Voiceprint recognition application of ordered sequence similarity comparison method | |
CN115424616A (en) | Audio data screening method, device, equipment and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120801 |