CN102623008A

CN102623008A - Voiceprint identification method

Info

Publication number: CN102623008A
Application number: CN2011101671461A
Authority: CN
Inventors: 吴丽丽
Original assignee: Suzhou Institute of Nano Tech and Nano Bionics of CAS
Current assignee: Suzhou Institute of Nano Tech and Nano Bionics of CAS
Priority date: 2011-06-21
Filing date: 2011-06-21
Publication date: 2012-08-01

Abstract

The invention discloses a voiceprint identification method which comprises the following steps of: 1) carrying out characteristic conversion on a plurality of voices to obtain characteristic points which form a voice characteristic space; 2) dividing the characteristic space into a plurality of subspaces, numbering the subspaces respectively, and recording space serial numbers and data description; 3) carrying out characteristic conversion on a trained voice to obtain a time sequence characteristic point set, wherein each characteristic point obtains a corresponding space serial number according to a nearest neighbor principle, and converting a segment of voice into a string of digital sequence; 4) obtaining a digital sequence of a test voice through characteristic conversion; 5) comparing a training voice characteristic and a test voice characteristic. According to the invention, the disadvantages of the prior art are solved, and the voiceprint identification method with small amount of calculation, high identification rate, and small data quantity is provided.

Description

Method for recognizing sound-groove

Technical field

The present invention relates to the Application on Voiceprint Recognition field.

Background technology

Speaker Identification and fingerprint, iris, recognitions of face etc. are the same, belong to a kind of of bio-identification, are considered to the most natural living things feature recognition identity authentication mode, claim " vocal print " identification again.Speaker Identification has that collecting device is simple, and system price is cheap, easily by advantages such as people's acceptance.Use occasions such as gate control system, safety cabinet, personal device (automobile, computer, mobile phone, PDA etc.) rights of using control; The method for identifying speaker that text is relevant; Can verify on both side through speaker's voice biometric feature and voice content; The phrase sound can be realized study and test process simultaneously, has outstanding application advantage.

The basic process of Speaker Identification is voice collecting, feature extraction, disaggregated model.Common phonetic feature method for distilling is the smooth performance in short-term that utilizes voice, and adopting U.S. cepstrum conversion (MFCC) method is the phonetic feature point set with speech conversion.Through learning process speaker's voice are carried out the disaggregated model that modeling obtains the speaker afterwards.Hidden Markov model (HMM) is the best modeling method of effect in the relevant Speaker Identification of text of generally acknowledging at present.HMM uses implicit state corresponding to the metastable pronunciation unit of acoustic layer on the one hand, and describes the variation of pronunciation through state transitions and state presence; It has introduced probability statistics model on the other hand, with the output probability of probability density function computing voice parameter to the HMM model, through search optimum condition sequence, is that criterion finds recognition result with the maximum a posteriori probability.But there are several problems in it: the more learning samples of (1) needs.(2) computation complexity is high.(3) the model data amount that obtains is big.For resource limited embedded system, above problem has limited the use of algorithm.Therefore need a kind of new method to address the above problem.

Summary of the invention

In order to overcome the deficiency that prior art exists, the object of the present invention is to provide the method for recognizing sound-groove that a kind of calculated amount is little, discrimination good, data volume is little.

For reaching above purpose, the invention provides a kind of method for recognizing sound-groove, comprise the steps:

1) speech feature space establishment step is divided into the voice segments of length-specific with the voice of different background, different voice, and each voice segments obtains the phonetic feature point after doing eigentransformation, and the phonetic feature point of all voice segments constitutes speech feature space;

2) subspace partiting step is divided into a plurality of subspaces with speech feature space, and data description is used in the subspace after the division, and each sub spaces is numbered, and writes down the data description and its corresponding sequence number of each subspace;

3) training statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is training statement characteristic;

4) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through eigentransformation, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;

5) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.

Further improvement of the present invention is that described eigentransformation is U.S. cepstrum conversion.

Further improvement of the present invention is, in the described U.S. cepstrum conversion voice is divided into 20ms one frame, and the 10ms frame pipettes out speech frame; It is quiet with the frame to be that unit removes; After speech frame helped the cepstrum conversion, every frame stayed 12 coefficients, and these 12 coefficients constitute phonetic feature points.

Further improvement of the present invention is, described step 2) in adopt " K-average " algorithm that speech feature space is divided into several subspaces, several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ".

Further improvement of the present invention is that described step 3) and step 4) also comprise statement feature compression step, and the data of training statement characteristic and test statement characteristic are compressed.

Further improvement of the present invention is that described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number, the quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1; Remove this group data, after removing these group data, when the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.

The invention has the beneficial effects as follows: the process said method corresponds to the spatial index sequence with the continuity information of voice; Utilize the similarity characteristic of the less label sequence of data volume as the speaker; The characteristic amount that obtains through this method is little, representative, only need carry out the Serial No. comparison and can obtain confirming the result; It is little to have calculated amount; Save advantages such as storage resources, overcome problem, be suitable for the limited embedded system of system resource and use based on the modeling method existence of probability statistics.

Description of drawings

The schematic flow sheet that a kind of method for recognizing sound-groove speech feature space of Fig. 1 the present invention is set up;

A kind of method for recognizing sound-groove feature space of Fig. 2 the present invention is divided schematic flow sheet;

The schematic flow sheet of statement feature extraction in a kind of method for recognizing sound-groove of Fig. 3 the present invention;

Serial No. compression process synoptic diagram in a kind of method for recognizing sound-groove of Fig. 4 the present invention;

The schematic flow sheet of a kind of method for recognizing sound-groove of Fig. 5 the present invention.

Embodiment

Set forth in detail in the face of preferred embodiment of the present invention down, thereby protection scope of the present invention is made more explicit defining so that advantage of the present invention and characteristic can be easier to it will be appreciated by those skilled in the art that.

Referring to accompanying drawing 5, a kind of method for recognizing sound-groove comprises the steps:

1) referring to accompanying drawing 1; The speech feature space establishment step, it is a frame that the voice of different background, different voice are divided into 20ms, the 10ms frame pipettes out speech frame (voice segments); It is quiet with the frame to be that unit removes; Speech frame is helped the cepstrum conversion, and every frame stays 12 coefficients, and these 12 coefficients constitute the phonetic feature point.The phonetic feature point of all voice segments constitutes phonetic feature point set, just speech feature space.

2) referring to accompanying drawing 2; The subspace partiting step; Adopt " K-average " algorithm that speech feature space is divided into several subspaces; Several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ", and each sub spaces is numbered, and write down the data description and its corresponding sequence number of each subspace;

3) referring to accompanying drawing 3; Training statement characteristic extraction step; Statement is obtained the temporal aspect point set through U.S. cepstrum conversion, and each unique point is dispensed into each sub spaces according to nearest neighbouring rule, writes down the sequence number of the corresponding subspace of each unique point; Write down the sequence (2,2,4,8,8,8,5,5,5,5,5) that this sequence number is formed, this sequential recording is training statement characteristic;

4) referring to accompanying drawing 4, statement feature compression step, the sequence number of record subspace and the quantity of same sequence number; The quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1, remove this group data, sequence number is that 4 data have only 1 in the present embodiment; In the process of data compression, we delete the reorganization data.

If after removing these group data; When the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.In this enforcement, when sequence number is that the sequence number that is positioned at the data of last group of these group data was 2 after 4 data set was removed, the sequence number that is positioned at after these group data one group data is 8,2 and 8 inequality, so keep former data set.

5) test statement characteristic extraction step; Statement is obtained the temporal aspect point set through U.S. cepstrum conversion; Each unique point is dispensed into each sub spaces according to nearest neighbouring rule; Write down the sequence number of the corresponding subspace of each unique point, write down the sequence that this sequence number is formed, this sequential recording is the test statement characteristic;

Described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number; The quantity of sequence number and same sequence number is arranged as one group of data, when the quantity of same sequence number is 1, remove this group data; After removing these group data; When the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical, then also with two combinations, in the new data set that forms; Sequence number be with merge before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.

6) Application on Voiceprint Recognition step, whether comparative training statement characteristic is similar with the test statement characteristic.

Can find out that through above embodiment the present invention is the method for recognizing sound-groove that a kind of calculated amount is little, discrimination good, data volume is little.

Above embodiment only is explanation technical conceive of the present invention and characteristics; Its purpose is to let the people that is familiar with this technology understand content of the present invention and implements; Can not limit protection scope of the present invention with this, all equivalences that spirit is done according to the present invention change or modification all is encompassed in protection scope of the present invention.

Claims

1. a method for recognizing sound-groove is characterized in that: comprise the steps:

2. method for recognizing sound-groove as claimed in claim 1 is characterized in that: described eigentransformation is U.S. cepstrum conversion.

3. method for recognizing sound-groove as claimed in claim 2; It is characterized in that: in the described U.S. cepstrum conversion voice are divided into 20ms one frame; The 10ms frame pipettes out speech frame, and it is quiet with the frame to be that unit removes, speech frame is helped the cepstrum conversion after; Every frame stays 12 coefficients, and these 12 coefficients constitute the phonetic feature point.

4. method for recognizing sound-groove as claimed in claim 1; It is characterized in that: adopt " K-average " algorithm that speech feature space is divided into several subspaces described step 2), several subspaces after the division are recorded as the data description of this subspace respectively with the central point of " K-average ".

5. method for recognizing sound-groove as claimed in claim 1 is characterized in that: described step 3) and step 4) also comprise statement feature compression step, and the data of training statement characteristic and test statement characteristic are compressed.

6. method for recognizing sound-groove as claimed in claim 4 is characterized in that: described statement feature compression step is: the sequence number of record subspace and the quantity of same sequence number, the quantity of sequence number and same sequence number is arranged as one group of data; When the quantity of same sequence number is 1; Remove this group data, after removing these group data, when the sequence number in one group of data of sequence number in the one group of data in this data the place ahead and rear is identical; Then also with two combinations; In the new data set that forms, sequence number be with merging before identical sequence number, the quantity of same sequence number is one group of data bulk sum of quantity and rear of the one group of data in the place ahead.