CN106971729A

CN106971729A - A kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope

Info

Publication number: CN106971729A
Application number: CN201610025132.9A
Authority: CN
Inventors: 祝铭明
Original assignee: Yutou Technology Hangzhou Co Ltd
Current assignee: Yutou Technology Hangzhou Co Ltd
Priority date: 2016-01-14
Filing date: 2016-01-14
Publication date: 2017-07-21

Abstract

The invention belongs to field of voice signal, more particularly to a kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic, applied to domestic robot, including：S1：Gather voice signal；S2：Voice signal is pre-processed；S3：Speech characteristic parameter is extracted from pretreated voice signal；S4：Acoustic model is set up for each kinsfolk；S5：Training in advance obtains the sound characteristic scope of correspondence children, adult and the elderly, and acoustic model is divided into by the first acoustic model and the second acoustic model according to sound characteristic scope, wherein, first acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, second acoustic model includes the training sentence in the range of the sound characteristic of correspondence children and the elderly, and the first acoustic model is loaded onto in caching when being powered；S6：Pattern match is carried out to voice signal to be measured according to the first acoustic model and the second acoustic model, recognition result is obtained.

Description

A kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope

Technical field

The invention belongs to field of voice signal, more particularly to a kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope.

Background technology

Household service robot is one of current most active field of forward position high-tech research, it can complete the services being beneficial to man, housework, amusement and leisure, education, security monitoring service are such as provided, possess extensive potential customers colony and market, the existing widely used speech recognition technology of household service robot realizes man-machine interaction, robot is allowed to understand human speech, to perform corresponding actions, but, existing robot there is no method to accurately identify speaker's identity, it is impossible to meet the demand of user individual.The sound groove recognition technology in e occurred with the development of computer technology and digital signal processing theory, by from one section of voice of speaker, extract and reflect the human physiology of speaking, the speech characteristic parameter of psychology, by carrying out analysis modeling and pattern match to speech characteristic parameter, come the purpose realized identification or confirm unknown speaker identity.But, existing Voiceprint Recognition System is often designed for a specific application scenarios, when systematic difference scene changes, adaptive ability is not strong, it can not realize and man-machine freely exchange, and because the speed of Application on Voiceprint Recognition is excessively slow, poor user experience is caused, this is that those skilled in the art do not expect to see.

The content of the invention

To solve above technical problem there is provided a kind of system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, the defect of existing recognition methods is solved.

Concrete technical scheme is as follows：

A kind of method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, wherein, applied to domestic robot, specific works step includes：

S1：Gather voice signal；

S2：The voice signal is pre-processed；

S3：Speech characteristic parameter is extracted from the pretreated voice signal, the Equations of The Second Kind characteristic parameter that the first kind characteristic parameter and simulation human ear that the speech characteristic parameter is obtained including linear prediction are extracted to the perception characteristic of sound frequency；

S4：A code book is set up for each kinsfolk and is stored in sound template in speech database as the kinsfolk, and all code books of the kinsfolk constitute an acoustic model；

S5：Training in advance obtains the sound characteristic scope of correspondence children, adult and the elderly, and the acoustic model is divided into by the first acoustic model and the second acoustic model according to sound characteristic scope, wherein, first acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, second acoustic model includes the training sentence in the range of the sound characteristic of correspondence children and the elderly, and first acoustic model is loaded onto in caching when being powered；

S6：Pattern match is carried out to voice signal to be measured according to first acoustic model and the second acoustic model, recognition result is obtained.

Include successively in the above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, the step S2, the step of the pretreatment：

Step S21, is sampled and is quantified to obtain audio digital signals to the pretreated voice signal；

Step S22, the audio digital signals are by a wave filter group to lift the radio-frequency component of the data signal；

Step S23, the voice signal obtained to step S22 carries out framing and adding window, obtains the voice signal after adding window.

The first kind characteristic parameter is extracted in the above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, the step S3 for linear predictor coefficient, extraction step is as follows：

Step S31a, defines Short Time Speech signal and error signal；

Step S32a, calculates the error sum of squares of the Short Time Speech signal and the error signal；

Step S33a, differentiates to the error sum of squares, and solves the equation group acquisition first kind characteristic parameter.

The step of extracting the Equations of The Second Kind characteristic parameter in the above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, the step S3 is included：

Step S31b, carries out Fourier transformation to the pretreated voice signal and obtains linear spectral；

Step S32b, corresponding Mel frequency spectrum is obtained to the linear spectral by a triangular band pass wave filter group；

Step S33b, calculates the log spectrum of the Mel frequency spectrum；

Step S34b, carries out discrete cosine transform to the log spectrum and obtains Equations of The Second Kind characteristic parameter.

The above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, the step S4's comprises the following steps that：

Step S41, N number of characteristic vector is extracted from the voice signal, and the characteristic vector sort out by clustering procedure to obtain M code book；

Step S42, obtains the corresponding codebook vectors of each class；

Step S43, the set for setting up the codebook vectors of each kinsfolk constitutes acoustic model.

The above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, the step S6 is specific as follows,

Step S61, voice signal to be identified is matched with first acoustic model and the second acoustic model as similitude successively, and is estimated according to weighted euclidean distance and judged；

Step S62, chooses appropriately distance measurement and is used as threshold value；

Step S63, meets the result in the range of threshold value as recognition result.

Also provide, a kind of system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, including

Voice input module, for capturing voice signal；

Pretreatment module, is connected with the voice input module, for being pre-processed to the voice signal；

Fisrt feature parameter extraction module, is connected with the pretreatment module, for obtaining the fisrt feature parameter in the voice signal；

Second feature parameter extraction module, is connected with the pretreatment module, for obtaining the second feature parameter in the voice signal；

Training module, is connected, the sound template for setting up each kinsfolk with the fisrt feature parameter extraction module and the second feature parameter extraction module, and all code books of the kinsfolk constitute an acoustic model；

Acquisition processing module, training in advance obtains the sound characteristic scope of correspondence children, adult and the elderly, and the acoustic model is divided into by the first acoustic model and the second acoustic model according to sound characteristic scope, first acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, second acoustic model includes the training sentence in the range of the sound characteristic of correspondence children and the elderly, and first acoustic model is loaded onto in caching when being powered；

Template matches module, is connected with the acquisition processing module, carries out pattern match to voice signal to be measured according to first acoustic model and the second acoustic model, obtains recognition result.

Beneficial effect：Above technical scheme can adaptively realize Application on Voiceprint Recognition, and effectively increase the man-machine communication under the speed of Application on Voiceprint Recognition, reply different application scene, be conducive to lifting Consumer's Experience.

Brief description of the drawings

Fig. 1 is flow chart of the method for the present invention；

Fig. 2 is the method flow diagram of the step 2 of the present invention；

Fig. 3 is system structure diagram of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained on the premise of creative work is not made belongs to the scope of protection of the invention.

It should be noted that in the case where not conflicting, the embodiment in the present invention and the feature in embodiment can be mutually combined.

The invention will be further described with specific embodiment below in conjunction with the accompanying drawings, but not as limiting to the invention.

Reference picture 1, a kind of method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, wherein, applied to domestic robot, specific works step includes：

S1：Gather voice signal；

S2：Voice signal is pre-processed；

S3：Speech characteristic parameter is extracted from pretreated voice signal, the Equations of The Second Kind characteristic parameter that the first kind characteristic parameter and simulation human ear that speech characteristic parameter is obtained including linear prediction are extracted to the perception characteristic of sound frequency；

S4：A code book is set up for each kinsfolk and is stored in sound template in speech database as kinsfolk, and all code books of kinsfolk constitute an acoustic model；

S5：Training in advance obtains the sound characteristic scope (such as frequecy characteristic) of correspondence children, adult and the elderly, and acoustic model is divided into by the first acoustic model and the second acoustic model according to sound characteristic scope, first acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, second acoustic model includes the training sentence in the range of the sound characteristic of correspondence children and the elderly, and the first acoustic model is loaded onto in caching when being powered, the second acoustic model is remained stored in speech database；

S6：Pattern match is carried out to voice signal to be measured according to the first acoustic model and the second acoustic model, recognition result is obtained.

Everyone can cause articulation type and custom of speaking different due to the differences of Physiological of vocal organs, the Equations of The Second Kind characteristic parameter that the present invention is extracted with reference to the first kind characteristic parameter and simulation human ear that linear prediction is obtained to the perception characteristic of sound frequency, obtain acoustic model, to improve existing Application on Voiceprint Recognition effect, Consumer's Experience is lifted.

Include successively in the above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, reference picture 2, step S2, the step of pretreatment：

Step S21, is sampled to pretreated voice signal and quantifies to obtain audio digital signals；

Step S22, audio digital signals are by a wave filter group to lift the radio-frequency component of data signal；

It can be linear predictor coefficient that first kind characteristic parameter is extracted in the above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, step S3, and its extraction step is as follows：

Step S31a, defines Short Time Speech signal and error signal；

Step S32a, calculates the error sum of squares of Short Time Speech signal and error signal；

Step S33a, differentiates to error sum of squares, and solves equation group acquisition first kind characteristic parameter.

Due to having correlation between voice adjacent spots, the mode of linear prediction can be utilized, present or following sample value is predicted according to past voice sample value, i.e., using several voices sampling in the past or their linear combination, to approach the sample value that voice is present.

The step of Equations of The Second Kind characteristic parameter being extracted in the above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, step S3, including：

Step S31b, carries out Fourier transformation to pretreated voice signal and obtains linear spectral；

Step S32b, corresponding Mel frequency spectrum is obtained to linear spectral by a triangular band pass wave filter group；

Step S33b, calculates the log spectrum of Mel frequency spectrum；

Step S34b, carries out discrete cosine transform to log spectrum and obtains Equations of The Second Kind characteristic parameter.

The above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, step S4's comprises the following steps that：

Step S41, N number of characteristic vector is extracted from first kind characteristic parameter and Equations of The Second Kind characteristic parameter, and characteristic vector sort out by clustering procedure to obtain M code book；

Step S42, obtains the corresponding codebook vectors of each class；

The above-mentioned method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, step S6 is specific as follows,

Step S61, voice signal to be identified is matched with the first acoustic model and the second acoustic model as similitude successively, and is estimated according to weighted euclidean distance and judged；

Also provide, a kind of system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, reference picture 3, including

Voice input module 1, for capturing voice signal；

Pretreatment module 2, is connected with voice input module 1, for being pre-processed to voice signal；

Fisrt feature parameter extraction module 3, is connected with pretreatment module 2, for obtaining the fisrt feature parameter in voice signal；

Second feature parameter extraction module 4, is connected with pretreatment module 2, for obtaining the second feature parameter in voice signal；

Training module 5, is connected with fisrt feature parameter extraction module and second feature parameter extraction module, the sound template for setting up each kinsfolk, and all code books of kinsfolk constitute an acoustic model；

Acquisition processing module 6, it is connected with training module 5, training in advance obtains the sound characteristic scope of correspondence children, adult and the elderly, and acoustic model is divided into by the first acoustic model and the second acoustic model according to sound characteristic scope, first acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, second acoustic model includes the training sentence in the range of the sound characteristic of correspondence children and the elderly, and the first acoustic model is loaded onto in caching when being powered；

Template matches module 7, is connected with acquisition processing module 6, carries out pattern match to voice signal to be measured according to the first acoustic model and the second acoustic model successively, obtains recognition result.

It these are only preferred embodiments of the present invention; not thereby embodiments of the present invention and protection domain are limited; to those skilled in the art; the scheme obtained by all utilization description of the invention and the equivalent substitution made by diagramatic content and obvious change should be can appreciate that, should be included in protection scope of the present invention.

Claims

1. a kind of method that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, it is characterised in that application In domestic robot, specific works step includes：

S1：Gather voice signal；

S2：The voice signal is pre-processed；

S3：Speech characteristic parameter, the speech characteristic parameter are extracted from the pretreated voice signal The first kind characteristic parameter and simulation human ear obtained including linear prediction is carried to the perception characteristic of sound frequency The Equations of The Second Kind characteristic parameter taken；

S4：For each kinsfolk set up a code book be stored in speech database as the family into The sound template of member, all code books of the kinsfolk constitute an acoustic model；

S5：Training in advance obtains the sound characteristic scope of correspondence children, adult and the elderly, and root The acoustic model is divided into the first acoustic model and the second acoustic model according to sound characteristic scope, it is described First acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, second acoustics Model includes the training sentence in the range of the sound characteristic of correspondence children and the elderly, and when being powered by institute The first acoustic model is stated to be loaded onto in caching；

S6：Row mode is entered to voice signal to be measured according to first acoustic model and the second acoustic model Match somebody with somebody, obtain recognition result.

2. the method according to claim 1 that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, Include successively characterized in that, in the step S2, the step of the pretreatment：

Step S21, is sampled to the pretreated voice signal and quantifies to obtain digital speech letter Number；

Step S22, the audio digital signals are by a wave filter group to lift the high frequency of the data signal Composition；

Step S23, the voice signal obtained to step S22 carries out framing and adding window, obtains the language after adding window Message number.

3. the method according to claim 1 that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, Characterized in that, extracting the first kind characteristic parameter in the step S3 for linear predictor coefficient, carry Take step as follows：

Step S31a, defines Short Time Speech signal and error signal；

Step S33a, differentiates to the error sum of squares, and it is special to solve the equation group acquisition first kind Levy parameter.

4. the method according to claim 1 that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, Characterized in that, the step of extracting the Equations of The Second Kind characteristic parameter in the step S3 includes：

Step S32b, corresponding Mel is obtained to the linear spectral by a triangular band pass wave filter group Frequency spectrum；

Step S33b, calculates the log spectrum of the Mel frequency spectrum；

5. the method according to claim 1 that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, Characterized in that, the step S4's comprises the following steps that：

Step S41, N number of feature is extracted from the first kind characteristic parameter and the Equations of The Second Kind characteristic parameter Vector, to the characteristic vector sort out obtaining M code book by clustering procedure；

Step S42, obtains the corresponding codebook vectors of each class；

6. the method according to claim 1 that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, Characterized in that, the step S6 is specific as follows,

Step S61, by voice signal to be identified successively with first acoustic model and second acoustics Model makees similitude matching, and is estimated according to weighted euclidean distance and judged；

7. a kind of system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope, it is characterised in that including

Voice input module, for capturing voice signal；

Pretreatment module, is connected with the voice input module, for being located in advance to the voice signal Reason；

Fisrt feature parameter extraction module, is connected with the pretreatment module, for obtaining the voice letter Fisrt feature parameter in number；

Second feature parameter extraction module, is connected with the pretreatment module, for obtaining the voice letter Second feature parameter in number；

Training module, connects with the fisrt feature parameter extraction module and the second feature parameter extraction module Connect, the sound template for setting up each kinsfolk, all code books of the kinsfolk constitute a sound Learn model；

Acquisition processing module, training in advance obtains the sound characteristic of correspondence children, adult and the elderly Scope, and the acoustic model is divided into by the first acoustic model and the second acoustics according to sound characteristic scope Model, first acoustic model includes the training sentence in the range of the sound characteristic of correspondence adult, institute State the second acoustic model include correspondence children and the elderly sound characteristic in the range of training sentence, and First acoustic model is loaded onto in caching during energization；

Template matches module, is connected with the acquisition processing module, according to first acoustic model and Two acoustic models carry out pattern match to voice signal to be measured, obtain recognition result.