CN103310788A

CN103310788A - Voice information identification method and system

Info

Publication number: CN103310788A
Application number: CN2013101955759A
Authority: CN
Inventors: 李轶杰
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2013-05-23
Filing date: 2013-05-23
Publication date: 2013-09-18
Anticipated expiration: 2033-05-23
Also published as: CN103310788B

Abstract

The invention provides a voice information identification method and system. The identification method comprises the steps of extracting sample voice feature parameters from sample voice data corresponding to personalized information, using the sample voice feature parameters to train a Gaussian mixed model to obtain a personalized model, extracting to-be-identified voice feature parameters from to-be-identified voice data, matching the to-be-identified voice feature parameters with the personalized model, and determining the personalized information on the basis of the to-be-identified voice feature parameters and the personalized model. The voice information identification method and system can identify the personalized information such as gender and age of a talker from the to-be-identified voice data, and the identified personalized information leaves larger operable space for the subsequent operations such as voice assistant and voice dialogue; in addition, the voice information identification method and system can also identify text information, the personalized information identification and the text information identification share one set of voice feature parameters, and the personalized information identification is smaller than the text information identification in computing amount, so the effect on the identification speed of the text information is smaller.

Description

A kind of voice messaging recognition methods and system

Technical field

The present invention relates to the information discriminating technology field, relate in particular to a kind of voice messaging recognition methods and system.

Background technology

Development along with electronic technology, the electronic equipment upgrading makes it possess increasing function, wherein, voice control function receives much concern, various voice assistant classes are used along with appearance, and voice assistant class is used so that the user can read note by electronic equipment, introduces the dining room, inquire weather etc.

Realize that the key in the application of voice assistant class is speech recognition system, the process of speech recognition system identification voice is the process that user's voice messaging is converted to Word message.Yet the inventor finds in the process that realizes the invention: speech recognition system of the prior art can only be finished voice to the simple conversion of literal, and the information that namely identifies from user's voice is less.

Summary of the invention

In view of this, the invention provides a kind of voice messaging recognition methods and system, can only finish voice to the simple conversion of literal in order to solve speech recognition system of the prior art, the problem that the information that namely identifies from user's voice is less, its technical scheme is as follows:

A kind of voice messaging recognition methods comprises:

From the sample voice extracting data sample voice characteristic parameter corresponding with customized information;

With described sample voice characteristic parameter training gauss hybrid models, obtain personalized model;

From described speech data to be identified, extract speech characteristic parameter to be identified;

Described speech characteristic parameter to be identified and described personalized model are mated;

Match condition based on described speech characteristic parameter and described personalized model is determined customized information.

Optionally, above-mentioned voice messaging recognition methods also comprises::

Determine the text message corresponding with described speech data to be identified by described speech characteristic parameter.

Wherein, described customized information is speaker's sex;

Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: go out speech characteristic parameter from the male sex's sample voice extracting data, obtain the male sex's speech characteristic parameter; From women's sample voice extracting data speech characteristic parameter, obtain women's speech characteristic parameter;

With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the described male sex's speech characteristic parameter training gauss hybrid models, obtain the male sex's gauss hybrid models, with described women's speech characteristic parameter training gauss hybrid models, obtain women's gauss hybrid models.

The process of preferably, described speech characteristic parameter to be identified and described personalized model being mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and the described male sex's gauss hybrid models parameter, obtain the first log likelihood; Calculate the log likelihood of described speech characteristic parameter to be identified and described women's gauss hybrid models parameter, obtain the second log likelihood;

The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises: when described the first log likelihood during greater than described the second log likelihood, the sex of determining described speaker is the male sex, when described the first log likelihood less than described the second log likelihood, the sex of determining described speaker is the women.

Wherein, described customized information is: the age bracket under the speaker;

Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: from the sample voice extracting data speech characteristic parameter corresponding with all age group, obtain the speech characteristic parameter of all age group;

With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the speech characteristic parameter training gauss hybrid models of described all age group, obtain the gauss hybrid models of all age group.

Preferably, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described all age group, obtain the log likelihood corresponding with all age group;

Determine that based on described speech characteristic parameter and the match condition of described personalized model the process of customized information comprises: from described and log likelihood corresponding to all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.

Wherein, described customized information is language form;

Described method also comprises: from all sample voice extracting data sample voice characteristic parameters, with this sample voice characteristic parameter training gauss hybrid models, obtain general gauss hybrid models;

Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: from economizing corresponding sample voice extracting data speech characteristic parameter with each, obtain the speech characteristic parameter of each province;

With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the speech characteristic parameter training gauss hybrid models of described each province, obtain the gauss hybrid models of each province.

Preferably, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described each province, obtain economizing corresponding log likelihood with each;

The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises:

Calculate the log likelihood of described speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm;

Determine maximum log likelihood the corresponding log likelihood from described the province with each;

Judge that whether the difference of the log likelihood of described maximum and the 3rd likelihood logarithm is greater than the first preset value, if so, determine that then described language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of described maximum, otherwise, determine that described language form is mandarin.

Wherein, described customized information is: speaker's identity;

Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: the speech characteristic parameter that from speaker's historical speech data, extracts speaker self;

With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with described speaker's self speech characteristic parameter training gauss hybrid models, obtain speaker's self gauss hybrid models.

Preferably, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and described speaker's self gauss hybrid models parameter, obtain the 4th log likelihood;

The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises: calculate the log likelihood of described speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood;

Whether judge the difference of described the 4th log likelihood and described the 5th log likelihood greater than the second preset value, if so, then determine to speak in person artificial, otherwise the people that determines to speak is other people.

A kind of voice messaging recognition system comprises: characteristic extracting module and personalized analysis module;

Described characteristic extracting module is used for from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, extracts speech characteristic parameter to be identified from speech data to be identified;

Described personalized analysis module, be used for described sample voice characteristic parameter training gauss hybrid models, obtain personalized model, described speech characteristic parameter to be identified and described personalized model are mated, determine customized information based on the match condition of described speech characteristic parameter and described personalized model.

Above-mentioned voice messaging recognition system also comprises: the text identification module;

Described text identification module is used for determining the text message corresponding with described speech data by described speech characteristic parameter.

Technique scheme has following beneficial effect:

Voice messaging recognition methods provided by the invention and system, from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, with sample voice characteristic parameter training gauss hybrid models, obtain personalized model, from speech data to be identified, extract speech characteristic parameter to be identified, speech characteristic parameter to be identified and personalized model are mated, determine customized information based on the match condition of speech characteristic parameter and personalized model.Voice messaging recognition methods provided by the invention and system, can from speech data to be identified, identify speaker's sex, the customized informations such as speaker's age, the customized information that identifies is the voice assistant, but the subsequent operations such as voice dialogue have stayed larger operating space, in addition, voice messaging recognition methods provided by the invention and system also can identify this paper information from speech data to be identified, and the identification of customized information and the identification of text message share a cover voice characteristic parameter, it is little that calculated amount is compared in the identification of customized information and the identification of text message, therefore little on the recognition speed impact of text message.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

The schematic flow sheet of the voice messaging recognition methods that Fig. 1 provides for the embodiment of the invention;

The voice messaging recognition methods schematic flow sheet when customized information is speaker's sex that Fig. 2 provides for the embodiment of the invention;

The schematic flow sheet of the speech characteristic parameter extracting method that Fig. 3 provides for the embodiment of the invention;

Fig. 4 is the schematic flow sheet of voice messaging recognition methods during age bracket under the speaker for what the embodiment of the invention provided when customized information;

The schematic flow sheet of the voice messaging recognition methods when customized information is language form that Fig. 5 provides for the embodiment of the invention;

The schematic flow sheet of the voice messaging recognition methods when customized information is speaker ' s identity that Fig. 6 provides for the embodiment of the invention;

The structural representation of the voice messaging recognition system that Fig. 7 provides for the embodiment of the invention;

One concrete structure schematic diagram of the information identification system that Fig. 8 provides for the embodiment of the invention;

Another structural representation of the information identification system that Fig. 9 provides for the embodiment of the invention.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

The embodiment of the invention provides a kind of voice messaging recognition methods, and Fig. 1 shows the schematic flow sheet of the method, and the method can comprise:

S11: from the sample voice extracting data sample voice characteristic parameter corresponding with customized information.

S12: with sample voice characteristic parameter training gauss hybrid models, obtain personalized model.

S13: from speech data to be identified, extract speech characteristic parameter to be identified.

S14: speech characteristic parameter to be identified and personalized model are mated.

S15: the match condition based on speech characteristic parameter and personalized model is determined customized information.

Wherein, customized information can comprise: the identity of age bracket, language form and/or speaker under speaker's sex, the speaker.Wherein, speaker's identity is I or other people for the speaker.

The voice messaging recognition methods that the embodiment of the invention provides, can from speech data to be identified, identify the customized informations such as speaker's sex, age, the customized information that identifies can combine with the text message that identifies by voice recognition mode of the prior art, but has stayed larger operating space for application such as voice assistant, voice dialogues.

The below respectively sex take customized information as the speaker, the age bracket under the speaker, language form and speaker ' s identity is example, and the voice messaging recognition methods that the embodiment of the invention is provided is elaborated.Need to prove, the voice messaging recognition methods that the embodiment of the invention provides can identify speaker's sex, the age bracket under the speaker, language form and speaker's identity simultaneously from speech data to be identified, the present embodiment only is for detailed description is known in the identification of customized information, Cai the identifying of each information is separated explanation.

See also Fig. 2, Fig. 2 is when customized information is speaker's sex, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, and the method can comprise:

Step S101: from the male sex's sample voice extracting data speech characteristic parameter, obtain the male sex's speech characteristic parameter, from women's sample voice extracting data speech characteristic parameter, obtain women's speech characteristic parameter.

In the present embodiment, can be stored in the Sample Storehouse by a large amount of sample voice data of voice capture device collection in advance, when carrying out speech recognition, from Sample Storehouse, obtain the male sex's sample voice data and women's sample voice data.

Speech characteristic parameter in the present embodiment can adopt recognition performance better and can fully simulate the Mel frequency cepstral coefficient MFCC of human auditory system apperceive characteristic.

Fig. 3 shows a flow process schematic diagram that extracts MFCC from speech data, and the process of extracting MFCC from speech data can comprise:

Step S1011: speech data to be identified is carried out pretreatment operation.Wherein, pretreatment operation can comprise: minute frame, windowing and pre-emphasis.

Because the time-varying characteristics of voice signal determine it is processed and must carry out at a bit of voice, therefore, need to divide frame, windowing to voice signal, in addition, because the average power spectra of voice signal is subject to giving birth to the impact of door excitation and mouth and nose radiation, front end falls by 6dB/ times of journey more than 8000Hz greatly, therefore, carry out pre-emphasis to promote the HFS of voice signal, make the frequency spectrum of signal become smooth.

Step S1012: pretreated voice signal is carried out Short Time Fourier Transform obtain frequency spectrum, the Fourier transform formula of voice signal is:

X (k) = Σ_{n = 0}^{N - 1} x (n) e^{- j 2 πk / N}, 0 \leq k \leq N - 1 - - - (1)

Wherein, x (n) is pretreated voice signal, and X (k) is frequency spectrum.

Step S1013: calculate frequency spectrum square, i.e. energy spectrum is then by the Mel filters H _q(k) frequency spectrum of voice signal carried out smoothly, harmonic carcellation highlights the resonance peak of former voice.Wherein the Mel wave filter is one group of V-belt bandpass filter, the Mel filters H _q(k) can be expressed as:

H_{q} (k) = \{\begin{matrix} 0 & k < f (q - 1) \\ \frac{k - f (q - 1)}{f (q) - f (q - 1)} & f (q - 1) \leq k < f (q) \\ \frac{f (q + 1) - k}{f (q + 1) - f (q)} & f (q) \leq k \leq f (q + 1) \\ 0 & k > f (q + 1) \end{matrix} - - - (2)

Wherein, q=1 among the f (q), 2,---Q, Q are the numbers of V-belt bandpass filter.

Step S1014: the output to bank of filters is taken the logarithm: the dynamic range of compressed voice frequency spectrum; Be the additivity composition with the property the taken advantage of composition conversion of noise in the frequency domain, obtain logarithm Mel frequency spectrum S (q):

S (q) = \ln {Σ_{k = 0}^{N - 1} {| X (k) |}^{2} H_{q} (k)} - - - (3)

Step S1015: carry out discrete cosine transform, logarithm Mel frequency spectrum S (q) is transformed to time domain, obtain Mel frequency spectrum cepstrum coefficient MFCC, the formula that is calculated as follows of n coefficient C (n):

C (n) = \sqrt{\frac{2}{Q}} Σ_{q = 0}^{Q - 1} S (q) \cos {\frac{πn (q + 0.5)}{Q}}, 0 \leq n < L - - - (4)

Wherein, L is the MFCC exponent number, and Q is the Mel number of filter.

Step S102: with the male sex's speech characteristic parameter training gauss hybrid models, obtain the male sex's gauss hybrid models, with women's speech characteristic parameter training gauss hybrid models, obtain women's gauss hybrid models.

Wherein, can utilize the speech characteristic parameter of extraction to generate corresponding gauss hybrid models by the LBG algorithm.

Why the voice messaging recognition methods that the present embodiment provides adopts gauss hybrid models, be because: gauss hybrid models is to use the most successful model in the Speaker Identification field, it is mainly used in the Speaker Identification with text-independent, it directly carries out match to the statistical distribution of speaker's personal characteristics in the voice, gauss hybrid models is not paid close attention to the time program process of voice, the static distribution of speech characteristic parameter is only described, because the static distribution of different speaker's phonetic features is different, therefore, can distinguish different speakers by more different speakers' gauss hybrid models.With gauss hybrid models as with the Speaker Identification model of text-independent mainly based on 2 reasons:

One, a speaker's acoustical characteristic parameters is at the distribution of the feature space distribution and constitution by his (she) eigenvector when sending out unisonance not, for with the Speaker Identification of text-independent, can think that each gauss component of gauss hybrid models simulated the acoustic feature of the different unknown phoneme of same speaker, each gauss component has been described different phonemes and has been distributed; Its two, statistical theory shows, can approach any distribution with the linear combination of a plurality of Gaussian probability-density functions, therefore, gauss hybrid models can distribute to phonetic feature arbitrarily and describe accurately.

Step S103: from speech data to be identified, extract speech characteristic parameter to be identified.

The process of extracting speech characteristic parameter to be identified from speech data to be identified is identical with the mode of the above-mentioned feature extraction that provides, and therefore not to repeat here.

Step S104: the log likelihood that calculates speech characteristic parameter to be identified and the male sex's gauss hybrid models parameter, obtain the first log likelihood, calculate the log likelihood of speech characteristic parameter to be identified and women's gauss hybrid models parameter, obtain the second log likelihood.

Step S105: judge that whether the first log likelihood is greater than the second log likelihood, when the first log likelihood during greater than the second log likelihood, the similarity of gauss hybrid models parameter that shows speech characteristic parameter to be identified and the male sex is higher, the sex of determining the speaker is the male sex, when the first log likelihood less than the second log likelihood, the sex of determining the speaker is the women.

The voice messaging recognition methods that the embodiment of the invention provides, use the male sex's speech data to train gauss hybrid models to obtain the male sex's gauss hybrid models, use women's speech data to train gauss hybrid models to obtain women's gauss hybrid models, the speech characteristic parameter to be identified that will extract from speech data to be identified mates with the male sex's gauss hybrid models and women's gauss hybrid models respectively, thereby based on determining that with the match condition of model speaker's sex is the male sex or women.The voice messaging recognition methods that the application provides has realized identifying speaker's sex from speech data to be identified.

See also Fig. 4, Fig. 4 when when customized information being the affiliated age bracket of speaker, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, the method can comprise:

Step S201: go out speech characteristic parameter from the sample voice extracting data corresponding with all age group, obtain the speech characteristic parameter of all age group.

The process of extracting speech characteristic parameter in the present embodiment can be referring to step S1011-S1015, and therefore not to repeat here.

In the present embodiment, five age brackets can be set, be respectively: childhood (0～6 years old), juvenile (7～17 years old), young (18～40 years old), middle age (41～65 years old) and old (after 66 years old), namely, from 0～6 years old user's speech data, extract speech characteristic parameter, obtain the speech characteristic parameter in childhood, from 7～17 years old user's speech data, extract speech characteristic parameter, obtain juvenile speech characteristic parameter, from 18～40 years old user's speech data, extract speech characteristic parameter, obtain young speech characteristic parameter, from 41～65 years old user's speech data, extract speech characteristic parameter, obtain the speech characteristic parameter in middle age, extracted speech characteristic parameter from 66 years old the later speech data, obtain old speech characteristic parameter, so just extracted the speech characteristic parameter of all age group.

Step S202: with the speech characteristic parameter training gauss hybrid models of all age group, obtain the gauss hybrid models of all age group.

The present embodiment can obtain the gauss hybrid models in childhood, juvenile gauss hybrid models, young gauss hybrid models, the gauss hybrid models in middle age and old gauss hybrid models.

Step S203: from speech data to be identified, extract speech characteristic parameter to be identified.

Step S204: calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and all age group, obtain the log likelihood corresponding with all age group.

Wherein, the log likelihood corresponding with all age group reflected speaker's age to the convergence degree of corresponding age bracket, and log likelihood is larger, and the age that shows the speaker more levels off to corresponding age bracket.

Step S205: from the log likelihood corresponding with all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.

The voice messaging recognition methods that the embodiment of the invention provides, obtain respectively the gauss hybrid models of all age group with the speech data training gauss hybrid models of all age group, the speech characteristic parameter to be identified that will from speech data to be identified, extract respectively with the gauss hybrid models of all age group coupling, thereby determine age bracket under the speaker based on match condition.The voice messaging recognition methods that the application provides has realized identifying the age bracket under the speaker from speech data to be identified.

See also Fig. 5, Fig. 5 is when customized information is language form, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, and the method can comprise:

Step S301: from economizing corresponding sample voice extracting data speech characteristic parameter with each, obtain the speech characteristic parameter of each province.

Step S302: use with each and economize corresponding speech characteristic parameter training gauss hybrid models, obtain the gauss hybrid models of each province.

Step S303: from all sample voice extracting data speech characteristic parameters.

Step S304: use the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models.

Step S305: from speech data to be identified, extract speech characteristic parameter to be identified.

Step S306: calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and each province, obtain economizing corresponding log likelihood with each.

Step S307: calculate the log likelihood of speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm.

Step S308: determine maximum log likelihood the corresponding log likelihood from economizing with each.

Step S309-S311: judge that whether the difference of maximum log likelihood and the 3rd likelihood logarithm is greater than the first preset value, if so, determine that then language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of maximum, otherwise, determine that language form is mandarin.

The voice messaging recognition methods that the embodiment of the invention provides, obtain respectively the gauss hybrid models of each province with the speech data training gauss hybrid models of each province, and obtain general gauss hybrid models with all sample voice data training gauss hybrid models, the speech characteristic parameter to be identified that will from speech data to be identified, extract respectively with the gauss hybrid models of each province coupling, and with speech characteristic parameter to be identified and general gauss hybrid models coupling, based on determining speaker's language form with the match condition of each model.The voice messaging recognition methods that the present embodiment provides has realized identifying speaker's language form from speech data to be identified.

See also Fig. 6, Fig. 6 is when customized information is speaker's identity, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, and the method can comprise:

Step S401: from speaker's historical speech data, extract speech characteristic parameter, obtain speaker's self speech characteristic parameter.

In the present embodiment, the historical speech data that has comprised speech data to be identified in the sample voice data in the Sample Storehouse.

In addition, the process of extracting speech characteristic parameter in the present embodiment can be referring to step S1011-S1015, and therefore not to repeat here.

Step S402: with speaker's self speech characteristic parameter training gauss hybrid models, obtain speaker's self gauss hybrid models.

Step S403: from all sample voice extracting data speech characteristic parameters.

Step S404: use the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models.

Step S405: from speech data to be identified, extract speech characteristic parameter to be identified.

Step S406: calculate the log likelihood of speech characteristic parameter to be identified and speaker's self gauss hybrid models parameter, obtain the 4th log likelihood.

Step S407: calculate the log likelihood of speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood.

Step S408-S410: whether judge the difference of the 4th log likelihood and the 5th log likelihood greater than the second preset value, if so, then determine to speak in person artificial, otherwise the people that determines to speak is other people.

The voice messaging recognition methods that the embodiment of the invention provides, obtain speaker's self gauss hybrid models with speaker's historical speech data training gauss hybrid models, and obtain general gauss hybrid models with all sample voice data training gauss hybrid models, the speech characteristic parameter to be identified that will from speech data to be identified, extract respectively with speaker's self gauss hybrid models and general gauss hybrid models coupling, based on determining speaker's identity with the match condition of each model, namely the speaker is I or other people.The voice messaging recognition methods that the present embodiment provides has realized identifying speaker's identity from speech data to be identified.

In another embodiment of the present invention, the voice messaging recognition methods can also comprise: when extract speech characteristic parameter to be identified from speech data to be identified after, determine the text message corresponding with speech data to be identified by speech characteristic parameter to be identified, this process is the process that speech data is identified as literal.

The identification of the identification of customized information and text message shares a cover voice characteristic parameter in the present embodiment, based on this speech characteristic parameter, from voice to be identified, both text message can be identified, speaker's sex, speaker's the customized informations such as age can be identified again.

The voice messaging recognition methods that the embodiment of the invention provides, both can from speech data to be identified, identify speaker's sex, speaker's the customized informations such as age, can from speech data to be identified, identify text message again, the customized information that identifies can combine with text message, but has stayed larger operating space for subsequent operation.In addition, the identification of the customized information that the present embodiment provides and the identification of text message share a cover voice characteristic parameter, and the identification calculated amount that customized information is identified relative text message is little, and is therefore little on the identification impact of text message.

The embodiment of the invention also provides a kind of voice messaging recognition system, and Fig. 7 shows the structural representation of this system, and this system can comprise: characteristic extracting module 11 and personalized analysis module 12.Wherein:

Characteristic extracting module 11 is used for from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, extracts speech characteristic parameter to be identified from speech data to be identified.

Personalized analysis module 12, be used for sample voice characteristic parameter training gauss hybrid models, obtain personalized model, speech characteristic parameter to be identified and personalized model are mated, determine customized information based on the match condition of speech characteristic parameter and personalized model.

Further, characteristic extracting module 11 can comprise:

The First Characteristic extraction module is used for the sample voice extracting data speech characteristic parameter from the male sex, obtains the male sex's speech characteristic parameter, from women's sample voice extracting data speech characteristic parameter, obtains women's speech characteristic parameter.

The Second Characteristic extraction module is used for going out speech characteristic parameter from the sample voice extracting data corresponding with all age group, obtains the speech characteristic parameter corresponding with all age group.

The 3rd characteristic extracting module is used for obtaining the speech characteristic parameter of each province from economizing corresponding sample voice extracting data speech characteristic parameter with each.

The 4th characteristic extracting module is used for from speaker's historical speech data extraction speaker's speech characteristic parameter.

The 5th characteristic extracting module is used for from all sample voice extracting data speech characteristic parameters.

The 6th characteristic extracting module is used for extracting speech characteristic parameter to be identified from speech data to be identified.

Further, as shown in Figure 8, personalized analysis module 12 can comprise: gender analysis module 121, Analysis of age module 122, language analysis module 123 and identity analysis module 124.Wherein:

Gender analysis module 121, be used for the speech characteristic parameter training gauss hybrid models with the male sex, obtain the male sex's gauss hybrid models, speech characteristic parameter training gauss hybrid models with the women, obtain women's gauss hybrid models, calculate the log likelihood of speech characteristic parameter to be identified and the male sex's gauss hybrid models parameter, obtain the first log likelihood, calculate the log likelihood of speech characteristic parameter to be identified and women's gauss hybrid models parameter, obtain the second log likelihood, when the first log likelihood during greater than the second log likelihood, the sex of determining the speaker is the male sex, when the first log likelihood less than the second log likelihood, the sex of determining the speaker is the women.

Analysis of age module 122, use the speech characteristic parameter training gauss hybrid models corresponding with all age group, obtain the gauss hybrid models of all age group, calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and all age group, obtain the log likelihood corresponding with all age group, from the log likelihood corresponding with all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.

Language analysis module 123, use with each and economize corresponding speech characteristic parameter training gauss hybrid models, obtain the gauss hybrid models of each province, with the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models, calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and each province, obtain economizing corresponding log likelihood with each, calculate the log likelihood of speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm, determine maximum log likelihood the corresponding log likelihood from economizing with each, when the difference of the log likelihood of maximum and the 3rd likelihood logarithm during greater than the first preset value, determine that language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of maximum, otherwise, determine that language form is mandarin.

Identity analysis module 124, be used for the speech characteristic parameter training gauss hybrid models with the speaker, obtain speaker's self gauss hybrid models, with the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models, calculate the log likelihood of speech characteristic parameter to be identified and speaker's self gauss hybrid models parameter, obtain the 4th log likelihood, calculate the log likelihood of speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood, when the difference of the 4th log likelihood and the 5th log likelihood during greater than the second preset value, determine to speak in person artificial, otherwise the people that determines to speak is other people.

The voice messaging recognition system that the embodiment of the invention provides, can from speech data to be identified, identify the customized informations such as speaker's sex, age, the customized information that identifies can combine with the text message that identifies by voice recognition mode of the prior art, but has stayed larger operating space for application such as voice assistant, voice dialogues.

In other embodiments of the invention, as shown in Figure 9, the voice messaging recognition system can also comprise text identification module 13 except comprising characteristic extracting module 11 and personalized analysis module 12.

Text identification module 13 is used for determining the text message corresponding with speech data to be identified by speech characteristic parameter to be identified.

Text identification module 13 in the present embodiment and personalized analysis module 12 share a cover voice characteristic parameter, based on this speech characteristic parameter, from voice to be identified, both text message can be identified, speaker's sex, speaker's the customized informations such as age can be identified again.

In the practical application, text identification module 13 is most important with the speed ability that speech data to be identified is identified as text message, generally adopts real-time rate (RTF, Real Time Factor) index to show the recognition speed of text message.

In the process of text message identification, speech data to be identified is one section one section and sends, and text identification module 13 is whenever received one section speech data, then carries out immediately calculation process, when calculation process time during less than the voice physical length, remove the voice data transmission time, the user just can obtain recognition result finishing voice, substantially reaches real-time, if calculation process is greater than the voice physical length, then the user needs to wait for, the stand-by period is longer, and the user experiences just poorer.

In the situation that time overhead is very nervous in the text identification module 13, take to share the mode of speech characteristic parameter, when 13 pairs of speech datas of text identification module carry out Word message identification, the matching degree that personalized analysis module calculates MFCC and personalized model is log likelihood, compare the larger Word message identification of calculated amount, customized information identification only needs for about 1% time (relevant with size with the personalized model number).

The voice messaging recognition system that the embodiment of the invention provides, both can from speech data to be identified, identify speaker's sex, speaker's the customized informations such as age, can from speech data to be identified, identify text message again, the customized information that identifies can combine with text message, but has stayed larger operating space for subsequent operation.In addition, the identification of the customized information that the present embodiment provides and the identification of text message share a cover voice characteristic parameter, and the identification calculated amount that customized information is identified relative text message is little, and is therefore little on the identification impact of text message.

For the convenience of describing, be divided into various unit with function when describing above device and describe respectively.Certainly, when enforcement is of the present invention, can in same or a plurality of softwares and/or hardware, realize the function of each unit.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for device embodiment, because its basic simlarity is in embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.System embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select according to the actual needs wherein some or all of module to realize the purpose of the present embodiment scheme.Those of ordinary skills namely can understand and implement in the situation that do not pay creative work.

The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, the system based on microprocessor, network PC, small-size computer, mainframe computer, comprise distributed computing environment of above any system or equipment etc.

Need to prove, in this article, relational terms such as the first and second grades only is used for an entity or operation are separated with another entity or operational zone, and not necessarily requires or hint and have the relation of any this reality or sequentially between these entities or the operation.

To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be apparent concerning those skilled in the art, and General Principle as defined herein can be in the situation that do not break away from the spirit or scope of the present invention, in other embodiments realization.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims

1. a voice messaging recognition methods is characterized in that, comprising:

2. method according to claim 1 is characterized in that, also comprises:

3. method according to claim 1 is characterized in that, described customized information is speaker's sex;

4. method according to claim 3, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and the described male sex's gauss hybrid models parameter, obtain the first log likelihood; Calculate the log likelihood of described speech characteristic parameter to be identified and described women's gauss hybrid models parameter, obtain the second log likelihood;

5. method according to claim 1 is characterized in that, described customized information is: the age bracket under the speaker;

6. method according to claim 5, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described all age group, obtain the log likelihood corresponding with all age group;

7. method according to claim 1 is characterized in that, described customized information is language form;

8. method according to claim 7, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described each province, obtain economizing corresponding log likelihood with each;

9. method according to claim 1 is characterized in that, described customized information is: speaker's identity;

10. method according to claim 9, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and described speaker's self gauss hybrid models parameter, obtain the 4th log likelihood;

11. a voice messaging recognition system is characterized in that, comprising: characteristic extracting module and personalized analysis module;

12. system according to claim 11 is characterized in that, also comprises: the text identification module;