CN103310788A - Voice information identification method and system - Google Patents

Voice information identification method and system Download PDF

Info

Publication number
CN103310788A
CN103310788A CN2013101955759A CN201310195575A CN103310788A CN 103310788 A CN103310788 A CN 103310788A CN 2013101955759 A CN2013101955759 A CN 2013101955759A CN 201310195575 A CN201310195575 A CN 201310195575A CN 103310788 A CN103310788 A CN 103310788A
Authority
CN
China
Prior art keywords
characteristic parameter
identified
hybrid models
log likelihood
gauss hybrid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101955759A
Other languages
Chinese (zh)
Other versions
CN103310788B (en
Inventor
李轶杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Beijing Yunzhisheng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhisheng Information Technology Co Ltd filed Critical Beijing Yunzhisheng Information Technology Co Ltd
Priority to CN201310195575.9A priority Critical patent/CN103310788B/en
Publication of CN103310788A publication Critical patent/CN103310788A/en
Application granted granted Critical
Publication of CN103310788B publication Critical patent/CN103310788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice information identification method and system. The identification method comprises the steps of extracting sample voice feature parameters from sample voice data corresponding to personalized information, using the sample voice feature parameters to train a Gaussian mixed model to obtain a personalized model, extracting to-be-identified voice feature parameters from to-be-identified voice data, matching the to-be-identified voice feature parameters with the personalized model, and determining the personalized information on the basis of the to-be-identified voice feature parameters and the personalized model. The voice information identification method and system can identify the personalized information such as gender and age of a talker from the to-be-identified voice data, and the identified personalized information leaves larger operable space for the subsequent operations such as voice assistant and voice dialogue; in addition, the voice information identification method and system can also identify text information, the personalized information identification and the text information identification share one set of voice feature parameters, and the personalized information identification is smaller than the text information identification in computing amount, so the effect on the identification speed of the text information is smaller.

Description

A kind of voice messaging recognition methods and system
Technical field
The present invention relates to the information discriminating technology field, relate in particular to a kind of voice messaging recognition methods and system.
Background technology
Development along with electronic technology, the electronic equipment upgrading makes it possess increasing function, wherein, voice control function receives much concern, various voice assistant classes are used along with appearance, and voice assistant class is used so that the user can read note by electronic equipment, introduces the dining room, inquire weather etc.
Realize that the key in the application of voice assistant class is speech recognition system, the process of speech recognition system identification voice is the process that user's voice messaging is converted to Word message.Yet the inventor finds in the process that realizes the invention: speech recognition system of the prior art can only be finished voice to the simple conversion of literal, and the information that namely identifies from user's voice is less.
Summary of the invention
In view of this, the invention provides a kind of voice messaging recognition methods and system, can only finish voice to the simple conversion of literal in order to solve speech recognition system of the prior art, the problem that the information that namely identifies from user's voice is less, its technical scheme is as follows:
A kind of voice messaging recognition methods comprises:
From the sample voice extracting data sample voice characteristic parameter corresponding with customized information;
With described sample voice characteristic parameter training gauss hybrid models, obtain personalized model;
From described speech data to be identified, extract speech characteristic parameter to be identified;
Described speech characteristic parameter to be identified and described personalized model are mated;
Match condition based on described speech characteristic parameter and described personalized model is determined customized information.
Optionally, above-mentioned voice messaging recognition methods also comprises::
Determine the text message corresponding with described speech data to be identified by described speech characteristic parameter.
Wherein, described customized information is speaker's sex;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: go out speech characteristic parameter from the male sex's sample voice extracting data, obtain the male sex's speech characteristic parameter; From women's sample voice extracting data speech characteristic parameter, obtain women's speech characteristic parameter;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the described male sex's speech characteristic parameter training gauss hybrid models, obtain the male sex's gauss hybrid models, with described women's speech characteristic parameter training gauss hybrid models, obtain women's gauss hybrid models.
The process of preferably, described speech characteristic parameter to be identified and described personalized model being mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and the described male sex's gauss hybrid models parameter, obtain the first log likelihood; Calculate the log likelihood of described speech characteristic parameter to be identified and described women's gauss hybrid models parameter, obtain the second log likelihood;
The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises: when described the first log likelihood during greater than described the second log likelihood, the sex of determining described speaker is the male sex, when described the first log likelihood less than described the second log likelihood, the sex of determining described speaker is the women.
Wherein, described customized information is: the age bracket under the speaker;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: from the sample voice extracting data speech characteristic parameter corresponding with all age group, obtain the speech characteristic parameter of all age group;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the speech characteristic parameter training gauss hybrid models of described all age group, obtain the gauss hybrid models of all age group.
Preferably, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described all age group, obtain the log likelihood corresponding with all age group;
Determine that based on described speech characteristic parameter and the match condition of described personalized model the process of customized information comprises: from described and log likelihood corresponding to all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.
Wherein, described customized information is language form;
Described method also comprises: from all sample voice extracting data sample voice characteristic parameters, with this sample voice characteristic parameter training gauss hybrid models, obtain general gauss hybrid models;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: from economizing corresponding sample voice extracting data speech characteristic parameter with each, obtain the speech characteristic parameter of each province;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the speech characteristic parameter training gauss hybrid models of described each province, obtain the gauss hybrid models of each province.
Preferably, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described each province, obtain economizing corresponding log likelihood with each;
The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises:
Calculate the log likelihood of described speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm;
Determine maximum log likelihood the corresponding log likelihood from described the province with each;
Judge that whether the difference of the log likelihood of described maximum and the 3rd likelihood logarithm is greater than the first preset value, if so, determine that then described language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of described maximum, otherwise, determine that described language form is mandarin.
Wherein, described customized information is: speaker's identity;
Described method also comprises: from all sample voice extracting data sample voice characteristic parameters, with this sample voice characteristic parameter training gauss hybrid models, obtain general gauss hybrid models;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: the speech characteristic parameter that from speaker's historical speech data, extracts speaker self;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with described speaker's self speech characteristic parameter training gauss hybrid models, obtain speaker's self gauss hybrid models.
Preferably, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and described speaker's self gauss hybrid models parameter, obtain the 4th log likelihood;
The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises: calculate the log likelihood of described speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood;
Whether judge the difference of described the 4th log likelihood and described the 5th log likelihood greater than the second preset value, if so, then determine to speak in person artificial, otherwise the people that determines to speak is other people.
A kind of voice messaging recognition system comprises: characteristic extracting module and personalized analysis module;
Described characteristic extracting module is used for from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, extracts speech characteristic parameter to be identified from speech data to be identified;
Described personalized analysis module, be used for described sample voice characteristic parameter training gauss hybrid models, obtain personalized model, described speech characteristic parameter to be identified and described personalized model are mated, determine customized information based on the match condition of described speech characteristic parameter and described personalized model.
Above-mentioned voice messaging recognition system also comprises: the text identification module;
Described text identification module is used for determining the text message corresponding with described speech data by described speech characteristic parameter.
Technique scheme has following beneficial effect:
Voice messaging recognition methods provided by the invention and system, from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, with sample voice characteristic parameter training gauss hybrid models, obtain personalized model, from speech data to be identified, extract speech characteristic parameter to be identified, speech characteristic parameter to be identified and personalized model are mated, determine customized information based on the match condition of speech characteristic parameter and personalized model.Voice messaging recognition methods provided by the invention and system, can from speech data to be identified, identify speaker's sex, the customized informations such as speaker's age, the customized information that identifies is the voice assistant, but the subsequent operations such as voice dialogue have stayed larger operating space, in addition, voice messaging recognition methods provided by the invention and system also can identify this paper information from speech data to be identified, and the identification of customized information and the identification of text message share a cover voice characteristic parameter, it is little that calculated amount is compared in the identification of customized information and the identification of text message, therefore little on the recognition speed impact of text message.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
The schematic flow sheet of the voice messaging recognition methods that Fig. 1 provides for the embodiment of the invention;
The voice messaging recognition methods schematic flow sheet when customized information is speaker's sex that Fig. 2 provides for the embodiment of the invention;
The schematic flow sheet of the speech characteristic parameter extracting method that Fig. 3 provides for the embodiment of the invention;
Fig. 4 is the schematic flow sheet of voice messaging recognition methods during age bracket under the speaker for what the embodiment of the invention provided when customized information;
The schematic flow sheet of the voice messaging recognition methods when customized information is language form that Fig. 5 provides for the embodiment of the invention;
The schematic flow sheet of the voice messaging recognition methods when customized information is speaker ' s identity that Fig. 6 provides for the embodiment of the invention;
The structural representation of the voice messaging recognition system that Fig. 7 provides for the embodiment of the invention;
One concrete structure schematic diagram of the information identification system that Fig. 8 provides for the embodiment of the invention;
Another structural representation of the information identification system that Fig. 9 provides for the embodiment of the invention.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
The embodiment of the invention provides a kind of voice messaging recognition methods, and Fig. 1 shows the schematic flow sheet of the method, and the method can comprise:
S11: from the sample voice extracting data sample voice characteristic parameter corresponding with customized information.
S12: with sample voice characteristic parameter training gauss hybrid models, obtain personalized model.
S13: from speech data to be identified, extract speech characteristic parameter to be identified.
S14: speech characteristic parameter to be identified and personalized model are mated.
S15: the match condition based on speech characteristic parameter and personalized model is determined customized information.
Wherein, customized information can comprise: the identity of age bracket, language form and/or speaker under speaker's sex, the speaker.Wherein, speaker's identity is I or other people for the speaker.
The voice messaging recognition methods that the embodiment of the invention provides, can from speech data to be identified, identify the customized informations such as speaker's sex, age, the customized information that identifies can combine with the text message that identifies by voice recognition mode of the prior art, but has stayed larger operating space for application such as voice assistant, voice dialogues.
The below respectively sex take customized information as the speaker, the age bracket under the speaker, language form and speaker ' s identity is example, and the voice messaging recognition methods that the embodiment of the invention is provided is elaborated.Need to prove, the voice messaging recognition methods that the embodiment of the invention provides can identify speaker's sex, the age bracket under the speaker, language form and speaker's identity simultaneously from speech data to be identified, the present embodiment only is for detailed description is known in the identification of customized information, Cai the identifying of each information is separated explanation.
See also Fig. 2, Fig. 2 is when customized information is speaker's sex, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, and the method can comprise:
Step S101: from the male sex's sample voice extracting data speech characteristic parameter, obtain the male sex's speech characteristic parameter, from women's sample voice extracting data speech characteristic parameter, obtain women's speech characteristic parameter.
In the present embodiment, can be stored in the Sample Storehouse by a large amount of sample voice data of voice capture device collection in advance, when carrying out speech recognition, from Sample Storehouse, obtain the male sex's sample voice data and women's sample voice data.
Speech characteristic parameter in the present embodiment can adopt recognition performance better and can fully simulate the Mel frequency cepstral coefficient MFCC of human auditory system apperceive characteristic.
Fig. 3 shows a flow process schematic diagram that extracts MFCC from speech data, and the process of extracting MFCC from speech data can comprise:
Step S1011: speech data to be identified is carried out pretreatment operation.Wherein, pretreatment operation can comprise: minute frame, windowing and pre-emphasis.
Because the time-varying characteristics of voice signal determine it is processed and must carry out at a bit of voice, therefore, need to divide frame, windowing to voice signal, in addition, because the average power spectra of voice signal is subject to giving birth to the impact of door excitation and mouth and nose radiation, front end falls by 6dB/ times of journey more than 8000Hz greatly, therefore, carry out pre-emphasis to promote the HFS of voice signal, make the frequency spectrum of signal become smooth.
Step S1012: pretreated voice signal is carried out Short Time Fourier Transform obtain frequency spectrum, the Fourier transform formula of voice signal is:
X ( k ) = Σ n = 0 N - 1 x ( n ) e - j 2 πk / N , 0 ≤ k ≤ N - 1 - - - ( 1 )
Wherein, x (n) is pretreated voice signal, and X (k) is frequency spectrum.
Step S1013: calculate frequency spectrum square, i.e. energy spectrum is then by the Mel filters H q(k) frequency spectrum of voice signal carried out smoothly, harmonic carcellation highlights the resonance peak of former voice.Wherein the Mel wave filter is one group of V-belt bandpass filter, the Mel filters H q(k) can be expressed as:
H q ( k ) = 0 k < f ( q - 1 ) k - f ( q - 1 ) f ( q ) - f ( q - 1 ) f ( q - 1 ) &le; k < f ( q ) f ( q + 1 ) - k f ( q + 1 ) - f ( q ) f ( q ) &le; k &le; f ( q + 1 ) 0 k > f ( q + 1 ) - - - ( 2 )
Wherein, q=1 among the f (q), 2,---Q, Q are the numbers of V-belt bandpass filter.
Step S1014: the output to bank of filters is taken the logarithm: the dynamic range of compressed voice frequency spectrum; Be the additivity composition with the property the taken advantage of composition conversion of noise in the frequency domain, obtain logarithm Mel frequency spectrum S (q):
S ( q ) = ln { &Sigma; k = 0 N - 1 | X ( k ) | 2 H q ( k ) } - - - ( 3 )
Step S1015: carry out discrete cosine transform, logarithm Mel frequency spectrum S (q) is transformed to time domain, obtain Mel frequency spectrum cepstrum coefficient MFCC, the formula that is calculated as follows of n coefficient C (n):
C ( n ) = 2 Q &Sigma; q = 0 Q - 1 S ( q ) cos { &pi;n ( q + 0.5 ) Q } , 0 &le; n < L - - - ( 4 )
Wherein, L is the MFCC exponent number, and Q is the Mel number of filter.
Step S102: with the male sex's speech characteristic parameter training gauss hybrid models, obtain the male sex's gauss hybrid models, with women's speech characteristic parameter training gauss hybrid models, obtain women's gauss hybrid models.
Wherein, can utilize the speech characteristic parameter of extraction to generate corresponding gauss hybrid models by the LBG algorithm.
Why the voice messaging recognition methods that the present embodiment provides adopts gauss hybrid models, be because: gauss hybrid models is to use the most successful model in the Speaker Identification field, it is mainly used in the Speaker Identification with text-independent, it directly carries out match to the statistical distribution of speaker's personal characteristics in the voice, gauss hybrid models is not paid close attention to the time program process of voice, the static distribution of speech characteristic parameter is only described, because the static distribution of different speaker's phonetic features is different, therefore, can distinguish different speakers by more different speakers' gauss hybrid models.With gauss hybrid models as with the Speaker Identification model of text-independent mainly based on 2 reasons:
One, a speaker's acoustical characteristic parameters is at the distribution of the feature space distribution and constitution by his (she) eigenvector when sending out unisonance not, for with the Speaker Identification of text-independent, can think that each gauss component of gauss hybrid models simulated the acoustic feature of the different unknown phoneme of same speaker, each gauss component has been described different phonemes and has been distributed; Its two, statistical theory shows, can approach any distribution with the linear combination of a plurality of Gaussian probability-density functions, therefore, gauss hybrid models can distribute to phonetic feature arbitrarily and describe accurately.
Step S103: from speech data to be identified, extract speech characteristic parameter to be identified.
The process of extracting speech characteristic parameter to be identified from speech data to be identified is identical with the mode of the above-mentioned feature extraction that provides, and therefore not to repeat here.
Step S104: the log likelihood that calculates speech characteristic parameter to be identified and the male sex's gauss hybrid models parameter, obtain the first log likelihood, calculate the log likelihood of speech characteristic parameter to be identified and women's gauss hybrid models parameter, obtain the second log likelihood.
Step S105: judge that whether the first log likelihood is greater than the second log likelihood, when the first log likelihood during greater than the second log likelihood, the similarity of gauss hybrid models parameter that shows speech characteristic parameter to be identified and the male sex is higher, the sex of determining the speaker is the male sex, when the first log likelihood less than the second log likelihood, the sex of determining the speaker is the women.
The voice messaging recognition methods that the embodiment of the invention provides, use the male sex's speech data to train gauss hybrid models to obtain the male sex's gauss hybrid models, use women's speech data to train gauss hybrid models to obtain women's gauss hybrid models, the speech characteristic parameter to be identified that will extract from speech data to be identified mates with the male sex's gauss hybrid models and women's gauss hybrid models respectively, thereby based on determining that with the match condition of model speaker's sex is the male sex or women.The voice messaging recognition methods that the application provides has realized identifying speaker's sex from speech data to be identified.
See also Fig. 4, Fig. 4 when when customized information being the affiliated age bracket of speaker, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, the method can comprise:
Step S201: go out speech characteristic parameter from the sample voice extracting data corresponding with all age group, obtain the speech characteristic parameter of all age group.
The process of extracting speech characteristic parameter in the present embodiment can be referring to step S1011-S1015, and therefore not to repeat here.
In the present embodiment, five age brackets can be set, be respectively: childhood (0~6 years old), juvenile (7~17 years old), young (18~40 years old), middle age (41~65 years old) and old (after 66 years old), namely, from 0~6 years old user's speech data, extract speech characteristic parameter, obtain the speech characteristic parameter in childhood, from 7~17 years old user's speech data, extract speech characteristic parameter, obtain juvenile speech characteristic parameter, from 18~40 years old user's speech data, extract speech characteristic parameter, obtain young speech characteristic parameter, from 41~65 years old user's speech data, extract speech characteristic parameter, obtain the speech characteristic parameter in middle age, extracted speech characteristic parameter from 66 years old the later speech data, obtain old speech characteristic parameter, so just extracted the speech characteristic parameter of all age group.
Step S202: with the speech characteristic parameter training gauss hybrid models of all age group, obtain the gauss hybrid models of all age group.
The present embodiment can obtain the gauss hybrid models in childhood, juvenile gauss hybrid models, young gauss hybrid models, the gauss hybrid models in middle age and old gauss hybrid models.
Step S203: from speech data to be identified, extract speech characteristic parameter to be identified.
Step S204: calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and all age group, obtain the log likelihood corresponding with all age group.
Wherein, the log likelihood corresponding with all age group reflected speaker's age to the convergence degree of corresponding age bracket, and log likelihood is larger, and the age that shows the speaker more levels off to corresponding age bracket.
Step S205: from the log likelihood corresponding with all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.
The voice messaging recognition methods that the embodiment of the invention provides, obtain respectively the gauss hybrid models of all age group with the speech data training gauss hybrid models of all age group, the speech characteristic parameter to be identified that will from speech data to be identified, extract respectively with the gauss hybrid models of all age group coupling, thereby determine age bracket under the speaker based on match condition.The voice messaging recognition methods that the application provides has realized identifying the age bracket under the speaker from speech data to be identified.
See also Fig. 5, Fig. 5 is when customized information is language form, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, and the method can comprise:
Step S301: from economizing corresponding sample voice extracting data speech characteristic parameter with each, obtain the speech characteristic parameter of each province.
The process of extracting speech characteristic parameter in the present embodiment can be referring to step S1011-S1015, and therefore not to repeat here.
Step S302: use with each and economize corresponding speech characteristic parameter training gauss hybrid models, obtain the gauss hybrid models of each province.
Step S303: from all sample voice extracting data speech characteristic parameters.
Step S304: use the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models.
Step S305: from speech data to be identified, extract speech characteristic parameter to be identified.
Step S306: calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and each province, obtain economizing corresponding log likelihood with each.
Step S307: calculate the log likelihood of speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm.
Step S308: determine maximum log likelihood the corresponding log likelihood from economizing with each.
Step S309-S311: judge that whether the difference of maximum log likelihood and the 3rd likelihood logarithm is greater than the first preset value, if so, determine that then language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of maximum, otherwise, determine that language form is mandarin.
The voice messaging recognition methods that the embodiment of the invention provides, obtain respectively the gauss hybrid models of each province with the speech data training gauss hybrid models of each province, and obtain general gauss hybrid models with all sample voice data training gauss hybrid models, the speech characteristic parameter to be identified that will from speech data to be identified, extract respectively with the gauss hybrid models of each province coupling, and with speech characteristic parameter to be identified and general gauss hybrid models coupling, based on determining speaker's language form with the match condition of each model.The voice messaging recognition methods that the present embodiment provides has realized identifying speaker's language form from speech data to be identified.
See also Fig. 6, Fig. 6 is when customized information is speaker's identity, the schematic flow sheet of the voice messaging recognition methods that the embodiment of the invention provides, and the method can comprise:
Step S401: from speaker's historical speech data, extract speech characteristic parameter, obtain speaker's self speech characteristic parameter.
In the present embodiment, the historical speech data that has comprised speech data to be identified in the sample voice data in the Sample Storehouse.
In addition, the process of extracting speech characteristic parameter in the present embodiment can be referring to step S1011-S1015, and therefore not to repeat here.
Step S402: with speaker's self speech characteristic parameter training gauss hybrid models, obtain speaker's self gauss hybrid models.
Step S403: from all sample voice extracting data speech characteristic parameters.
Step S404: use the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models.
Step S405: from speech data to be identified, extract speech characteristic parameter to be identified.
Step S406: calculate the log likelihood of speech characteristic parameter to be identified and speaker's self gauss hybrid models parameter, obtain the 4th log likelihood.
Step S407: calculate the log likelihood of speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood.
Step S408-S410: whether judge the difference of the 4th log likelihood and the 5th log likelihood greater than the second preset value, if so, then determine to speak in person artificial, otherwise the people that determines to speak is other people.
The voice messaging recognition methods that the embodiment of the invention provides, obtain speaker's self gauss hybrid models with speaker's historical speech data training gauss hybrid models, and obtain general gauss hybrid models with all sample voice data training gauss hybrid models, the speech characteristic parameter to be identified that will from speech data to be identified, extract respectively with speaker's self gauss hybrid models and general gauss hybrid models coupling, based on determining speaker's identity with the match condition of each model, namely the speaker is I or other people.The voice messaging recognition methods that the present embodiment provides has realized identifying speaker's identity from speech data to be identified.
In another embodiment of the present invention, the voice messaging recognition methods can also comprise: when extract speech characteristic parameter to be identified from speech data to be identified after, determine the text message corresponding with speech data to be identified by speech characteristic parameter to be identified, this process is the process that speech data is identified as literal.
The identification of the identification of customized information and text message shares a cover voice characteristic parameter in the present embodiment, based on this speech characteristic parameter, from voice to be identified, both text message can be identified, speaker's sex, speaker's the customized informations such as age can be identified again.
The voice messaging recognition methods that the embodiment of the invention provides, both can from speech data to be identified, identify speaker's sex, speaker's the customized informations such as age, can from speech data to be identified, identify text message again, the customized information that identifies can combine with text message, but has stayed larger operating space for subsequent operation.In addition, the identification of the customized information that the present embodiment provides and the identification of text message share a cover voice characteristic parameter, and the identification calculated amount that customized information is identified relative text message is little, and is therefore little on the identification impact of text message.
The embodiment of the invention also provides a kind of voice messaging recognition system, and Fig. 7 shows the structural representation of this system, and this system can comprise: characteristic extracting module 11 and personalized analysis module 12.Wherein:
Characteristic extracting module 11 is used for from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, extracts speech characteristic parameter to be identified from speech data to be identified.
Personalized analysis module 12, be used for sample voice characteristic parameter training gauss hybrid models, obtain personalized model, speech characteristic parameter to be identified and personalized model are mated, determine customized information based on the match condition of speech characteristic parameter and personalized model.
Further, characteristic extracting module 11 can comprise:
The First Characteristic extraction module is used for the sample voice extracting data speech characteristic parameter from the male sex, obtains the male sex's speech characteristic parameter, from women's sample voice extracting data speech characteristic parameter, obtains women's speech characteristic parameter.
The Second Characteristic extraction module is used for going out speech characteristic parameter from the sample voice extracting data corresponding with all age group, obtains the speech characteristic parameter corresponding with all age group.
The 3rd characteristic extracting module is used for obtaining the speech characteristic parameter of each province from economizing corresponding sample voice extracting data speech characteristic parameter with each.
The 4th characteristic extracting module is used for from speaker's historical speech data extraction speaker's speech characteristic parameter.
The 5th characteristic extracting module is used for from all sample voice extracting data speech characteristic parameters.
The 6th characteristic extracting module is used for extracting speech characteristic parameter to be identified from speech data to be identified.
Further, as shown in Figure 8, personalized analysis module 12 can comprise: gender analysis module 121, Analysis of age module 122, language analysis module 123 and identity analysis module 124.Wherein:
Gender analysis module 121, be used for the speech characteristic parameter training gauss hybrid models with the male sex, obtain the male sex's gauss hybrid models, speech characteristic parameter training gauss hybrid models with the women, obtain women's gauss hybrid models, calculate the log likelihood of speech characteristic parameter to be identified and the male sex's gauss hybrid models parameter, obtain the first log likelihood, calculate the log likelihood of speech characteristic parameter to be identified and women's gauss hybrid models parameter, obtain the second log likelihood, when the first log likelihood during greater than the second log likelihood, the sex of determining the speaker is the male sex, when the first log likelihood less than the second log likelihood, the sex of determining the speaker is the women.
Analysis of age module 122, use the speech characteristic parameter training gauss hybrid models corresponding with all age group, obtain the gauss hybrid models of all age group, calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and all age group, obtain the log likelihood corresponding with all age group, from the log likelihood corresponding with all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.
Language analysis module 123, use with each and economize corresponding speech characteristic parameter training gauss hybrid models, obtain the gauss hybrid models of each province, with the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models, calculate respectively the log likelihood of the gauss hybrid models parameter of speech characteristic parameter to be identified and each province, obtain economizing corresponding log likelihood with each, calculate the log likelihood of speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm, determine maximum log likelihood the corresponding log likelihood from economizing with each, when the difference of the log likelihood of maximum and the 3rd likelihood logarithm during greater than the first preset value, determine that language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of maximum, otherwise, determine that language form is mandarin.
Identity analysis module 124, be used for the speech characteristic parameter training gauss hybrid models with the speaker, obtain speaker's self gauss hybrid models, with the speech characteristic parameter training gauss hybrid models that goes out from all sample voice extracting data, obtain general gauss hybrid models, calculate the log likelihood of speech characteristic parameter to be identified and speaker's self gauss hybrid models parameter, obtain the 4th log likelihood, calculate the log likelihood of speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood, when the difference of the 4th log likelihood and the 5th log likelihood during greater than the second preset value, determine to speak in person artificial, otherwise the people that determines to speak is other people.
The voice messaging recognition system that the embodiment of the invention provides, can from speech data to be identified, identify the customized informations such as speaker's sex, age, the customized information that identifies can combine with the text message that identifies by voice recognition mode of the prior art, but has stayed larger operating space for application such as voice assistant, voice dialogues.
In other embodiments of the invention, as shown in Figure 9, the voice messaging recognition system can also comprise text identification module 13 except comprising characteristic extracting module 11 and personalized analysis module 12.
Text identification module 13 is used for determining the text message corresponding with speech data to be identified by speech characteristic parameter to be identified.
Text identification module 13 in the present embodiment and personalized analysis module 12 share a cover voice characteristic parameter, based on this speech characteristic parameter, from voice to be identified, both text message can be identified, speaker's sex, speaker's the customized informations such as age can be identified again.
In the practical application, text identification module 13 is most important with the speed ability that speech data to be identified is identified as text message, generally adopts real-time rate (RTF, Real Time Factor) index to show the recognition speed of text message.
In the process of text message identification, speech data to be identified is one section one section and sends, and text identification module 13 is whenever received one section speech data, then carries out immediately calculation process, when calculation process time during less than the voice physical length, remove the voice data transmission time, the user just can obtain recognition result finishing voice, substantially reaches real-time, if calculation process is greater than the voice physical length, then the user needs to wait for, the stand-by period is longer, and the user experiences just poorer.
In the situation that time overhead is very nervous in the text identification module 13, take to share the mode of speech characteristic parameter, when 13 pairs of speech datas of text identification module carry out Word message identification, the matching degree that personalized analysis module calculates MFCC and personalized model is log likelihood, compare the larger Word message identification of calculated amount, customized information identification only needs for about 1% time (relevant with size with the personalized model number).
The voice messaging recognition system that the embodiment of the invention provides, both can from speech data to be identified, identify speaker's sex, speaker's the customized informations such as age, can from speech data to be identified, identify text message again, the customized information that identifies can combine with text message, but has stayed larger operating space for subsequent operation.In addition, the identification of the customized information that the present embodiment provides and the identification of text message share a cover voice characteristic parameter, and the identification calculated amount that customized information is identified relative text message is little, and is therefore little on the identification impact of text message.
For the convenience of describing, be divided into various unit with function when describing above device and describe respectively.Certainly, when enforcement is of the present invention, can in same or a plurality of softwares and/or hardware, realize the function of each unit.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for device embodiment, because its basic simlarity is in embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.System embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select according to the actual needs wherein some or all of module to realize the purpose of the present embodiment scheme.Those of ordinary skills namely can understand and implement in the situation that do not pay creative work.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, the system based on microprocessor, network PC, small-size computer, mainframe computer, comprise distributed computing environment of above any system or equipment etc.
Need to prove, in this article, relational terms such as the first and second grades only is used for an entity or operation are separated with another entity or operational zone, and not necessarily requires or hint and have the relation of any this reality or sequentially between these entities or the operation.
To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be apparent concerning those skilled in the art, and General Principle as defined herein can be in the situation that do not break away from the spirit or scope of the present invention, in other embodiments realization.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (12)

1. a voice messaging recognition methods is characterized in that, comprising:
From the sample voice extracting data sample voice characteristic parameter corresponding with customized information;
With described sample voice characteristic parameter training gauss hybrid models, obtain personalized model;
From described speech data to be identified, extract speech characteristic parameter to be identified;
Described speech characteristic parameter to be identified and described personalized model are mated;
Match condition based on described speech characteristic parameter and described personalized model is determined customized information.
2. method according to claim 1 is characterized in that, also comprises:
Determine the text message corresponding with described speech data to be identified by described speech characteristic parameter.
3. method according to claim 1 is characterized in that, described customized information is speaker's sex;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: go out speech characteristic parameter from the male sex's sample voice extracting data, obtain the male sex's speech characteristic parameter; From women's sample voice extracting data speech characteristic parameter, obtain women's speech characteristic parameter;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the described male sex's speech characteristic parameter training gauss hybrid models, obtain the male sex's gauss hybrid models, with described women's speech characteristic parameter training gauss hybrid models, obtain women's gauss hybrid models.
4. method according to claim 3, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and the described male sex's gauss hybrid models parameter, obtain the first log likelihood; Calculate the log likelihood of described speech characteristic parameter to be identified and described women's gauss hybrid models parameter, obtain the second log likelihood;
The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises: when described the first log likelihood during greater than described the second log likelihood, the sex of determining described speaker is the male sex, when described the first log likelihood less than described the second log likelihood, the sex of determining described speaker is the women.
5. method according to claim 1 is characterized in that, described customized information is: the age bracket under the speaker;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: from the sample voice extracting data speech characteristic parameter corresponding with all age group, obtain the speech characteristic parameter of all age group;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the speech characteristic parameter training gauss hybrid models of described all age group, obtain the gauss hybrid models of all age group.
6. method according to claim 5, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described all age group, obtain the log likelihood corresponding with all age group;
Determine that based on described speech characteristic parameter and the match condition of described personalized model the process of customized information comprises: from described and log likelihood corresponding to all age group, determine maximum log likelihood, will the age bracket corresponding with the log likelihood of maximum be defined as the age bracket under the speaker.
7. method according to claim 1 is characterized in that, described customized information is language form;
Described method also comprises: from all sample voice extracting data sample voice characteristic parameters, with this sample voice characteristic parameter training gauss hybrid models, obtain general gauss hybrid models;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: from economizing corresponding sample voice extracting data speech characteristic parameter with each, obtain the speech characteristic parameter of each province;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with the speech characteristic parameter training gauss hybrid models of described each province, obtain the gauss hybrid models of each province.
8. method according to claim 7, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate respectively the log likelihood of the gauss hybrid models parameter of described speech characteristic parameter to be identified and described each province, obtain economizing corresponding log likelihood with each;
The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises:
Calculate the log likelihood of described speech characteristic parameter to be identified and the general gauss model parameter, obtain the 3rd likelihood logarithm;
Determine maximum log likelihood the corresponding log likelihood from described the province with each;
Judge that whether the difference of the log likelihood of described maximum and the 3rd likelihood logarithm is greater than the first preset value, if so, determine that then described language form is dialect, and definite dialect is the dialect in the province corresponding with the log likelihood of described maximum, otherwise, determine that described language form is mandarin.
9. method according to claim 1 is characterized in that, described customized information is: speaker's identity;
Described method also comprises: from all sample voice extracting data sample voice characteristic parameters, with this sample voice characteristic parameter training gauss hybrid models, obtain general gauss hybrid models;
Comprise from the process of the sample voice extracting data sample voice characteristic parameter corresponding with customized information: the speech characteristic parameter that from speaker's historical speech data, extracts speaker self;
With described sample voice characteristic parameter training gauss hybrid models, the process that obtains personalized model comprises: with described speaker's self speech characteristic parameter training gauss hybrid models, obtain speaker's self gauss hybrid models.
10. method according to claim 9, it is characterized in that, the process that described speech characteristic parameter to be identified and described personalized model are mated comprises: calculate the log likelihood of described speech characteristic parameter to be identified and described speaker's self gauss hybrid models parameter, obtain the 4th log likelihood;
The process of determining customized information based on the match condition of described speech characteristic parameter and described personalized model comprises: calculate the log likelihood of described speech characteristic parameter to be identified and general gauss hybrid models parameter, obtain the 5th log likelihood;
Whether judge the difference of described the 4th log likelihood and described the 5th log likelihood greater than the second preset value, if so, then determine to speak in person artificial, otherwise the people that determines to speak is other people.
11. a voice messaging recognition system is characterized in that, comprising: characteristic extracting module and personalized analysis module;
Described characteristic extracting module is used for from the sample voice extracting data sample voice characteristic parameter corresponding with customized information, extracts speech characteristic parameter to be identified from speech data to be identified;
Described personalized analysis module, be used for described sample voice characteristic parameter training gauss hybrid models, obtain personalized model, described speech characteristic parameter to be identified and described personalized model are mated, determine customized information based on the match condition of described speech characteristic parameter and described personalized model.
12. system according to claim 11 is characterized in that, also comprises: the text identification module;
Described text identification module is used for determining the text message corresponding with described speech data by described speech characteristic parameter.
CN201310195575.9A 2013-05-23 2013-05-23 A kind of voice information identification method and system Active CN103310788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310195575.9A CN103310788B (en) 2013-05-23 2013-05-23 A kind of voice information identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310195575.9A CN103310788B (en) 2013-05-23 2013-05-23 A kind of voice information identification method and system

Publications (2)

Publication Number Publication Date
CN103310788A true CN103310788A (en) 2013-09-18
CN103310788B CN103310788B (en) 2016-03-16

Family

ID=49135931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310195575.9A Active CN103310788B (en) 2013-05-23 2013-05-23 A kind of voice information identification method and system

Country Status (1)

Country Link
CN (1) CN103310788B (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714812A (en) * 2013-12-23 2014-04-09 百度在线网络技术(北京)有限公司 Voice identification method and voice identification device
CN104239459A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Voice search method, voice search device and voice search system
CN104391673A (en) * 2014-11-20 2015-03-04 百度在线网络技术(北京)有限公司 Voice interaction method and voice interaction device
CN104485100A (en) * 2014-12-18 2015-04-01 天津讯飞信息科技有限公司 Text-to-speech pronunciation person self-adaptive method and system
CN104681023A (en) * 2015-02-15 2015-06-03 联想(北京)有限公司 Information processing method and electronic equipment
CN104700843A (en) * 2015-02-05 2015-06-10 海信集团有限公司 Method and device for identifying ages
CN105489221A (en) * 2015-12-02 2016-04-13 北京云知声信息技术有限公司 Voice recognition method and device
CN105723448A (en) * 2014-01-21 2016-06-29 三星电子株式会社 Electronic device and voice recognition method thereof
CN105895080A (en) * 2016-03-30 2016-08-24 乐视控股(北京)有限公司 Voice recognition model training method, speaker type recognition method and device
CN106033670A (en) * 2015-03-19 2016-10-19 科大讯飞股份有限公司 Voiceprint password authentication method and system
CN106251859A (en) * 2016-07-22 2016-12-21 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
CN106952648A (en) * 2017-02-17 2017-07-14 北京光年无限科技有限公司 A kind of output intent and robot for robot
CN107015781A (en) * 2017-03-28 2017-08-04 联想(北京)有限公司 Audio recognition method and system
CN107170456A (en) * 2017-06-28 2017-09-15 北京云知声信息技术有限公司 Method of speech processing and device
CN107274900A (en) * 2017-08-10 2017-10-20 北京灵隆科技有限公司 Information processing method and its system for control terminal
CN107357875A (en) * 2017-07-04 2017-11-17 北京奇艺世纪科技有限公司 A kind of voice search method, device and electronic equipment
CN107578771A (en) * 2017-07-25 2018-01-12 科大讯飞股份有限公司 Audio recognition method and device, storage medium, electronic equipment
CN107680599A (en) * 2017-09-28 2018-02-09 百度在线网络技术(北京)有限公司 User property recognition methods, device and electronic equipment
CN107704549A (en) * 2017-09-26 2018-02-16 百度在线网络技术(北京)有限公司 Voice search method, device and computer equipment
CN107895579A (en) * 2018-01-02 2018-04-10 联想(北京)有限公司 A kind of audio recognition method and system
CN108172218A (en) * 2016-12-05 2018-06-15 中国移动通信有限公司研究院 A kind of pronunciation modeling method and device
CN108281138A (en) * 2017-12-18 2018-07-13 百度在线网络技术(北京)有限公司 Age discrimination model training and intelligent sound exchange method, equipment and storage medium
CN109308901A (en) * 2018-09-29 2019-02-05 百度在线网络技术(北京)有限公司 Chanteur's recognition methods and device
CN109324561A (en) * 2018-11-29 2019-02-12 奥克斯空调股份有限公司 Monitoring method, monitoring system and the kitchen system of kitchen appliance
CN109431507A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Cough disease identification method and device based on deep learning
CN109714608A (en) * 2018-12-18 2019-05-03 深圳壹账通智能科技有限公司 Video data handling procedure, device, computer equipment and storage medium
CN109961794A (en) * 2019-01-14 2019-07-02 湘潭大学 A kind of layering method for distinguishing speek person of model-based clustering
CN110164445A (en) * 2018-02-13 2019-08-23 阿里巴巴集团控股有限公司 Audio recognition method, device, equipment and computer storage medium
CN110246507A (en) * 2019-08-05 2019-09-17 上海优扬新媒信息技术有限公司 A kind of recognition methods of voice and device
CN110265040A (en) * 2019-06-20 2019-09-20 Oppo广东移动通信有限公司 Training method, device, storage medium and the electronic equipment of sound-groove model
CN110738998A (en) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 Voice-based personal credit evaluation method, device, terminal and storage medium
CN111210805A (en) * 2018-11-05 2020-05-29 北京嘀嘀无限科技发展有限公司 Language identification model training method and device and language identification method and device
WO2020035085A3 (en) * 2019-10-31 2020-08-20 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for determining voice characteristics
CN111916056A (en) * 2019-10-28 2020-11-10 宁波大学 Intelligent voice recognition method
CN112382295A (en) * 2020-11-13 2021-02-19 安徽听见科技有限公司 Voice recognition method, device, equipment and readable storage medium
WO2021159902A1 (en) * 2020-02-12 2021-08-19 深圳壹账通智能科技有限公司 Age recognition method, apparatus and device, and computer-readable storage medium
CN113555010A (en) * 2021-07-16 2021-10-26 广州三星通信技术研究有限公司 Voice processing method and voice processing device
CN117272061A (en) * 2023-09-17 2023-12-22 武汉科鉴文化科技有限公司 Method and system for detecting ancient ceramic elements, electronic equipment and storage medium
US11984119B2 (en) 2014-01-21 2024-05-14 Samsung Electronics Co., Ltd. Electronic device and voice recognition method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1076329A2 (en) * 1999-08-10 2001-02-14 International Business Machines Corporation Personality data mining method using a speech based dialog
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
CN102479511A (en) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 Large-scale voiceprint authentication method and system
CN102543084A (en) * 2010-12-29 2012-07-04 盛乐信息技术(上海)有限公司 Online voiceprint recognition system and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1076329A2 (en) * 1999-08-10 2001-02-14 International Business Machines Corporation Personality data mining method using a speech based dialog
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
CN102479511A (en) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 Large-scale voiceprint authentication method and system
CN102543084A (en) * 2010-12-29 2012-07-04 盛乐信息技术(上海)有限公司 Online voiceprint recognition system and implementation method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张超琼 等: "基于高斯混合模型的语音性别识别", 《计算机应用》 *
王岐学: "基于统计特性的汉语方言辨识方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714812A (en) * 2013-12-23 2014-04-09 百度在线网络技术(北京)有限公司 Voice identification method and voice identification device
US10304443B2 (en) 2014-01-21 2019-05-28 Samsung Electronics Co., Ltd. Device and method for performing voice recognition using trigger voice
CN105723448A (en) * 2014-01-21 2016-06-29 三星电子株式会社 Electronic device and voice recognition method thereof
CN105723448B (en) * 2014-01-21 2021-01-12 三星电子株式会社 Electronic equipment and voice recognition method thereof
US11984119B2 (en) 2014-01-21 2024-05-14 Samsung Electronics Co., Ltd. Electronic device and voice recognition method thereof
US11011172B2 (en) 2014-01-21 2021-05-18 Samsung Electronics Co., Ltd. Electronic device and voice recognition method thereof
CN104239459B (en) * 2014-09-02 2018-03-09 百度在线网络技术(北京)有限公司 voice search method, device and system
CN104239459A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Voice search method, voice search device and voice search system
CN104391673A (en) * 2014-11-20 2015-03-04 百度在线网络技术(北京)有限公司 Voice interaction method and voice interaction device
CN104485100A (en) * 2014-12-18 2015-04-01 天津讯飞信息科技有限公司 Text-to-speech pronunciation person self-adaptive method and system
CN104485100B (en) * 2014-12-18 2018-06-15 天津讯飞信息科技有限公司 Phonetic synthesis speaker adaptive approach and system
CN104700843A (en) * 2015-02-05 2015-06-10 海信集团有限公司 Method and device for identifying ages
CN104681023A (en) * 2015-02-15 2015-06-03 联想(北京)有限公司 Information processing method and electronic equipment
CN106033670B (en) * 2015-03-19 2019-11-15 科大讯飞股份有限公司 Voiceprint password authentication method and system
CN106033670A (en) * 2015-03-19 2016-10-19 科大讯飞股份有限公司 Voiceprint password authentication method and system
CN105489221B (en) * 2015-12-02 2019-06-14 北京云知声信息技术有限公司 A kind of audio recognition method and device
CN105489221A (en) * 2015-12-02 2016-04-13 北京云知声信息技术有限公司 Voice recognition method and device
WO2017166651A1 (en) * 2016-03-30 2017-10-05 乐视控股(北京)有限公司 Voice recognition model training method, speaker type recognition method and device
CN105895080A (en) * 2016-03-30 2016-08-24 乐视控股(北京)有限公司 Voice recognition model training method, speaker type recognition method and device
WO2018014469A1 (en) * 2016-07-22 2018-01-25 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
US11138967B2 (en) 2016-07-22 2021-10-05 Baidu Online Network Technology (Beijing) Co., Ltd. Voice recognition processing method, device and computer storage medium
CN106251859A (en) * 2016-07-22 2016-12-21 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
CN106251859B (en) * 2016-07-22 2019-05-31 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
CN108172218A (en) * 2016-12-05 2018-06-15 中国移动通信有限公司研究院 A kind of pronunciation modeling method and device
CN106952648A (en) * 2017-02-17 2017-07-14 北京光年无限科技有限公司 A kind of output intent and robot for robot
CN107015781A (en) * 2017-03-28 2017-08-04 联想(北京)有限公司 Audio recognition method and system
CN107170456A (en) * 2017-06-28 2017-09-15 北京云知声信息技术有限公司 Method of speech processing and device
CN107357875A (en) * 2017-07-04 2017-11-17 北京奇艺世纪科技有限公司 A kind of voice search method, device and electronic equipment
CN107357875B (en) * 2017-07-04 2021-09-10 北京奇艺世纪科技有限公司 Voice search method and device and electronic equipment
CN107578771A (en) * 2017-07-25 2018-01-12 科大讯飞股份有限公司 Audio recognition method and device, storage medium, electronic equipment
CN107578771B (en) * 2017-07-25 2021-02-02 科大讯飞股份有限公司 Voice recognition method and device, storage medium and electronic equipment
CN107274900B (en) * 2017-08-10 2020-09-18 北京京东尚科信息技术有限公司 Information processing method for control terminal and system thereof
CN107274900A (en) * 2017-08-10 2017-10-20 北京灵隆科技有限公司 Information processing method and its system for control terminal
CN107704549A (en) * 2017-09-26 2018-02-16 百度在线网络技术(北京)有限公司 Voice search method, device and computer equipment
CN107680599A (en) * 2017-09-28 2018-02-09 百度在线网络技术(北京)有限公司 User property recognition methods, device and electronic equipment
CN108281138B (en) * 2017-12-18 2020-03-31 百度在线网络技术(北京)有限公司 Age discrimination model training and intelligent voice interaction method, equipment and storage medium
CN108281138A (en) * 2017-12-18 2018-07-13 百度在线网络技术(北京)有限公司 Age discrimination model training and intelligent sound exchange method, equipment and storage medium
CN107895579A (en) * 2018-01-02 2018-04-10 联想(北京)有限公司 A kind of audio recognition method and system
CN110164445A (en) * 2018-02-13 2019-08-23 阿里巴巴集团控股有限公司 Audio recognition method, device, equipment and computer storage medium
CN110164445B (en) * 2018-02-13 2023-06-16 阿里巴巴集团控股有限公司 Speech recognition method, device, equipment and computer storage medium
CN109308901A (en) * 2018-09-29 2019-02-05 百度在线网络技术(北京)有限公司 Chanteur's recognition methods and device
CN109431507A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Cough disease identification method and device based on deep learning
CN111210805A (en) * 2018-11-05 2020-05-29 北京嘀嘀无限科技发展有限公司 Language identification model training method and device and language identification method and device
CN109324561A (en) * 2018-11-29 2019-02-12 奥克斯空调股份有限公司 Monitoring method, monitoring system and the kitchen system of kitchen appliance
CN109714608A (en) * 2018-12-18 2019-05-03 深圳壹账通智能科技有限公司 Video data handling procedure, device, computer equipment and storage medium
CN109714608B (en) * 2018-12-18 2023-03-10 深圳壹账通智能科技有限公司 Video data processing method, video data processing device, computer equipment and storage medium
CN109961794A (en) * 2019-01-14 2019-07-02 湘潭大学 A kind of layering method for distinguishing speek person of model-based clustering
CN110265040A (en) * 2019-06-20 2019-09-20 Oppo广东移动通信有限公司 Training method, device, storage medium and the electronic equipment of sound-groove model
CN110265040B (en) * 2019-06-20 2022-05-17 Oppo广东移动通信有限公司 Voiceprint model training method and device, storage medium and electronic equipment
CN110246507A (en) * 2019-08-05 2019-09-17 上海优扬新媒信息技术有限公司 A kind of recognition methods of voice and device
WO2021047319A1 (en) * 2019-09-11 2021-03-18 深圳壹账通智能科技有限公司 Voice-based personal credit assessment method and apparatus, terminal and storage medium
CN110738998A (en) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 Voice-based personal credit evaluation method, device, terminal and storage medium
CN111916056B (en) * 2019-10-28 2023-05-02 宁波大学 Intelligent voice recognition method
CN111916056A (en) * 2019-10-28 2020-11-10 宁波大学 Intelligent voice recognition method
US10997980B2 (en) 2019-10-31 2021-05-04 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for determining voice characteristics
US11244689B2 (en) 2019-10-31 2022-02-08 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for determining voice characteristics
US11031018B2 (en) 2019-10-31 2021-06-08 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for personalized speaker verification
WO2020035085A3 (en) * 2019-10-31 2020-08-20 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for determining voice characteristics
WO2021159902A1 (en) * 2020-02-12 2021-08-19 深圳壹账通智能科技有限公司 Age recognition method, apparatus and device, and computer-readable storage medium
CN112382295A (en) * 2020-11-13 2021-02-19 安徽听见科技有限公司 Voice recognition method, device, equipment and readable storage medium
CN112382295B (en) * 2020-11-13 2024-04-30 安徽听见科技有限公司 Speech recognition method, device, equipment and readable storage medium
CN113555010A (en) * 2021-07-16 2021-10-26 广州三星通信技术研究有限公司 Voice processing method and voice processing device
CN117272061A (en) * 2023-09-17 2023-12-22 武汉科鉴文化科技有限公司 Method and system for detecting ancient ceramic elements, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103310788B (en) 2016-03-16

Similar Documents

Publication Publication Date Title
CN103310788B (en) A kind of voice information identification method and system
CN106504754B (en) A kind of real-time method for generating captions according to audio output
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
Singh et al. Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement
Tiwari MFCC and its applications in speaker recognition
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
Das et al. Exploring different attributes of source information for speaker verification with limited test data
CN101923855A (en) Test-irrelevant voice print identifying system
CN103943104A (en) Voice information recognition method and terminal equipment
CN108597505A (en) Audio recognition method, device and terminal device
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
Zhang et al. Using computer speech recognition technology to evaluate spoken English.
CN113823293B (en) Speaker recognition method and system based on voice enhancement
Abdallah et al. Text-independent speaker identification using hidden Markov model
CN109887510A (en) A kind of method for recognizing sound-groove and device based on empirical mode decomposition and MFCC
Zhang et al. Voice biometric identity authentication system based on android smart phone
CN106934870A (en) A kind of voice attendance checking system
CN113782032A (en) Voiceprint recognition method and related device
Goh et al. Robust computer voice recognition using improved MFCC algorithm
Zouhir et al. A bio-inspired feature extraction for robust speech recognition
Kumar et al. Hybrid of wavelet and MFCC features for speaker verification
Kumar et al. Text dependent speaker identification in noisy environment
Patil et al. Significance of magnitude and phase information via VTEO for humming based biometrics
Nagaraja et al. Mono and cross lingual speaker identification with the constraint of limited data
Bansod et al. Speaker Recognition using Marathi (Varhadi) Language

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100083 Beijing City, Haidian District Zhongguancun Road No. 18 smartfortune International Building, block C room 1501

Patentee after: Yunzhisheng Intelligent Technology Co., Ltd.

Address before: 100083 Beijing City, Haidian District Zhongguancun Road No. 18 smartfortune International Building, block C room 1501

Patentee before: Beijing Yunzhisheng Information Technology Co., Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20200401

Address after: No. 101, 1st Floor, 1st Building, Xisanqi Building Materials City, Haidian District, Beijing, 100000

Co-patentee after: Xiamen yunzhixin Intelligent Technology Co., Ltd

Patentee after: Yunzhisheng Intelligent Technology Co., Ltd.

Address before: 100083 Beijing City, Haidian District Zhongguancun Road No. 18 smartfortune International Building, block C room 1501

Patentee before: Yunzhisheng Intelligent Technology Co., Ltd.

TR01 Transfer of patent right