CN107301862A - A kind of audio recognition method, identification model method for building up, device and electronic equipment - Google Patents
A kind of audio recognition method, identification model method for building up, device and electronic equipment Download PDFInfo
- Publication number
- CN107301862A CN107301862A CN201610203791.7A CN201610203791A CN107301862A CN 107301862 A CN107301862 A CN 107301862A CN 201610203791 A CN201610203791 A CN 201610203791A CN 107301862 A CN107301862 A CN 107301862A
- Authority
- CN
- China
- Prior art keywords
- user
- phonetic feature
- colony
- speech
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to artificial intelligence field, a kind of audio recognition method, identification model method for building up, device and electronic equipment are disclosed, to solve technical problem in the prior art for the not enough speech data None- identified of speech intelligibility, this method includes:The speech data that user produces is obtained, the user is that the user is the user that speech intelligibility is less than predetermined definition;Extract the first phonetic feature of the speech data;Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized, and the pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.The technique effect that the speech data that the user that can be less than predetermined definition to speech intelligibility produces effectively is recognized is reached.
Description
Technical field
The present invention relates to artificial intelligence field, more particularly to a kind of audio recognition method, identification model method for building up, device
And electronic equipment.
Background technology
With the continuous development of scientific technology, electronic technology also obtains development at full speed, and the species of electronic product is also more next
More, people also enjoy the various facilities that development in science and technology is brought.Present people can be enjoyed by various types of electronic equipments
By the comfortable life brought with development in science and technology.For example, the electronic equipment such as smart mobile phone, tablet personal computer has become people's life
In an important part.
Many electronic equipments all possess speech identifying function, the speech data that user produces can be converted into textual data
According to, so save user's input text data time, however, electronic equipment recognize speech data when, it is necessary to ensure to speak
User to tell the more clear user of word, otherwise None- identified, for example:For Baby language its with regard to None- identified,
Baby language refer to child's language of milk sound milk gas, although lovely and melodious, but sound and be difficult to distinguish.That is,
The technical problem for the not enough speech data None- identified of speech intelligibility is there is in the prior art.
The content of the invention
The present invention provides a kind of audio recognition method, identification model method for building up, device and electronic equipment, existing to solve
For the technical problem of the not enough speech data None- identified of speech intelligibility in technology.
In a first aspect, the embodiment of the present invention provides a kind of audio recognition method, including:
The speech data that user produces is obtained, the user is that the user is that speech intelligibility is less than predetermined definition
User;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized,
The pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
Optionally, the speech intelligibility is less than the user of predetermined definition, including:Tell the ambiguous user of word, speak Lou
The user of wind, the user of lisper, at least one of the user that word can only pronounce user can not be told.
Optionally, the speech recognition modeling of the pre-set user colony is identified by the following manner foundation:
For every kind of pre-set user colony, at least one sample user is determined;
Collection obtains the voice sample of at least one sample user, wherein, mark its language for each voice sample
Justice;
The speech samples included based on each semanteme determine the semantic phonetic feature of correspondence, association correspondence phonetic feature with
Semanteme can obtain the speech recognition modeling.
Optionally, the pre-set user colony includes M kind user groups, and M is positive integer, the language of the pre-set user colony
Include and included in the identification model of M kind user groups, the identification model of every kind of user group in sound identification model:Correspondence customer group
Phonetic feature and semantic corresponding relation under body;Or
The pre-set user colony includes M kind user groups, in the speech recognition modeling identification of the pre-set user colony
Include the semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
Optionally, the first phonetic feature institute table is determined in the speech recognition modeling based on pre-set user colony
After the first semanteme levied, methods described also includes:
First semanteme of first phonetic feature is supplied to pre-set user, so that the pre-set user judges institute
Whether state the first phonetic feature described first be semantic accurate;
Obtain the pre-set user assert described first it is semantic inaccurate when, the provided for first phonetic feature
Two is semantic;
In the speech recognition modeling, the semanteme of first phonetic feature is replaced with by second semanteme described
First is semantic.
Optionally, the speech recognition modeling based on pre-set user colony determines that first phonetic feature is characterized
It is first semantic, including:
The phonetic feature is divided at least one phonetic feature fragment;
By in each phonetic feature fragment at least one described phonetic feature fragment and the speech recognition modeling
Phonetic feature is matched, and then identifies the semanteme of each phonetic feature fragment;
The semanteme of each phonetic feature fragment obtains the phonetic feature at least one comprehensive described phonetic feature fragment
Described first characterized is semantic.
Optionally, the phonetic feature includes:Frequency range and/or frequency shape.
Optionally, the first phonetic feature institute table is identified in the speech recognition modeling based on pre-set user colony
After the first semanteme levied, methods described also includes:
By the described first semantic transmission to the electronic equipment where pre-set user;Or
Judge whether include predetermined keyword in first semanteme;When comprising the predetermined keyword, by described
One semanteme is sent to the electronic equipment where pre-set user.
Second aspect, the embodiment of the present invention provides a kind of speech recognition modeling method for building up, including:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is small
The colony belonging to user in predetermined definition;
Collection obtains the voice sample of at least one sample user, wherein, mark its language for each voice sample
Justice;
The speech samples included based on each semanteme determine the semantic phonetic feature of correspondence, association correspondence phonetic feature with
The semantic and then acquisition speech recognition modeling.
Optionally, the pre-set user colony includes M kind user groups, and M is positive integer, the language of the pre-set user colony
Include and included in the identification model of M kind user groups, the identification model of every kind of user group in sound identification model:Correspondence customer group
Phonetic feature and semantic corresponding relation under body;Or
The pre-set user colony includes M kind user groups, in the speech recognition modeling identification of the pre-set user colony
Include the semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
Optionally, it is described after the association correspondence phonetic feature with the semantic and then acquisition speech recognition modeling
Method also includes:
After the speech data that user produces is obtained, the speech recognition is known by the speech recognition modeling
Not, the first semantic, user of the user for speech intelligibility less than predetermined definition that the speech data is characterized is obtained;
First semanteme of first phonetic feature is supplied to pre-set user, so that the pre-set user judges institute
Whether state the first phonetic feature described first be semantic accurate;
Obtain the pre-set user assert described first it is semantic inaccurate when, the provided for first phonetic feature
Two is semantic;
The semanteme of first phonetic feature is replaced with into first semanteme by second semanteme.
Optionally, the phonetic feature includes:Frequency range and/or frequency shape.
The third aspect, the embodiment of the present invention provides a kind of speech recognition equipment, including:
First obtains module, and the speech data for obtaining user's generation, the user is speech intelligibility less than predetermined
The user of definition;
Extraction module, the first phonetic feature for extracting the speech data;
First determining module, first phonetic feature is determined for the speech recognition modeling based on pre-set user colony
First characterized is semantic, and the pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
Fourth aspect, the embodiment of the present invention provides a kind of speech recognition modeling and sets up device, including:
3rd determining module, for for pre-set user colony, determining at least one sample user, the pre-set user group
Body is speech intelligibility less than the colony belonging to the user of predetermined definition;
Second acquisition module, the voice sample of at least one sample user is obtained for gathering, wherein, for each
It is semantic that voice sample marks it;
3rd obtains module, and the speech samples for being included based on each semanteme determine the semantic phonetic feature of correspondence,
Association correspondence phonetic feature and the semantic and then acquisition speech recognition modeling.
5th aspect, the embodiment of the present invention provides a kind of electronic equipment, includes memory, and one or one with
On program, one of them or more than one program storage is configured to by one or more than one in memory
Computing device is one or more than one program bag contains the instruction for being used for being operated below:
The speech data that user produces is obtained, the user is the user that speech intelligibility is less than predetermined definition;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized,
The pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
6th aspect, the embodiment of the present invention provides a kind of electronic equipment, includes memory, and one or one with
On program, one of them or more than one program storage is configured to by one or more than one in memory
Computing device is one or more than one program bag contains the instruction for being used for being operated below:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is small
The colony belonging to user in predetermined definition;
Collection obtains the voice sample of at least one sample user, wherein, mark its language for each voice sample
Justice;
The speech samples included based on each semanteme determine the semantic phonetic feature of correspondence, association correspondence phonetic feature with
The semantic and then acquisition speech recognition modeling.
The present invention has the beneficial effect that:
Due in embodiments of the present invention, speech data being produced obtaining user of the speech intelligibility less than predetermined definition
Afterwards, the first phonetic feature of speech data can be extracted, the speech recognition modeling for being then based on pre-set user colony is determined
The first semanteme that first phonetic feature is characterized, the pre-set user colony is that speech intelligibility is less than predetermined definition
Colony belonging to user, that is, reach the voice that can be produced by the speech recognition modeling of pre-set user colony to user
The technique effect that data are identified, so as to be less than the speech data that the user of predetermined definition produces to speech intelligibility
Effectively recognized.
Brief description of the drawings
Fig. 1 is the flow chart of audio recognition method in the embodiment of the present invention;
Fig. 2 is sets up the flow chart of speech recognition modeling in the audio recognition method of the embodiment of the present invention;
Fig. 3 is the semantic flow chart of determination first in the audio recognition method of the embodiment of the present invention;
Fig. 4 is the flow chart of speech recognition modeling method for building up in the embodiment of the present invention;
Fig. 5 is the structure chart of speech recognition equipment in the embodiment of the present invention;
Fig. 6 sets up the structure chart of device for speech recognition modeling in the embodiment of the present invention;
Fig. 7 is that being used for according to an exemplary embodiment implements a kind of audio recognition method or speech recognition modeling is built
The block diagram of the electronic equipment of cube method;
Fig. 8 is the clothes for being used in the embodiment of the present invention implement a kind of audio recognition method or speech recognition modeling method for building up
The structural representation of business device.
Embodiment
The present invention provides a kind of audio recognition method, identification model method for building up, device and electronic equipment, existing to solve
The technical problem of the speech data None- identified of the not enough user of speech intelligibility is directed in technology.
Technical scheme in the embodiment of the present application is solves above-mentioned technical problem, and general thought is as follows:
After user generation speech data of the speech intelligibility less than predetermined definition is obtained, speech data can be extracted
The first phonetic feature, the speech recognition modeling for being then based on pre-set user colony determines that first phonetic feature is characterized
It is first semantic, the pre-set user colony is that speech intelligibility is less than colony belonging to the user of predetermined definition, that is,
The technology effect that the speech data that having reached can be produced by the speech recognition modeling of pre-set user colony to user is identified
Really, the speech data produced so as to be less than the user of predetermined definition to speech intelligibility is effectively recognized.
In order to be better understood from above-mentioned technical proposal, below by accompanying drawing and specific embodiment to technical solution of the present invention
It is described in detail, it should be understood that the specific features in the embodiment of the present invention and embodiment are to the detailed of technical solution of the present invention
Thin explanation, rather than the restriction to technical solution of the present invention, in the case where not conflicting, the embodiment of the present invention and embodiment
In technical characteristic can be mutually combined.
In a first aspect, the embodiment of the present invention provides a kind of audio recognition method, Fig. 1 is refer to, including:
Step S101:The speech data that user produces is obtained, the user is that speech intelligibility is less than predetermined definition
User;
Step S102:Extract the first phonetic feature of the speech data;
Step S103:Speech recognition modeling based on pre-set user colony determines what first phonetic feature was characterized
First is semantic, and the pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
For example, the program is applied to the electronic equipment for possessing speech identifying function, and the electronic equipment can be client
End equipment, for example:Mobile phone, tablet personal computer, notebook computer, bracelet, child intelligence wrist-watch etc.;The electronic equipment can also be
Server, the embodiment of the present invention is not restricted.
In step S101, if the program is applied to client device, the client device can by carrying or
External voice acquisition device collection obtains the speech data that user produces, and can also receive other client devices and send extremely
Speech data;If the program is applied to server, server can receive the client device being attached thereto and send extremely
Speech data.
For example, user A wears a Wearable (for example:Children's wrist-watch), user B is (for example:User A father and mother, parent
People etc.) mobile phone and the Wearable there is data cube computation, then the Wearable can gather the voice for obtaining user A
Data, then identify that first corresponding to the speech data is semantic, then in Wearable by speech recognition modeling
By the first semantic transmission to mobile phone, in this case, the program is applied to client device (Wearable), and client is set
It is standby that speech data is directly obtained by audio collecting device collection;Or, Wearable obtains user A voice in collection
After data, directly send it to server, the speech data identified by speech recognition modeling as server corresponding to
It is first semantic, be then sent to user B, in this case, the program is applied to server, and server receives wearing
The speech data that formula equipment is sent;Or, Wearable collection is obtained after user A speech data, is sent it to
Mobile phone (is sent by short-distance wireless transmission mode or forwarded by server), and mobile phone is recognized by speech recognition modeling
Go out the first semanteme corresponding to the speech data, in this case, the program is applied to client device (mobile phone), client
Equipment is by receiving the speech data of other client devices (Wearable) transmission extremely.
Predetermined definition is, for example,:Told when speaking word understand, it is clear corresponding to user that people around can be allowed to be distinguished
Degree.Speech articulation (language transmission index;The intelligibility of speech) it is the physical quantity for weighing teller's voice intelligibility.
(STI/RASTI/STIPA is the speech articulation result using different measuring methods.) according to relevant criterion, sent by speaker
Linguistic unit (sentence, word or syllable), through language transmission system, investigates the ratio that music-listener correctly recognizes, its result is language
Definition.
Wherein, the user that speech intelligibility is less than predetermined definition can be a variety of users, for example:Tell the ambiguous use of word
Family, the user leaked out that speaks, the user of lisper, user that word can only pronounce etc. can not be told.Wherein, baby speaks milk sound milk
Gas, it is ambiguous or can not tell word and can only pronounce often to tell word, so generally baby belong to speech intelligibility be less than it is predetermined clear
The user of degree.Under normal circumstances, these users can have the problem of cacoepy is true, so as to lead when producing speech data
Cause its definition relatively low, it is impossible to be recognized by electronic equipment, it could even be possible to can not be recognized by other users.Step S102
In, for speech data, its phonetic feature can include at least one of spectral range and frequency shape feature.
In step S103, Fig. 2 is refer to, the speech recognition modeling of pre-set user colony can be obtained in the following manner:
Step S201:For every kind of pre-set user colony, at least one sample user is determined;
Step S202:Collection obtains the voice sample of at least one sample user, wherein, for each voice sample
Mark it semantic;
Step S203:The semantic phonetic feature of correspondence, association correspondence are determined based on the speech samples that each semanteme is included
Phonetic feature can obtain the speech recognition modeling with semanteme.
In step S201, pre-set user colony can be a variety of different user groups, for example:It can be divided into:Tell
The ambiguous user group of word, the user group leaked out that speaks, the user group of lisper, it word can not be told can only pronounce (for example:Baby
Youngster) user group etc..
For same semanteme, phonetic feature that different user group produces simultaneously is differed, so that, know setting up voice
During other code model, for same semanteme, its corresponding phonetic feature can be obtained respectively in each user group.Specific to step
In S201, every kind of pre-set user colony can be directed to, at least one sample user is all obtained respectively, for example:It is ambiguous for telling word
User group can obtain such as 100 sample users, the user group leaked out for speaking can obtain such as 100 use
Family, can obtain such as 100 users, the user group that can only pronounce for that can not tell word can for the user group of lisper
To obtain such as 100 users etc..
In step S202, for each sample user, after the voice sample of the sample user is obtained, sample collector can
The assistance data provided with combining the environment and other users of collection speech data determines the semanteme of the voice sample, other
User be usually the acquaintance of sample user (for example:Father and mother, relatives or the corresponding doctor of the colony of baby or scientific research personnel) or
Person's sample user is in itself etc..Then, then voice sample can be marked by the semanteme of acquisition.
Wherein, for basic voice sample (for example:Starting stage obtains voice sample) can be directly by above-mentioned
It is semantic that the mode of handmarking marks it;After having certain basis, machine automatic marking speech samples can also be passed through
Semanteme, if some speech samples can not be marked by machine mode, its is semantic, can turn again to manual type mark.
In step S203, speech recognition modeling is by knowing to the voice sample that the user for presetting feature colony produces
Not, and to match its corresponding semantic so as to obtaining speech recognition modeling.
In step S203, for every kind of pre-set categories colony, at least one voice corresponding to each implication can be obtained
Sample, for example:There is speech samples A, speech samples B, speech samples C in the user group ambiguous for telling word, semantic " having a meal ",
Speech samples A, speech samples B and speech samples C phonetic feature can be then extracted respectively, then to the phonetic feature of this three
Integrated, it is possible to obtain the phonetic feature of semantic " having a meal ".
If for example, phonetic feature is frequency range, it is assumed that speech samples A frequency range is:500Hz~800Hz, language
The frequency range of sound sample B is 550Hz~900Hz, and speech samples C frequency range is 450Hz~700Hz, then can be to this
The frequency range of three is overlapped, so as to obtain the frequency range of semantic " having a meal ", such as:450Hz~900Hz, certainly, also
The phonetic feature of semanteme can be otherwise determined, for example:Take common factor, average etc., the embodiment of the present invention is no longer detailed
Carefully enumerate, and be not restricted.
In specific implementation process, in step S203, it may be determined that go out the speech recognition modeling of diversified forms, be set forth below
Two kinds therein are introduced, certainly, in specific implementation process, are not limited to following two situations.
The first, the pre-set user colony includes M kind user groups, and M is positive integer, the language of the pre-set user colony
Include and included in the identification model of M kind user groups, the identification model of every kind of user group in sound identification model:Correspondence customer group
Phonetic feature and semantic corresponding relation under body.
As an example it is assumed that pre-set user colony includes four kinds of user groups, it is respectively:Tell the ambiguous user group of word,
The user group of lisper, the user group leaked out that speaks, the user group that word can only pronounce can not be told, then can set up language
During sound identification model, speech recognition modeling is set up respectively for these four user groups, for example:Assuming that for semantic " having a meal " four
Phonetic feature is planted, is respectively:Phonetic feature A (correspondence tells the ambiguous user group of word), phonetic feature B are (corresponding to speak what is leaked out
User group), the phonetic feature C user group of lisper (correspondence), (correspondence can not tell the use that word can only pronounce to phonetic feature D
Family colony), then when setting up speech recognition modeling, each classification can be divided by the form of table 1:
Table 1
So that after speech data is obtained, the user group belonging to the user is obtained first, then using the use
The voice is identified the speech recognition modeling of family owning user colony.Wherein it is possible to voluntarily be set by the user of electronic equipment
Determine the user group belonging to user, can also be by carrying out feature recognition to speech data, and then identify the use belonging to user
Family colony, the embodiment of the present invention is not restricted.
Based on the program, it is only necessary to which the first phonetic feature is matched with a kind of speech recognition modeling of user group,
Thus, it is possible to improve recognition efficiency.
Second, the pre-set user colony includes M kind user groups, the speech recognition modeling of the pre-set user colony
The semantic corresponding relation with the phonetic feature of each user group in M kind user groups is included in identification.
For example, for semantic " having a meal ", can for tell the ambiguous user group of word, the user group of lisper,
Its corresponding phonetic feature is respectively associated in the speak user group leaked out, the user group that can not tell word and can only pronounce, so that, lead to
Cross the first phonetic feature match its it is corresponding first it is semantic when, by the first phonetic feature and the phonetic feature of all user groups all
Matched, for example:Corresponding relation as shown in table 2 can be set up:
Table 2
In step S203, the speech recognition modeling based on pre-set user colony determines the first phonetic feature institute
First characterized is semantic, refer to Fig. 3, including:
Step S301:The phonetic feature is divided at least one phonetic feature fragment;
Step S302:By each phonetic feature fragment at least one described phonetic feature fragment and the speech recognition
Phonetic feature in model is matched, and then identifies the semanteme of each phonetic feature fragment;
Step S303:The semantic of each phonetic feature fragment obtains described at least one comprehensive described phonetic feature fragment
First semanteme that phonetic feature is characterized.
In step S301, it can reconcile length to divide phonetic feature based on the word of each word of phonetic feature, enter
And obtain at least one phonetic feature fragment.
In step S302, for each phonetic feature fragment, it can be entered with the phonetic feature in speech recognition modeling
A row matching, if the phonetic feature in the phonetic feature fragment and speech recognition modeling corresponding to some semanteme can succeed
Match somebody with somebody, then can this correspondence phonetic feature semanteme as the semantic feature fragment semanteme, for example:For some phonetic feature piece
Section if it is determined that it match with the phonetic feature E in speech recognition modeling, then can determine the phonetic feature fragment it is right
The semanteme answered is:Sleep.
, can be semantic according to user's generation language by each after the semanteme of each characteristic fragment is obtained in step S303
The time order and function order of sound characteristic fragment is arranged, and then it is semantic to obtain first.
In specific implementation process, after the first semanteme that the first phonetic feature is characterized is obtained based on step S103,
The first semanteme can also be exported, can be semantic by number of ways output first, it is set forth below two kinds therein and is introduced, when
So, in specific implementation process, following two situations are not limited to.
The first, the output equipment output first carried by current electronic device is semantic.
As an example it is assumed that baby wears intelligent watch (current electronic device), if the father and mother of baby are aside
Words, in order to ensure that its father and mother can understand the first semantic, then output equipment that can be carried by intelligent watch that baby speaks
Output first is semantic, for example:First sound output dress semantic, by smart machine is shown by the display unit of smart machine
Put first semanteme of output etc..
In another example, baby wears intelligent watch, and the father and mother of baby are by mobile phone (current electronic device) to intelligent watch
It is controlled, wherein intelligent watch sends it to mobile phone, mobile phone is in identification after the speech data of baby's generation is collected
Go out after the first semanteme that speech data is characterized, can directly export the first semanteme, its way of output is similar with intelligent watch,
Repeat no more.
Second, by the first semantic transmission to the electronic equipment where pre-set user, and then by where pre-set user
Electronic equipment output first is semantic.
For example, pre-set user is, for example,:There is the user of close relationship (for example in active user and active user:When
The father and mother of preceding user, relatives etc.) or doctor scientific research personnel etc.,.For example:Baby wears intelligent watch, and the father and mother of baby are led to
Cross mobile phone (current electronic device) to be controlled intelligent watch, intelligent watch, can after the speech data of baby is collected
It is first semantic corresponding to the speech data of baby to identify, but other electronic equipments such as mobile phone or PC are sent it to,
So as to semantic by the output first of other electronic equipments such as mobile phone or PC, it can be exported by the display unit or sound of mobile phone
Device output first is semantic., can be defeated by the sound of other electronic equipments such as mobile phone or PC as a kind of optional embodiment
Go out speech data, display unit the first language of output by other electronic equipments such as mobile phone or PC that device output user produces
Justice, so as to allow pre-set user to be learnt based on the speech data and the first semanteme.
Wherein, current electronic device by first it is semantic send to electronic equipment where pre-set user when, can be direct
Send, can also first determine whether whether include predetermined keyword in first semanteme;, will when comprising the predetermined keyword
First semanteme is sent to the electronic equipment where pre-set user.
For example, predetermined keyword is, for example,:There is the keyword of demand to pre-set user, for example:Assuming that current use
The baby at family, then predetermined keyword for example, " having a meal ", " thirsty ", " hungry ", " stool, urine " etc.;If active user is voice
The adult that definition is not enough, then predetermined keyword be, for example,:" help ", " seeking help ", " stool, urine " etc., the present invention are implemented
Example is not restricted.
As a kind of optional embodiment, the first semanteme of the first phonetic feature is being identified based on speech recognition modeling
Afterwards, be also based on the feedback of pre-set user, the semanteme of the first phonetic feature be modified, its specific makeover process include with
Lower step:First semanteme of first phonetic feature is supplied to pre-set user, so that the pre-set user judges institute
Whether state the first phonetic feature described first be semantic accurate;Obtain the pre-set user and assert that described first is semantic inaccurate
When, the second semanteme provided for first phonetic feature;In the speech recognition modeling, by first phonetic feature
It is semantic that first semanteme is replaced with by second semanteme.
As an example it is assumed that by the way that the phonetic feature that user produces is identified, determining its corresponding first semanteme
For:Have a meal, then it can be supplied to pre-set user by current electronic device or send it to the electricity of pre-set user
Sub- equipment, so as to be supplied to pre-set user;Pre-set user is after the speech data that user produces is heard, if feeling the first language
Justice has no problem, and can produce the feedback information for determining that first semanteme has no problem;If pre-set user thinks that first is semantic
It is problematic, then the feedback information corrected to the semanteme of speech data can be produced, for example:It is special to current electronic device voice
Levy corresponding correct semanteme (for example:Second is semantic), so as to semantic replace the based on second in speech recognition modeling
One is semantic, and the corrigendum to speech recognition modeling is realized with this.
As a kind of optional embodiment, for that with short, may there is different semantemes under different scenes.
So when setting up speech recognition modeling, there may be a variety of semantemes, every kind of semantic and and user for same phonetic feature
The characteristic for producing speech data is related, and this feature data are, for example,:Surrounding enviroment, user action, user's tone etc., enter
And in step S103, the word content expressed by the first phonetic feature can be determined by the first phonetic feature first, then
The characteristic of user is obtained, and then the semanteme corresponding to the word content is obtained by characteristic.
For example:In speech recognition modeling, there is corresponding relation as shown in table 3 for word content " I will not ":
Table 3
And then, if determining that the word content that speech data is characterized is " I will not " by the first phonetic feature,
User action can be obtained by camera collection, if it is determined that go out user action for " falling thing ", then it is assumed that the user may
It is angry, so needs carry out some to the user and pacify action.
Based on such scheme, the skill that can be identified for the implication of the speech data produced by user has been reached
Art effect.
Second aspect, based on same inventive concept, the embodiment of the present invention provides speech recognition modeling method for building up, refer to
Fig. 4, including:
Step S401:For pre-set user colony, at least one sample user is determined, the pre-set user colony is voice
Definition is less than the colony belonging to the user of predetermined definition;
Step S402:Collection obtains the voice sample of at least one sample user, wherein, for each voice sample
Mark it semantic;
Step S403:The semantic phonetic feature of correspondence, association correspondence are determined based on the speech samples that each semanteme is included
Phonetic feature and the semantic and then acquisition speech recognition modeling.
Optionally, the pre-set user colony includes M kind user groups, and M is positive integer, the language of the pre-set user colony
Include and included in the identification model of M kind user groups, the identification model of every kind of user group in sound identification model:Correspondence customer group
Phonetic feature and semantic corresponding relation under body;Or
The pre-set user colony includes M kind user groups, in the speech recognition modeling identification of the pre-set user colony
Include the semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
Optionally, it is described after the association correspondence phonetic feature with the semantic and then acquisition speech recognition modeling
Method also includes:
After the speech data that user produces is obtained, the speech recognition is known by the speech recognition modeling
Not, the first semantic, user of the user for speech intelligibility less than predetermined definition that the speech data is characterized is obtained;
First semanteme of first phonetic feature is supplied to pre-set user, so that the pre-set user judges institute
Whether state the first phonetic feature described first be semantic accurate;
Obtain the pre-set user assert described first it is semantic inaccurate when, the provided for first phonetic feature
Two is semantic;
The semanteme of first phonetic feature is replaced with into first semanteme by second semanteme.
Optionally, the phonetic feature includes:Frequency range and/or frequency shape.
Due to specifically how to set up speech recognition modeling, it has been described in first aspect of the embodiment of the present invention, so herein
Repeat no more, the speech recognition modeling that every first aspect of the embodiment of the present invention is used sets up mode and is suitable for the present invention in fact
Apply a second aspect.
The third aspect, based on same inventive concept, the embodiment of the present invention provides a kind of speech recognition equipment, refer to Fig. 5,
Including:
First obtains module 50, the speech data for obtaining user's generation, and the user is that speech intelligibility is less than in advance
Determine the user of definition;
Extraction module 51, the first phonetic feature for extracting the speech data;
First determining module 52, determines that first voice is special for the speech recognition modeling based on pre-set user colony
The first characterized semanteme is levied, the pre-set user colony is speech intelligibility less than the group belonging to the user of predetermined definition
Body.
Optionally, the speech intelligibility is less than the user of predetermined definition, including:Tell the ambiguous user of word, speak Lou
The user of wind, the user of lisper, at least one of the user that word can only pronounce user can not be told.
Optionally, described device also includes:
Second determining module, for for every kind of pre-set user colony, determining at least one sample user;
First acquisition module, the voice sample of at least one sample user is obtained for gathering, wherein, for each
It is semantic that voice sample marks it;
Second obtains module, and the speech samples for being included based on each semanteme determine the semantic phonetic feature of correspondence,
Association correspondence phonetic feature can obtain the speech recognition modeling with semanteme.
Optionally, the pre-set user colony includes M kind user groups, and M is positive integer, the language of the pre-set user colony
Include and included in the identification model of M kind user groups, the identification model of every kind of user group in sound identification model:Correspondence customer group
Phonetic feature and semantic corresponding relation under body;Or
The pre-set user colony includes M kind user groups, in the speech recognition modeling identification of the pre-set user colony
Include the semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
Optionally, described device also includes:
First provides module, for first semanteme of first phonetic feature to be supplied into pre-set user, for
The pre-set user judges whether first semanteme of first phonetic feature is accurate;
First acquisition module, for obtain the pre-set user assert described first it is semantic inaccurate when, be described the
The second semanteme that one phonetic feature is provided;
First replacement module, in the speech recognition modeling, by the semanteme of first phonetic feature by described
It is semantic that second semanteme replaces with described first.
Optionally, first determining module 52, including:
Division unit, for the phonetic feature to be divided into at least one phonetic feature fragment;
Matching unit, for by each phonetic feature fragment at least one described phonetic feature fragment and the voice
Phonetic feature in identification model is matched, and then identifies the semanteme of each phonetic feature fragment;
Comprehensive unit, is obtained for the semantic of each phonetic feature fragment at least one comprehensive described phonetic feature fragment
First semanteme that the phonetic feature is characterized.
Optionally, the phonetic feature includes:Frequency range and/or frequency shape.
Optionally, described device also includes:
Sending module, for the described first semanteme to be sent to the electronic equipment where pre-set user;Or
Judge whether include predetermined keyword in first semanteme;When comprising the predetermined keyword, by described
One semanteme is sent to the electronic equipment where pre-set user.
By the speech recognition equipment that the third aspect of the embodiment of the present invention is introduced, to implement first party of the embodiment of the present invention
The device that the audio recognition method in face is used, so the speech recognition side introduced based on first aspect of the embodiment of the present invention
Method, those skilled in the art can understand concrete structure and the deformation of the device, so will not be repeated here, it is every to implement
The device that the audio recognition method that first aspect of the embodiment of the present invention is introduced is used belongs to the embodiment of the present invention and is intended to protect
The scope of shield.
Fourth aspect, based on same inventive concept, the embodiment of the present invention provides a kind of speech recognition modeling and sets up device, please
With reference to Fig. 6, including:
3rd determining module 60, for for pre-set user colony, determining at least one sample user, the pre-set user
Colony is speech intelligibility less than the colony belonging to the user of predetermined definition;
Second acquisition module 61, the voice sample of at least one sample user is obtained for gathering, wherein, for every
It is semantic that individual voice sample marks it;
3rd obtains module 62, and the speech samples for being included based on each semanteme determine that the semantic voice of correspondence is special
Levy, association correspondence phonetic feature and the semantic and then acquisition speech recognition modeling.
Optionally, the pre-set user colony includes M kind user groups, and M is positive integer, the language of the pre-set user colony
Include and included in the identification model of M kind user groups, the identification model of every kind of user group in sound identification model:Correspondence customer group
Phonetic feature and semantic corresponding relation under body;Or
The pre-set user colony includes M kind user groups, in the speech recognition modeling identification of the pre-set user colony
Include the semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
Optionally, described device also includes:
Identification module, for after the speech data that user produces is obtained, by the speech recognition modeling to described
Speech recognition is identified, and obtains the first semanteme that the speech data is characterized, the user is speech intelligibility less than predetermined
The user of definition;
Second provides module, for first semanteme of first phonetic feature to be supplied into pre-set user, for
The pre-set user judges whether first semanteme of first phonetic feature is accurate;
Second acquisition module, for obtain the pre-set user assert described first it is semantic inaccurate when, be described the
The second semanteme that one phonetic feature is provided;
Second replacement module, for the semanteme of first phonetic feature to be replaced with into described first by second semanteme
It is semantic.
Optionally, the phonetic feature includes:Frequency range and/or frequency shape.
By the speech recognition modeling that fourth aspect of the embodiment of the present invention is introduced sets up device, implement to implement the present invention
The device that the speech recognition modeling method for building up of example second aspect is used, so be situated between based on second aspect of the embodiment of the present invention
The speech recognition modeling method for building up continued, those skilled in the art can understand concrete structure and the deformation of the device, therefore
And will not be repeated here, the speech recognition modeling method for building up that every implementation second aspect of the embodiment of the present invention is introduced is used
Device belong to the scope to be protected of the embodiment of the present invention.
5th aspect, based on same inventive concept, the embodiment of the present invention provides a kind of electronic equipment, includes memory,
And one or more than one program, one of them or more than one program storage is configured in memory
By one or more than one computing device is one or more than one program bag contains the finger for being used for being operated below
Order:
The speech data that user produces is obtained, the user is the user that speech intelligibility is less than predetermined definition;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized,
The pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
By the electronic equipment that the aspect of the embodiment of the present invention the 5th is introduced, to implement first aspect of the embodiment of the present invention
The electronic equipment that audio recognition method is used, so the speech recognition side introduced based on first aspect of the embodiment of the present invention
Method, those skilled in the art can understand concrete structure and the deformation of the electronic equipment, so will not be repeated here, it is every
The electronic equipment that the audio recognition method that implementation first aspect of the embodiment of the present invention is introduced is used belongs to implementation of the present invention
The scope to be protected of example.
6th aspect, based on same inventive concept, the embodiment of the present invention provides a kind of electronic equipment, includes memory,
And one or more than one program, one of them or more than one program storage is configured in memory
By one or more than one computing device is one or more than one program bag contains the finger for being used for being operated below
Order:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is small
The colony belonging to user in predetermined definition;
Collection obtains the voice sample of at least one sample user, wherein, mark its language for each voice sample
Justice;
The speech samples included based on each semanteme determine the semantic phonetic feature of correspondence, association correspondence phonetic feature with
The semantic and then acquisition speech recognition modeling.
By the electronic equipment that the aspect of the embodiment of the present invention the 6th is introduced, to implement second aspect of the embodiment of the present invention
The electronic equipment that speech recognition modeling method for building up is used, so the voice introduced based on second aspect of the embodiment of the present invention
Identification model method for building up, those skilled in the art can understand concrete structure and the deformation of the electronic equipment, so
This is repeated no more, the electricity that the speech recognition modeling method for building up that every implementation second aspect of the embodiment of the present invention is introduced is used
Sub- equipment belongs to the scope to be protected of the embodiment of the present invention.
Fig. 7 is that (or a kind of speech recognition modeling is built for a kind of audio recognition method according to an exemplary embodiment
Cube method) electronic equipment 800 block diagram.For example, electronic equipment 800 can be mobile phone, computer, digital broadcasting is whole
End, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant, bracelet, children
Wrist-watch etc..
Reference picture 7, electronic equipment 800 can include following one or more assemblies:Processing assembly 802, memory 804,
Power supply module 806, multimedia groupware 808, audio-frequency assembly 810, the interface 812 of input/output (I/O), sensor cluster 814,
And communication component 816.
The integrated operation of the usual control electronics 800 of processing assembly 802, such as with display, call, data are led to
Letter, the camera operation operation associated with record operation.Treatment element 802 can include one or more processors 820 to hold
Row instruction, to complete all or part of step of above-mentioned method.In addition, processing assembly 802 can include one or more moulds
Block, is easy to the interaction between processing assembly 802 and other assemblies.For example, processing component 802 can include multi-media module, with
Facilitate the interaction between multimedia groupware 808 and processing assembly 802.
Memory 804 is configured as storing various types of data supporting the operation in equipment 800.These data are shown
Example includes the instruction of any application program or method for being operated on electronic equipment 800, contact data, telephone directory number
According to, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or they
Combination realize that such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) is erasable
Programmable read only memory (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, quick flashing
Memory, disk or CD.
Electric power assembly 806 provides electric power for the various assemblies of electronic equipment 800.Electric power assembly 806 can include power supply pipe
Reason system, one or more power supplys, and other components associated with generating, managing and distributing electric power for electronic equipment 800.
Multimedia groupware 808 is included in the screen of one output interface of offer between the electronic equipment 800 and user.
In certain embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface
Plate, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch
Sensor is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or slip
The border of action, but also the detection duration related to the touch or slide and pressure.In certain embodiments,
Multimedia groupware 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operator scheme, such as clap
When taking the photograph pattern or video mode, front camera and/or rear camera can receive the multi-medium data of outside.It is each preposition
Camera and rear camera can be a fixed optical lens systems or with focusing and optical zoom capabilities.
Audio-frequency assembly 810 is configured as output and/or input audio signal.For example, audio-frequency assembly 810 includes a Mike
Wind (MIC), when electronic equipment 800 is in operator scheme, when such as call model, logging mode and speech recognition mode, microphone
It is configured as receiving external audio signal.The audio signal received can be further stored in memory 804 or via logical
Letter component 816 is sent.In certain embodiments, audio-frequency assembly 810 also includes a loudspeaker, for exports audio signal.
I/O interfaces 812 is provide interface between processing assembly 802 and peripheral interface module, above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor cluster 814 includes one or more sensors, the state for providing various aspects for electronic equipment 800
Assess.For example, sensor cluster 814 can detect opening/closed mode of equipment 800, the relative positioning of component, such as institute
Display and keypad that component is electronic equipment 800 are stated, sensor cluster 814 can also detect electronic equipment 800 or electronics
The position of 800 1 components of equipment changes, the existence or non-existence that user contacts with electronic equipment 800, the orientation of electronic equipment 800
Or acceleration/deceleration and the temperature change of electronic equipment 800.Sensor cluster 814 can include proximity transducer, be configured to
The presence of object near being detected in not any physical contact.Sensor cluster 814 can also include optical sensor, such as
CMOS or ccd image sensor, for being used in imaging applications.In certain embodiments, the sensor cluster 814 can be with
Including acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment.
Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.Show at one
In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 816 also includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuits (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided
Such as include the memory 804 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 820 of electronic equipment 800.Example
Such as, the non-transitorycomputer readable storage medium can be ROM, it is random access memory (RAM), CD-ROM, tape, soft
Disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment
When device is performed so that electronic equipment is able to carry out a kind of audio recognition method, and methods described includes:
The speech data that user produces is obtained, the user is the user that speech intelligibility is less than predetermined definition;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized,
The pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment
When device is performed so that electronic equipment is able to carry out a kind of speech recognition modeling method for building up, and methods described includes:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is small
The colony belonging to user in predetermined definition;
Collection obtains the voice sample of at least one sample user, wherein, mark its language for each voice sample
Justice;
The speech samples included based on each semanteme determine the semantic phonetic feature of correspondence, association correspondence phonetic feature with
The semantic and then acquisition speech recognition modeling.
Fig. 8 is the structural representation of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance
And produce than larger difference, can include one or more central processing units (central processing units,
CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs
1942 or the storage medium 1930 (such as one or more mass memory units) of data 1944.Wherein, memory 1932
Can be of short duration storage or persistently storage with storage medium 1930.Be stored in storage medium 1930 program can include one or
More than one module (diagram is not marked), each module can include operating the series of instructions in server.Further
Ground, central processing unit 1922 be could be arranged to communicate with storage medium 1930, and storage medium 1930 is performed on server 1900
In series of instructions operation.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is in server 1900
When central processor 1922 is performed so that server is able to carry out a kind of audio recognition method, and methods described includes:
The speech data that user produces is obtained, the user is the user that speech intelligibility is less than predetermined definition;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized,
The pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is in server 1900
When central processor 1922 is performed so that server is able to carry out a kind of speech recognition modeling method for building up, and methods described includes:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is small
The colony belonging to user in predetermined definition;
Collection obtains the voice sample of at least one sample user, wherein, mark its language for each voice sample
Justice;
The speech samples included based on each semanteme determine the semantic phonetic feature of correspondence, association correspondence phonetic feature with
The semantic and then acquisition speech recognition modeling.
The one or more embodiments of the present invention, at least have the advantages that:
Due in embodiments of the present invention, speech data being produced obtaining user of the speech intelligibility less than predetermined definition
Afterwards, the first phonetic feature of speech data can be extracted, the speech recognition modeling for being then based on pre-set user colony is determined
The first semanteme that first phonetic feature is characterized, the pre-set user colony is that speech intelligibility is less than predetermined definition
Colony belonging to user, that is, reach the voice that can be produced by the speech recognition modeling of pre-set user colony to user
The technique effect that data are identified, so as to be less than the speech data that the user of predetermined definition produces to speech intelligibility
Effectively recognized.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described
Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.So, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.
Claims (16)
1. a kind of audio recognition method, it is characterised in that including:
The speech data that user produces is obtained, the user is that the user is the use that speech intelligibility is less than predetermined definition
Family;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized, described
Pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
2. the method as described in claim 1, it is characterised in that the speech intelligibility is less than the user of predetermined definition, bag
Include:Tell the ambiguous user of word, the user leaked out that speaks, the user of lisper, can not tell in the user that word can only pronounce at least
A kind of user.
3. the method as described in claim 1, it is characterised in that the speech recognition modeling of the pre-set user colony is identified by
In the following manner is set up:
For every kind of pre-set user colony, at least one sample user is determined;
Collection obtains the voice sample of at least one sample user, wherein, mark it for each voice sample semantic;
Corresponding semantic phonetic feature, association correspondence phonetic feature and semanteme are determined based on the speech samples that each semanteme is included
The speech recognition modeling can be obtained.
4. method as claimed in claim 3, it is characterised in that the pre-set user colony includes M kind user groups, and M is just
The identification model of M kind user groups is included in integer, the speech recognition modeling of the pre-set user colony, every kind of user group's
Included in identification model:Phonetic feature and semantic corresponding relation under correspondence user group;Or
The pre-set user colony includes M kind user groups, is included in the speech recognition modeling identification of the pre-set user colony
The semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
5. method as claimed in claim 3, it is characterised in that true in the speech recognition modeling based on pre-set user colony
Make after the first semanteme that first phonetic feature is characterized, methods described also includes:
First semanteme of first phonetic feature is supplied to pre-set user, so that the pre-set user judges described
Whether described the first of one phonetic feature be semantic accurate;
The pre-set user is obtained when identification described first is semantic inaccurate, the second language provided for first phonetic feature
Justice;
In the speech recognition modeling, the semanteme of first phonetic feature is replaced with described first by second semanteme
It is semantic.
6. the method as described in claim 1-5 is any, it is characterised in that the speech recognition mould based on pre-set user colony
Type determines the first semanteme that first phonetic feature is characterized, including:
The phonetic feature is divided at least one phonetic feature fragment;
By the voice in each phonetic feature fragment at least one described phonetic feature fragment and the speech recognition modeling
Feature is matched, and then identifies the semanteme of each phonetic feature fragment;
The semanteme of each phonetic feature fragment obtains phonetic feature institute table at least one comprehensive described phonetic feature fragment
Described first levied is semantic.
7. the method as described in claim 1-5 is any, it is characterised in that the phonetic feature includes:Frequency range and/or frequency
Rate shape.
8. the method as described in claim 1-5 is any, it is characterised in that in the speech recognition based on pre-set user colony
Model Identification goes out after the first semanteme that first phonetic feature is characterized, and methods described also includes:
By the described first semantic transmission to the electronic equipment where pre-set user;Or
Judge whether include predetermined keyword in first semanteme;When comprising the predetermined keyword, by first language
Justice is sent to the electronic equipment where pre-set user.
9. a kind of speech recognition modeling method for building up, it is characterised in that including:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is less than in advance
Determine the colony belonging to the user of definition;
Collection obtains the voice sample of at least one sample user, wherein, mark it for each voice sample semantic;
Corresponding semantic phonetic feature, association correspondence phonetic feature and semanteme are determined based on the speech samples that each semanteme is included
And then obtain the speech recognition modeling.
10. method as claimed in claim 9, it is characterised in that the pre-set user colony includes M kind user groups, and M is just
The identification model of M kind user groups is included in integer, the speech recognition modeling of the pre-set user colony, every kind of user group's
Included in identification model:Phonetic feature and semantic corresponding relation under correspondence user group;Or
The pre-set user colony includes M kind user groups, is included in the speech recognition modeling identification of the pre-set user colony
The semantic corresponding relation with the phonetic feature of each user group in M kind user groups.
11. method as claimed in claim 9, it is characterised in that in the association correspondence phonetic feature and semantic and then acquisition
After the speech recognition modeling, methods described also includes:
After the speech data that user produces is obtained, the speech recognition is identified by the speech recognition modeling,
Obtain the first semantic, user of the user for speech intelligibility less than predetermined definition that the speech data is characterized;
First semanteme of first phonetic feature is supplied to pre-set user, so that the pre-set user judges described
Whether described the first of one phonetic feature be semantic accurate;
The pre-set user is obtained when identification described first is semantic inaccurate, the second language provided for first phonetic feature
Justice;
The semanteme of first phonetic feature is replaced with into first semanteme by second semanteme.
12. the method as described in claim 9-11 is any, it is characterised in that the phonetic feature includes:Frequency range and/or
Frequency shape.
13. a kind of speech recognition equipment, it is characterised in that including:
First obtains module, the speech data for obtaining user's generation, and the user is that speech intelligibility is clear less than predetermined
The user of degree;
Extraction module, the first phonetic feature for extracting the speech data;
First determining module, the first phonetic feature institute table is determined for the speech recognition modeling based on pre-set user colony
First levied is semantic, and the pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
14. a kind of speech recognition modeling sets up device, it is characterised in that including:
3rd determining module, for for pre-set user colony, determining at least one sample user, the pre-set user colony is
Speech intelligibility is less than the colony belonging to the user of predetermined definition;
Second acquisition module, the voice sample of at least one sample user is obtained for gathering, wherein, for each voice
It is semantic that sample marks it;
3rd obtains module, and the speech samples for being included based on each semanteme determine the semantic phonetic feature of correspondence, association
Correspondence phonetic feature and the semantic and then acquisition speech recognition modeling.
15. a kind of electronic equipment, it is characterised in that include memory, and one or more than one program, wherein one
Individual or more than one program storage is configured to one as described in one or more than one computing device in memory
Individual or more than one program bag contains the instruction for being used for being operated below:
The speech data that user produces is obtained, the user is the user that speech intelligibility is less than predetermined definition;
Extract the first phonetic feature of the speech data;
Speech recognition modeling based on pre-set user colony determines the first semanteme that first phonetic feature is characterized, described
Pre-set user colony is speech intelligibility less than the colony belonging to the user of predetermined definition.
16. a kind of electronic equipment, it is characterised in that include memory, and one or more than one program, wherein one
Individual or more than one program storage is configured to one as described in one or more than one computing device in memory
Individual or more than one program bag contains the instruction for being used for being operated below:
For pre-set user colony, at least one sample user is determined, the pre-set user colony is that speech intelligibility is less than in advance
Determine the colony belonging to the user of definition;
Collection obtains the voice sample of at least one sample user, wherein, mark it for each voice sample semantic;
Corresponding semantic phonetic feature, association correspondence phonetic feature and semanteme are determined based on the speech samples that each semanteme is included
And then obtain the speech recognition modeling.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610203791.7A CN107301862A (en) | 2016-04-01 | 2016-04-01 | A kind of audio recognition method, identification model method for building up, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610203791.7A CN107301862A (en) | 2016-04-01 | 2016-04-01 | A kind of audio recognition method, identification model method for building up, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107301862A true CN107301862A (en) | 2017-10-27 |
Family
ID=60136556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610203791.7A Pending CN107301862A (en) | 2016-04-01 | 2016-04-01 | A kind of audio recognition method, identification model method for building up, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107301862A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909995A (en) * | 2017-11-16 | 2018-04-13 | 北京小米移动软件有限公司 | Voice interactive method and device |
CN108847222A (en) * | 2018-06-19 | 2018-11-20 | Oppo广东移动通信有限公司 | Speech recognition modeling generation method, device, storage medium and electronic equipment |
CN109767775A (en) * | 2019-02-26 | 2019-05-17 | 珠海格力电器股份有限公司 | Sound control method, device and air-conditioning |
CN111007902A (en) * | 2019-11-12 | 2020-04-14 | 珠海格力电器股份有限公司 | Mother and infant motion monitoring system and method based on camera and smart home |
CN111629164A (en) * | 2020-05-29 | 2020-09-04 | 联想(北京)有限公司 | Video recording generation method and electronic equipment |
CN111883112A (en) * | 2020-07-27 | 2020-11-03 | 中国平安人寿保险股份有限公司 | Semantic recognition method and device based on multi-mode identification and computer equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0241163A1 (en) * | 1986-03-25 | 1987-10-14 | AT&T Corp. | Speaker-trained speech recognizer |
EP0862161A2 (en) * | 1997-02-28 | 1998-09-02 | Philips Patentverwaltung GmbH | Speech recognition method with model adaptation |
CN101599270A (en) * | 2008-06-02 | 2009-12-09 | 海尔集团公司 | Voice server and voice control method |
CN103247291A (en) * | 2013-05-07 | 2013-08-14 | 华为终端有限公司 | Updating method, device, and system of voice recognition device |
CN103559892A (en) * | 2013-11-08 | 2014-02-05 | 安徽科大讯飞信息科技股份有限公司 | Method and system for evaluating spoken language |
CN103714812A (en) * | 2013-12-23 | 2014-04-09 | 百度在线网络技术(北京)有限公司 | Voice identification method and voice identification device |
CN104112445A (en) * | 2014-07-30 | 2014-10-22 | 宇龙计算机通信科技(深圳)有限公司 | Terminal and voice identification method |
-
2016
- 2016-04-01 CN CN201610203791.7A patent/CN107301862A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0241163A1 (en) * | 1986-03-25 | 1987-10-14 | AT&T Corp. | Speaker-trained speech recognizer |
EP0862161A2 (en) * | 1997-02-28 | 1998-09-02 | Philips Patentverwaltung GmbH | Speech recognition method with model adaptation |
CN101599270A (en) * | 2008-06-02 | 2009-12-09 | 海尔集团公司 | Voice server and voice control method |
CN103247291A (en) * | 2013-05-07 | 2013-08-14 | 华为终端有限公司 | Updating method, device, and system of voice recognition device |
CN103559892A (en) * | 2013-11-08 | 2014-02-05 | 安徽科大讯飞信息科技股份有限公司 | Method and system for evaluating spoken language |
CN103714812A (en) * | 2013-12-23 | 2014-04-09 | 百度在线网络技术(北京)有限公司 | Voice identification method and voice identification device |
CN104112445A (en) * | 2014-07-30 | 2014-10-22 | 宇龙计算机通信科技(深圳)有限公司 | Terminal and voice identification method |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909995A (en) * | 2017-11-16 | 2018-04-13 | 北京小米移动软件有限公司 | Voice interactive method and device |
CN108847222A (en) * | 2018-06-19 | 2018-11-20 | Oppo广东移动通信有限公司 | Speech recognition modeling generation method, device, storage medium and electronic equipment |
CN109767775A (en) * | 2019-02-26 | 2019-05-17 | 珠海格力电器股份有限公司 | Sound control method, device and air-conditioning |
CN111007902A (en) * | 2019-11-12 | 2020-04-14 | 珠海格力电器股份有限公司 | Mother and infant motion monitoring system and method based on camera and smart home |
CN111629164A (en) * | 2020-05-29 | 2020-09-04 | 联想(北京)有限公司 | Video recording generation method and electronic equipment |
CN111883112A (en) * | 2020-07-27 | 2020-11-03 | 中国平安人寿保险股份有限公司 | Semantic recognition method and device based on multi-mode identification and computer equipment |
CN111883112B (en) * | 2020-07-27 | 2022-03-18 | 中国平安人寿保险股份有限公司 | Semantic recognition method and device based on multi-mode identification and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107301862A (en) | A kind of audio recognition method, identification model method for building up, device and electronic equipment | |
CN109427333A (en) | Activate the method for speech-recognition services and the electronic device for realizing the method | |
CN104408402B (en) | Face identification method and device | |
CN105389304B (en) | Event Distillation method and device | |
CN108121490A (en) | For handling electronic device, method and the server of multi-mode input | |
CN105224601B (en) | A kind of method and apparatus of extracting time information | |
US20170052947A1 (en) | Methods and devices for training a classifier and recognizing a type of information | |
CN108735204A (en) | Equipment for executing task corresponding with user spoken utterances | |
CN106663245A (en) | Social reminders | |
CN104243814B (en) | Analysis method, image taking reminding method and the device of objects in images layout | |
CN107608532A (en) | A kind of association-feeding method, device and electronic equipment | |
CN110147467A (en) | A kind of generation method, device, mobile terminal and the storage medium of text description | |
CN104063865B (en) | Disaggregated model creation method, image partition method and relevant apparatus | |
CN109558512A (en) | A kind of personalized recommendation method based on audio, device and mobile terminal | |
CN108121736A (en) | A kind of descriptor determines the method for building up, device and electronic equipment of model | |
CN104850222A (en) | Instruction recognition method and electronic terminal | |
CN108345581A (en) | A kind of information identifying method, device and terminal device | |
CN109961787A (en) | Determine the method and device of acquisition end time | |
CN106991309A (en) | The operating method and device of terminal pattern | |
WO2021046958A1 (en) | Speech information processing method and apparatus, and storage medium | |
CN107529699A (en) | Control method of electronic device and device | |
CN107291772A (en) | One kind search access method, device and electronic equipment | |
CN110135349A (en) | Recognition methods, device, equipment and storage medium | |
WO2021212388A1 (en) | Interactive communication implementation method and device, and storage medium | |
CN109145876A (en) | Image classification method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171027 |