CN109545203A - Audio recognition method, device, equipment and storage medium - Google Patents
Audio recognition method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN109545203A CN109545203A CN201811534858.0A CN201811534858A CN109545203A CN 109545203 A CN109545203 A CN 109545203A CN 201811534858 A CN201811534858 A CN 201811534858A CN 109545203 A CN109545203 A CN 109545203A
- Authority
- CN
- China
- Prior art keywords
- user
- homonym
- semantic
- session
- semanteme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The present embodiments relate to a kind of audio recognition method, device, computer equipment and storage mediums, the described method includes: after detecting that user inputs voice, establish the dialogue with the user, if in the homonym for carrying out detecting corresponding multiple semantemes during speech recognition to voice messaging, it then initiates to talk with for the clarification of the polyphonic word, to allow user to determine the correct semanteme of polyphonic word, the correct semanteme of polyphonic word is finally determined according to the context in the reply of user and session.Compared to the mode of existing single interactive identification, the embodiment of the present invention uses man-machine double interactive modes, by establishing the one-to-one session with user, scene support can be provided for speech recognition, speech model is enabled to better understand semanteme expressed by voice messaging by the context in session.In addition, method provided in an embodiment of the present invention can also initiate clarification session for polyphonic word, to allow user to confirm the semanteme of polyphonic word, so as to improve the accuracy rate of speech recognition.
Description
Technical field
The present embodiments relate to technical field of data processing, more particularly to audio recognition method, device, computer set
Standby and storage medium.
Background technique
Speech recognition is currently a more popular technical field.Speech recognition technology can be applied to all multi-products
In, such as mobile phone, wearable device, smart home etc..User carries out certain operation by voice, that is, controllable device.Mesh
Speech recognition technology in front platform is a kind of traditional single interactive identification.Namely machine the problem of only answering this, office
Limit is in single-wheel dialogue.
Such as:
Does user: " middle mountain " have anything to be fond of eating?
Machine: good, I has found following restaurant: (can default and recommend nearby restaurant to user)
User: I does not feel like a meal.
Machine: good.
However, in such speech recognition mode, the problem of machine only answers this, only it is confined in single-wheel dialogue,
Lack language contexts to support, and this single interactive voice recognition mode for homonym, polysemant discrimination it is correct
Rate is lower.
Summary of the invention
Based on this, the embodiment of the invention provides a kind of audio recognition method, device, equipment and storage mediums, for mentioning
The recognition correct rate of homonym in high speech recognition process.
In a first aspect, the embodiment of the present invention may include:
After detecting user's input voice information, generated according to the first information by session manager with the user's
Session;Wherein the first information is the characteristic information detected for characterizing the user or preset period;
In the session, during the voice messaging inputted to user carries out speech recognition, however, it is determined that institute's predicate
Homonym comprising corresponding multiple semantic results in message breath then initiates to talk with for the clarification of the homonym, the clarification
Talk with the corresponding correct semanteme for confirming the homonym to user;
After detecting user for the reply of the clarification dialogue, above and below in the reply and the session
Text determines the correct semanteme of the homonym.
Optionally, the determining voice messaging includes the homonym of corresponding multiple semantic results, comprising:
Voice messaging is identified, corresponding multiple syllables are obtained;
Participle operation is carried out to obtained multiple syllables, obtains word segmentation result;
Semantic understanding is carried out to word segmentation result, if the corresponding semanteme of the first participle is multiple, it is determined that the first participle is same
Sound word.
Optionally, the clarification dialogue includes the corresponding mark of the multiple semantic and each semanteme, the mark
Conventional semantic and conventional semantic corresponding weight is stored in advance in the model for carrying out speech recognition;
The user is correct semantic corresponding mark for the reply of the clarification dialogue;
The correct semanteme that the homonym is determined according to reply, comprising:
Correct semanteme corresponding to the mark that user is replied stores in a model, and it is corresponding that the correct semanteme is arranged
Weight is greater than the conventional semantic corresponding weight of the mark;
The corresponding all semantemes of the mark are ranked up by weight is descending, sequence is determined near preceding semanteme
For the correct semanteme of the homonym.
Optionally, the method also includes:
When detection meets session termination condition, terminate the session;
Correct semanteme corresponding to the mark is deleted from the model.
Optionally, the method also includes:
The correct semantic completion of the homonym is tied into the recognition result to the voice messaging, and to the identification
Fruit carries out semantic understanding.
Optionally, the method also includes:
According to semantic understanding as a result, search for corresponding reply content, and show the reply content.
Optionally, the characteristic information of the user includes: the account information of user or the voiceprint of user.
Second aspect, the embodiment of the invention provides a kind of speech recognition equipments, comprising:
Session Control Unit, for passing through session management according to the first information after detecting user's input voice information
Device generates the session with the user;Wherein the first information is to detect for characterizing the characteristic information of the user,
Or the preset period;
Semantic understanding unit is used in the session, during carrying out speech recognition to the voice messaging, if
It determines that voice messaging includes the homonym of corresponding multiple semantic results, then initiates to talk with for the clarification of the homonym, it is described
The corresponding correct semanteme for confirming the homonym to user is talked in clarification;
The semantic understanding unit is also used to after detecting user for the reply of the clarification dialogue, according to described
Context in reply and the session determines the correct semanteme of the homonym.
In some embodiments, the semantic understanding unit 302 determines that voice messaging includes corresponding multiple semantic results
Homonym, comprising:
Voice messaging is identified, corresponding multiple syllables are obtained;
Participle operation is carried out to obtained multiple syllables, obtains word segmentation result;
Semantic understanding is carried out to word segmentation result, if the corresponding semanteme of the first participle is multiple, it is determined that the first participle is same
Sound word.
In some embodiments, the clarification dialogue includes the corresponding mark of the multiple semantic and each semanteme,
The routine of the mark is semantic and the semantic corresponding weight of routine is stored in advance in the model for carrying out speech recognition;
The user is correct semantic corresponding mark for the reply of the clarification dialogue;
The semantic understanding unit determines the correct semanteme of the homonym according to replying, comprising:
Correct semanteme corresponding to the mark that user is replied stores in a model, and it is corresponding that the correct semanteme is arranged
Weight is greater than the conventional semantic corresponding weight of the mark;
The corresponding all semantemes of the mark are ranked up by weight is descending, sequence is determined near preceding semanteme
For the correct semanteme of the homonym.
In some embodiments, the Session Control Unit is also used to:
When detection meets session termination condition, terminate the session;
Correct semanteme corresponding to the mark is deleted from the model.
In some embodiments, the semantic understanding unit is also used to:
The correct semantic completion of the homonym is tied into the recognition result to the voice messaging, and to the identification
Fruit carries out semantic understanding.
In some embodiments, the semantic understanding unit is also used to:
According to semantic understanding as a result, search for corresponding reply content, and show the reply content.
In some embodiments, the characteristic information of the user includes: the account information of user or the vocal print letter of user
Breath.
The third aspect, the embodiment of the invention provides a kind of computer equipment, including memory and processor, the storages
Computer-readable instruction is stored in device, when the computer-readable instruction is executed by the processor, so that the processor
The step of executing audio recognition method as described in relation to the first aspect.
Fourth aspect, the embodiment of the invention provides a kind of storage medium for being stored with computer-readable instruction, the meters
When calculation machine readable instruction is executed by one or more processors, so that one or more processors execute as described in relation to the first aspect
The step of audio recognition method.
The embodiment of the invention provides a kind of audio recognition method, device, computer equipment and storage medium, the methods
It include: to establish the dialogue with the user after detecting that user inputs voice, in the process for carrying out speech recognition to voice messaging
If the homonym that corresponding multiple semantemes are detected in initiates to talk with for the clarification of the polyphonic word, to allow user to determine multitone
The correct semanteme of word finally determines the correct semanteme of polyphonic word according to the context in the reply of user and session.Compared to
The mode of existing list interactive identification, the embodiment of the present invention uses man-machine double interactive modes, one-to-one with user by establishing
Session can provide scene support for speech recognition, speech model is better understood by the context in session
Semanteme expressed by voice messaging.In addition, method provided in an embodiment of the present invention can also initiate clarification session for polyphonic word,
To allow user to confirm the semanteme of polyphonic word, so as to improve the accuracy rate of speech recognition.
Detailed description of the invention
Fig. 1 is the internal structure block diagram of computer equipment in one embodiment;
Fig. 2 is the flow chart of audio recognition method in one embodiment;
Fig. 3 is the structural block diagram of speech recognition equipment in one embodiment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
In a first aspect, Fig. 1 is the schematic diagram of internal structure of computer equipment in one embodiment.As shown in Figure 1, the calculating
Machine equipment includes processor, non-volatile memory medium, memory and the network interface connected by system bus.Wherein, should
The non-volatile memory medium of computer equipment is stored with operating system, database and computer-readable instruction, can in database
It is stored with control information sequence, when which is executed by processor, processor may make to realize that a kind of voice is known
Other method.When the computer-readable instruction is executed by processor, processor may make to realize a kind of audio recognition method.The calculating
The processor of machine equipment supports the operation of entire computer equipment for providing calculating and control ability.The computer equipment
It can be stored with computer-readable instruction in memory, when which is executed by processor, processor may make to hold
A kind of audio recognition method of row.The network interface of the computer equipment with PERCOM peripheral communication for connecting.Those skilled in the art can
To understand, structure shown in Fig. 1, only the block diagram of part-structure relevant to application scheme, is not constituted to this Shen
Please the restriction of computer equipment that is applied thereon of scheme, specific computer equipment may include than as shown in the figure more or
Less component perhaps combines certain components or with different component layouts.
Second aspect, as shown in Fig. 2, the embodiment of the invention provides a kind of audio recognition methods, comprising the following steps:
S201, after detecting user's input voice information, according to the first information by session manager generate with it is described
The session of user;
Wherein, here detect user's input voice information can be detect user trigger voice input by
Key, or detect that user has directly inputted voice messaging.
Here the first information can be to detect for characterizing the characteristic information of user, such as can be for user's
ID, the account name of user, voiceprint of user etc. can be identified for that the information of user identity.Feature letter in addition to that can be user
Except breath, the first information here can also be the preset period, this preset period can come according to the actual situation
Setting, namely can be set the session for establishing a preset duration, for example, can establish one with the user one 10 minutes
Session.Or the first information can be the characteristic information of user and the combination of preset time period, such as according to user's
Characteristic information establishes the one-to-one session with the user, and the time of the session persistence is set as 10 minutes.
Session manager is one and is capable of providing the container that some web of concrete management applies all sessions, session here
Manager can generate and the user according to the ID of user, the voiceprint of the account name of user or user, preset period
One-to-one dialogue.Session manager can also maintain the session after generating dialogue, until detecting conversation end item
Part occurs.Wherein the condition of conversation end can be with are as follows: detects that user account, ID exit or have reached the preset period.?
After detecting the generation of conversation end condition, the session is just ended automatically.
S202, in the session, to the voice messaging carry out speech recognition during, however, it is determined that voice messaging
Homonym comprising the multiple semantic results of correspondence then initiates to talk with for the clarification of the homonym, and the clarification dialogue is used for
The corresponding correct semanteme of the homonym is confirmed to user;
Specifically, due in S201 session manager generate the one-to-one dialogue with the user, for this
The voice messaging identification of user also carries out in corresponding session.
The homonym detected in speech recognition process is the vocabulary that a syllable can correspond to multiple semantemes.For example, with
Family voice inputs " how is the weather on middle mountain ".Wherein, this syllable of zhong shan can correspond to " the middle mountain " of Zhongshan city,
" Zhong Mountain " of Zhongshan County can also be corresponded to.These semantemes are that speech recognition modeling is trained by a large amount of voice data
It arrives, while speech recognition modeling can determine that the syllable is corresponding according to the corresponding semantic frequency occurred of the voice in training data
The probability of each voice.It can be appreciated that its higher corresponding probability of the frequency that the semanteme occurs is higher.The prior art is detecting
It is highest semantic for correct semanteme of the syllable in this word that such homonym can directly determine probability.But in some feelings
Under condition, the semanteme that the mode directly determined in this way is selected not is the real meaning that user is intended by, for example, user think it is defeated
What is entered is " Zhong Mountain ", but due to zhong shan it is corresponding semanteme in " middle mountain " probability highest, the result of identification be always " in
Mountain " can bring bad experience in this way for user.Therefore, such same detecting in method provided in an embodiment of the present invention
When sound word occurs, clarification dialogue can be initiated to user.Clarification dialogue can provide the sound in the form of asking in reply user for user
Corresponding all semantemes are saved, confirm that syllable user really thinks which expression means to user, without such as existing skill
It goes to default the demand for judging user based on big data in art.For aforementioned citing, clarification dialogue can be to confirm zhong to user
Shan is " Zhong Mountain " of Zhongshan city " middle mountain " or Zhongshan County.Certainly there are many kinds of the forms of clarification dialogue, the present invention is implemented
Example is not especially limited this.
S203, detect user for it is described clarification dialogue reply after, according to it is described reply and the session in
Context determine the correct semanteme of the homonym.
User can indicate that the user really thinks the syllable meaning of expression, therefore root for the reply of clarification dialogue
According to this reply, while according to the session context carried out in the session with user, that is, it can determine the corresponding correct language of the syllable
Justice.
In method provided in an embodiment of the present invention, after detecting that user inputs voice, the dialogue with the user is established,
If detecting the homonym of corresponding multiple semantemes during carrying out speech recognition to voice messaging, initiate to be directed to the polyphonic word
Clarification dialogue, to allow user to determine the correct semanteme of polyphonic word, finally according to the context in the reply of user and session
Determine the correct semanteme of polyphonic word.Compared to the mode of existing single interactive identification, the embodiment of the present invention is mutual using man-machine double cross
Mode can provide scene support for speech recognition, speech model is led to by establishing the one-to-one session with user
The context crossed in session better understands semanteme expressed by voice messaging.In addition, method pair provided in an embodiment of the present invention
Clarification session can also be initiated in polyphonic word, to allow user to confirm the semanteme of polyphonic word, so as to improve the standard of speech recognition
True rate.
In some embodiments, determine whether the mode comprising homonym has much in voice messaging in step S202
Kind, one of optional embodiment are as follows:
S2021, voice messaging is identified, obtains corresponding multiple syllables;
Such as: user is inputted by voice: how is the weather of zhong shan? in the voice number for detecting user's input
According to later, natural language understanding (Natural Language Understanding, abbreviation NLU) technology can be used to voice
It is identified, is each syllable by speech recognition, obtains: zhong shan de tian qi zenme yang.
S2022, participle operation is carried out to obtained multiple syllables, obtains word segmentation result;
Here it can be segmented using NLU technology.The algorithm of participle is more mature, namely by a large amount of data to mould
Type is trained, and model is after receiving the data newly inputted, and the result that can be trained before is if it is determined that the two sounds
As soon as the probability that section becomes vocabulary is higher, the two syllables are synthesized a vocabulary.For example, by zhong shande
The result that tianqi zenme yang is segmented is zhong shan, de, tian qi, zenme yang.
S2023, semantic understanding is carried out to word segmentation result, if the corresponding semanteme of the first participle is multiple, it is determined that first point
Word is homonym.
After participle, each word segmentation result is subjected to semantic understanding, is also translated as syllable by word segmentation result
Text.Such as: zhong shan, de, tian qi, zenme yang be translated as zhong shan weather how.Here
It is zhong shan here can corresponding be Zhongshan city " middle mountain " that zhong shan, which is not translated as the reason of text,
Can corresponding be Zhongshan County " Zhong Mountain " namely the participle it is corresponding it is semantic be multiple, therefore the participle is homonym.
In some embodiments, the clarification dialogue initiated in above method step S202 for homonym can have very much
Kind form, one of optional embodiment may include: the corresponding mark of multiple semantic and each semantemes.Here mark
Know to be number, letter etc..For example, this homonym of zhong shan may be corresponding semantic for " middle mountain " and " clock
Mountain ".It is understandable to be, due to clarification dialogue be also user is played in a manner of voice, if only comprising " middle mountain " and
" Zhong Mountain ", then the syllable heard for a user be it is identical, there is no methods to distinguish, therefore, clarification dialogue in
Include it is semantic need to be the semanteme that can allow user that can judge difference from syllable, namely clarification dialogue may include " in
Mountain city " and " Zhongshan County ", along with corresponding mark can obtain 1: " Zhongshan city ", 2: " Zhongshan County ".
As described in aforementioned, clarify dialogue purpose be in order to allow user confirm homonym correct semanteme, therefore in addition to need
It to include that the purpose of dialogue is also clarified to instruction manual except semantic and mark.Such as: user speech input: zhong
How is the weather of shan? does is machine: may I ask that you say which? 1: " Zhongshan city ", 2: " Zhongshan County ".So as to allow user
Know that zhong shan is homonym, it is needed to confirm that correct semanteme is.Certainly one kind of above-mentioned only clarification dialogue
Optional embodiment can also can allow user to confirm correct semantic form using others, and the embodiment of the present invention is to this
It is not especially limited.
In some embodiments, the step S203 in above method embodiment is according in the reply and session of user
Context determines that the correct semanteme of homonym can have implementations in very much, below to one of optional embodiment into
Row explanation.
After having initiated clarification dialogue to user, user can be replied accordingly according to the call format of session.For
For clarification dialogue described above, user can reply mark corresponding to correct semanteme.Namely:
Does is machine: may I ask that you say which? 1: " Zhongshan city ", 2: " Zhongshan County ";
User: being 1.
It is not difficult to find out by the context understanding in the session, although user's reply is 1, it thinks the meaning of expression
Be on machine 1 in one enquirement representated by Zhongshan city.But 1 itself also has other meanings, such as number 1 itself, therefore
It needs based on context to understand 1 meaning to know correct semanteme.It is provided in an embodiment of the present invention according to reply with
And context determines that a kind of specific embodiment of the correct semanteme of homonym may include:
In a model, and the correct semanteme is arranged in correct semantic storage corresponding to S2031, the mark for replying user
Corresponding weight is greater than the conventional semantic corresponding weight of the mark;
Specifically, the identification information for clarifying each semanteme in the format and format of dialogue is pre-set
's.It is understandable to be, mark here may exist it is some conventional semantic, such as 1 further include number 1 itself this contain
Justice.The conventional semantic and corresponding weight of mark can be stored in advance in the model for carrying out speech recognition.Then it is detecting
After replying clarification dialogue to user, the correct semanteme that user replys also is stored in model, while this is that this is correct semantic
Weight is greater than conventional semantic corresponding weight.
For example, the meaning for identifying 1 may include: number 1 itself (weight w1), meaning 1 (weight w2), (power of meaning 2
Weight w3).These are stored in advance in a model.After user answers 1, at this moment 1 is marked using information completion technology
For Zhongshan city, and the weight that " 1 represents Zhongshan city " is arranged is greater than the weight of " 1 is number 1 ".Namely 1 meaning after completion
Are as follows: 1 (weight w1), meaning 1 (weight w2), meaning 2 (weight w3), Zhongshan city (weight w4) of number itself, and w4 be greater than w1,
w2、w3。
S2032, the corresponding all semantemes of the mark are ranked up by weight is descending, will be sorted near preceding language
Justice is determined as the correct semanteme of the homonym.
Also by taking citing above-mentioned as an example, by the corresponding weight of 1 meaning be ranked up after obtain: w4 be greater than w1, w2,
W3, then the just correct semanteme by the corresponding meaning of w4 (Zhongshan city) as homonym zhong shan.
Understandable to be, for mark 1, " Zhongshan city " this meaning for newly increasing is only at this with the user's
Be in dialogue it is useful, for when other or other users this be meant that useless, and language when can also affect on other
The understanding of justice.Therefore, when detecting this conversation end, by this it is newly-increased semantic delete, thus when not influencing other pair
In the semantic understanding of the mark.
After the correct semanteme of homonym has been determined, method provided in an embodiment of the present invention can also include:
S204, by the correct semantic completion of the homonym into the recognition result to the voice messaging, and to described
Recognition result carries out semantic understanding.
It, can will after determining and being correctly meant that " middle mountain " expressed by zhong shan also by taking above example as an example
Zhong shan " middle mountain " replaces zhong shan, and identifies the knot of " zhong shan de tian qi zen me yang "
Fruit be middle mountain weather how.
After recognition result has been determined, method provided in an embodiment of the present invention further include:
S205, according to semantic understanding as a result, search for corresponding reply content, and show reply content.
Such as determine user to be obtained be middle mountain Weather information, then search in mountain Weather information, searched
It, can also be with voice broadcast to user's displaying as a result, can for example show text after hitch fruit.
The third aspect, as shown in figure 3, the embodiment of the invention provides a kind of speech recognition equipment, the speech recognition equipment
It can integrate in above-mentioned computer equipment 110, may include Session Control Unit 301 and semantic understanding unit 302.
Session Control Unit 301, for passing through session pipe according to the first information after detecting user's input voice information
Manage the session of device generation and the user;Wherein the first information is the feature letter for characterizing the user detected
Breath or preset period;
Semantic understanding unit 302 is used in the session, in the process for carrying out speech recognition to the voice messaging
In, however, it is determined that voice messaging includes the homonym of corresponding multiple semantic results, then initiates to talk with for the clarification of the homonym,
The corresponding correct semanteme for confirming the homonym to user is talked in the clarification;
The semantic understanding unit 302 is also used to after detecting user for the reply of the clarification dialogue, according to institute
State the correct semanteme that the context in reply and the session determines the homonym.
In some embodiments, the semantic understanding unit 302 determines that voice messaging includes corresponding multiple semantic results
Homonym, comprising:
Voice messaging is identified, corresponding multiple syllables are obtained;
Participle operation is carried out to obtained multiple syllables, obtains word segmentation result;
Semantic understanding is carried out to word segmentation result, if the corresponding semanteme of the first participle is multiple, it is determined that the first participle is same
Sound word.
In some embodiments, the clarification dialogue includes the corresponding mark of the multiple semantic and each semanteme,
The routine of the mark is semantic and the semantic corresponding weight of routine is stored in advance in the model for carrying out speech recognition;
The user is correct semantic corresponding mark for the reply of the clarification dialogue;
The semantic understanding unit 302 determines the correct semanteme of the homonym according to replying, comprising:
Correct semanteme corresponding to the mark that user is replied stores in a model, and it is corresponding that the correct semanteme is arranged
Weight is greater than the conventional semantic corresponding weight of the mark;
The corresponding all semantemes of the mark are ranked up by weight is descending, sequence is determined near preceding semanteme
For the correct semanteme of the homonym.
In some embodiments, the Session Control Unit 301 is also used to:
When detection meets session termination condition, terminate the session;
Correct semanteme corresponding to the mark is deleted from the model.
In some embodiments, the semantic understanding unit 302 is also used to:
The correct semantic completion of the homonym is tied into the recognition result to the voice messaging, and to the identification
Fruit carries out semantic understanding.
In some embodiments, the semantic understanding unit 302 is also used to:
According to semantic understanding as a result, search for corresponding reply content, and show the reply content.
In some embodiments, the characteristic information of the user includes: the account information of user or the vocal print letter of user
Breath.
Fourth aspect, the embodiment of the invention provides a kind of computer equipment, the computer equipment includes memory, place
It manages device and is stored in the computer program that can be run on the memory and on the processor, described in the processor execution
Step described in the embodiment of the method for first aspect is realized when computer program.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between
In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be
The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note
Recall body (RandomAccess Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously
Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention
Protect range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (10)
1. a kind of audio recognition method characterized by comprising
After detecting user's input voice information, the meeting with the user is generated by session manager according to the first information
Words;Wherein the first information is the characteristic information detected for characterizing the user or/and preset period;
In the session, during the voice messaging inputted to user carries out speech recognition, however, it is determined that the voice letter
Homonym comprising corresponding multiple semantic results in breath then initiates to talk with for the clarification of the homonym, the clarification dialogue
For confirming the corresponding correct semanteme of the homonym to user;
It is true according to the context in the reply and the session after detecting user for the reply of the clarification dialogue
The correct semanteme of the fixed homonym.
2. the method according to claim 1, wherein the determining voice messaging includes corresponding multiple semantic results
Homonym, comprising:
Voice messaging is identified, corresponding multiple syllables are obtained;
Participle operation is carried out to obtained multiple syllables, obtains word segmentation result;
Semantic understanding is carried out to word segmentation result, if the corresponding semanteme of the first participle is multiple, it is determined that the first participle is homonym.
3. the method according to claim 1, wherein clarification dialogue includes the multiple semantic and each
Semantic corresponding mark, the routine of the mark is semantic and the semantic corresponding weight of routine is stored in advance in carry out speech recognition
Model in;
The user is correct semantic corresponding mark for the reply of the clarification dialogue;
The correct semanteme that the homonym is determined according to reply, comprising:
In a model, and the corresponding weight of the correct semanteme is arranged in correct semantic storage corresponding to the mark that user is replied
Greater than the conventional semantic corresponding weight of the mark;
The corresponding all semantemes of the mark are ranked up by weight is descending, sequence is determined as institute near preceding semanteme
State the correct semanteme of homonym.
4. according to the method described in claim 3, it is characterized in that, the method also includes:
When detection meets session termination condition, terminate the session;
Correct semanteme corresponding to the mark is deleted from the model.
5. the method according to claim 1, wherein the method also includes:
By the correct semantic completion of the homonym into the recognition result to the voice messaging, and to the recognition result into
Row semantic understanding.
6. according to the method described in claim 5, it is characterized in that, the method also includes:
According to semantic understanding as a result, search for corresponding reply content, and show the reply content.
7. -6 any method according to claim 1, which is characterized in that the characteristic information of the user includes: user's
Account information or the voiceprint of user.
8. a kind of speech recognition equipment characterized by comprising
Session Control Unit, for passing through session manager life according to the first information after detecting user's input voice information
At the session with the user;Wherein the first information is to detect for characterizing the characteristic information of the user, or pre-
If period;
Semantic understanding unit is used in the session, during carrying out speech recognition to the voice messaging, however, it is determined that
Voice messaging includes the homonym of corresponding multiple semantic results, then initiates to talk with for the clarification of the homonym, the clarification
Talk with the corresponding correct semanteme for confirming the homonym to user;
The semantic understanding unit is also used to after detecting user for the reply of the clarification dialogue, according to the reply
And the context in the session determines the correct semanteme of the homonym.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is described
When computer-readable instruction is executed by the processor, so that the processor executes such as any one of claims 1 to 7 right
It is required that the step of described audio recognition method.
10. a kind of storage medium for being stored with computer-readable instruction, the computer-readable instruction is handled by one or more
When device executes, so that one or more processors execute the speech recognition as described in any one of claims 1 to 7 claim
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811534858.0A CN109545203A (en) | 2018-12-14 | 2018-12-14 | Audio recognition method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811534858.0A CN109545203A (en) | 2018-12-14 | 2018-12-14 | Audio recognition method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109545203A true CN109545203A (en) | 2019-03-29 |
Family
ID=65856408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811534858.0A Pending CN109545203A (en) | 2018-12-14 | 2018-12-14 | Audio recognition method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109545203A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334201A (en) * | 2019-07-18 | 2019-10-15 | 中国工商银行股份有限公司 | A kind of intension recognizing method, apparatus and system |
CN111046154A (en) * | 2019-11-20 | 2020-04-21 | 泰康保险集团股份有限公司 | Information retrieval method, information retrieval device, information retrieval medium and electronic equipment |
CN113421561A (en) * | 2021-06-03 | 2021-09-21 | 广州小鹏汽车科技有限公司 | Voice control method, voice control device, server and storage medium |
CN113593566A (en) * | 2021-06-08 | 2021-11-02 | 深圳双猴科技有限公司 | Voice recognition processing method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055163A1 (en) * | 2007-08-20 | 2009-02-26 | Sandeep Jindal | Dynamic Mixed-Initiative Dialog Generation in Speech Recognition |
CN103645876A (en) * | 2013-12-06 | 2014-03-19 | 百度在线网络技术(北京)有限公司 | Voice inputting method and device |
CN105810200A (en) * | 2016-02-04 | 2016-07-27 | 深圳前海勇艺达机器人有限公司 | Man-machine dialogue apparatus and method based on voiceprint identification |
CN106782547A (en) * | 2015-11-23 | 2017-05-31 | 芋头科技(杭州)有限公司 | A kind of robot semantics recognition system based on speech recognition |
CN107221331A (en) * | 2017-06-05 | 2017-09-29 | 深圳市讯联智付网络有限公司 | A kind of personal identification method and equipment based on vocal print |
CN108021554A (en) * | 2017-11-14 | 2018-05-11 | 无锡小天鹅股份有限公司 | Audio recognition method, device and washing machine |
CN108510355A (en) * | 2018-03-12 | 2018-09-07 | 拉扎斯网络科技(上海)有限公司 | The implementation method and relevant apparatus that interactive voice is made a reservation |
CN108920497A (en) * | 2018-05-23 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of man-machine interaction method and device |
CN108986825A (en) * | 2018-07-02 | 2018-12-11 | 北京百度网讯科技有限公司 | Context acquisition methods and equipment based on interactive voice |
-
2018
- 2018-12-14 CN CN201811534858.0A patent/CN109545203A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055163A1 (en) * | 2007-08-20 | 2009-02-26 | Sandeep Jindal | Dynamic Mixed-Initiative Dialog Generation in Speech Recognition |
CN103645876A (en) * | 2013-12-06 | 2014-03-19 | 百度在线网络技术(北京)有限公司 | Voice inputting method and device |
CN106782547A (en) * | 2015-11-23 | 2017-05-31 | 芋头科技(杭州)有限公司 | A kind of robot semantics recognition system based on speech recognition |
CN105810200A (en) * | 2016-02-04 | 2016-07-27 | 深圳前海勇艺达机器人有限公司 | Man-machine dialogue apparatus and method based on voiceprint identification |
CN107221331A (en) * | 2017-06-05 | 2017-09-29 | 深圳市讯联智付网络有限公司 | A kind of personal identification method and equipment based on vocal print |
CN108021554A (en) * | 2017-11-14 | 2018-05-11 | 无锡小天鹅股份有限公司 | Audio recognition method, device and washing machine |
CN108510355A (en) * | 2018-03-12 | 2018-09-07 | 拉扎斯网络科技(上海)有限公司 | The implementation method and relevant apparatus that interactive voice is made a reservation |
CN108920497A (en) * | 2018-05-23 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of man-machine interaction method and device |
CN108986825A (en) * | 2018-07-02 | 2018-12-11 | 北京百度网讯科技有限公司 | Context acquisition methods and equipment based on interactive voice |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334201A (en) * | 2019-07-18 | 2019-10-15 | 中国工商银行股份有限公司 | A kind of intension recognizing method, apparatus and system |
CN111046154A (en) * | 2019-11-20 | 2020-04-21 | 泰康保险集团股份有限公司 | Information retrieval method, information retrieval device, information retrieval medium and electronic equipment |
CN113421561A (en) * | 2021-06-03 | 2021-09-21 | 广州小鹏汽车科技有限公司 | Voice control method, voice control device, server and storage medium |
CN113421561B (en) * | 2021-06-03 | 2024-01-09 | 广州小鹏汽车科技有限公司 | Voice control method, voice control device, server, and storage medium |
CN113593566A (en) * | 2021-06-08 | 2021-11-02 | 深圳双猴科技有限公司 | Voice recognition processing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109545203A (en) | Audio recognition method, device, equipment and storage medium | |
CN103853703B (en) | A kind of information processing method and electronic equipment | |
CN107623614A (en) | Method and apparatus for pushed information | |
CN110047481B (en) | Method and apparatus for speech recognition | |
CN108447471A (en) | Audio recognition method and speech recognition equipment | |
CN105096941A (en) | Voice recognition method and device | |
CN104185868A (en) | Voice authentication and speech recognition system and method | |
CN105744090A (en) | Voice information processing method and device | |
CN105988581A (en) | Voice input method and apparatus | |
CN111261151A (en) | Voice processing method and device, electronic equipment and storage medium | |
CN109360565A (en) | A method of precision of identifying speech is improved by establishing resources bank | |
CN109979474A (en) | Speech ciphering equipment and its user speed modification method, device and storage medium | |
CN109120789A (en) | Message prompt method, device, terminal and storage medium | |
CN111583931A (en) | Service data processing method and device | |
CN110493123A (en) | Instant communication method, device, equipment and storage medium | |
CN114155853A (en) | Rejection method, device, equipment and storage medium | |
CN105206273B (en) | Voice transfer control method and system | |
CN105227557A (en) | A kind of account number processing method and device | |
CN111178081A (en) | Semantic recognition method, server, electronic device and computer storage medium | |
CN107886940B (en) | Voice translation processing method and device | |
KR20090076318A (en) | Realtime conversational service system and method thereof | |
CN109147792A (en) | A kind of voice resume system | |
CN108597499A (en) | Method of speech processing and voice processing apparatus | |
CN109725798B (en) | Intelligent role switching method and related device | |
CN108717851A (en) | A kind of audio recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |