CN109920430A - Speech recognition semantic processing system and its method - Google Patents
Speech recognition semantic processing system and its method Download PDFInfo
- Publication number
- CN109920430A CN109920430A CN201910023125.9A CN201910023125A CN109920430A CN 109920430 A CN109920430 A CN 109920430A CN 201910023125 A CN201910023125 A CN 201910023125A CN 109920430 A CN109920430 A CN 109920430A
- Authority
- CN
- China
- Prior art keywords
- speech recognition
- semantic
- words
- bag
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The present invention provides a speech recognition semantic processing system and semantic processes method.The speech recognition semantic processing system, suitable for a phone robot, including a bag of words module, one semantic vector conversion module, one semantic classes library and a semantic determining module, wherein semantic vector conversion module word according to needed for the speech recognition semantic processing system is by a speech recognition result vectorization, form a speech recognition result vectorization value, wherein the semantic determining module is according to institute's speech recognition result vectorization value, determine institute's speech recognition result semantic classes described in the semantic classification library, form a semantic classes information, the semanteme of speech recognition result to determine, match a response voice.
Description
Technical field
The present invention relates to phone robot fields, in more detail are related to a speech recognition semantic processing system and its side
Method avoids the voice of the semantic understanding of mistake and playback error, mentions for handling the partials of phone robot voice recognition result
The intelligence of high phone robot.
Background technique
Artificial intelligence is the core driver of current new round industry transformation, the economy to the world, social progress
Life with the mankind generates most penetrating influence.In life, artificial intelligence with ubiquitous, such as fingerprint recognition, people
Face identification, intelligent searching engine and speech recognition etc..
Phone robot is also a part of artificial intelligence, is also increasingly paid close attention in recent years by relevant enterprise, especially electric
Words sell relevant enterprise.The staff pressure for being engaged in telemarketing and phone customer service is very big, can not keep working heat for a long time
Feelings also can often meet with severe dialogue, be easy to produce mood swing, the later period or lose job enthusiasm, fall into inefficiency at
This raised vicious circle.For enterprise, recruits and be engaged in the employee of telemarketing and phone customer service and be difficult, separation rate also occupies
It is high not under, while Market competition, business number is inadequate, and client's difficulty is sought, if using artificial screening intention client, time benefit
Low with rate, enterprise's input cost is big, and working efficiency declines with numerous objective factors, influences enterprise marketing achievement.So
It is replaced manually being engaged in telemarketing and phone customer service with telephone set device people, enterprise and the pressure of employee can be mitigated significantly, it can be with
Accomplish online service in 24 hours, and employee's bring is influenced without misgivings severe dialogue.
In the market, phone robots all at present all uses keyword match technology to realize and manages the semanteme of customer voice
Solution.After the voice of client is passed through speech recognition into text, by the voice in keyword match sound bank, and it will match to
Voice play with realize intelligent sound reply.But Chinese speech language extensive knowledge and profound scholarship, not only there are near synonym, it is same to look like
There is different expression ways, there are also unisonance difference word, same pronunciation but represents the different meanings.And keyword match technology
It identifies more single, it is easy to by semantic understanding mistake, and then cannot not have matched correspondingly, inappropriate voice, lead to phone machine
The voice that people plays not is the suitable answer to customer voice, and customer experience sense is poor, and intelligence is not strong.
For example, " interested " and " interesting " two words all indicate that client has intention, using the two words as key
Word matches resulting recording, is further to introduction of product etc. (referred to as recording A).But if what client said is " not feel emerging
Interest " or " not interested ", expression is that client is not intended to further appreciate that product.At this time if phone robot is using crucial
Word matching technique, it is more likely that matching error " will lose interest in " or the expression of " not interested " is matched with recording A, in turn
Playback A is exactly to semantic understanding mistake.
Further, since phonetically similar word and near synonym etc. are homophonic, the probability of mistake occurs for the result that speech recognition technology is identified
Very big, this also has an impact subsequent semantic understanding.Such as " zhao jing li " described in customer voice is possible to be known
" manager Wei not be looked for " to be also possible to that " Zhao Jingli " can be identified as, and keyword match technology is according to " looking for manager " and " Zhao Jingli "
The matched recording of institute is different, that is, semantic understanding can be different.In another example " company " and " shop " is near synonym, " place " and
" place " is near synonym, and what customer voice was said is " place in shop ", thinks that the semantic meaning of expression is identical as " CompanyAddress ",
And if " CompanyAddress " is only set as keyword by keyword match technology, it just can not be " place in shop " this voice
It is fitted on the recording (referred to as recording B) for illustrating specific location.That is, although " place in shop " and " CompanyAddress " are in reality
Expressed semanteme is identical in Chinese, should all be matching recording B in logic, but for keyword match technology, the two
It is different two semantemes, it is possible to match different recording, " place in shop " can not match recording B, this is not intelligence
The embodiment of energy phone robot intelligence.
In conclusion existing phone robot can not be handled the partials of speech recognition result, and the pass used
A possibility that key word matching technique has very big error rate, and semantic understanding is caused to generate deviation is very big.It is therefore desirable to telephone set
Device people improves, and improves reasonability, logicality and the intelligence of phone robot.
Summary of the invention
It is an object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice is known
Other semantic processes system in a speech recognition result obtained by speech recognition phonetically similar word or the partials such as near synonym handle, with
Correct semantic understanding is carried out, homophonic a possibility that misleading semantic understanding is reduced.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
Identification semantic processes system is based on whole context and understands speech recognition result, and corrects to partials therein, to carry out
Correct semantic understanding, the accuracy integrally understood with guarantee and the harmony entirely talked with.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
Identify that semantic processes system understands speech recognition result using bag of words, keyword match skill compared with prior art
Art can take a panoramic view of the situation, and consider whole context.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
Identify that semantic processes system understands speech recognition result using bag of words, keyword match skill compared with prior art
Art, the influence of the problem of having evaded text back to front in Chinese well to speech understanding, provides recognition accuracy.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
Identify that semantic processes system provides a basic bag of words and multiple extension bag of words, wherein the expansion word bag is in the basic bag of words
Word association near synonym or the partials such as homonym so that during semantic vector converts, basic word and associated expansion
It is equivalent to open up term vector chemical conversion, to obtain identical semantic understanding, reduces homophonic a possibility that misleading semantic understanding.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the extension
Bag of words are arranged in the basic bag of words, and basic word and associated expansion word are arranged to "or" relationship, so that language
During adopted vector conversion, basic word and the chemical conversion of associated expansion word vector are equivalent.Also, the extension bag of words account at this time
Small with space, the time used in vectorization is shorter, more efficient.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the extension
Bag of words are the cartesian products of each basic word and associated expansion word in the basic bag of words, so that semantic vector converts
During, partials can also be properly understood, to guarantee the harmony entirely talked with.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
Identify that semantic processes system provides a semantic classes library, the common-use words and profession for storing field used in a phone robot are used
Classify in equal words art, to determine semantic affiliated classification according to semantic vector value, and then determine language for a semantic determining module
Reason and good sense solution matches corresponding response recording.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
It identifies that semantic processes system utilizes Bayes and inverse document frequency, the semanteme of vectorization is further understood, is analyzed and really
It is fixed, reinforce the weight to the difference most significant word of document, so that semantic understanding is more accurate and more harmony.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein extending bag of words
It is associated with used speech recognition technology fault domain, avoid blindly adding expansion word so that it is each basis word and its
Associated each expansion word has more specific aim, improves semantic understanding efficiency and accuracy.
It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice
Identification semantic processes system can be applicable in various speech recognition technologies, the not restriction to speech recognition technology, and can be directed to
Different speech recognition technologies sets corresponding extension bag of words, and the scope of application is wider, more flexible.
In order to realize at least one above purpose, one aspect under this invention, the present invention further provides one to be suitable for
One speech recognition semantic processing system, comprising:
One bag of words module, for word needed for storing the speech recognition semantic processing system;
One semantic vector conversion module, wherein the semantic vector conversion module is according to the speech recognition semantic processing system
Word needed for uniting is by a speech recognition result vectorization;
One semantic classes library, including multiple semantic classes;And
One semantic determining module, wherein vectorization value of the semanteme determining module according to institute's speech recognition result, really
Determine institute's speech recognition result semantic classes described in the semantic classes library, a semantic classes information is formed, to determine
The semanteme of speech recognition result matches a response voice.
According to one embodiment of present invention, the bag of words include that a basic bag of words and at least one extend bag of words,
Described in basis bag of words include multiple basic words, the expansion word includes that the basic bag of words include that a basic word partials are associated
An at least expansion word, wherein the semantic vector conversion module makes expansion word according to the basic bag of words and the extension bag of words
Identical value is turned to basic term vector.
According to one embodiment of present invention, the extension bag of words are accordingly arranged in the basic bag of words, wherein
Basic word and associated expansion word is arranged to or relationship, so that expansion word and basic term vector be made to turn to identical value.
According to one embodiment of present invention, the bag of words module is each basic word and the Qi Guan in the basic bag of words
The cartesian product of the expansion word of connection, so that expansion word and basic term vector be made to turn to identical value.
According to one embodiment of present invention, the extension bag of words are related to the fault domain of institute speech recognition result
Connection.
According to one embodiment of present invention, the semantic determining module utilizes Bayes, according to the speech recognition knot
The vectorization value of fruit determines institute's speech recognition result semantic classes described in the semantic classes library, forms the semanteme
Classification information.
According to one embodiment of present invention, the semantic determining module utilizes Bayes and inverse document frequency, root
According to the vectorization value of institute's speech recognition result, institute's speech recognition result semantic category described in the semantic classes library is determined
Not, the semantic classes information is formed.
According to one embodiment of present invention, the semantic classes in the semantic classes library corresponds to the phone robot and answers
Art is talked about with field and industry.
According to one embodiment of present invention, the speech recognition semantic processing system further comprises a speech recognition mould
Block forms institute's speech recognition result wherein a customer voice is identified as text by the speech recognition module.
According to one embodiment of present invention, the speech recognition semantic processing system further comprises response recording
With module and a response dictation library, wherein the response dictation library includes multiple responses recording, each response recording and right
The semantic classes association answered, wherein the recording matching module of answering is recorded according to the semantic classes information in the response
The corresponding response recording is matched in sound library, forms a response recorded message.
According to one embodiment of present invention, the speech recognition semantic processing system further comprises a playback module,
Wherein the playback module plays the corresponding response according to the response recorded message and records.
Other side under this invention, the present invention further provides a speech recognition semantic processing methods, comprising steps of
(a) word according to needed for the processing speech recognition semantic that a bag of words module stores, one speech recognition result of vectorization, shape
At a speech recognition result vectorization value;With
(b) according to institute's speech recognition result vectorization value, semantic classes belonging to institute's speech recognition result, shape are determined
At a semantic classes information.
According to one embodiment of present invention, the step (a) further comprises step:
(a.1) the basic bag of words of setting one and at least one extension bag of words, wherein the expansion word bag is in the basic bag of words
The homophonic associated expansion word of a basic word, the bag of words module is formed, so that the vectorization result phase of expansion word and basic word
Together.
According to one embodiment of present invention, extension bag of words described in the step (a.1) is accordingly arranged at described
In basic bag of words, basic word and associated expansion word is arranged to or relationship.
According to one embodiment of present invention, bag of words module described in the step (a.1) is in the basic bag of words
The cartesian product of each basis word and associated expansion word.
According to one embodiment of present invention, the step (b) further comprises step:
(b.1) it is determined by Bayes and or inverse document frequency according to institute's speech recognition result vectorization value
Semantic classes belonging to institute's speech recognition result forms the semantic classes information.
According to one embodiment of present invention, before step (a), the speech recognition semantic processing method is further wrapped
It includes step: a customer voice being identified as text, forms institute's speech recognition result.
According to one embodiment of present invention, after step (b), the speech recognition semantic processing method is further wrapped
Include step:
(c) according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging;
According to one embodiment of present invention, the speech recognition semantic processing method further comprises step:
(d) the corresponding response voice is played according to the response voice messaging.
Detailed description of the invention
Fig. 1 is speech recognition semantic processing system application drawing according to an embodiment of the invention.
Fig. 2 is speech recognition semantic processing system structural block diagram according to an embodiment of the invention.
Fig. 3 is semantic classes library and the response record of speech recognition semantic processing system according to an embodiment of the invention
The illustration in sound library.
Fig. 4 is the flow chart of speech recognition semantic processing method according to an embodiment of the invention.
Specific embodiment
It is described below for disclosing the present invention so that those skilled in the art can be realized the present invention.It is excellent in being described below
Embodiment is selected to be only used as illustrating, it may occur to persons skilled in the art that other obvious modifications.It defines in the following description
Basic principle of the invention can be applied to other embodiments, deformation scheme, improvement project, equivalent program and do not carry on the back
Other technologies scheme from the spirit and scope of the present invention.
It will be understood by those skilled in the art that in exposure of the invention, term " longitudinal direction ", " transverse direction ", "upper",
The orientation or position of the instructions such as "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outside"
Relationship is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplification of the description, rather than
The device or element of indication or suggestion meaning must have a particular orientation, be constructed and operated in a specific orientation, therefore above-mentioned
Term is not considered as limiting the invention.
It is understood that term " one " is interpreted as " at least one " or " one or more ", i.e., in one embodiment,
The quantity of one element can be one, and in a further embodiment, the quantity of the element can be it is multiple, term " one " is no
It can be interpreted as the limitation to quantity.
As shown in Figures 1 to 4, a speech recognition semantic processing system according to a preferred embodiment of the present invention and semanteme
Processing method is set forth, in text obtained by speech recognition phonetically similar word or the partials such as near synonym handle, to carry out just
True semantic understanding reduces homophonic a possibility that misleading semantic understanding.It is noted that homophonic in the present invention not only includes same
Sound word and near synonym also refer to used speech recognition into the word for easily identifying error in the technology of text and obscuring, such as
The similar front and back nasal of pronunciation, flat tongue consonant and cacuminal etc..In order to facilitate illustration and description, the present invention is with phonetically similar word and nearly justice
Word is illustrated, and is not limitation.
It is noted that the speech recognition semantic processing system of the invention is preferably used for a phone robot,
So that phone robot more intelligently with client connection.Phone robot puts through customer phone according to customer data, and in phone
Preset opening remarks are played after connection, it is subsequent that art is talked about according to different scenes, intelligently reply.The phone robot can be with intelligence
Energy ground and client connection, can also filter out possible intention client and be classified from a large amount of customer data, thus just
Effectively secondary follow-up is carried out according to data analysis and message registration in sale or contact staff.When the voice of the invention is known
When other semantic processes system is applied to the phone robot, the phone robot can be made to understand customer voice, more intelligence
It can the different scene words art of ground reply.
Specifically, the speech recognition semantic processing system includes a speech recognition module 10, for by the voice of client
It is identified as text.That is, the speech recognition module 10 receives a customer voice, the customer voice is identified as a language
Sound recognition result.Institute's speech recognition result is expressed in the form of text.In the present invention, the speech recognition module 10 is adopted
The technical solution taken is not intended to limit, and those skilled in the art can take its known or independently developed technical solution, by institute
It states customer voice and is converted into text, form institute's speech recognition result.For example, the speech recognition module 10 can be by institute
It states customer voice and resolves to smaller voice unit (VU), by acoustic model and the data model of deep learning, be converted to correspondence
Text.
It is appreciated that Chinese speech language is of extensive knowledge and profound scholarship, not only there are near synonym, the same meaning has different expression
Mode, there are also unisonance difference word, same pronunciation but represents the different meanings, along with the limitation of speech recognition technology,
Institute's speech recognition result that the speech recognition module 10 is identified can all have a possibility that error.That is, institute's predicate
The identified word that sound recognition result is included is likely to not be really to think with the customer voice as the word of expression.For example,
The customer voice is " how much can borrow ", and due to phonetically similar word or the limitation of speech recognition technology, the speech recognition module
The 10 institute's speech recognition results identified are likely to " how much is energy band ".If using the keyword match skill of the prior art
Art, " how much can borrow " and " how much is energy band " belong to different semantemes, also just can not be matched to and should match by " how much is energy band "
The recording (referred to as recording C) that further loan is described in detail arrived.
And in the present invention, the speech recognition semantic processing system further comprises 20 He of a semantic vector conversion module
One bag of words module 30 is used for institute's speech recognition result vectorization.I.e. by establishing bag of words for the speech recognition knot
Fruit vectorization, keyword match technology compared with prior art, can take a panoramic view of the situation, and consider whole context.Meanwhile utilizing bag of words
The problem of model understands to text obtained by speech recognition, can evade text back to front in Chinese well is to voice
The influence of understanding, provides recognition accuracy.
The bag of words module 30 is used as dictionary, stores a phone robot by application field words art related to industry
Word, that is, the word for forming the common-use words in the field and industry, dialect etc..For example, the relevant company of financial credit, utilizes
The various loan projects of the artificial client of phone machine.The common-use words of financial credit have " I how much can borrow ", accordingly, institute's predicate
Bag module 30 is used as dictionary, it may be possible to { I how much can borrow }.Certainly, it is merely illustrative and simplifies herein, an industry and neck
The related words art in domain may have several hundred, accordingly, all institutes of the bag of words 30 comprising art if these several hundred
Need word, the word for being included also can substantial amounts, for instance it can be possible that { I think loan can how many projects banks ... }.
The semantic vector conversion module 20 is according to the bag of words module 30, by institute's speech recognition result vectorization.?
In one embodiment of the invention, in the bag of words module 30, the text in institute's speech recognition result occurs once, then institute
Semantic vector conversion module 20 is stated in corresponding position label 1, if there is twice then the semantic vector conversion module 20 right
Answer position mark 2.That is, what institute's speech recognition result vectorization value indicated is which word of the bag of words module 30 goes out
Existing number.Such as the bag of words module 30 is { I how much can borrow }, the customer voice is " how much can borrow ", and the voice
Recognition result identification is correctly " how much can borrow ", and the result after institute's speech recognition result vectorization at this time is { 0,1,1,1,1 },
At this time regardless of institute's speech recognition result whether back to front, the result after vectorization is all { 0,1,1,1,1 }, also just good
Influence of the problem of text back to front in Chinese to speech understanding has been evaded on ground, provides recognition accuracy.
But in practical identification process, the speech recognition module 10 is easy to for " how much can borrow " to be identified as that " energy band is more
It is few ".And become { 0,1,0,1,1 } after " how much is energy band " vectorization, and " how much can borrow " should vector value { 0,1,1,1,1 }
Not identical, this is resulted in during subsequent match, it is likely that can not be matched to should matched recording C, so that semanteme occur
Misinterpretation.That is, although existing bag of words improve recognition accuracy to a certain extent, at partials
Reason still has shortcoming.
In the present invention, the bag of words module 30 further comprises that a basic bag of words 31 and at least one extend bag of words 32,
Described in extension bag of words 32 be a basic homophonic associated expansion word of word in the basic bag of words 31.For example, for the base
Basic word " loan ", associated expansion word have " band " and " borrowing " in plinth bag of words 31, then basic word " loan " in the basic bag of words 31
The extension bag of words be { band is borrowed }.It is therefore to be understood that the possibility of the basis bag of words 31 has multiple basic words to have
Associated expansion word then may accordingly have multiple associated extension bag of words 32.
It is worth one, the speech recognition technology that the extension bag of words 32 and the speech recognition module 10 use malfunctions
Range is associated.It is, the covering of the speech recognition module 10 used sample is less, it is in training process it finds that described
10 fault domain of speech recognition module may be just larger, such as in addition to " band ", " loan " is it is also possible to be misidentified as " bag ", " to "
Deng.The extension bag of words 32 { band by means of bag to } of basic word " loans " at this time.Or the used sample of the speech recognition module 10
The case where covering is more, and recognition effect is good, and fault domain is lower, such as basic word " loan " is identified as " band " are most, other are seldom
It can make mistakes, then the extension bag of words 32 of basic word " loan " can include near synonym and " band ", be { band is borrowed }.
The extension bag of words 32 are associated with the speech recognition technology fault domain that the speech recognition module 10 uses can
To avoid expansion word is blindly added, so that each basis word and associated each expansion word have more specific aim, improve semantic
Understand efficiency and accuracy, more spaces can also be saved.That is, the speech recognition module 10 can be applicable in it is various
Speech recognition technology, the not restriction to speech recognition technology, and different speech recognition technologies, the bag of words mould can be directed to
Block 30 is set corresponding extension bag of words, and the scope of application is wider, more flexible.
The semantic vector conversion module 20 according to the basic bag of words 31 and the extension bag of words 32 so that basic word with
Associated expansion word vector chemical conversion is equivalent, to obtain identical semantic understanding, reduces the homophonic possibility for misleading semantic understanding
Property.That is, whether the speech recognition module 10 is to be identified as when the customer voice is " how much can borrow "
" how much can borrow " still " how much is energy band ", the semantic vector conversion module 20 can vector turn to identical value, be interpreted as
Identical semanteme, so that recording C can be matched to, so that phone robot is more intelligent.
In the preferred embodiment of the invention, the extension bag of words 32 are accordingly arranged at the basic bag of words 31
Interior, basic word and associated expansion word are arranged to "or" relationship, so that during semantic vector converts, basic word
It is melted into associated expansion word vector equivalent.For example, the bag of words module 30 is { my energy { borrow or and borrow or band } is how many }.
At this point, the extension bag of words { band is borrowed } are arranged in the basic bag of words 31, wherein basic word " loan " and associated extension
Word " band ", " borrowing " they are the relationship of "or" between each other, that is to say, that as long as occurring in { borrow or and borrow or band } one, corresponding vector
Just it is 1, to obtain identical vectorization value, carries out identical semantic understanding.
Specifically, when the customer voice is " how much can borrow ", if the speech recognition module 10 is to be known
Not Wei " how much can borrow ", then the semantic vector conversion module 20 is according to the basic bag of words 31 and the extension bag of words 32, to
Quantized value is { 0,1,1,1,1 }.If the speech recognition module 10 is to be identified as " energy band is how much ", it is described it is semantic to
Conversion module 20 is measured according to the basic bag of words 31 and the extension bag of words 32, vectorization value is { 0,1,1,1,1 }.
When the customer voice is " can borrow how many ", the speech recognition module 10 is to be identified as " to borrow more
It is few ", and the semantic vector conversion module 20 is also according to the basic bag of words 31 and the extension bag of words 32, vectorization value
The vectorization value of { 0,1,1,1,1 } and above-mentioned two institute's speech recognition result is identical, is interpreted as the identical meaning.
That is, the near synonym being identified as the speech recognition module 10 and phonetically similar word etc. are homophonic, it is of the invention
The speech recognition semantic processing system can be corrected as correct semantic understanding, avoid homophonic to semantic influence.Also, this
In the preferable implementation of invention, the extension bag of words 32 occupy little space, and the time used in vectorization is shorter, more efficient.
In another embodiment of the invention, the basic bag of words 31 and the extension bag of words 32 are in the basic bag of words
Each basic word and associated expansion word cartesian product so that during semantic vector converts, it is homophonic can also be with
It is properly understood, to guarantee the harmony entirely talked with.For example, the expansion word of basic word " company " have " shop ",
" shop ", " shop ", " hotel owner ", " trade company ", " business office " and " mall ", to form array { company store shop shop hotel owner
Business office, trade company mall }.And the expansion word of basic word " place " has " place " and " address ", to form array { location address
Side }.Then the basic bag of words 31 and the extension bag of words 32 of " CompanyAddress " are the cartesian products of both of the aforesaid array, wherein
{ CompanyAddress } is the basic bag of words 31, and remaining is the extension bag of words 32.
In the present embodiment, basic bag of words 31 are expanded using cartesian product, the extension bag of words 32 is generated, so that various
Homophonic situation can be quantified as identical value, and then be understood to the identical meaning, effectively improve classification accuracy.Into
One step, in the present embodiment, the speech recognition technology error model of the extension bag of words 32 and the speech recognition module 10 use
It encloses to be associated and unnecessary array combination can be effectively avoided, reduce the bag of words module 30, to improve semantic understanding effect
Rate.
Further, the speech recognition semantic processing system of the invention includes a semantic determining module 40 and a semantic category
Other library 50.The semanteme determining module 40 determines the customer voice in institute according to the vectorization value of institute's speech recognition result
Affiliated semantic classes in predicate justice class library 50, that is, determine the semanteme of the customer voice.
The common-use words and profession that the semantic classes library 50 is used to store field used in a phone robot are for waiting words art
Classification, i.e., the common difference in field used are semantic.In other words, the semantic classes library 50 includes multiple semantic classes 51.Often
A semantic classes 51 is semantic different between each other, accordingly matches different response voices.Such as in financial credit field,
The semantic classes library 50 may storage semantic classes have " lack interest more than a year? with half a year how much interest? a month how many benefit
Breath? ", " how low interest is? ", " vagrant, without work, not borrow money, credit bad " and " which qualification needed "
Etc. semantic classes.It is appreciated that the semantic classes library 50 is likely to for different users, different fields
Difference can be pointedly arranged and be stored corresponding content.
The semanteme determining module 40 determines belonging to the customer voice according to the vectorization value of institute's speech recognition result
Semantic classes also determines that the semanteme of the customer voice.In the preferred embodiment of the invention, known based on the voice
Onrelevant the it is assumed that semantic modules 40 analyze which institute institute's speech recognition result belongs between each word of other result
The maximum probability of semantic classes 51 is stated, that is, can determine the affiliated semantic classes of the speech recognition result.
For example, the bag of words module 30 is { I can { borrow or borrow or band } how many }, the customer voice is " how much can borrow ",
The speech recognition module 10 is to be identified as " how much is energy band ", then the semantic vector conversion module 20 is according to the basis
Bag of words 31 and the extension bag of words 32, vectorization value are { 0,1,1,1,1 }.The semanteme determining module 40 according to vectorization value is
{ 0,1,1,1,1 }, onrelevant it is assumed that calculating which institute analysis belongs between each word based on institute's speech recognition result
State the maximum probability of semantic classes 51.For example, being { 0,1,1,1,1 } according to vectorization value, the semanteme determining module 40 is determined
The possibility that institute's speech recognition result belongs to " how much can borrow " this semantic classes 51 is maximum.
Preferably, the semantic determining module 40 utilizes Bayes, calculates the vectorization value category of institute's speech recognition result
In the probability of each classification, so that it is semantic as determining to be maximized the corresponding semantic classes 51.Compared with prior art
The single matching of keyword, the accuracy rate of semantic understanding can be improved using Bayesian analysis.Preferably, the semantic determining module
40 using Bayes and and inverse document frequency, the semanteme of vectorization is further understood, analyzed and determined, reinforce to difference
The weight of the most significant word of document, so that semantic understanding is more accurate and more harmony.
The basic principle and content of Bayes and inverse document frequency that those skilled in the art should be recognized that, herein no longer
It repeats.The Bayes herein and inverse document frequency that those skilled in the art should be recognized that are merely illustrative, and are not limitation,
Those skilled in the art can determine the affiliated semantic classes of institute's speech recognition result using other method for calculating probability.
Further, the speech recognition semantic processing system includes response recording matching module 70 and a response dictation library
60, the semantic classes 51 for being determined according to the semantic determining module 40 matches properly in the response dictation library 60
Corresponding response recording 61.
Specifically, the response dictation library 60 includes multiple responses recording 61.The response recording 61 is to prerecord work
For the recording responded and played to the customer voice.Each response recording 61 and the corresponding semantic classes 51 are closed
Connection.Such as in one embodiment of this invention, the response recording 61 and the corresponding semantic classes 51 pass through association identification
Symbol association, such as the associated identifiers are implemented as recording serial number, the response recording 61 and the corresponding semantic classes
51 are equipped with identical recording serial number.That is, the response recording 61 and the semantic classes 51 are one-one relationship, each
The semantic classes 51 is equipped with the corresponding response recording 61 and is used as answer.For example, the semantic classes 51 is " benefit
How low cease ", with response recording 61 " because what we docks is bank's inside channel, assuring mode is you into part,
So what bank gave is all minimum preferential policy " it is associated, the two is associated by being identically numbered " 113 ", as shown in Figure 3.
The response recording matching module 70 passes through associated identifiers, Ji Ke according to the determining semantic classes 51
The response dictation library 60 matches suitable corresponding response recording 61, forms a response recorded message.For example, working as institute's predicate
Adopted determining module 40 determines that the affiliated semantic classes 51 of institute's speech recognition result is " how much can borrow ", then according to associated identifiers
" 124 ", the corresponding response recording 61 can be matched in the response dictation library 60, and " how much a according to you this borrows
It is fixed that human feelings condition is come, everyone situation is different ", and the corresponding response recorded message is formed, as shown in Figure 3.
The response recorded message may include but not limit storage address, content and the number etc. of the response recording 61
Deng.The response recorded message is sent to a playback module 80 by the response recording matching module 70.The playback module 80
The corresponding response recording 61 is played according to the response recorded message.So far, the customer voice can be obtained response, and
And in the speech recognition semantic processing system of the invention, the partials of institute's recognition result of the customer voice are processed,
So that the customer voice is more accurately understood, so that the more specific aim of response recording 61 obtained, also allows for
Phone robot is more intelligent, more tends to hommization.
According to another aspect of the present invention, the present invention further provides a speech recognition semantic processing methods, to realize this
The purpose and advantage of invention.The speech recognition semantic processing method can be used for above-mentioned speech recognition semantic processing system.
As shown in figure 4, being the flow chart of the speech recognition semantic processing method of the invention.
Step 110: receiving a customer voice.
After phone robotics dialup is with connecting successfully, it will record described in client (connecting the other side of phone) as institute
State customer voice.Partial phone robot can first play opening remarks recording after closing of the circuit, for example introduce identity and main
Business etc..
Step 120: the customer voice being identified as text, forms a speech recognition result.
Specifically, speech recognition technology used by speech recognition does not limit herein, and those skilled in the art can adopt
The customer voice is identified as text with well known or its independently developed technology.It is noted that the speech recognition
As a result it is expressed with written form.
It is appreciated that due to the limitation of speech recognition technology and the complexity of Chinese vocabulary, institute's speech recognition result
Probably be not it is complete consistent with the content of the customer voice, a possibility that there are partials, is very big.Partials herein are
Speech recognition used by referring to is at the word for easily identifying error in the technology of text and obscuring, such as near synonym, phonetically similar word, hair
Sound error-prone front and back nasal sound and flat tongue consonant and cacuminal etc..
Step 130: according to a bag of words module, vectorization institute speech recognition result forms a speech recognition result vector
Change value.
Specifically, this step can use bag of words by institute's speech recognition result vectorization, so that the voice is known
What other result vector value indicated is the number of which word appearance of the bag of words module.Those skilled in the art will be appreciated that bag of words
The basic conception and content of model, details are not described herein again.That is, institute's speech recognition result vectorization value is without the institute that worries
Speech recognition result whether back to front, the problem of also just having evaded text back to front in Chinese well, is to speech understanding
Influence, recognition accuracy is provided.
Further, the step 130 can with comprising steps of the basic bag of words of setting one and at least one extend bag of words, wherein
The extension bag of words are the basic homophonic associated expansion words of word in the basic bag of words, form the bag of words module, so that
Expansion word is identical with the basic vectorization result of word.
As described above, the problem of institute's speech recognition result is in addition to text back to front, there is also homophonic problem, from
And influence whole semantic understanding.What the extension bag of words included is the homophonic word of corresponding basic word.In this way, in vectorization institute predicate
During sound recognition result, expansion word is identical with the basic vectorization result of word, has reached identical semantic understanding.
In one embodiment of the invention, the extension bag of words are accordingly arranged in the basic bag of words, basis
Word and associated expansion word are arranged to "or" relationship.In another embodiment of the invention, the bag of words module is described
The cartesian product of each basic word and associated expansion word in basic bag of words.Both methods can make expansion word and
The vectorization result of basic word is identical, to avoid the homophonic misleading to speech understanding.
Step 140: according to institute's speech recognition result vectorization value, determining semantic category belonging to institute's speech recognition result
Not, a semantic classes information is formed.
Specifically, there is the words art such as its specific common-use words and profession use in each industry and field.By these words arts according to
Semantic classification in advance is in different semantic classes.According to institute's speech recognition result vectorization value, it can use probability calculation and go out
Institute's speech recognition result belongs to the maximum probability of which semantic classes, so that it is determined that language belonging to institute's speech recognition result
Adopted classification.The semantic classes information may include but not limit, the affiliated semantic classes number of speech recognition result, storage ground
Location, content, associated response recording number etc..
Preferably, the step 140 further comprises step: by Bayes and inverse document frequency, according to described
Speech recognition result vectorization value determines semantic classes belonging to institute's speech recognition result, forms the semantic classes information.
Step 150: according to the semantic classes information, matching corresponding response voice, form a response voice messaging.
Different semantic classes is corresponding different responses.Corresponding response voice is prerecorded, and by it and is corresponded to
Semantic classes association, such as by being identically numbered etc., be not intended to limit herein.In this way, according to the semantic classes information
Corresponding response voice can be searched and be matched, the response voice messaging is formed.The response voice messaging may include but
It is not limited to storage address, content and the number etc. of the response recording.
Step 160: the corresponding response voice is played according to the response voice messaging.
That is, the answer of the response voice i.e. the customer voice, phone robot realizes intelligence and answers
It answers.Traditional keyword match technology has not only been abandoned in the speech recognition semantic processing method of the invention, it is also contemplated that
Homophonic problem in identification, improves semantic understanding accuracy rate in the way of Bayes etc., so that phone robot is more humanized,
Whole dialogue is more coordinated.
It should be understood by those skilled in the art that foregoing description and the embodiment of the present invention shown in the drawings are only used as illustrating
And it is not intended to limit the present invention.The purpose of the present invention has been fully and effectively achieved.Function and structural principle of the invention exists
It shows and illustrates in embodiment, under without departing from the principle, embodiments of the present invention can have any deformation or modification.
Claims (19)
1. a speech recognition semantic processing system characterized by comprising
One bag of words module, for word needed for storing the speech recognition semantic processing system;
One semantic vector conversion module, wherein the semantic vector conversion module is according to the speech recognition semantic processing system institute
It needs word by a speech recognition result vectorization, forms a speech recognition result vectorization value;
One semantic classes library, including multiple semantic classes;And
One semantic determining module, wherein described in the semanteme determining module according to institute's speech recognition result vectorization value, determines
The speech recognition result semantic classes affiliated in the semantic classes library, forms a semantic classes information, to determine
The semanteme of speech recognition result matches a response voice.
2. speech recognition semantic processing system according to claim 1, wherein the bag of words module includes a basic bag of words
At least one extension bag of words, wherein the basis bag of words include multiple basic words, the expansion word includes that a basic word is humorous
The associated at least expansion word of sound, wherein the semantic vector conversion module is according to the basic bag of words and the extension bag of words,
Expansion word and basic term vector is set to turn to identical value.
3. speech recognition semantic processing system according to claim 2, wherein the expansion word bag is accordingly arranged at
In the basis bag of words, wherein basic word and associated expansion word is arranged to or relationship, to make expansion word and basic word
Vector turns to identical value.
4. speech recognition semantic processing system according to claim 2, wherein the bag of words module is the basic bag of words
In each basic word and associated expansion word cartesian product, so that expansion word and basic term vector be made to turn to identical value.
5. speech recognition semantic processing system according to claim 2, wherein the expansion word bag and the speech recognition
As a result fault domain is associated.
6. speech recognition semantic processing system according to any one of claims 1 to 5, wherein the semanteme determining module utilizes
Bayes determines institute's speech recognition result in the semantic classes library according to the vectorization value of institute's speech recognition result
The affiliated semantic classes forms the semantic classes information.
7. speech recognition semantic processing system according to any one of claims 1 to 5, wherein the semanteme determining module utilizes
Bayes and inverse document frequency determine that institute's speech recognition result exists according to the vectorization value of institute's speech recognition result
The semantic classes belonging in the semantic classes library, forms the semantic classes information.
8. speech recognition semantic processing system according to any one of claims 1 to 5, wherein the semanteme in the semantic classes library
Classification corresponds to phone robot application field and industry talks about art.
9. speech recognition semantic processing system according to any one of claims 1 to 5 further comprises a speech recognition mould
Block forms institute's speech recognition result wherein a customer voice is identified as text by the speech recognition module.
10. speech recognition semantic processing system according to any one of claims 1 to 5 further comprises response recording
With module and a response dictation library, wherein the response dictation library includes multiple responses recording, each response recording and right
The semantic classes association answered, wherein the recording matching module of answering is recorded according to the semantic classes information in the response
The corresponding response recording is matched in sound library, forms a response recorded message.
11. speech recognition semantic processing system according to claim 10 further comprises a playback module, wherein described
Playback module plays the corresponding response according to the response recorded message and records.
12. a speech recognition semantic processing method, which is characterized in that comprising steps of
(a) word according to needed for the processing speech recognition semantic that a bag of words module stores, one speech recognition result of vectorization form one
Speech recognition result vectorization value;With
(b) it according to institute's speech recognition result vectorization value, determines semantic classes belonging to institute's speech recognition result, forms one
Semantic classes information.
13. speech recognition semantic processing method according to claim 12, wherein the step (a) further comprises step
It is rapid:
(a.1) the basic bag of words of setting one and at least one extension bag of words, wherein the expansion word bag is one in the basic bag of words
The homophonic associated expansion word of basic word, forms the bag of words module, so that expansion word is identical with the basic vectorization result of word.
14. speech recognition semantic processing method according to claim 13, wherein expansion word described in the step (a.1)
Bag is accordingly arranged in the basic bag of words, basic word and associated expansion word is arranged to or relationship.
15. speech recognition semantic processing method according to claim 13, wherein bag of words mould described in the step (a.1)
Block is the cartesian product of each basic word and associated expansion word in the basic bag of words.
16. speech recognition semantic processing method according to claim 12, wherein the step (b) further comprises step
It is rapid:
(b.1) by Bayes and or inverse document frequency, according to institute's speech recognition result vectorization value, determine described in
Semantic classes belonging to speech recognition result forms the semantic classes information.
17. 2 to 16 any speech recognition semantic processing method according to claim 1, before step (a), institute's predicate
Sound identifies that semantic processes method further comprises step:
One customer voice is identified as text, forms institute's speech recognition result.
18. 2 to 16 any speech recognition semantic processing method according to claim 1, after step (b), institute's predicate
Sound identifies that semantic processes method further comprises step:
(c) according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging.
19. speech recognition semantic processing method according to claim 18, further comprises step:
(d) the corresponding response voice is played according to the response voice messaging.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910023125.9A CN109920430A (en) | 2019-01-10 | 2019-01-10 | Speech recognition semantic processing system and its method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910023125.9A CN109920430A (en) | 2019-01-10 | 2019-01-10 | Speech recognition semantic processing system and its method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109920430A true CN109920430A (en) | 2019-06-21 |
Family
ID=66960239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910023125.9A Pending CN109920430A (en) | 2019-01-10 | 2019-01-10 | Speech recognition semantic processing system and its method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109920430A (en) |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1127898A (en) * | 1995-01-26 | 1996-07-31 | 李琳山 | Intelligent common spoken Chinese phonetic input method and dictation machine |
JPH1139313A (en) * | 1997-07-24 | 1999-02-12 | Nippon Telegr & Teleph Corp <Ntt> | Automatic document classification system, document classification oriented knowledge base creating method and record medium recording its program |
JP2005250071A (en) * | 2004-03-03 | 2005-09-15 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for speech recognition, speech recognition program, and storage medium with speech recognition program stored therein |
US20080221878A1 (en) * | 2007-03-08 | 2008-09-11 | Nec Laboratories America, Inc. | Fast semantic extraction using a neural network architecture |
CN101593518A (en) * | 2008-05-28 | 2009-12-02 | 中国科学院自动化研究所 | The balance method of actual scene language material and finite state network language material |
JP2010197859A (en) * | 2009-02-26 | 2010-09-09 | Gifu Univ | Utterance difference speech recognition system |
CN102831892A (en) * | 2012-09-07 | 2012-12-19 | 深圳市信利康电子有限公司 | Toy control method and system based on internet voice interaction |
US20130035931A1 (en) * | 2011-08-04 | 2013-02-07 | International Business Machines Corporation | Predicting lexical answer types in open domain question and answering (qa) systems |
CN103562919A (en) * | 2011-06-02 | 2014-02-05 | 浦项工科大学校产学协力团 | Method for searching for information using the web and method for voice conversation using same |
CN104424290A (en) * | 2013-09-02 | 2015-03-18 | 佳能株式会社 | Voice based question-answering system and method for interactive voice system |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
CN105446146A (en) * | 2015-11-19 | 2016-03-30 | 深圳创想未来机器人有限公司 | Intelligent terminal control method based on semantic analysis, system and intelligent terminal |
CN106294396A (en) * | 2015-05-20 | 2017-01-04 | 北京大学 | Keyword expansion method and keyword expansion system |
CN106469554A (en) * | 2015-08-21 | 2017-03-01 | 科大讯飞股份有限公司 | A kind of adaptive recognition methodss and system |
CN107424611A (en) * | 2017-07-07 | 2017-12-01 | 歌尔科技有限公司 | Voice interactive method and device |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
CN107862000A (en) * | 2017-10-22 | 2018-03-30 | 北京市农林科学院 | A kind of agricultural technology seeks advice from interactive method |
CN108335692A (en) * | 2018-03-21 | 2018-07-27 | 上海木爷机器人技术有限公司 | A kind of method for switching languages, server and system |
CN108595706A (en) * | 2018-05-10 | 2018-09-28 | 中国科学院信息工程研究所 | A kind of document semantic representation method, file classification method and device based on theme part of speech similitude |
CN108595696A (en) * | 2018-05-09 | 2018-09-28 | 长沙学院 | A kind of human-computer interaction intelligent answering method and system based on cloud platform |
CN108922534A (en) * | 2018-07-04 | 2018-11-30 | 北京小米移动软件有限公司 | control method, device, equipment and storage medium |
-
2019
- 2019-01-10 CN CN201910023125.9A patent/CN109920430A/en active Pending
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1127898A (en) * | 1995-01-26 | 1996-07-31 | 李琳山 | Intelligent common spoken Chinese phonetic input method and dictation machine |
JPH1139313A (en) * | 1997-07-24 | 1999-02-12 | Nippon Telegr & Teleph Corp <Ntt> | Automatic document classification system, document classification oriented knowledge base creating method and record medium recording its program |
JP2005250071A (en) * | 2004-03-03 | 2005-09-15 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for speech recognition, speech recognition program, and storage medium with speech recognition program stored therein |
US20080221878A1 (en) * | 2007-03-08 | 2008-09-11 | Nec Laboratories America, Inc. | Fast semantic extraction using a neural network architecture |
CN101593518A (en) * | 2008-05-28 | 2009-12-02 | 中国科学院自动化研究所 | The balance method of actual scene language material and finite state network language material |
JP2010197859A (en) * | 2009-02-26 | 2010-09-09 | Gifu Univ | Utterance difference speech recognition system |
CN103562919A (en) * | 2011-06-02 | 2014-02-05 | 浦项工科大学校产学协力团 | Method for searching for information using the web and method for voice conversation using same |
US20130035931A1 (en) * | 2011-08-04 | 2013-02-07 | International Business Machines Corporation | Predicting lexical answer types in open domain question and answering (qa) systems |
CN102831892A (en) * | 2012-09-07 | 2012-12-19 | 深圳市信利康电子有限公司 | Toy control method and system based on internet voice interaction |
CN104424290A (en) * | 2013-09-02 | 2015-03-18 | 佳能株式会社 | Voice based question-answering system and method for interactive voice system |
CN106294396A (en) * | 2015-05-20 | 2017-01-04 | 北京大学 | Keyword expansion method and keyword expansion system |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN106469554A (en) * | 2015-08-21 | 2017-03-01 | 科大讯飞股份有限公司 | A kind of adaptive recognition methodss and system |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
CN105446146A (en) * | 2015-11-19 | 2016-03-30 | 深圳创想未来机器人有限公司 | Intelligent terminal control method based on semantic analysis, system and intelligent terminal |
CN107424611A (en) * | 2017-07-07 | 2017-12-01 | 歌尔科技有限公司 | Voice interactive method and device |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
CN107862000A (en) * | 2017-10-22 | 2018-03-30 | 北京市农林科学院 | A kind of agricultural technology seeks advice from interactive method |
CN108335692A (en) * | 2018-03-21 | 2018-07-27 | 上海木爷机器人技术有限公司 | A kind of method for switching languages, server and system |
CN108595696A (en) * | 2018-05-09 | 2018-09-28 | 长沙学院 | A kind of human-computer interaction intelligent answering method and system based on cloud platform |
CN108595706A (en) * | 2018-05-10 | 2018-09-28 | 中国科学院信息工程研究所 | A kind of document semantic representation method, file classification method and device based on theme part of speech similitude |
CN108922534A (en) * | 2018-07-04 | 2018-11-30 | 北京小米移动软件有限公司 | control method, device, equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
卢良进: "基于词袋模型的微课视频跨媒体检索研究", 《软件导刊》 * |
李国佳 等: "一种基于多义词向量表示的词义消歧方法", 《智能计算机与应用》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chung et al. | Voxceleb2: Deep speaker recognition | |
Chen et al. | Multi-task learning for text-dependent speaker verification | |
Irum et al. | Speaker verification using deep neural networks: A | |
CN107481720A (en) | A kind of explicit method for recognizing sound-groove and device | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
Caglayan et al. | Multimodal grounding for sequence-to-sequence speech recognition | |
Bhattacharya et al. | Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification. | |
Li et al. | Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition | |
CN109935242A (en) | Formula speech processing system and method can be interrupted | |
Chaabouni et al. | Learning weakly supervised multimodal phoneme embeddings | |
Srinivasan et al. | A partial least squares framework for speaker recognition | |
CN109920430A (en) | Speech recognition semantic processing system and its method | |
de Abreu Campos et al. | A framework for speaker retrieval and identification through unsupervised learning | |
Ghahabi et al. | Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars. | |
Chandel et al. | Sensei: Spoken language assessment for call center agents | |
Van Segbroeck et al. | UBM fused total variability modeling for language identification. | |
CN114444609B (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
CN116883888A (en) | Bank counter service problem tracing system and method based on multi-mode feature fusion | |
CN110853674A (en) | Text collation method, apparatus, and computer-readable storage medium | |
Selvaraj et al. | Bimodal recognition of affective states with the features inspired from human visual and auditory perception system | |
Li et al. | A multi-feature multi-classifier system for speech emotion recognition | |
Chen et al. | An investigation of context clustering for statistical speech synthesis with deep neural network. | |
Li et al. | Emotion embedding framework with emotional self-attention mechanism for speaker recognition | |
Lin et al. | Gated fusion of handcrafted and deep features for robust automatic pronunciation assessment | |
Laskar et al. | Speaker-phrase-specific adaptation of PLDA model for improved performance in text-dependent speaker verification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190621 |
|
RJ01 | Rejection of invention patent application after publication |