CN109920430A

CN109920430A - Speech recognition semantic processing system and its method

Info

Publication number: CN109920430A
Application number: CN201910023125.9A
Authority: CN
Inventors: 沈悦; 袁晓茹; 李闯
Original assignee: Shanghai Yantong Network Technology Co Ltd
Current assignee: Shanghai Yantong Network Technology Co Ltd
Priority date: 2019-01-10
Filing date: 2019-01-10
Publication date: 2019-06-21

Abstract

The present invention provides a speech recognition semantic processing system and semantic processes method.The speech recognition semantic processing system, suitable for a phone robot, including a bag of words module, one semantic vector conversion module, one semantic classes library and a semantic determining module, wherein semantic vector conversion module word according to needed for the speech recognition semantic processing system is by a speech recognition result vectorization, form a speech recognition result vectorization value, wherein the semantic determining module is according to institute's speech recognition result vectorization value, determine institute's speech recognition result semantic classes described in the semantic classification library, form a semantic classes information, the semanteme of speech recognition result to determine, match a response voice.

Description

Speech recognition semantic processing system and its method

Technical field

The present invention relates to phone robot fields, in more detail are related to a speech recognition semantic processing system and its side Method avoids the voice of the semantic understanding of mistake and playback error, mentions for handling the partials of phone robot voice recognition result The intelligence of high phone robot.

Background technique

Artificial intelligence is the core driver of current new round industry transformation, the economy to the world, social progress Life with the mankind generates most penetrating influence.In life, artificial intelligence with ubiquitous, such as fingerprint recognition, people Face identification, intelligent searching engine and speech recognition etc..

Phone robot is also a part of artificial intelligence, is also increasingly paid close attention in recent years by relevant enterprise, especially electric Words sell relevant enterprise.The staff pressure for being engaged in telemarketing and phone customer service is very big, can not keep working heat for a long time Feelings also can often meet with severe dialogue, be easy to produce mood swing, the later period or lose job enthusiasm, fall into inefficiency at This raised vicious circle.For enterprise, recruits and be engaged in the employee of telemarketing and phone customer service and be difficult, separation rate also occupies It is high not under, while Market competition, business number is inadequate, and client's difficulty is sought, if using artificial screening intention client, time benefit Low with rate, enterprise's input cost is big, and working efficiency declines with numerous objective factors, influences enterprise marketing achievement.So It is replaced manually being engaged in telemarketing and phone customer service with telephone set device people, enterprise and the pressure of employee can be mitigated significantly, it can be with Accomplish online service in 24 hours, and employee's bring is influenced without misgivings severe dialogue.

In the market, phone robots all at present all uses keyword match technology to realize and manages the semanteme of customer voice Solution.After the voice of client is passed through speech recognition into text, by the voice in keyword match sound bank, and it will match to Voice play with realize intelligent sound reply.But Chinese speech language extensive knowledge and profound scholarship, not only there are near synonym, it is same to look like There is different expression ways, there are also unisonance difference word, same pronunciation but represents the different meanings.And keyword match technology It identifies more single, it is easy to by semantic understanding mistake, and then cannot not have matched correspondingly, inappropriate voice, lead to phone machine The voice that people plays not is the suitable answer to customer voice, and customer experience sense is poor, and intelligence is not strong.

For example, " interested " and " interesting " two words all indicate that client has intention, using the two words as key Word matches resulting recording, is further to introduction of product etc. (referred to as recording A).But if what client said is " not feel emerging Interest " or " not interested ", expression is that client is not intended to further appreciate that product.At this time if phone robot is using crucial Word matching technique, it is more likely that matching error " will lose interest in " or the expression of " not interested " is matched with recording A, in turn Playback A is exactly to semantic understanding mistake.

Further, since phonetically similar word and near synonym etc. are homophonic, the probability of mistake occurs for the result that speech recognition technology is identified Very big, this also has an impact subsequent semantic understanding.Such as " zhao jing li " described in customer voice is possible to be known " manager Wei not be looked for " to be also possible to that " Zhao Jingli " can be identified as, and keyword match technology is according to " looking for manager " and " Zhao Jingli " The matched recording of institute is different, that is, semantic understanding can be different.In another example " company " and " shop " is near synonym, " place " and " place " is near synonym, and what customer voice was said is " place in shop ", thinks that the semantic meaning of expression is identical as " CompanyAddress ", And if " CompanyAddress " is only set as keyword by keyword match technology, it just can not be " place in shop " this voice It is fitted on the recording (referred to as recording B) for illustrating specific location.That is, although " place in shop " and " CompanyAddress " are in reality Expressed semanteme is identical in Chinese, should all be matching recording B in logic, but for keyword match technology, the two It is different two semantemes, it is possible to match different recording, " place in shop " can not match recording B, this is not intelligence The embodiment of energy phone robot intelligence.

In conclusion existing phone robot can not be handled the partials of speech recognition result, and the pass used A possibility that key word matching technique has very big error rate, and semantic understanding is caused to generate deviation is very big.It is therefore desirable to telephone set Device people improves, and improves reasonability, logicality and the intelligence of phone robot.

Summary of the invention

It is an object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice is known Other semantic processes system in a speech recognition result obtained by speech recognition phonetically similar word or the partials such as near synonym handle, with Correct semantic understanding is carried out, homophonic a possibility that misleading semantic understanding is reduced.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice Identification semantic processes system is based on whole context and understands speech recognition result, and corrects to partials therein, to carry out Correct semantic understanding, the accuracy integrally understood with guarantee and the harmony entirely talked with.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice Identify that semantic processes system understands speech recognition result using bag of words, keyword match skill compared with prior art Art can take a panoramic view of the situation, and consider whole context.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice Identify that semantic processes system understands speech recognition result using bag of words, keyword match skill compared with prior art Art, the influence of the problem of having evaded text back to front in Chinese well to speech understanding, provides recognition accuracy.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice Identify that semantic processes system provides a basic bag of words and multiple extension bag of words, wherein the expansion word bag is in the basic bag of words Word association near synonym or the partials such as homonym so that during semantic vector converts, basic word and associated expansion It is equivalent to open up term vector chemical conversion, to obtain identical semantic understanding, reduces homophonic a possibility that misleading semantic understanding.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the extension Bag of words are arranged in the basic bag of words, and basic word and associated expansion word are arranged to "or" relationship, so that language During adopted vector conversion, basic word and the chemical conversion of associated expansion word vector are equivalent.Also, the extension bag of words account at this time Small with space, the time used in vectorization is shorter, more efficient.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the extension Bag of words are the cartesian products of each basic word and associated expansion word in the basic bag of words, so that semantic vector converts During, partials can also be properly understood, to guarantee the harmony entirely talked with.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice Identify that semantic processes system provides a semantic classes library, the common-use words and profession for storing field used in a phone robot are used Classify in equal words art, to determine semantic affiliated classification according to semantic vector value, and then determine language for a semantic determining module Reason and good sense solution matches corresponding response recording.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice It identifies that semantic processes system utilizes Bayes and inverse document frequency, the semanteme of vectorization is further understood, is analyzed and really It is fixed, reinforce the weight to the difference most significant word of document, so that semantic understanding is more accurate and more harmony.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein extending bag of words It is associated with used speech recognition technology fault domain, avoid blindly adding expansion word so that it is each basis word and its Associated each expansion word has more specific aim, improves semantic understanding efficiency and accuracy.

It is another object of the present invention to provide a speech recognition semantic processing system and its methods, wherein the voice Identification semantic processes system can be applicable in various speech recognition technologies, the not restriction to speech recognition technology, and can be directed to Different speech recognition technologies sets corresponding extension bag of words, and the scope of application is wider, more flexible.

In order to realize at least one above purpose, one aspect under this invention, the present invention further provides one to be suitable for One speech recognition semantic processing system, comprising:

One bag of words module, for word needed for storing the speech recognition semantic processing system；

One semantic vector conversion module, wherein the semantic vector conversion module is according to the speech recognition semantic processing system Word needed for uniting is by a speech recognition result vectorization；

One semantic classes library, including multiple semantic classes；And

One semantic determining module, wherein vectorization value of the semanteme determining module according to institute's speech recognition result, really Determine institute's speech recognition result semantic classes described in the semantic classes library, a semantic classes information is formed, to determine The semanteme of speech recognition result matches a response voice.

According to one embodiment of present invention, the bag of words include that a basic bag of words and at least one extend bag of words, Described in basis bag of words include multiple basic words, the expansion word includes that the basic bag of words include that a basic word partials are associated An at least expansion word, wherein the semantic vector conversion module makes expansion word according to the basic bag of words and the extension bag of words Identical value is turned to basic term vector.

According to one embodiment of present invention, the extension bag of words are accordingly arranged in the basic bag of words, wherein Basic word and associated expansion word is arranged to or relationship, so that expansion word and basic term vector be made to turn to identical value.

According to one embodiment of present invention, the bag of words module is each basic word and the Qi Guan in the basic bag of words The cartesian product of the expansion word of connection, so that expansion word and basic term vector be made to turn to identical value.

According to one embodiment of present invention, the extension bag of words are related to the fault domain of institute speech recognition result Connection.

According to one embodiment of present invention, the semantic determining module utilizes Bayes, according to the speech recognition knot The vectorization value of fruit determines institute's speech recognition result semantic classes described in the semantic classes library, forms the semanteme Classification information.

According to one embodiment of present invention, the semantic determining module utilizes Bayes and inverse document frequency, root According to the vectorization value of institute's speech recognition result, institute's speech recognition result semantic category described in the semantic classes library is determined Not, the semantic classes information is formed.

According to one embodiment of present invention, the semantic classes in the semantic classes library corresponds to the phone robot and answers Art is talked about with field and industry.

According to one embodiment of present invention, the speech recognition semantic processing system further comprises a speech recognition mould Block forms institute's speech recognition result wherein a customer voice is identified as text by the speech recognition module.

According to one embodiment of present invention, the speech recognition semantic processing system further comprises response recording With module and a response dictation library, wherein the response dictation library includes multiple responses recording, each response recording and right The semantic classes association answered, wherein the recording matching module of answering is recorded according to the semantic classes information in the response The corresponding response recording is matched in sound library, forms a response recorded message.

According to one embodiment of present invention, the speech recognition semantic processing system further comprises a playback module, Wherein the playback module plays the corresponding response according to the response recorded message and records.

Other side under this invention, the present invention further provides a speech recognition semantic processing methods, comprising steps of

(a) word according to needed for the processing speech recognition semantic that a bag of words module stores, one speech recognition result of vectorization, shape At a speech recognition result vectorization value；With

(b) according to institute's speech recognition result vectorization value, semantic classes belonging to institute's speech recognition result, shape are determined At a semantic classes information.

According to one embodiment of present invention, the step (a) further comprises step:

(a.1) the basic bag of words of setting one and at least one extension bag of words, wherein the expansion word bag is in the basic bag of words The homophonic associated expansion word of a basic word, the bag of words module is formed, so that the vectorization result phase of expansion word and basic word Together.

According to one embodiment of present invention, extension bag of words described in the step (a.1) is accordingly arranged at described In basic bag of words, basic word and associated expansion word is arranged to or relationship.

According to one embodiment of present invention, bag of words module described in the step (a.1) is in the basic bag of words The cartesian product of each basis word and associated expansion word.

According to one embodiment of present invention, the step (b) further comprises step:

(b.1) it is determined by Bayes and or inverse document frequency according to institute's speech recognition result vectorization value Semantic classes belonging to institute's speech recognition result forms the semantic classes information.

According to one embodiment of present invention, before step (a), the speech recognition semantic processing method is further wrapped It includes step: a customer voice being identified as text, forms institute's speech recognition result.

According to one embodiment of present invention, after step (b), the speech recognition semantic processing method is further wrapped Include step:

(c) according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging；

According to one embodiment of present invention, the speech recognition semantic processing method further comprises step:

(d) the corresponding response voice is played according to the response voice messaging.

Detailed description of the invention

Fig. 1 is speech recognition semantic processing system application drawing according to an embodiment of the invention.

Fig. 2 is speech recognition semantic processing system structural block diagram according to an embodiment of the invention.

Fig. 3 is semantic classes library and the response record of speech recognition semantic processing system according to an embodiment of the invention The illustration in sound library.

Fig. 4 is the flow chart of speech recognition semantic processing method according to an embodiment of the invention.

Specific embodiment

It is described below for disclosing the present invention so that those skilled in the art can be realized the present invention.It is excellent in being described below Embodiment is selected to be only used as illustrating, it may occur to persons skilled in the art that other obvious modifications.It defines in the following description Basic principle of the invention can be applied to other embodiments, deformation scheme, improvement project, equivalent program and do not carry on the back Other technologies scheme from the spirit and scope of the present invention.

It will be understood by those skilled in the art that in exposure of the invention, term " longitudinal direction ", " transverse direction ", "upper", The orientation or position of the instructions such as "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outside" Relationship is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplification of the description, rather than The device or element of indication or suggestion meaning must have a particular orientation, be constructed and operated in a specific orientation, therefore above-mentioned Term is not considered as limiting the invention.

It is understood that term " one " is interpreted as " at least one " or " one or more ", i.e., in one embodiment, The quantity of one element can be one, and in a further embodiment, the quantity of the element can be it is multiple, term " one " is no It can be interpreted as the limitation to quantity.

As shown in Figures 1 to 4, a speech recognition semantic processing system according to a preferred embodiment of the present invention and semanteme Processing method is set forth, in text obtained by speech recognition phonetically similar word or the partials such as near synonym handle, to carry out just True semantic understanding reduces homophonic a possibility that misleading semantic understanding.It is noted that homophonic in the present invention not only includes same Sound word and near synonym also refer to used speech recognition into the word for easily identifying error in the technology of text and obscuring, such as The similar front and back nasal of pronunciation, flat tongue consonant and cacuminal etc..In order to facilitate illustration and description, the present invention is with phonetically similar word and nearly justice Word is illustrated, and is not limitation.

It is noted that the speech recognition semantic processing system of the invention is preferably used for a phone robot, So that phone robot more intelligently with client connection.Phone robot puts through customer phone according to customer data, and in phone Preset opening remarks are played after connection, it is subsequent that art is talked about according to different scenes, intelligently reply.The phone robot can be with intelligence Energy ground and client connection, can also filter out possible intention client and be classified from a large amount of customer data, thus just Effectively secondary follow-up is carried out according to data analysis and message registration in sale or contact staff.When the voice of the invention is known When other semantic processes system is applied to the phone robot, the phone robot can be made to understand customer voice, more intelligence It can the different scene words art of ground reply.

Specifically, the speech recognition semantic processing system includes a speech recognition module 10, for by the voice of client It is identified as text.That is, the speech recognition module 10 receives a customer voice, the customer voice is identified as a language Sound recognition result.Institute's speech recognition result is expressed in the form of text.In the present invention, the speech recognition module 10 is adopted The technical solution taken is not intended to limit, and those skilled in the art can take its known or independently developed technical solution, by institute It states customer voice and is converted into text, form institute's speech recognition result.For example, the speech recognition module 10 can be by institute It states customer voice and resolves to smaller voice unit (VU), by acoustic model and the data model of deep learning, be converted to correspondence Text.

It is appreciated that Chinese speech language is of extensive knowledge and profound scholarship, not only there are near synonym, the same meaning has different expression Mode, there are also unisonance difference word, same pronunciation but represents the different meanings, along with the limitation of speech recognition technology, Institute's speech recognition result that the speech recognition module 10 is identified can all have a possibility that error.That is, institute's predicate The identified word that sound recognition result is included is likely to not be really to think with the customer voice as the word of expression.For example, The customer voice is " how much can borrow ", and due to phonetically similar word or the limitation of speech recognition technology, the speech recognition module The 10 institute's speech recognition results identified are likely to " how much is energy band ".If using the keyword match skill of the prior art Art, " how much can borrow " and " how much is energy band " belong to different semantemes, also just can not be matched to and should match by " how much is energy band " The recording (referred to as recording C) that further loan is described in detail arrived.

And in the present invention, the speech recognition semantic processing system further comprises 20 He of a semantic vector conversion module One bag of words module 30 is used for institute's speech recognition result vectorization.I.e. by establishing bag of words for the speech recognition knot Fruit vectorization, keyword match technology compared with prior art, can take a panoramic view of the situation, and consider whole context.Meanwhile utilizing bag of words The problem of model understands to text obtained by speech recognition, can evade text back to front in Chinese well is to voice The influence of understanding, provides recognition accuracy.

The bag of words module 30 is used as dictionary, stores a phone robot by application field words art related to industry Word, that is, the word for forming the common-use words in the field and industry, dialect etc..For example, the relevant company of financial credit, utilizes The various loan projects of the artificial client of phone machine.The common-use words of financial credit have " I how much can borrow ", accordingly, institute's predicate Bag module 30 is used as dictionary, it may be possible to { I how much can borrow }.Certainly, it is merely illustrative and simplifies herein, an industry and neck The related words art in domain may have several hundred, accordingly, all institutes of the bag of words 30 comprising art if these several hundred Need word, the word for being included also can substantial amounts, for instance it can be possible that { I think loan can how many projects banks ... }.

The semantic vector conversion module 20 is according to the bag of words module 30, by institute's speech recognition result vectorization.? In one embodiment of the invention, in the bag of words module 30, the text in institute's speech recognition result occurs once, then institute Semantic vector conversion module 20 is stated in corresponding position label 1, if there is twice then the semantic vector conversion module 20 right Answer position mark 2.That is, what institute's speech recognition result vectorization value indicated is which word of the bag of words module 30 goes out Existing number.Such as the bag of words module 30 is { I how much can borrow }, the customer voice is " how much can borrow ", and the voice Recognition result identification is correctly " how much can borrow ", and the result after institute's speech recognition result vectorization at this time is { 0,1,1,1,1 }, At this time regardless of institute's speech recognition result whether back to front, the result after vectorization is all { 0,1,1,1,1 }, also just good Influence of the problem of text back to front in Chinese to speech understanding has been evaded on ground, provides recognition accuracy.

But in practical identification process, the speech recognition module 10 is easy to for " how much can borrow " to be identified as that " energy band is more It is few ".And become { 0,1,0,1,1 } after " how much is energy band " vectorization, and " how much can borrow " should vector value { 0,1,1,1,1 } Not identical, this is resulted in during subsequent match, it is likely that can not be matched to should matched recording C, so that semanteme occur Misinterpretation.That is, although existing bag of words improve recognition accuracy to a certain extent, at partials Reason still has shortcoming.

In the present invention, the bag of words module 30 further comprises that a basic bag of words 31 and at least one extend bag of words 32, Described in extension bag of words 32 be a basic homophonic associated expansion word of word in the basic bag of words 31.For example, for the base Basic word " loan ", associated expansion word have " band " and " borrowing " in plinth bag of words 31, then basic word " loan " in the basic bag of words 31 The extension bag of words be { band is borrowed }.It is therefore to be understood that the possibility of the basis bag of words 31 has multiple basic words to have Associated expansion word then may accordingly have multiple associated extension bag of words 32.

It is worth one, the speech recognition technology that the extension bag of words 32 and the speech recognition module 10 use malfunctions Range is associated.It is, the covering of the speech recognition module 10 used sample is less, it is in training process it finds that described 10 fault domain of speech recognition module may be just larger, such as in addition to " band ", " loan " is it is also possible to be misidentified as " bag ", " to " Deng.The extension bag of words 32 { band by means of bag to } of basic word " loans " at this time.Or the used sample of the speech recognition module 10 The case where covering is more, and recognition effect is good, and fault domain is lower, such as basic word " loan " is identified as " band " are most, other are seldom It can make mistakes, then the extension bag of words 32 of basic word " loan " can include near synonym and " band ", be { band is borrowed }.

The extension bag of words 32 are associated with the speech recognition technology fault domain that the speech recognition module 10 uses can To avoid expansion word is blindly added, so that each basis word and associated each expansion word have more specific aim, improve semantic Understand efficiency and accuracy, more spaces can also be saved.That is, the speech recognition module 10 can be applicable in it is various Speech recognition technology, the not restriction to speech recognition technology, and different speech recognition technologies, the bag of words mould can be directed to Block 30 is set corresponding extension bag of words, and the scope of application is wider, more flexible.

The semantic vector conversion module 20 according to the basic bag of words 31 and the extension bag of words 32 so that basic word with Associated expansion word vector chemical conversion is equivalent, to obtain identical semantic understanding, reduces the homophonic possibility for misleading semantic understanding Property.That is, whether the speech recognition module 10 is to be identified as when the customer voice is " how much can borrow " " how much can borrow " still " how much is energy band ", the semantic vector conversion module 20 can vector turn to identical value, be interpreted as Identical semanteme, so that recording C can be matched to, so that phone robot is more intelligent.

In the preferred embodiment of the invention, the extension bag of words 32 are accordingly arranged at the basic bag of words 31 Interior, basic word and associated expansion word are arranged to "or" relationship, so that during semantic vector converts, basic word It is melted into associated expansion word vector equivalent.For example, the bag of words module 30 is { my energy { borrow or and borrow or band } is how many }. At this point, the extension bag of words { band is borrowed } are arranged in the basic bag of words 31, wherein basic word " loan " and associated extension Word " band ", " borrowing " they are the relationship of "or" between each other, that is to say, that as long as occurring in { borrow or and borrow or band } one, corresponding vector Just it is 1, to obtain identical vectorization value, carries out identical semantic understanding.

Specifically, when the customer voice is " how much can borrow ", if the speech recognition module 10 is to be known Not Wei " how much can borrow ", then the semantic vector conversion module 20 is according to the basic bag of words 31 and the extension bag of words 32, to Quantized value is { 0,1,1,1,1 }.If the speech recognition module 10 is to be identified as " energy band is how much ", it is described it is semantic to Conversion module 20 is measured according to the basic bag of words 31 and the extension bag of words 32, vectorization value is { 0,1,1,1,1 }.

When the customer voice is " can borrow how many ", the speech recognition module 10 is to be identified as " to borrow more It is few ", and the semantic vector conversion module 20 is also according to the basic bag of words 31 and the extension bag of words 32, vectorization value The vectorization value of { 0,1,1,1,1 } and above-mentioned two institute's speech recognition result is identical, is interpreted as the identical meaning.

That is, the near synonym being identified as the speech recognition module 10 and phonetically similar word etc. are homophonic, it is of the invention The speech recognition semantic processing system can be corrected as correct semantic understanding, avoid homophonic to semantic influence.Also, this In the preferable implementation of invention, the extension bag of words 32 occupy little space, and the time used in vectorization is shorter, more efficient.

In another embodiment of the invention, the basic bag of words 31 and the extension bag of words 32 are in the basic bag of words Each basic word and associated expansion word cartesian product so that during semantic vector converts, it is homophonic can also be with It is properly understood, to guarantee the harmony entirely talked with.For example, the expansion word of basic word " company " have " shop ", " shop ", " shop ", " hotel owner ", " trade company ", " business office " and " mall ", to form array { company store shop shop hotel owner Business office, trade company mall }.And the expansion word of basic word " place " has " place " and " address ", to form array { location address Side }.Then the basic bag of words 31 and the extension bag of words 32 of " CompanyAddress " are the cartesian products of both of the aforesaid array, wherein { CompanyAddress } is the basic bag of words 31, and remaining is the extension bag of words 32.

In the present embodiment, basic bag of words 31 are expanded using cartesian product, the extension bag of words 32 is generated, so that various Homophonic situation can be quantified as identical value, and then be understood to the identical meaning, effectively improve classification accuracy.Into One step, in the present embodiment, the speech recognition technology error model of the extension bag of words 32 and the speech recognition module 10 use It encloses to be associated and unnecessary array combination can be effectively avoided, reduce the bag of words module 30, to improve semantic understanding effect Rate.

Further, the speech recognition semantic processing system of the invention includes a semantic determining module 40 and a semantic category Other library 50.The semanteme determining module 40 determines the customer voice in institute according to the vectorization value of institute's speech recognition result Affiliated semantic classes in predicate justice class library 50, that is, determine the semanteme of the customer voice.

The common-use words and profession that the semantic classes library 50 is used to store field used in a phone robot are for waiting words art Classification, i.e., the common difference in field used are semantic.In other words, the semantic classes library 50 includes multiple semantic classes 51.Often A semantic classes 51 is semantic different between each other, accordingly matches different response voices.Such as in financial credit field, The semantic classes library 50 may storage semantic classes have " lack interest more than a year? with half a year how much interest? a month how many benefit Breath? ", " how low interest is? ", " vagrant, without work, not borrow money, credit bad " and " which qualification needed " Etc. semantic classes.It is appreciated that the semantic classes library 50 is likely to for different users, different fields Difference can be pointedly arranged and be stored corresponding content.

The semanteme determining module 40 determines belonging to the customer voice according to the vectorization value of institute's speech recognition result Semantic classes also determines that the semanteme of the customer voice.In the preferred embodiment of the invention, known based on the voice Onrelevant the it is assumed that semantic modules 40 analyze which institute institute's speech recognition result belongs between each word of other result The maximum probability of semantic classes 51 is stated, that is, can determine the affiliated semantic classes of the speech recognition result.

For example, the bag of words module 30 is { I can { borrow or borrow or band } how many }, the customer voice is " how much can borrow ", The speech recognition module 10 is to be identified as " how much is energy band ", then the semantic vector conversion module 20 is according to the basis Bag of words 31 and the extension bag of words 32, vectorization value are { 0,1,1,1,1 }.The semanteme determining module 40 according to vectorization value is { 0,1,1,1,1 }, onrelevant it is assumed that calculating which institute analysis belongs between each word based on institute's speech recognition result State the maximum probability of semantic classes 51.For example, being { 0,1,1,1,1 } according to vectorization value, the semanteme determining module 40 is determined The possibility that institute's speech recognition result belongs to " how much can borrow " this semantic classes 51 is maximum.

Preferably, the semantic determining module 40 utilizes Bayes, calculates the vectorization value category of institute's speech recognition result In the probability of each classification, so that it is semantic as determining to be maximized the corresponding semantic classes 51.Compared with prior art The single matching of keyword, the accuracy rate of semantic understanding can be improved using Bayesian analysis.Preferably, the semantic determining module 40 using Bayes and and inverse document frequency, the semanteme of vectorization is further understood, analyzed and determined, reinforce to difference The weight of the most significant word of document, so that semantic understanding is more accurate and more harmony.

The basic principle and content of Bayes and inverse document frequency that those skilled in the art should be recognized that, herein no longer It repeats.The Bayes herein and inverse document frequency that those skilled in the art should be recognized that are merely illustrative, and are not limitation, Those skilled in the art can determine the affiliated semantic classes of institute's speech recognition result using other method for calculating probability.

Further, the speech recognition semantic processing system includes response recording matching module 70 and a response dictation library 60, the semantic classes 51 for being determined according to the semantic determining module 40 matches properly in the response dictation library 60 Corresponding response recording 61.

Specifically, the response dictation library 60 includes multiple responses recording 61.The response recording 61 is to prerecord work For the recording responded and played to the customer voice.Each response recording 61 and the corresponding semantic classes 51 are closed Connection.Such as in one embodiment of this invention, the response recording 61 and the corresponding semantic classes 51 pass through association identification Symbol association, such as the associated identifiers are implemented as recording serial number, the response recording 61 and the corresponding semantic classes 51 are equipped with identical recording serial number.That is, the response recording 61 and the semantic classes 51 are one-one relationship, each The semantic classes 51 is equipped with the corresponding response recording 61 and is used as answer.For example, the semantic classes 51 is " benefit How low cease ", with response recording 61 " because what we docks is bank's inside channel, assuring mode is you into part, So what bank gave is all minimum preferential policy " it is associated, the two is associated by being identically numbered " 113 ", as shown in Figure 3.

The response recording matching module 70 passes through associated identifiers, Ji Ke according to the determining semantic classes 51 The response dictation library 60 matches suitable corresponding response recording 61, forms a response recorded message.For example, working as institute's predicate Adopted determining module 40 determines that the affiliated semantic classes 51 of institute's speech recognition result is " how much can borrow ", then according to associated identifiers " 124 ", the corresponding response recording 61 can be matched in the response dictation library 60, and " how much a according to you this borrows It is fixed that human feelings condition is come, everyone situation is different ", and the corresponding response recorded message is formed, as shown in Figure 3.

The response recorded message may include but not limit storage address, content and the number etc. of the response recording 61 Deng.The response recorded message is sent to a playback module 80 by the response recording matching module 70.The playback module 80 The corresponding response recording 61 is played according to the response recorded message.So far, the customer voice can be obtained response, and And in the speech recognition semantic processing system of the invention, the partials of institute's recognition result of the customer voice are processed, So that the customer voice is more accurately understood, so that the more specific aim of response recording 61 obtained, also allows for Phone robot is more intelligent, more tends to hommization.

According to another aspect of the present invention, the present invention further provides a speech recognition semantic processing methods, to realize this The purpose and advantage of invention.The speech recognition semantic processing method can be used for above-mentioned speech recognition semantic processing system. As shown in figure 4, being the flow chart of the speech recognition semantic processing method of the invention.

Step 110: receiving a customer voice.

After phone robotics dialup is with connecting successfully, it will record described in client (connecting the other side of phone) as institute State customer voice.Partial phone robot can first play opening remarks recording after closing of the circuit, for example introduce identity and main Business etc..

Step 120: the customer voice being identified as text, forms a speech recognition result.

Specifically, speech recognition technology used by speech recognition does not limit herein, and those skilled in the art can adopt The customer voice is identified as text with well known or its independently developed technology.It is noted that the speech recognition As a result it is expressed with written form.

It is appreciated that due to the limitation of speech recognition technology and the complexity of Chinese vocabulary, institute's speech recognition result Probably be not it is complete consistent with the content of the customer voice, a possibility that there are partials, is very big.Partials herein are Speech recognition used by referring to is at the word for easily identifying error in the technology of text and obscuring, such as near synonym, phonetically similar word, hair Sound error-prone front and back nasal sound and flat tongue consonant and cacuminal etc..

Step 130: according to a bag of words module, vectorization institute speech recognition result forms a speech recognition result vector Change value.

Specifically, this step can use bag of words by institute's speech recognition result vectorization, so that the voice is known What other result vector value indicated is the number of which word appearance of the bag of words module.Those skilled in the art will be appreciated that bag of words The basic conception and content of model, details are not described herein again.That is, institute's speech recognition result vectorization value is without the institute that worries Speech recognition result whether back to front, the problem of also just having evaded text back to front in Chinese well, is to speech understanding Influence, recognition accuracy is provided.

Further, the step 130 can with comprising steps of the basic bag of words of setting one and at least one extend bag of words, wherein The extension bag of words are the basic homophonic associated expansion words of word in the basic bag of words, form the bag of words module, so that Expansion word is identical with the basic vectorization result of word.

As described above, the problem of institute's speech recognition result is in addition to text back to front, there is also homophonic problem, from And influence whole semantic understanding.What the extension bag of words included is the homophonic word of corresponding basic word.In this way, in vectorization institute predicate During sound recognition result, expansion word is identical with the basic vectorization result of word, has reached identical semantic understanding.

In one embodiment of the invention, the extension bag of words are accordingly arranged in the basic bag of words, basis Word and associated expansion word are arranged to "or" relationship.In another embodiment of the invention, the bag of words module is described The cartesian product of each basic word and associated expansion word in basic bag of words.Both methods can make expansion word and The vectorization result of basic word is identical, to avoid the homophonic misleading to speech understanding.

Step 140: according to institute's speech recognition result vectorization value, determining semantic category belonging to institute's speech recognition result Not, a semantic classes information is formed.

Specifically, there is the words art such as its specific common-use words and profession use in each industry and field.By these words arts according to Semantic classification in advance is in different semantic classes.According to institute's speech recognition result vectorization value, it can use probability calculation and go out Institute's speech recognition result belongs to the maximum probability of which semantic classes, so that it is determined that language belonging to institute's speech recognition result Adopted classification.The semantic classes information may include but not limit, the affiliated semantic classes number of speech recognition result, storage ground Location, content, associated response recording number etc..

Preferably, the step 140 further comprises step: by Bayes and inverse document frequency, according to described Speech recognition result vectorization value determines semantic classes belonging to institute's speech recognition result, forms the semantic classes information.

Step 150: according to the semantic classes information, matching corresponding response voice, form a response voice messaging.

Different semantic classes is corresponding different responses.Corresponding response voice is prerecorded, and by it and is corresponded to Semantic classes association, such as by being identically numbered etc., be not intended to limit herein.In this way, according to the semantic classes information Corresponding response voice can be searched and be matched, the response voice messaging is formed.The response voice messaging may include but It is not limited to storage address, content and the number etc. of the response recording.

Step 160: the corresponding response voice is played according to the response voice messaging.

That is, the answer of the response voice i.e. the customer voice, phone robot realizes intelligence and answers It answers.Traditional keyword match technology has not only been abandoned in the speech recognition semantic processing method of the invention, it is also contemplated that Homophonic problem in identification, improves semantic understanding accuracy rate in the way of Bayes etc., so that phone robot is more humanized, Whole dialogue is more coordinated.

It should be understood by those skilled in the art that foregoing description and the embodiment of the present invention shown in the drawings are only used as illustrating And it is not intended to limit the present invention.The purpose of the present invention has been fully and effectively achieved.Function and structural principle of the invention exists It shows and illustrates in embodiment, under without departing from the principle, embodiments of the present invention can have any deformation or modification.

Claims

1. a speech recognition semantic processing system characterized by comprising

One semantic vector conversion module, wherein the semantic vector conversion module is according to the speech recognition semantic processing system institute It needs word by a speech recognition result vectorization, forms a speech recognition result vectorization value；

One semantic classes library, including multiple semantic classes；And

One semantic determining module, wherein described in the semanteme determining module according to institute's speech recognition result vectorization value, determines The speech recognition result semantic classes affiliated in the semantic classes library, forms a semantic classes information, to determine The semanteme of speech recognition result matches a response voice.

2. speech recognition semantic processing system according to claim 1, wherein the bag of words module includes a basic bag of words At least one extension bag of words, wherein the basis bag of words include multiple basic words, the expansion word includes that a basic word is humorous The associated at least expansion word of sound, wherein the semantic vector conversion module is according to the basic bag of words and the extension bag of words, Expansion word and basic term vector is set to turn to identical value.

3. speech recognition semantic processing system according to claim 2, wherein the expansion word bag is accordingly arranged at In the basis bag of words, wherein basic word and associated expansion word is arranged to or relationship, to make expansion word and basic word Vector turns to identical value.

4. speech recognition semantic processing system according to claim 2, wherein the bag of words module is the basic bag of words In each basic word and associated expansion word cartesian product, so that expansion word and basic term vector be made to turn to identical value.

5. speech recognition semantic processing system according to claim 2, wherein the expansion word bag and the speech recognition As a result fault domain is associated.

6. speech recognition semantic processing system according to any one of claims 1 to 5, wherein the semanteme determining module utilizes Bayes determines institute's speech recognition result in the semantic classes library according to the vectorization value of institute's speech recognition result The affiliated semantic classes forms the semantic classes information.

7. speech recognition semantic processing system according to any one of claims 1 to 5, wherein the semanteme determining module utilizes Bayes and inverse document frequency determine that institute's speech recognition result exists according to the vectorization value of institute's speech recognition result The semantic classes belonging in the semantic classes library, forms the semantic classes information.

8. speech recognition semantic processing system according to any one of claims 1 to 5, wherein the semanteme in the semantic classes library Classification corresponds to phone robot application field and industry talks about art.

9. speech recognition semantic processing system according to any one of claims 1 to 5 further comprises a speech recognition mould Block forms institute's speech recognition result wherein a customer voice is identified as text by the speech recognition module.

10. speech recognition semantic processing system according to any one of claims 1 to 5 further comprises response recording With module and a response dictation library, wherein the response dictation library includes multiple responses recording, each response recording and right The semantic classes association answered, wherein the recording matching module of answering is recorded according to the semantic classes information in the response The corresponding response recording is matched in sound library, forms a response recorded message.

11. speech recognition semantic processing system according to claim 10 further comprises a playback module, wherein described Playback module plays the corresponding response according to the response recorded message and records.

12. a speech recognition semantic processing method, which is characterized in that comprising steps of

(a) word according to needed for the processing speech recognition semantic that a bag of words module stores, one speech recognition result of vectorization form one Speech recognition result vectorization value；With

(b) it according to institute's speech recognition result vectorization value, determines semantic classes belonging to institute's speech recognition result, forms one Semantic classes information.

13. speech recognition semantic processing method according to claim 12, wherein the step (a) further comprises step It is rapid:

(a.1) the basic bag of words of setting one and at least one extension bag of words, wherein the expansion word bag is one in the basic bag of words The homophonic associated expansion word of basic word, forms the bag of words module, so that expansion word is identical with the basic vectorization result of word.

14. speech recognition semantic processing method according to claim 13, wherein expansion word described in the step (a.1) Bag is accordingly arranged in the basic bag of words, basic word and associated expansion word is arranged to or relationship.

15. speech recognition semantic processing method according to claim 13, wherein bag of words mould described in the step (a.1) Block is the cartesian product of each basic word and associated expansion word in the basic bag of words.

16. speech recognition semantic processing method according to claim 12, wherein the step (b) further comprises step It is rapid:

(b.1) by Bayes and or inverse document frequency, according to institute's speech recognition result vectorization value, determine described in Semantic classes belonging to speech recognition result forms the semantic classes information.

17. 2 to 16 any speech recognition semantic processing method according to claim 1, before step (a), institute's predicate Sound identifies that semantic processes method further comprises step:

One customer voice is identified as text, forms institute's speech recognition result.

18. 2 to 16 any speech recognition semantic processing method according to claim 1, after step (b), institute's predicate Sound identifies that semantic processes method further comprises step:

(c) according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging.

19. speech recognition semantic processing method according to claim 18, further comprises step: