CN109935242A

CN109935242A - Formula speech processing system and method can be interrupted

Info

Publication number: CN109935242A
Application number: CN201910023131.4A
Authority: CN
Inventors: 沈悦; 袁晓茹; 李闯
Original assignee: Shanghai Yantong Network Technology Co Ltd
Current assignee: Shanghai Yantong Network Technology Co Ltd
Priority date: 2019-01-10
Filing date: 2019-01-10
Publication date: 2019-06-25

Abstract

The present invention, which provides one kind, can interrupt formula speech processing system and method.The formula speech processing system that interrupts includes a default main flow library, one broadcast module and a sentiment analysis module, wherein the default main flow library is according to the multiple main flow voices of default associated storage, form a default main flow, wherein the broadcast module is used to play the corresponding main flow voice according to the default main flow, the wherein current affective state that one speech recognition result of sentiment analysis module analysis is reflected, wherein the sentiment analysis module is according to the current affective state, instruct whether the broadcast module interrupts the execution of the default main flow.

Description

Formula speech processing system and method can be interrupted

Technical field

The present invention relates to phone robot fields, and formula speech processing system and side can be interrupted by more detail being related to one Method, so as to phone robot realize sale purpose simultaneously, take into account the reaction and mood of client so that phone robot it is more intelligent and More coordinate.

Background technique

Artificial intelligence is the core driver of current new round industry transformation, the economy to the world, social progress Life with the mankind generates most penetrating influence.In life, artificial intelligence with ubiquitous, such as fingerprint recognition, people Face identification, intelligent searching engine and speech recognition etc..

Phone robot is also a part of artificial intelligence, is also increasingly paid close attention in recent years by relevant enterprise, especially electric Words sell relevant enterprise.The staff pressure for being engaged in telemarketing and phone customer service is very big, can not keep working heat for a long time Feelings also can often meet with severe dialogue, be easy to produce mood swing, the later period or lose job enthusiasm, fall into inefficiency at This raised vicious circle.For enterprise, recruits and be engaged in the employee of telemarketing and phone customer service and be difficult, separation rate also occupies It is high not under, while Market competition, business number is inadequate, and client's difficulty is sought, if using artificial screening intention client, time benefit Low with rate, enterprise's input cost is big, and working efficiency declines with numerous objective factors, influences enterprise marketing achievement.So It is replaced manually being engaged in telemarketing and phone customer service with telephone set device people, mitigates the pressure of enterprise and employee significantly, can accomplish Online service in 24 hours, and employee's bring is influenced without misgivings severe dialogue.

But passively executed and operated according to consumer instruction different from the speech robot people of other field, telephone set The device National People's Congress replaces sales force more, needs to play the distribution and guiding function of sales force.If phone robot is only simple It mechanically answers a question, cannot realize the sale purpose such as promote the sale of products, also just can not effectively replace sales force.Namely It says, there is no specific purposes during service by general speech robot people, but respond language according to the purpose of consumer Sound, but phone robot during exchange for the purpose of recommended products, sale product etc., needs to play for selling Guidance and the effect promoted, could really replace sales force.

On the other hand, phone robot is during for the purpose of recommended products, sale product etc. with customer communication, also The reaction and mood for needing to take into account client are not to promote simply.For example, when client's refusal further appreciates that, telephone set Device people can keep client, rather than so mechanical that hang up；Or when client has a question, phone robot should be answered accordingly, Rather than it recklessly promotes the sale of products.This just proposes the intelligence of phone robot, personification and harmony higher It is required that.

There is no the realizations for really focusing on sale purpose for phone robot currently on the market, also it is not intended that client Mood is the semanteme for directlying adopt single keyword match technique and understanding speech recognition result mostly.Directly adopt keyword Matching technique understands semantic, not only easy error, so that the answer of phone robot is not corresponding with customer voice, and does not consider The mood of client, mechanically matching efficiency is low, causes the intelligence of phone robot and harmony poor.

Summary of the invention

It is an object of the present invention to provide one can interrupt formula speech processing system and method, wherein described can interrupt formula Speech processing system provides a default main flow library, for it is default to form one according to the multiple main flow voices of associated storage are preset Main flow allows the formula speech processing system that interrupts according to the default default purpose of main flow realization, thus very Sales force is just effectively replaced, intelligence is improved.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein the default master Process can be interrupted according to a customer voice, properly to respond the customer voice, so that can interrupt formula language described in being applicable in The phone robot of sound processing system more intelligence and coordination.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein described can interrupt Formula speech processing system can analyze the current affective state of client according to the customer voice, to be based further on current emotion State is properly responded.So comparing existing phone robot, it is applicable in the telephone set that can interrupt formula speech processing system Device people may determine that the Sentiment orientation of the customer voice.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein as the client Voice is determined when belonging to refusal or negative mood, it is described interrupt formula speech processing system and can accordingly play one keep language Sound, to keep client, so that being applicable in phone robot humanoid, intelligence and the coordination that can interrupt formula speech processing system Property is higher.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein the default master Each sub-process in process be designed with it is corresponding it is described keep voice, to be broadcast when keeping client in corresponding sub-process It puts and appropriate described keeps voice.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein described can interrupt Formula speech processing system is based on current affective state, understands language using keyword match technology or natural language processing technique etc. On the one hand justice improves semantic understanding efficiency, on the other hand also improves the affective comprehension of semantic understanding, more intelligently.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein described can interrupt Formula speech processing system is based on current affective state, understands semanteme using natural language processing technique, compared to the prior art directly It connects using keyword match technology, not only semantic understanding accuracy is more increased, and is applicable in and described can be interrupted formula speech processes system The phone robot of system is more intelligent and personalizes.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein described can interrupt Formula speech processing system is based on current affective state, and identification client is intended to, consequently facilitating screening intention client, improves work effect Rate.

It is another object of the present invention to provide one can interrupt formula speech processing system and method, wherein described can interrupt Formula speech processing system by the way of neural network deep learning or can pass through the degree adverb etc. one for introducing emotion word Serial dictionary judges the current affective state of the customer voice come the mode etc. given a mark to the customer voice.? That is those skilled in the art can realize emotion shape using various known or its independently developed mode in the present invention State analytic function.

In order to realize at least one purpose of the invention, one aspect under this invention can the present invention further provides one Interrupt formula speech processing system, comprising:

One default main flow library, wherein the default main flow library is according to the multiple main flow voices of default associated storage, shape At a default main flow；

One broadcast module, for playing the corresponding main flow voice according to the default main flow；

One sentiment analysis module, wherein the current emotion that one speech recognition result of sentiment analysis module analysis is reflected State, wherein whether the sentiment analysis module instructs the broadcast module to interrupt described pre- according to the current affective state If the execution of main flow.

According to one embodiment of present invention, the formula speech processing system that interrupts further comprises keeping voice Library, wherein the storage at least one keeps voice, wherein each main flow voice is provided with and corresponding described keeps voice.

According to one embodiment of present invention, when the current affective state is intention emotion, the sentiment analysis mould Block forms a process and executes instruction, and the broadcast module executes instruction according to the process and plays the corresponding main flow voice Or corresponding described keep voice.

According to one embodiment of present invention, when the current affective state is affirmative or neutral, the broadcast Module continues to play next main flow voice, to continue to execute the default main flow；When the current affective state When being negative or refusal, the broadcast module, which plays, corresponding described keeps voice.

According to one embodiment of present invention, when the current affective state is query, the sentiment analysis module shape It is instructed at a semantic understanding, to interrupt the execution of the default main flow, into semantic understanding link.

According to one embodiment of present invention, the formula speech processing system that interrupts further comprises a semantic determining mould Block and a semantic classes library, wherein the semantic classes library includes multiple semantic classes, wherein the semanteme determining module is by institute Semantic understanding instruction triggers are stated, each semantic classes in institute's speech recognition result and the semantic classification library is carried out Matching determines the semantic classes belonging to institute's speech recognition result, forms a semantic classes information.

According to one embodiment of present invention, the semantic determining module determines the voice using keyword match technology The semantic classes belonging to recognition result forms the semantic classes information.

According to one embodiment of present invention, the formula speech processing system that interrupts further comprises that a semantic vector turns Change the mold block, wherein the semantic vector conversion module by institute's speech recognition result vectorization, formed a speech recognition result to Quantized value, wherein the semanteme determining module determines institute's speech recognition result according to institute's speech recognition result vectorization value The semantic classes described in the semantic classes library forms the semantic classes information.

According to one embodiment of present invention, the semantic vector conversion module utilizes bag of words technology, by institute's predicate Sound recognition result vectorization.

According to one embodiment of present invention, the semantic determining module using Bayes and or inverse document frequency, According to institute's speech recognition result vectorization value, semantics recognition result semantic category described in the semantic classes library is determined Not, the semantic classes information is formed.

According to one embodiment of present invention, the formula speech processing system that interrupts further comprises a speech recognition mould Block forms institute's speech recognition result wherein a customer voice is identified as text by the speech recognition module.

According to one embodiment of present invention, the formula speech processing system that interrupts further comprises response recording With module and a response dictation library, wherein the response dictation library includes multiple responses recording, each response recording and right The semantic classes association answered, wherein the recording matching module of answering is recorded according to the semantic classes information in the response The corresponding response recording is matched in sound library, a response recorded message is formed, wherein the broadcast module is according to the response Recorded message plays the corresponding response recording.

According to one embodiment of present invention, the formula speech processing system that interrupts further comprises an intention assessment mould Block and an intention type library, wherein the intention type library stores multiple preset intention types, wherein the intention assessment mould Root tuber determines the intention type that the recognition result embodies according to institute's speech recognition result vectorization value, for record and Analysis.

One aspect under this invention, the present invention further provides one can interrupt formula method of speech processing, comprising:

(a) play a default main flow in a main flow voice, the plurality of main flow voice according to

Certain sequence association, forms the default main flow；

(b) current emotional states that one speech recognition result of analysis is reflected；With

(c) according to the current emotional states, into corresponding default processing links.

According to one embodiment of present invention, the step (c) further comprises step:

(c.1) when the current emotional states are affirmative or are neutral, next main flow voice is accordingly played, with Continue to execute the default main flow；

(c.2) when the current emotional states are negative or refusal, broadcasting is drawn with the main flow voice corresponding one Stay voice；And

(c.3) when the current emotional states are query, semantic understanding instruction is sent, to enter a semantic understanding ring Section.

According to one embodiment of present invention, after the step (c.3), it is described interrupt formula method of speech processing into One step comprising steps of

(c.3.1) by institute's speech recognition result and the preset keyword match of each semantic classes, the voice is determined Semantic classes belonging to recognition result forms the semantic classes information.

(c.3.2) vectorization institute speech recognition result forms a speech recognition result vectorization value；With

(c.3.3) according to institute's speech recognition result vectorization value, semantic category belonging to institute's speech recognition result is determined Not, a semantic classes information is formed.

According to one embodiment of present invention, the formula method of speech processing that interrupts further comprises step:

(c.3.4) according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging；With And

(c.3.5) the corresponding response voice is played according to the response voice messaging.

According to one embodiment of present invention, the step (c.3.1) further comprises step:

If the semantics recognition result is all matched with the preset keyword of multiple semantic classes, matching word is taken The most semantic classes of number forms the semantic classes information as semantic classes belonging to final determine；With

If there is the same semantic classes of multiple matching numbers of words, then it can default to choose and number forward semantic category Not.

Detailed description of the invention

Fig. 1 is the structural block diagram according to an embodiment of the invention for interrupting formula speech processing system.

Fig. 2 is the structural block diagram for interrupting formula speech processing system according to another embodiment of the invention.

Fig. 3 is the signal of the process sound bank according to an embodiment of the invention for interrupting formula speech processing system Figure.

Fig. 4 is that the semantic classes library and one according to an embodiment of the invention for interrupting formula speech processing system is answered Answer the schematic diagram of dictation library.

Fig. 5 is the flow chart according to an embodiment of the invention for interrupting formula method of speech processing.

Fig. 6 is the semantic understanding flow chart according to an embodiment of the invention for interrupting formula method of speech processing.

Fig. 7 is another semantic understanding process according to an embodiment of the invention for interrupting formula method of speech processing Figure.

Specific embodiment

It is described below for disclosing the present invention so that those skilled in the art can be realized the present invention.It is excellent in being described below Embodiment is selected to be only used as illustrating, it may occur to persons skilled in the art that other obvious modifications.It defines in the following description Basic principle of the invention can be applied to other embodiments, deformation scheme, improvement project, equivalent program and do not carry on the back Other technologies scheme from the spirit and scope of the present invention.

It will be understood by those skilled in the art that in exposure of the invention, term " longitudinal direction ", " transverse direction ", "upper", The orientation or position of the instructions such as "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outside" Relationship is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplification of the description, rather than The device or element of indication or suggestion meaning must have a particular orientation, be constructed and operated in a specific orientation, therefore above-mentioned Term is not considered as limiting the invention.

It is understood that term " one " is interpreted as " at least one " or " one or more ", i.e., in one embodiment, The quantity of one element can be one, and in a further embodiment, the quantity of the element can be it is multiple, term " one " is no It can be interpreted as the limitation to quantity.

As shown in Figures 1 to 7, the one of an embodiment according to the present invention can interrupt formula speech processing system and method and be explained It states.The formula speech processing system that interrupts is preferably used for a phone robot so that phone robot more intelligently with visitor Family interaction.It is described interrupt formula method of speech processing can be applied to it is described can interrupt formula speech processing system, to realize The purpose of the present invention and advantage.

The phone robot puts through customer phone according to customer data, and preset prologue is played after closing of the circuit It is white, it is subsequent that art is talked about according to different scenes, intelligently reply.The phone robot can be intelligently with client connection, can also To filter out possible intention client from a large amount of customer data and classify, consequently facilitating sale or contact staff according to Data analysis and message registration carry out effectively secondary follow-up.Formula speech processing system can be interrupted described in of the invention to be applied When the phone robot, the phone robot can be made to realize default purpose simultaneously, take into account the reaction and mood of client, So that the phone robot is more intelligent and more coordinates.

It is noted that as previously mentioned, passively being referred to according to consumer different from the speech robot people of other field It enables and executes operation, the phone machine National People's Congress replaces sales force more, needs to play the distribution and guiding function of sales force.Such as Fruit phone robot is to answer a question simple and mechanically, cannot realize the sale purpose such as promote the sale of products, also just can not be effective Ground replaces sales force.That is, general speech robot people during service there is no specific purpose, but root Voice is responded according to the purpose of consumer, but phone robot is produced for selling during exchange with recommended products, sale It for the purpose of product etc., needs to play the role of guidance and distribution, could really replace sales force.

As depicted in figs. 1 and 2, the formula speech processing system that interrupts includes a process sound bank 10 and a broadcast module 30.The process sound bank 10 is used to store the process voice needed for can interrupting formula speech processing system.Specifically, described Main flow sound bank 10 includes a default main flow library 11.The default main flow library 11 is according to the multiple mainstreams of default associated storage Cheng Yuyin forms a default main flow.That is, the default main flow is by multiple main flow voices according to certain Sequence is default to be formed, primarily to realize default purpose, such as is promoted the sale of products, the purpose of introduction is movable.Different industry and The default main flow in field is possibly different from according to industry and domain feature, and the developer of phone robot can Think that its Customer design has the targetedly default main flow.

The broadcast module 30 is used for playback.In not unexpected situation, in other words with put through client and actively match Under conjunction, the broadcast module 30 can accordingly play the institute in the default main flow library 11 according to the default main flow Main flow voice is stated, so that the formula speech processing system that interrupts can realize default mesh according to the default main flow , to really effectively replace sales force, improve intelligence.

Such as one may is that the main flow voice for playing greeting first for the default main flow of financial field (recording A1, " feed, you are good "), then play the main flow voices of opening remarks (recording A2, " you are good, I is here that profession handles bank Loan, interest rate is low, and no mortgage, complimentary close is fast, may I ask you recently and have the demand in fund? "), then to the mainstream of Introduction of enterprises Cheng Yuyin (recording A4, " and in this way, we are that the poly- sincerity in Beijing is melted, enterprise is specially solved, it is personal, the problem of financing difficulties, side Just the case where I takes off you? "), then (recording A4, " I asks that, you probably need to the main flow voice gone and found out what's going on How many fund? "), then (recording A8, " you see so good or not, and let us is special later to the main flow voice actively invited The Financial Advisor of industry returns a call to you！May I ask you, how should I address you? "), finally play the main flow voice (recording of conclusion A10, " you see in this way, are also impossible to once just to make whole issue clear inside phone, then just by our Financial Advisor Specifically with you, today, we were just first in this way, wish that you work happiness, goodbye！").That is, in the feelings not interrupted surprisingly Under condition, in other words with put through under client actively cooperates, the formula speech processing system that interrupts should be able to be according to recording A1- recording Main flow voice described in the played in order of A2- recording A4- recording A4- recording A8- recording A10, continuously interacts with client, thus real It now provides a loan the purpose of financial item to putting through client and introduce enterprise and recommending, as shown in Figure 3.

But during for the purpose of recommended products, sale product etc. with customer communication, it is also necessary to take into account client's Reaction and mood, are not to promote simply.The answer of client is flexible and changeable, if played simply according to the default main flow Main flow voice, rather than properly reply, so that phone robot is dull and mechanical.

The formula speech processing system that interrupts further comprises a speech recognition module 20, for knowing the voice of client Not at text.That is, the speech recognition module 20 receives a customer voice, the customer voice is identified as a voice Recognition result.Institute's speech recognition result is expressed in the form of text.In the present invention, the speech recognition module 20 is taken Technical solution be not intended to limit, those skilled in the art can take its known or independently developed technical solution, will be described Customer voice is converted into text, forms institute's speech recognition result.For example, the speech recognition module 20 can will be described Customer voice resolves to smaller voice unit (VU), by acoustic model and the data model of deep learning, is converted to corresponding Text.

Current affective state according to the reflection of institute's speech recognition result is different, and the default main flow may be interrupted, It may also continue.The formula speech processing system that interrupts includes a sentiment analysis module 40, for analyzing the speech recognition As a result the current affective state reflected.The current affective state of institute's speech recognition result reflection may be affirmative, such as " good ", " very interested ", " having demand "；It may be refusal, such as " not needing ", " I should not "；It may be negative, such as " your companies are not all right "；It may be neutrality, such as " wanting to continue to understand "；It may be query, such as expression is that problem is " public Where take charge of address " etc..

The sentiment analysis module 40 can be realized by the way of neural network deep learning judges the current emotion State.In simple terms, using multi-layer perception (MLP), a large amount of corresponding label (such as will be " good " labeled as affirmative etc.) be marked, Recurrence task is completed, and constantly neural network deep learning is realized in training.Those skilled in the art should know neural network depth The basic conception of habit, details are not described herein again.Alternatively, the sentiment analysis module 40 passes through degree adverb of introducing emotion word etc. one Serial dictionary is given a mark come the sentence to input, to judge Sentiment orientation.For example, the emotion of " especially good " and " good " the two Degree is different, and " good " is made 0.6 score, and " especially good " makes 0.9 score, determines current emotion according to the height of score or corresponding grade Tendency.

When the current affective state of institute's speech recognition result reflection is query, that is to say, that the customer voice is When the problem of client, the sentiment analysis module 40 forms semantic understanding instruction, interrupts formula speech processes system so as to described System further understands and determines that voice is accordingly replied in the instruction of the customer voice, matching.When institute's speech recognition result is anti- When the current affective state reflected is the intentions emotions such as affirmative, refusal, negative or neutrality, the sentiment analysis module 40 forms one stream Journey executes instruction, and the broadcast module 30 is executed instruction according to the process plays corresponding voice, for example, main flow voice or Keep voice.

Certainly, those skilled in the art are it is appreciated that only citing is not limitation herein.According to different industries Or the feature in field and the requirement of phone robot user corresponding can be arranged not for different current affective states Same response.

Specifically, in one embodiment of this invention, when the current affective state of institute's speech recognition result reflection is willing When fixed or neutral, the sentiment analysis module 40 forms a main flow and executes instruction.The broadcast module 30 is according to described Main flow, which executes instruction, continues to play next main flow voice.That is, working as when the reflection of institute's speech recognition result When preceding affective state is affirmative or neutral, the default main flow is continued to execute, and is not interrupted.

For example, after the broadcast module 30 plays voice A1 and voice A2, the sentiment analysis module 40 is to described The current affective state that speech recognition result is reflected is determined as affirmative, such as " yes, I has demand for loan ", the emotion Analysis module 40 forms the main flow and executes instruction, and the broadcast module 30 is executed instruction according to the main flow to be continued to play Next main flow voice A4.That is, the main flow executes instruction including but not limited to current main-stream journey voice Number, next number that should play main flow voice and next storage location that should play main flow voice etc..

When the current affective state of institute's speech recognition result reflection is negative or refusal, the sentiment analysis mould The formation of block 40 one is kept process and is executed instruction.That is, when the current affective state of institute's speech recognition result reflection is no When fixed or refusal, the default main flow is interrupted, into keeping link.

Further, the process sound bank 10 for interrupting formula speech processing system keeps sound bank 12 including one, Voice is kept for storing at least one.The voice of keeping is with keeping the recording of property, such as " irrespective, you can be with First learn about again consider it is lower " or " you can slightly take off, our this sides be also all with the regular cooperation of bank, It makes loans very fast, and interest is also very low ".Specifically, each sub-process of the default main flow is designed with a correspondence Described keep voice.That is, each main flow voice can be seen as the sub-process, each main flow language Sound, which is both provided with, corresponding described keeps voice.For example, opening remarks process voice (voice A2) is associated in prologue sub-process The voice of keeping is voice A3；In Introduction of enterprises sub-process, Introduction of enterprises process voice (voice A4) is associated described to be drawn Staying voice is voice A5；In sub-process of going and finding out what's going on, process of going and finding out what's going on voice (voice A6) is associated described to keep voice It is voice A7；It is actively inviting in sub-process, actively inviting process voice (voice A8), associated described to keep voice be voice A9, as shown in Figure 3.

When institute's speech recognition result reflection current affective state be negate or refusal, cause to play down When the one main flow voice can not be played accordingly, in current substream journey it is corresponding it is described keep voice can be by the broadcast Module 30 plays, and is executed instruction with keeping process described in execution, keeps client.That is, the process of keeping executes instruction Can include but is not limited to, the number of current main-stream journey voice, should play described keep number and storage location of voice etc..

For example, after the broadcast module 30 plays voice A1 and voice A2, i.e., in prologue sub-process, the voice What the current affective state that 20 pairs of institute's speech recognition results of identification module are reflected was judged to refusing, such as " do not have, I is not required to Want ", the sentiment analysis module 40 formed it is described keep process and execute instruction, the broadcast module 30 is held according to the main flow Row instruction play in prologue sub-process it is corresponding it is described keep voice (voice A3), " you can slightly take off, we are here It also is all to make loans very fast, and interest is also very low with the regular cooperation of bank ".

Further, in one embodiment of this invention, when the current affective state of institute's speech recognition result reflection is doubtful When asking, that is to say, that when the problem of customer voice is client, the sentiment analysis module 40 forms the semantic understanding and refers to It enables, the formula speech processing system that interrupts is using the keyword match technology execution semantic understanding instruction, as shown in Figure 1. That is, the default main flow is interrupted, into semanteme when the current affective state of institute's speech recognition result reflection is query Understand link.

Specifically, the formula speech processing system that interrupts includes a semantic determining module 50 and a semantic classes library 60. The common-use words and profession that the semantic classes library 60 stores field used in a phone robot are for waiting words art classification, i.e., neck used The common difference in domain is semantic.In other words, the semantic classes library 60 includes multiple semantic classes 61.Each semantic classes 61 is semantic different between each other, accordingly match different response voices.Such as in financial credit field, the semantic classes library 60 may storage semantic classes have " lack interest more than a year? with half a year how much interest? a month how much interest? ", " how is interest It is so low? ", " vagrant, without work, not borrow money, credit bad " and " which qualification needed " etc. semantic classes, such as Shown in Fig. 4.It is appreciated that the semantic classes library 60 is likely to difference for different users, different fields, It can be pointedly arranged and be stored corresponding content.

In the present embodiment, the semantic determining module 50 is described to use keyword match by semantic understanding instruction triggers Technology matches each semanteme in institute's speech recognition result and the semantic classification library 60, is understood and determined described Customer voice forms a semantic classes information.For example, the keyword of semantic classes " where is company " be " company " " where ".When When occurring " company " and " which " in the semantics recognition result, the semantic determination of the determining module 50 semantics recognition result Semanteme belongs to semantic classes " where is company ", forms the corresponding semantic classes information.

In detail, the semantic determining module 50 traverses all preset keywords, if the semantics recognition result with Multiple semantic classes all match, then can choose the most semantic classes of matching number of words as final and determine semantic classes.Such as There is the same semantic classes of multiple matching numbers of words in fruit, then can default to choose and number forward semantic classes.This field skill Art personnel are not limitation it is appreciated that be merely illustrative herein.

On the basis of sentiment analysis, it is based on current affective state, determines and understands semanteme, not only make semantic understanding more It is accurate to add, but also semantic understanding more personalizes.And the keyword match technology based on sentiment analysis, compared to existing Keyword match technology is directlyed adopt, accuracy rate is higher, and error rate is lower.

In another embodiment of the invention, when the current affective state of institute's speech recognition result reflection is query, That is, the sentiment analysis module 40 forms the semantic understanding instruction, institute when the problem of customer voice is client Formula speech processing system can be interrupted by, which stating, executes the semantic understanding instruction using natural language processing technique, as shown in Figure 2.

Specifically, in this another embodiment, the formula speech processing system that interrupts further comprises a semantic vector Conversion module 70 is used for institute's speech recognition result vectorization, and keyword match technology compared with prior art can be made a general survey of The overall situation considers whole context.For example, position of the semantic vector conversion module 70 with each word in the semantics recognition result A bivector is formed with TFIDF value (Term Frequency-Inverse Document Frequency), to realize Vectorization.Or the semantic vector conversion module 70 can use bag of words, establishes bag of words and stores the phone robot By the word of application field words art related to industry, to realize vector according to the number which word in the bag of words occurs Change.Those skilled in the art it is appreciated that herein only citing be not limitation, other known method can be taken Or independently developed method realizes vectorization.Those skilled in the art should know the technologies basic conception such as bag of words and TFIDF, this Place repeats no more.

Further, the semantic determining module 50 determines the client according to the vectorization value of the semantics recognition result Affiliated semantic classes of the voice in the semantic classes library 60, that is, determine the semanteme of the customer voice, described in formation Semantic classes information.In this another embodiment, onrelevant it is assumed that institute between each word based on the semantics recognition result Predicate justice determining module 50 analyzes the maximum probability which semantic classes 61 is the semantics recognition result belong to, can be true The fixed affiliated semantic classes of semantics recognition result.

By taking bag of words technology as an example, the bag of words are { I how much can borrow ... }, and the customer voice is that " can borrow more It is few ", the speech recognition module 10 is to be identified as " how much can borrow ", then the semantic vector conversion module 70 is according to described Bag of words, vectorization value are { 0,1,1,1,1 ... ... }.It is described semanteme determining module 50 according to vectorization value be 0,1,1,1, 1 ... ... }, onrelevant analyzes which institute's predicate belonged to it is assumed that calculating between each word based on the semantics recognition result The maximum probability of adopted classification 61.For example, being { 0,1,1,1,1 ... ... } according to vectorization value, the semanteme determining module 50 is determined The possibility that the semantics recognition result belongs to " how much can borrow " this semantic classes 61 is maximum.

Preferably, the semantic determining module 50 utilizes Bayes, calculates the vectorization value category of the semantics recognition result In the probability of each classification, so that it is semantic as determining to be maximized the corresponding semantic classes 61.Compared with prior art The single matching of keyword, the accuracy rate of semantic understanding can be improved using Bayesian analysis.Preferably, the semantic determining module 50 using Bayes and and inverse document frequency, the semanteme of vectorization is further understood, analyzed and determined, reinforce to difference The weight of the most significant word of document, so that semantic understanding is more accurate and more harmony.

Further, the formula speech processing system that interrupts includes response recording matching module 80 and a response dictation library 90, the semantic classes 61 for being determined according to the semantic determining module 50 matches properly in the response dictation library 90 Corresponding response recording 91.

Specifically, the response dictation library 90 includes multiple responses recording 91.The response recording 91 is to prerecord work For the recording to response and the broadcasting of problem in the customer voice.Each response recording 91 and the corresponding semantic category Other 61 association.Such as in one embodiment of this invention, the response recording 91 and the corresponding semantic classes 61 pass through pass Join identifier association, for example the associated identifiers are implemented as recording serial number, the response recording 91 and corresponding institute's predicate Adopted classification 61 is equipped with identical recording serial number.That is, the response recording 91 and the semantic classes 61 are one-to-one pass System, each semantic classes 61 are equipped with the corresponding response recording 91 and are used as answer.For example, the semantic classes 61 be " how low interest is ", and " because what we docks is bank's inside channel, assuring mode is with response recording 91 You are into part, so what bank gave is all minimum preferential policy " it is associated, the two is associated by being identically numbered " 113 ", such as schemes Shown in 4.

The response recording matching module 90 passes through associated identifiers, Ji Ke according to the determining semantic classes 61 The response dictation library 60 matches suitable corresponding response recording 91, forms a response recorded message.For example, working as institute's predicate Adopted determining module 40 determines that the affiliated semantic classes 61 of semantics recognition result is " how much can borrow ", then according to associated identifiers " 124 ", the corresponding response recording 91 can be matched in the response dictation library 60, and " how much a according to you this borrows It is fixed that human feelings condition is come, everyone situation is different ", and the corresponding response recorded message is formed, as shown in Figure 4.

The response recorded message may include but not limit storage address, content and the number etc. of the response recording 91 Deng.The response recorded message is sent to the broadcast module 30 by the response recording matching module 90.The broadcast module 30 play the corresponding response recording 91 according to the response recorded message.

To sum up, the current affective state that the default main flow can be reflected according to the recognition result, be interrupted or It continues to execute.It is described to interrupt the execution of formula speech processing system and keep link or semantic reason when the default main flow is interrupted Solve link.When the default main flow is not interrupted, the formula speech processing system that interrupts continues to execute the default mainstream Journey.Wherein, in keeping link and semantic understanding link, when it is described keep voice or the response recording broadcasting after, according to visitor Next customer voice of the input at family carries out speech recognition and the emotion again by the speech recognition module 20 Analysis module 40 analyzes the current affective state that next customer voice is reflected, judges whether that institute can be continued to execute Default main flow is stated, is still interrupted to execute and semantic understanding link or keeps link, is reached and appropriately to take into account default purpose Cope with client.

Particularly, it when link is kept in entrance twice in succession or repeatedly, is drawn described in transmission twice in succession or repeatedly in other words When process being stayed to execute instruction, the broadcast module 30 can directly play the end voice in the default main flow.For example, working as The current affective state is negative twice in succession, i.e., client's continuous representation " not needing " when, the broadcast module 30 can be straight The process voice A10 for playing conclusion is met, is hung up the telephone later, so that unnecessary repetition be avoided to keep.

It is noted that in one embodiment of this invention, the formula speech processing system that interrupts includes an intention Identification module 100 and an intention type library 200, to identify that client is intended to, to filter out intention in a large amount of communication process Client provides working efficiency.The intention type library 200 stores multiple preset intention types, such as inquiry, understanding company's feelings Condition asks for contact method, understands product situation and the intention types such as dislike expensiveness.The intention assessment module 100 is according to the semanteme The resulting vectorization value of vector conversion module 70 determines the intention type that the recognition result embodies.Specifically, the intention is known Other module 100 can take the mode of deep learning neural network, raw into after crossing corpus pretreatment by obtaining training corpus Term vector is produced, is trained by LSTM.Or the intention assessment module 100 uses LDA document subject matter model, such as passes through Sentence similarity, the classification of Lai Jinhang text are calculated, and promotes with feature vector model the accuracy rate of AUC.For example, working as Intention assessment module 100 takes the mode of deep learning neural network to be, if institute's speech recognition result is that " CompanyAddress exists Where ", after 70 vectorization of semantic vector conversion module, the intention assessment module 100 can according to it is resulting to Quantized value is marked according to previous study and training, and determine the embodiment of institute's speech recognition result is intended to understanding company situation.

After the intention assessment module 100 determines that client is intended to, an intent information can be formed, for recording and analyzing. The phone robot can record the intent information and the information for putting through client in association.Further, the phone Robot can analyze point of interest and the mood etc. of relative clients according to the intent information.Or the phone robot can be with According to the intent information, client's intention is accurately classified, follow-up priority is directly distinguished according to customers wishes power, so as to fast Speed is concluded the transaction.

Another aspect according to the present invention, the present invention further provides one can interrupt formula method of speech processing.Such as Fig. 5 institute Show, for the flow chart that can interrupt formula method of speech processing.

Step (a): the main flow voice in a default main flow is played.

Multiple main flow voices are associated in a certain order, form the default main flow, preset to realize Purpose.And according to different purposes, the content and sequence of the default main flow are had nothing in common with each other.Accordingly, different default master In process, the content and mutual relationship of the main flow voice are had nothing in common with each other.Such as the example of lifted financial credit above Son forms the default main flow according to the sequence of recording A1- recording A2- recording A4- recording A4- recording A8- recording A10, with Wish to realize the purpose for recommending loan project, details are not described herein again.

Step (b): a customer voice is received, and is identified as a speech recognition result.

After the main flow voice plays, putting through client accordingly can react and answer, the customer voice of input It is received.Using speech recognition technology, the customer voice is identified, form institute's speech recognition result.Preferably, the voice Recognition result is text.Specifically, used speech recognition technology is not intended to limit, and those skilled in the art can take any Disclosed or its independently developed technology realizes speech recognition.

Step (b): the current emotional states that analysis institute's speech recognition result is reflected.

Specifically, emotion judgment can be realized using modes such as neural network deep learnings.The current emotional states can To be set according to demand with actual conditions, such as can simply set up as query, affirmation and negation, or can also be detailed Ground is set as neutral, affirmative, negative, refusal, query etc..

Step (c): according to the current emotional states, into corresponding default processing links.

According to demand and actual conditions, the different current emotional states trigger different default processing links, may The default main flow is continued to execute, next process voice is played, it is also possible to enter and keep link, language is kept in broadcasting Sound, it is also possible to enter semantic understanding link, play response voice, answer client's query etc..

Specifically, the step (c) further includes steps of

(c.2) when the current emotional states are negative or refusal, broadcasting is drawn with the main flow voice corresponding one Stay voice；

The semantic understanding technology for interrupting the use of formula method of speech processing of the invention does not limit.As shown in fig. 6, For the semantic understanding flow chart that can interrupt formula method of speech processing described in of the invention, natural-sounding processing method is used to realize Semantic understanding.

Step (A): vectorization institute speech recognition result forms a speech recognition result vectorization value.

The vectorization of institute's speech recognition result can establish bag of words, can also be with each in the semantics recognition result The position of a word and TFIDF value form bivector etc. and are not intended to limit herein.

Step (B): according to institute's speech recognition result vectorization value, semantic category belonging to institute's speech recognition result is determined Not, a semantic classes information is formed.

Specifically, there is the words art such as its specific common-use words and profession use in each industry and field.By these words arts according to Semantic classification in advance is in different semantic classes.According to institute's speech recognition result vectorization value, it can use probability calculation and go out Institute's speech recognition result belongs to the maximum probability of which semantic classes, so that it is determined that language belonging to institute's speech recognition result Adopted classification.The semantic classes information may include but not limit, the affiliated semantic classes number of speech recognition result, storage ground Location, content, associated response recording number etc..

Preferably, the step (B) further comprises step: by Bayes and or inverse document frequency, according to institute Speech recognition result vectorization value determines semantic classes belonging to institute's speech recognition result, forms the semantic classes letter Breath.

Step (C): according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging.

Different semantic classes is corresponding different responses.Corresponding response voice is prerecorded, and by it and is corresponded to Semantic classes association, such as by being identically numbered etc., be not intended to limit herein.In this way, according to the semantic classes information Corresponding response voice can be searched and be matched, the response voice messaging is formed.The response voice messaging may include but It is not limited to storage address, content and the number etc. of the response recording.

Step (D): the corresponding response voice is played according to the response voice messaging.

As shown in fig. 7, being another semantic understanding method flow for interrupting formula method of speech processing of the invention Figure uses keyword matching method to realize semantic understanding.

Step (I): by institute's speech recognition result and the preset keyword match of each semantic classes, the voice is determined Semantic classes belonging to recognition result forms the semantic classes information.

Each semantic classes has preset keyword.By institute's speech recognition result and preset keyword Match, if the preset all keywords of semantic classes occurs in institute's speech recognition result, that is, can determine the speech recognition As a result belong to the semantic classes.

Further, it is possible to all preset keywords be traversed, if the semantics recognition result and multiple semantic categories It does not all match, then can choose the most semantic classes of matching number of words as final and determine semantic classes.If there is multiple matchings The same semantic classes of number of words can then be defaulted to choose and number forward semantic classes.

Step (II): according to the semantic classes information, matching corresponding response voice, forms the response voice letter Breath.

Step (III): the corresponding response voice is played according to the response voice messaging.

It should be understood by those skilled in the art that foregoing description and the embodiment of the present invention shown in the drawings are only used as illustrating And it is not intended to limit the present invention.The purpose of the present invention has been fully and effectively achieved.Function and structural principle of the invention exists It shows and illustrates in embodiment, under without departing from the principle, embodiments of the present invention can have any deformation or modification.

Claims

1. formula speech processing system can be interrupted characterized by comprising

One default main flow library, wherein the default main flow library forms one according to the multiple main flow voices of default associated storage Default main flow；

One sentiment analysis module, wherein the current emotion shape that one speech recognition result of sentiment analysis module analysis is reflected State, wherein whether the sentiment analysis module instructs the broadcast module to interrupt described default according to the current affective state The execution of main flow.

It further comprise keeping sound bank, wherein described 2. according to claim 1 interrupt formula speech processing system Storage at least one keeps voice, wherein each main flow voice is provided with and corresponding described keeps voice.

3. according to claim 2 interrupt formula speech processing system, wherein when the current affective state is intention feelings When sense, the sentiment analysis module forms a process and executes instruction, and the broadcast module executes instruction broadcasting according to the process The corresponding main flow voice corresponding described keeps voice.

4. according to claim 3 interrupt formula speech processing system, wherein when the current affective state is affirmative Or when neutral, the broadcast module continues to play next main flow voice, to continue to execute the default main flow； When the current affective state is negative or refusal, the broadcast module play it is corresponding described in keep voice.

5. according to claim 1 interrupt formula speech processing system, wherein when the current affective state is query, The sentiment analysis module forms semantic understanding instruction, to interrupt the execution of the default main flow, into semantic understanding ring Section.

It further comprise a semantic determining module and a language 6. according to claim 5 interrupt formula speech processing system Adopted class library, wherein the semantic classes library includes multiple semantic classes, wherein the semanteme determining module is referred to by semantic understanding Triggering is enabled, each semantic classes in institute's speech recognition result and the semantic classification library is matched, determines institute The semantic classes belonging to speech recognition result forms a semantic classes information.

7. according to claim 6 interrupt formula speech processing system, wherein the semanteme determining module uses keyword Matching technique determines the semantic classes belonging to institute's speech recognition result, forms the semantic classes information.

It further comprise a semantic vector conversion module 8. according to claim 6 interrupt formula speech processing system, Described in semantic vector conversion module by institute's speech recognition result vectorization, form a speech recognition result vectorization value, Described in semanteme determining module according to institute's speech recognition result vectorization value, determine institute's speech recognition result in the semanteme Semantic classes described in class library forms the semantic classes information.

9. according to claim 8 interrupt formula speech processing system, wherein the semantic vector conversion module utilizes word Bag model technology, by institute's speech recognition result vectorization.

10. according to claim 8 interrupt formula speech processing system, wherein the semanteme determining module uses Bayes He or inverse document frequency determine the semantics recognition result described according to the vectorization value of institute's speech recognition result Semantic classes described in semantic classes library forms the semantic classes information.

It further comprise a speech recognition mould 11. according to any one of claims 1 to 10 interrupt formula speech processing system Block forms institute's speech recognition result wherein a customer voice is identified as text by the speech recognition module.

12. according to claim 6 to 10 it is any it is described interrupt formula speech processing system, further comprise response recording With module and a response dictation library, wherein the response dictation library includes multiple responses recording, each response recording and right The semantic classes association answered, wherein the recording matching module of answering is recorded according to the semantic classes information in the response The corresponding response recording is matched in sound library, a response recorded message is formed, wherein the broadcast module is according to the response Recorded message plays the corresponding response recording.

13. according to claim 8 to 10 it is any it is described interrupt formula speech processing system, further comprise an intention assessment mould Block and an intention type library, wherein the intention type library stores multiple preset intention types, wherein the intention assessment mould Root tuber determines the intention type that the recognition result embodies according to institute's speech recognition result vectorization value, for record and Analysis.

14. formula method of speech processing can be interrupted characterized by comprising

(a) the main flow voice in a default main flow is played, the plurality of main flow voice is in a certain order Association forms the default main flow；

15. according to claim 14 interrupt formula method of speech processing, wherein the step (c) further comprises step It is rapid:

(c.1) when the current emotional states are affirmative or are neutral, next main flow voice is accordingly played, to continue Execute the default main flow；

(c.2) it when the current emotional states are negative or refusal, plays and keeps language with the main flow voice corresponding one Sound；And

(c.3) when the current emotional states are query, semantic understanding instruction is sent, to enter a semantic understanding link.

16. according to claim 15 interrupt formula method of speech processing, after the step (c.3), further wrap Include step:

(c.3.1) by institute's speech recognition result and the preset keyword match of each semantic classes, the speech recognition is determined As a result the semantic classes belonging to, forms the semantic classes information.

17. according to claim 15 interrupt formula method of speech processing, after the step (c.3), further wrap Include step:

(c.3.3) according to institute's speech recognition result vectorization value, semantic classes belonging to institute's speech recognition result, shape are determined At a semantic classes information.

Further comprise step 18. according to claim 16 or 17 interrupt formula method of speech processing:

(c.3.4) according to the semantic classes information, corresponding response voice is matched, forms a response voice messaging；And

19. according to claim 16 interrupt formula method of speech processing, wherein the step (c.3.1) further comprises Step:

If the semantics recognition result is all matched with the preset keyword of multiple semantic classes, matching number of words is taken most More semantic classes forms the semantic classes information as semantic classes belonging to final determine；With

If there is the same semantic classes of multiple matching numbers of words, then it can default to choose and number forward semantic classes.