CN110162610A

CN110162610A - Intelligent robot answer method, device, computer equipment and storage medium

Info

Publication number: CN110162610A
Application number: CN201910305320.0A
Authority: CN
Inventors: 顾宝宝
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2019-08-23

Abstract

The invention discloses a kind of intelligent robot answer method, device, computer equipment and storage mediums, this method comprises: obtaining the collected raw tone of robot, carry out voice pretreatment to the raw tone, obtain efficient voice；The efficient voice is converted to by urtext using speech-to-text technology；Text Pretreatment is carried out to the urtext, obtains effective text；Effective text is identified using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism, obtains target intention；Target is chosen according to the target intention and talks about art, and target words art is converted by target voice by text-to-speech technology, the robot is controlled and plays the target voice, improve the flexibility attended a banquet and talked with robot.

Description

Intelligent robot answer method, device, computer equipment and storage medium

Technical field

The present invention relates to intelligent decision field more particularly to a kind of intelligent robot answer methods, device, computer equipment And storage medium.

Background technique

Existing intelligent training system, attend a banquet and robot question and answer process and dialog template be all it is pre-set, No matter answering that is attending a banquet and which type of being made, robot can all be putd question to according to pre-set problem, lack spirit Activity can not accomplish to carry out Intelligent dialogue according to the actual situation.

Summary of the invention

The embodiment of the present invention provides a kind of intelligent robot answer method, device, computer equipment and storage medium, with solution It certainly attends a banquet and talks with inflexible problem with robot.

A kind of intelligent robot answer method, comprising:

The collected raw tone of robot is obtained, voice pretreatment is carried out to the raw tone, obtains efficient voice；

The efficient voice is converted to by urtext using speech-to-text technology；

Text Pretreatment is carried out to the urtext, obtains effective text；

Effective text is identified using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism, is obtained Take target intention；

Target is chosen according to the target intention and talks about art, and target words art is converted by mesh by text-to-speech technology Poster sound controls the robot and plays the target voice.

A kind of intelligent robot answering device, comprising:

Raw tone preprocessing module carries out the raw tone for obtaining the collected raw tone of robot Voice pretreatment, obtains efficient voice；

Efficient voice turns text module, for the efficient voice to be converted to original text using speech-to-text technology This；

Urtext processing module obtains effective text for carrying out Text Pretreatment to the urtext；

Model identification module, for using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism to having Effect text is identified, target intention is obtained；

Text-to-speech module talks about art for choosing target according to the target intention, will by text-to-speech technology The target words art is converted into target voice, controls the robot and plays the target voice.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned intelligent robot answer party when executing the computer program Method.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes above-mentioned intelligent robot answer method when being executed by processor.

Above-mentioned intelligent robot answer method, device, computer equipment and storage medium, by obtaining robot acquisition Raw tone, and voice pretreatment is carried out to raw tone, efficient voice is obtained, subsequent step is facilitated to convert efficient voice to Original text to be identified, improves the accuracy rate of conversion.Efficient voice is converted to by urtext using speech-to-text technology, and Text Pretreatment is carried out to urtext, obtains effective text, then using target bi Recognition with Recurrent Neural Network model to effective Text is identified, target intention is obtained, and improves the accuracy rate of identification target intention.After obtaining target intention, according to target It is intended to choose target words art, improves the flexibility attended a banquet and talked with robot.Target words art is passed through into text-to-speech technology It is converted into target voice, control robot plays target voice, to complete the dialogue between machine and seat personnel.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is an application scenario diagram of intelligent robot answer method in one embodiment of the invention；

Fig. 2 is a flow chart of intelligent robot answer method in one embodiment of the invention；

Fig. 3 is a specific flow chart of step S10 in Fig. 2；

Fig. 4 is a specific flow chart of step S30 in Fig. 2；

Fig. 5 is another flow chart of intelligent robot answer method in one embodiment of the invention；

Fig. 6 is a specific flow chart of step S05 in Fig. 5；

Fig. 7 is a specific flow chart of step S052 in Fig. 6；

Fig. 8 is a specific flow chart of step S053 in Fig. 6；

Fig. 9 is a schematic diagram of intelligent robot answering device in one embodiment of the invention；

Figure 10 is a schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

Intelligent robot answer method provided by the present application, can be applicable in the application environment such as Fig. 1, terminal device passes through Network is communicated with server.Wherein, terminal device of the invention is specially robot.Server is handling machine people acquisition Raw tone, and according to raw tone obtain target intention server.Wherein, raw tone refers to that robot is adopted by sound The voice for the seat personnel that the needs of collection module acquisition identify.Target intention refers to be used to indicate that user says by what raw tone obtained Talk about the information being intended to.

In one embodiment, it as shown in Fig. 2, providing a kind of intelligent robot answer method, applies in Fig. 1 in this way Server for be illustrated, include the following steps:

S10: obtaining the collected raw tone of robot, carries out voice pretreatment to raw tone, obtains efficient voice.

Wherein, raw tone refers to the voice for the seat personnel that robot is identified by the needs that sound acquisition module acquires. Voice pretreatment, which refers to, carries out the processing such as preemphasis, framing, adding window and end-point detection to raw tone, removes quiet in raw tone Segment and noise segment retain the method for the apparent raw tone of vocal print consecutive variations.

Specifically, it after obtaining raw tone, needs to carry out preemphasis, framing, adding window and end-point detection etc. to raw tone Voice pretreatment, removes mute section and noise segment in raw tone, only retains former containing the apparent voice of vocal print consecutive variations Beginning voice, i.e. efficient voice.To raw tone carry out voice pretreatment, facilitate subsequent step by efficient voice be converted into it is original to It identifies text, improves the accuracy rate of conversion.

S20: efficient voice is converted to by urtext using speech-to-text technology.

Speech-to-text technology in the present embodiment is using ASR technology.Wherein, ASR (Automatic Speech Recognition, automatic speech recognition technology) it is a kind of technology that the voice of people is converted to text.

Specifically, after obtaining efficient voice, the corresponding server of robot uses ASR technology, and efficient voice is converted For urtext.Wherein, urtext refers to the text that efficient voice is converted into corresponding written form by ASR technology.By Expressed in the form of speech in efficient voice, if developer directly passes through the voice content listened to, to efficient voice into Row label processing, it has not been convenient to operation and preservation, and processing speed is slow.Efficient voice is converted into urtext, with the shape of text Formula is expressed, and carries out tag processes to the content of text by way of reading text, facilitates operation, and treatment effeciency is high.

S30: Text Pretreatment is carried out to urtext, obtains effective text.

Wherein, effective text, which refers to, pre-processes urtext, and removal data, additional character and stop words meet pre- If the text of length (such as 8 words).Data in the present embodiment, which refer to, is converted to the number occurred after urtext for efficient voice； Additional character refers to the character that cannot be identified occurred after efficient voice to be converted to urtext.As $, *, &, # ,+,?.

Specifically, after obtaining urtext, the corresponding server of robot needs to pre-process urtext, will Data and additional character removal in urtext；Further, in order to facilitate step S40, nerve is recycled using target bi Network model identifies effective text, after the data and additional character in removal urtext, it is also necessary to according to default Length cuts the urtext of removal data and additional character, so that the urtext of the removal data and additional character The requirement for meeting preset length obtains cutting text.Stop words in finally removal cutting text, reservation represent physical meaning Word forms effective text.

Further, after obtaining effective text, effective text is sent to client by the corresponding server of robot, with Make content of the developer in client by the effective text of reading, labeling processing is carried out to effective text, so that effectively text The corresponding text label of this acquisition, so that step S0531 constructs loss function.

S40: effective text is known using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism Not, target intention is obtained.

Wherein, target bi Recognition with Recurrent Neural Network (BRNN, Bi-directional Recurrent Neural Networks, bidirectional circulating neural network) model refers to preparatory trained text effective for identification, obtain the mould of target intention Type.Attention mechanism refers to the weight different according to imparting according to the different pairs of data importance, the big corresponding weight of importance Greatly, the small corresponding weight of importance is small.If a word is " today, weather was fine ", wherein " today " does not weigh in the words It wants, corresponding weight is small, and " weather " and " fine " is all critically important in the words, and corresponding weight is big and weight Size is identical.

Specifically, after obtaining effective text, the corresponding server of robot carries out effective text using participle tool Cutting removes stop words (participle, preposition, pronoun etc.), obtains target word.Wherein, target word refers to effective text removal stop words Remaining word afterwards.After obtaining target word, target word is converted into corresponding target term vector using term vector crossover tool.Most Afterwards, target term vector is input in the target bi Recognition with Recurrent Neural Network model generated using attention mechanism and is identified, Obtain target intention.Target intention feeling the pulse with the finger-tip mark bidirectional circulating neural network model in the present embodiment is according to the knowledge to effective text Not, the information of the corresponding intention of the effective text of acquisition.Target intention is obtained using target bi Recognition with Recurrent Neural Network model, It can effectively improve the accuracy rate of target intention.

S50: choosing target according to target intention and talk about art, and target words art is converted into target language by text-to-speech technology Sound, control robot play target voice.

Specifically, in order to more fully meet customer need, each target intention in the present embodiment is all provided with multiple Talk about art template.After obtaining target intention, by target intention choose it is corresponding with target intention if art template, then from multiple One, which is randomly selected, as target in words art template talks about art.Finally, target words art is converted by target voice by TTS technology, And control robot and play the target voice, to complete the dialogue with seat personnel.Wherein, TTS technology refers to computer oneself The technology that generation or externally input text information are changed into Chinese characters spoken language and export.Target voice refers to will by TTS technology Target words art is converted into the voice that oral communication is carried out for robot and seat personnel.

Step S10- step S50 by obtaining the raw tone of robot acquisition, and carries out voice to raw tone and locates in advance Reason obtains efficient voice, facilitates subsequent step to convert original text to be identified for efficient voice, improve the accuracy rate of conversion. Efficient voice is converted to by urtext using speech-to-text technology, and Text Pretreatment is carried out to urtext, acquisition has Text is imitated, then effective text is identified using target bi Recognition with Recurrent Neural Network model, obtains target intention, improves and knows The accuracy rate of other target intention.After obtaining target intention, target is chosen according to target intention and talks about art, improves and attends a banquet and machine The flexibility of people's dialogue.Target is talked about into art, target voice is converted by text-to-speech technology, control robot plays target Voice, to complete the dialogue between machine and seat personnel.

In one embodiment, since raw tone is the not voice Jing Guo any processing, including noise segment and mute section, Wherein, the noise segment in the present embodiment refers to that speaker when speaking, to be formed since the collision of the switch, object of door and window makes a sound Voice segments.Mute section of finger speaker does not pronounce due to breathing, thinking deeply, so that occurring silent voice segments in raw tone. Noise segment and mute section can to subsequent step using target bi Recognition with Recurrent Neural Network model obtain target intention generate it is serious It influences, therefore, after obtaining raw tone, needs to handle raw tone, the noise segment and quiet in removal raw tone Segment provides efficiently and accurately data source for subsequent step.As shown in figure 3, carrying out language to raw tone in step S10 Sound pretreatment, obtains efficient voice, specifically comprises the following steps:

S11: preemphasis, framing and windowing process are carried out to raw tone, obtain received pronunciation.

Wherein, received pronunciation refers to that raw tone carries out the voice obtained after preemphasis, framing and windowing process.

Specifically, the process for obtaining received pronunciation is as follows: (one) uses the formula s' of preemphasis processing_n=s_n-a*s_n-1It is right Raw tone carries out preemphasis processing, to eliminate the influence of the vocal cords and lip of speaker to speaker's voice, improves speaker The high frequency resolution of voice.Wherein, s'_nFor the voice signal amplitude at preemphasis treated n moment, s_nBelieve for the voice at n moment Number amplitude, s_n-1For the voice signal amplitude at n-1 moment, a is pre emphasis factor.(2) to preemphasis treated raw tone Sub-frame processing is carried out, in framing, discontinuous place can all occur in the starting point and end point of each frame voice, and framing is got over It is more, it is also bigger with the error of raw tone.(3) in order to keep the frequency characteristic of each frame voice, it is also necessary to carry out at adding window Reason, the formula of windowing process areWith s "_n=w_n*s′_n, wherein w_nFor the Hamming at n moment Window, N are that Hamming window window is long, s'_nFor the signal amplitude in n moment time domain, s "_nFor the signal amplitude in time domain after n moment adding window. Raw tone is pre-processed, received pronunciation is obtained, received pronunciation progress endpoint detection processing is provided effectively to be subsequent Data source.

S12: endpoint detection processing is carried out to received pronunciation, obtains efficient voice.

Wherein, endpoint detection processing is that a kind of processing of the starting point and end point of efficient voice is determined from one section of voice Means.

Specifically, inevitably there are mute section of voices corresponding with noise segment in a segment standard voice, therefore, In acquisition raw tone and after pretreatment, the corresponding server of robot can carry out endpoint detection processing to received pronunciation, Mute section of voice corresponding with noise segment is got rid of, retains the apparent voice of vocal print consecutive variations, using the voice as effective language Sound reduces subsequent data volume to be treated when converting urtext for efficient voice, in addition, getting rid of mute section and noise The corresponding voice of section, can also be improved the accuracy of urtext.

Step S11- step S12, after carrying out preemphasis, framing and windowing process to raw tone, to the standard speech of acquisition Sound carries out endpoint detection processing, removes mute section and noise segment in received pronunciation, retains only obvious comprising vocal print consecutive variations Voice, that is, efficient voice, reduce subsequent data volume to be treated when converting urtext for efficient voice, improve original The accuracy of text.

In one embodiment, as shown in figure 4, step S30, carries out Text Pretreatment to urtext, obtain effective text, Specifically comprise the following steps:

S31: the first pretreatment is carried out to urtext using regular expression, and will be pretreated original by first Text is cut into corresponding cutting text according to preset length.

Wherein, regular expression is also known as regular expression (Regular Expression, is often abbreviated as in code Regex, regexp or RE), the regular expression in the present embodiment refers to that a kind of logic for being filtered operation to urtext is public Formula.The regular expression is specifically used to be filtered data in urtext and additional character.Preset length refers to according to reality Need to pre-set is used to for urtext being cut into the value of preset length.

Specifically, due in urtext data and symbol subsequent acquisition target intention is not acted on, also will increase Therefore the data processing amount of target bi Recognition with Recurrent Neural Network model after obtaining urtext, is needed using writing in advance Regular expression to urtext carry out first pretreatment, remove urtext in data and additional character.It is former in removal After data and additional character in beginning text, it will be cut by the first pretreated urtext according to preset length default Length obtains cutting text.Wherein, cutting text refers to urtext is cut according to preset length after the text that is formed This.

S32: the second pretreatment is carried out to cutting text using participle tool, obtains effective text.

Specifically, cutting is carried out to cutting text using participle tool, removes stop words (participle, preposition, pronoun etc.), obtains It takes and effective text is formed by based on remaining word.Participle tool in the present embodiment includes but is not limited to participle tool of stammering. Stop words refers in information retrieval, to save memory space and improving search efficiency, in processing natural language data (or text Originally certain words or word are fallen in meeting automatic fitration before or after), which can deactivate vocabulary with reference to Baidu or Harbin Institute of Technology stop Word dictionary is by developer's self-defining.

Step S31- step S32 carries out the first pretreatment, removal data, special to urtext using regular expression Then symbol cuts the first pretreated urtext according to preset length, obtain cutting text, finally use and divide Word tool carries out the second pretreatment to cutting text, removes stop words, obtains effective text, provide for subsequent acquisition target intention Effective data source.

In one embodiment, as shown in figure 5, in step S10, before obtaining raw tone, intelligent robot answer method Further include trained original bidirectional circulating neural network model, obtains the target bi Recognition with Recurrent Neural Network that can identify target intention Model specifically comprises the following steps:

S01: obtaining training voice, carries out voice pretreatment to training voice, obtains pretreatment voice.

Wherein, training voice refers to the weight for adjusting original bidirectional circulating neural network model and the voice of biasing.Tool Body, the corresponding server of robot obtains training voice, and pre-processes to the training voice of acquisition, obtains pretreatment language Sound, the pretreatment voice refer to the voice that trained voice obtains after pretreatment.Preprocessing process such as step in the present embodiment S11- step S12 repeats no more to avoid repeating.

S02: preprocessed text is converted to for voice is pre-processed using speech-to-text technology.

Specifically, after obtaining pretreatment voice, pretreatment is converted to for voice is pre-processed using speech-to-text technology Text.Wherein, preprocessed text refers to that pretreatment voice is converted into the text of corresponding written form by speech-to-text technology. Speech-to-text technology in the present embodiment is using ASR technology.

S03: Text Pretreatment is carried out to preprocessed text, obtains training sample.

Specifically, after obtaining preprocessed text, the corresponding server of robot needs to carry out text to preprocessed text Pretreatment removes data and additional character, and carries out according to preprocessed text of the preset length to removal data and additional character Cutting finally removes stop words, obtains training sample.Wherein, training sample, which refers to, pre-processes preprocessed text, removes divisor According to, additional character and stop words, meet the text of preset length.The training sample is used for training objective bidirectional circulating neural network Model, so that subsequent obtain target intention according to target bi Recognition with Recurrent Neural Network model.Specific implementation process such as step S31- Step S32 repeats no more to avoid repeating.

S04: training sample is divided into training set and test set.

Specifically, after obtaining training sample, training sample is divided into training set and test set.Generally, training set Ratio with test set is 9:1.Training set refers to the text for adjusting the parameter in original bidirectional circulating neural network model.It surveys Examination collection is the text for testing the recognition accuracy of trained original bidirectional circulating neural network model.

S05: training set is input in original bidirectional circulating neural network model and is trained, effective bidirectional circulating is obtained Neural network model.

Wherein, original bidirectional circulating neural network model is by two Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNN) composition.For ease of description, one of Recognition with Recurrent Neural Network is referred to as followed forward in the present embodiment Ring neural network (RNN forward), another Recognition with Recurrent Neural Network are known as Recognition with Recurrent Neural Network (RNN backward) backward.It is original double RNN forward in Recognition with Recurrent Neural Network model (original BRNN) and backward RNN have corresponding hidden layer, input layer and defeated Layer shares one out.The neural network mould that i.e. original BRNN is made of an input layer, two hidden layers and an output layer Type.The original BRNN includes the ginseng of the neuron connection between each layer (input layer, two hidden layers and an output layer) Number (weight and biasing), these weights and biasing determine the property and recognition effect of original BRNN.

Specifically, training set is obtained, training set is input in original bidirectional circulating neural network model and is trained, is adjusted Weight and biasing in whole original bidirectional circulating neural network model, obtain effective bidirectional circulating neural network model.Wherein, have Effect bidirectional circulating neural network model refers to the bidirectional circulating neural network model obtained according to training set.

S06: test set being input in effective bidirectional circulating neural network model and is tested, and it is corresponding to obtain test set Effective bidirectional circulating neural network model is determined as target bi circulation mind if accuracy rate reaches preset threshold by accuracy rate Through network model.

Specifically, after obtaining effective bidirectional circulating neural network model, in order to verify effective bidirectional circulating neural network Test set is input in effective bidirectional circulating neural network model and tests by model accuracy, and it is corresponding to obtain test set It is double to be determined as target if accuracy rate reaches preset threshold (such as 90%) by accuracy rate for effective bidirectional circulating neural network model To Recognition with Recurrent Neural Network model.

Step S01- step S06 will be located by carrying out voice pretreatment to training voice using speech-to-text technology in advance Reason voice is converted to preprocessed text, so that training set only includes the content that can be used to carry out model training.To pretreatment text This progress Text Pretreatment obtains training sample, to improve training effectiveness and the training of original bidirectional circulating neural network model Accuracy.In order to avoid there is over-fitting, also need using test set to trained effective bidirectional circulating neural network It is tested in model, whether is satisfactory model with the effective bidirectional circulating neural network model of determination, if test set pair The accuracy rate answered reaches preset threshold, then it represents that the identification accuracy of effective bidirectional circulating neural network model is met the requirements, can To be determined as target bi Recognition with Recurrent Neural Network model, for obtaining target intention.

In one embodiment, as shown in fig. 6, step S05, is input to original bidirectional circulating neural network model for training set In be trained, obtain effective bidirectional circulating neural network model, specifically comprise the following steps:

S051: to the weight and biasing progress Initialize installation in original bidirectional circulating neural network model.

In the present embodiment, Initialize installation is carried out to weight and biasing using preset value, which is developer's root The value pre-set according to experience.The weight of original bidirectional circulating neural network model and biasing are carried out using preset value initial Change setting, it can be when subsequent trained to original bidirectional circulating neural network model according to training set, when shortening the training of model Between, improve the recognition accuracy of model.If weight and biasing Initialize installation to original bidirectional circulating neural network model are not It is very appropriately, then the adjustment capability that will lead to model in the initial stage is very poor, to influence the original bidirectional circulating neural network The subsequent recognition accuracy to target intention of model.

S052: being converted into term vector for training set, by term vector be input in original bidirectional circulating neural network model into Row training obtains model output.

Specifically, the word in training set is converted to by term vector by term vector crossover tool, it is possible to understand that ground, wait train Concentrating includes at least one term vector.The term vector crossover tool used in the present embodiment is word2vec (word to Vector, word converting vector), wherein word2vec is a kind of tool that word is converted to vector, can be incited somebody to action in the tool Each word is mapped to corresponding vector.

Will after training set is converted into term vector, firstly, input layer respectively by term vector be input to hidden layer forward and to It is calculated in hidden layer afterwards, obtains hidden layer and the corresponding output of hidden layer backward forward.Wherein, before hidden layer is directed toward forward The hidden layer of Recognition with Recurrent Neural Network；The hidden layer of Recognition with Recurrent Neural Network after hidden layer is directed toward backward.

Then, using the corresponding attention of hidden layer forward and backward hidden layer (attention) mechanism to forward The output of hidden layer and backward hidden layer carries out Automobile driving.

Finally, obtaining finally entering original to by attention mechanism treated two outputs progress fusion treatments The value of the output layer of bidirectional circulating neural network model, and by the calculating of output layer, obtain model output.Model output be to The output that training set is obtained by the training of original bidirectional circulating neural network model.Fusion treatment in the present embodiment includes but not It is limited to using arithmetic mean law and weighted average method, for ease of description, subsequent step uses method of arithmetical average pair Treated that two outputs carry out fusion treatments for attention mechanism.

S053: weight and biasing in original bidirectional circulating neural network model are updated based on model output, obtained effective Bidirectional circulating neural network model.

Specifically, it after obtaining model output, is adopted based on model output building loss function then according to loss function Weight and the biasing that original bidirectional circulating neural network model is adjusted with back-propagation algorithm, obtain effective bidirectional circulating nerve net Network model.Wherein, backpropagation (Back Propagation) algorithm refers to hides according to the reverse sequence adjustment of time sequence status Between weight and biasing and input layer and hidden layer between layer and the output layer of original bidirectional circulating neural network model The algorithm of weight and biasing.

Step S051- step S053, by being carried out to the weight in original bidirectional circulating neural network model with biasing Initialize installation shortens the training time of model, improves the recognition accuracy of model.Then it two-way is followed using training set to original Ring neural network model is trained, and adjusts the weight in original bidirectional circulating neural network model and biasing, so that original double Weight and biasing into Recognition with Recurrent Neural Network model are more in line with needs.

In one embodiment, original bidirectional circulating neural network recycles including Recognition with Recurrent Neural Network forward and backward nerve net Network obtains mould as shown in fig. 7, step S052, term vector is input in original bidirectional circulating neural network model and is trained Type output, specifically comprises the following steps:

S0521: term vector is input to the input layer of original bidirectional circulating neural network model, treated by input layer Term vector is input to the hidden layer forward of Recognition with Recurrent Neural Network forward, and is handled using attention mechanism, obtains defeated forward Out.

Specifically, term vector is input to the input layer of original bidirectional circulating neural network model, what input layer will acquire Term vector is input in hidden layer forward, passes through formula h in hidden layer forward_t1=σ (Ux_t+Wh_t-1+ b) it calculates and hides forward The output of layer.Wherein, σ indicates the activation primitive of Recognition with Recurrent Neural Network hidden layer forward, U indicate input layer and hidden layer forward it Between weight, W indicates the weight between each hidden layer of Recognition with Recurrent Neural Network forward, b indicate input layer and follow forward hidden layer it Between biasing, x_tIndicate the term vector of t moment input input layer, h_t1Indicate the output of the term vector of t moment in hidden layer forward, h_t-1Indicate the output of the term vector at t-1 moment in hidden layer forward.

The output of hidden layer forward is handled using attention mechanism, acquisition exports forward.Wherein, output refers to forward The value obtained after being handled using attention mechanism the output of hidden layer forward.Specifically, according to formula Calculate the importance value of semantic vector, wherein c_t1Refer to attention mechanism to the attention of the semantic vector of t moment in hidden layer forward (i.e. weight), α_tjRefer to the correlation of the term vector term vector corresponding with t moment of j-th of input, h_jRefer to the term vector of j input The output obtained after being calculated by hidden layer forward.Further, normalization process ise_tj=V^Γtanh (U·h_j+WS_t-1+ b) wherein, k refers to that k-th inputs term vector, and V indicates the weight between hidden layer and output layer, V^ΓWeight V's Transposition, S_t-1Refer to the output of t-1 moment output layer.

S0522: by input layer, treated that term vector is input to the hidden layer backward of Recognition with Recurrent Neural Network backward, and uses Attention mechanism is handled, and acquisition exports backward.

Specifically, term vector is input to input layer, and the term vector that input layer will acquire is input in hidden layer backward, Pass through formula h in hidden layer backward_t2=σ (Ux_t+Wh_t-1+ b) calculate the output of hidden layer backward.Wherein, σ expression follows hidden backward The activation primitive of layer is hidden, U indicates that input layer and the backward weight between hidden layer, W indicate that Recognition with Recurrent Neural Network is each backward and hide Weight between layer, b indicate input layer and the biasing between hidden layer backward, x_tIndicate input layer in t moment input word to Amount, h_t2Indicate the output of the term vector of t moment in hidden layer backward, h_t-1Indicate the term vector at t-1 moment in hidden layer backward Output.

The output of hidden layer backward is handled using attention mechanism, acquisition exports backward.Wherein, output refers to backward The value obtained after being handled using attention mechanism the output of hidden layer backward.Specifically, according to formula Calculate the importance value of semantic vector, wherein c_t2Refer to attention mechanism to the language of t moment in the hidden layer of Recognition with Recurrent Neural Network backward The attention (i.e. weight) of adopted vector, α_tjRefer to the correlation of the term vector term vector corresponding with t moment of j-th of input, h_jRefer to j The output that the term vector of a input obtains after being calculated by hidden layer backward.Further, normalization process ise_tj=V^Γtanh(U·h_j+WS_t-1+ b) wherein, k refers to that k-th inputs term vector, and V indicates hidden layer and defeated Weight between layer out, V^ΓThe transposition of weight V, S_t-1Refer to the output of t-1 moment output layer.

S0523: fusion treatment is carried out to output forward and backward output, obtains model output.

Specifically, after obtaining output forward and exporting backward, formula is usedTo output forward and backward Output carries out fusion treatment, obtains target output.Wherein, target output refers to the output that be finally input to output layer.Obtain mesh After mark output, target is input in output layer, according to formula S_t=f (S_t-1,y_t-1,c_t) calculated, it is defeated to obtain model Out.Wherein, S_tIndicate the output of t moment output layer, S_t-1Indicate the output of t-1 moment output layer, y_t-1Refer to what the t-1 moment inputted The text label that term vector carries, f generally select softmax function.It obtains model output and constructs loss function convenient for subsequent step, So as to adjust Recognition with Recurrent Neural Network and the backward weight of Recognition with Recurrent Neural Network and weighting forward in bidirectional circulating neural network model.

Step S0521- step S0523, by obtaining output forward and exporting backward, to obtain model output, after convenient Continuous step constructs loss function, so as to adjust Recognition with Recurrent Neural Network and backward circulation mind forward in bidirectional circulating neural network model Weight and weighting through network.

In one embodiment, training set carries text label, wherein text label refers to that developer passes through to training sample This understanding, the label being labeled.As shown in figure 8, step S053, updates original bidirectional circulating nerve net based on model output Weight and biasing in network model obtain effective bidirectional circulating neural network model, specifically comprise the following steps:

S0531: loss function is constructed based on model output and text label.

Specifically, after obtaining model output, S is exported based on model_tWith text label y_tConstruct loss function.The present embodiment In loss function beWherein, T is indicated in training set Timing tag entrained by term vector, t indicate t-th of timing in timing tag, θ indicate weight and biasing set (U, V, W, b, c), y_tIndicate the corresponding text label of term vector.

S0532: updating weight and biasing in original bidirectional circulating neural network model based on loss function, obtains effective Bidirectional circulating neural network model.

Specifically, loss function is being obtained then, according to formulaWith Back-propagation algorithm is respectively updated the corresponding weight of Recognition with Recurrent Neural Network forward and backward Recognition with Recurrent Neural Network and biasing, Recognition with Recurrent Neural Network and the weight of Recognition with Recurrent Neural Network and biasing backward forward is adjusted, when model output is calculated by loss function Obtained loss reaches requirement (as loss is no more than 10%), then the weight and the corresponding original loop neural network of biasing then may be used It is determined as effective bidirectional circulating neural network.

Step S0531- step S0532 is updated in original bidirectional circulating neural network model by constructing loss function Weight and biasing obtain effective bidirectional circulating neural network model.

Intelligent robot answer method provided by the invention, by obtaining the raw tone of robot acquisition, and to original Voice carries out voice pretreatment, obtains efficient voice, facilitates subsequent step to convert original text to be identified for efficient voice, mention The accuracy rate of height conversion.Efficient voice is converted to by urtext using speech-to-text technology, and text is carried out to urtext This pretreatment obtains effective text.Original bidirectional circulating neural network model is trained using training set and test set and Test, obtain target bi Recognition with Recurrent Neural Network model, and using target bi Recognition with Recurrent Neural Network model to effective text into Row identification, obtains target intention, improves the accuracy rate of identification target intention.After obtaining target intention, selected according to target intention It takes target to talk about art, improves the flexibility attended a banquet and talked with robot.Target words art is converted by text-to-speech technology Target voice, control robot plays target voice, to complete the dialogue between machine and seat personnel.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, a kind of intelligent robot answering device is provided, the intelligent robot answering device and above-mentioned reality Intelligent robot answer method in example is applied to correspond.As shown in figure 9, the intelligent robot answering device includes that raw tone is pre- Processing module 10, efficient voice turn text module 20, urtext processing module 30, model identification module 40 and text-to-speech Module 50.Detailed description are as follows for each functional module:

Raw tone preprocessing module 10 carries out language to raw tone for obtaining the collected raw tone of robot Sound pretreatment, obtains efficient voice.

Efficient voice turns text module 20, for efficient voice to be converted to urtext using speech-to-text technology.

Urtext processing module 30 obtains effective text for carrying out Text Pretreatment to urtext.

Model identification module 40, for using the target bi Recognition with Recurrent Neural Network model pair for using attention mechanism to generate Effective text is identified, target intention is obtained.

Text-to-speech module 50 talks about art for choosing target according to target intention, by text-to-speech technology by mesh Mark words art is converted into target voice, and control robot plays target voice.

Further, raw tone preprocessing module 10 includes that the first pretreatment unit of voice and voice second pre-process list Member.

The first pretreatment unit of voice obtains standard speech for carrying out preemphasis, framing and windowing process to raw tone Sound.

The second pretreatment unit of voice obtains efficient voice for carrying out endpoint detection processing to received pronunciation.

Further, text processing module 30 includes the second pretreatment unit of the first pretreatment unit of text and text.

The first pretreatment unit of text, for carrying out the first pretreatment to urtext using regular expression, and will be through It crosses the first pretreated urtext and is cut into corresponding cutting text according to preset length.

The second pretreatment unit of text obtains effective for carrying out the second pretreatment to cutting text using participle tool Text.

Further, intelligent robot answering device further includes trained speech preprocessing module 01, voice is trained to turn text Module 02, training sample obtain module 03, training sample processing module 04, model training module 05 and model measurement module 06.

Training speech preprocessing module 01 carries out voice pretreatment to training voice, obtains pre- for obtaining trained voice Handle voice.

Training speech to text module 02 is converted to pretreatment text for will pre-process voice using speech-to-text technology This.

Training sample obtains module 03, for carrying out Text Pretreatment to preprocessed text, obtains training sample.

Training sample processing module 04, for training sample to be divided into training set and test set.

Model training module 05 is trained for training set to be input in original bidirectional circulating neural network model, Obtain effective bidirectional circulating neural network model.

Model measurement module 06 is tested for test set to be input in effective bidirectional circulating neural network model, The corresponding accuracy rate of test set is obtained, if accuracy rate reaches preset threshold, effective bidirectional circulating neural network model is determined For target bi Recognition with Recurrent Neural Network model.

Further, model training module includes parameter initialization setting unit, model output acquiring unit and model ginseng Number updating unit.

Parameter initialization setting unit, for being carried out just to the weight in original bidirectional circulating neural network model with biasing Beginningization setting.

Model exports acquiring unit and term vector is input to original bidirectional circulating for training set to be converted into term vector It is trained in neural network model, obtains model output.

Model parameter updating unit, for updating the weight in original bidirectional circulating neural network model based on model output And biasing, obtain effective bidirectional circulating neural network model.

Further, original bidirectional circulating neural network includes Recognition with Recurrent Neural Network and backward Recognition with Recurrent Neural Network forward.

Further, model output acquiring unit includes exporting acquiring unit forward, exporting acquiring unit and fusion backward Handle computing unit.

Acquiring unit is exported forward, for term vector to be input to the input layer of original bidirectional circulating neural network model, By input layer, treated that term vector is input to the hidden layer forward of Recognition with Recurrent Neural Network forward, and is carried out using attention mechanism Processing, acquisition export forward.

Export acquiring unit backward, for by input layer treated term vector is input to backward Recognition with Recurrent Neural Network to Hidden layer afterwards, and handled using attention mechanism, acquisition exports backward.

Fusion treatment computing unit obtains model output for carrying out fusion treatment to output forward and backward output.

Further, training set carries text label.

Further, model parameter updating unit includes loss function construction unit and weight and biasing updating unit.

Loss function construction unit, for constructing loss function based on model output and text label.

Weight and biasing updating unit, for updating the power in original bidirectional circulating neural network model based on loss function Value and biasing, obtain effective bidirectional circulating neural network model.

Specific restriction about intelligent robot answering device may refer to above for intelligent robot answer method Restriction, details are not described herein.Modules in above-mentioned intelligent robot answering device can be fully or partially through software, hard Part and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor calls Corresponding operation.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 10.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing the data that intelligent robot answer method is related to.The network interface of the computer equipment is used It is communicated in passing through network connection with external terminal.To realize a kind of intelligent robot when the computer program is executed by processor Answer method.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize the robot intelligence of above-described embodiment when executing computer program Can answer method, such as the step S50 or Fig. 3 of step S10- shown in Fig. 2 be to step shown in fig. 8, to avoid repeating, this In repeat no more.Alternatively, each in realization this embodiment of intelligent robot answering device when processor executes computer program Module/unit function, such as module shown in Fig. 9 10 is to the function of module 50, alternatively, function of the module 01 to module 06, To avoid repeating, which is not described herein again.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the intelligent robot answer method of above-described embodiment, such as the step of step S10- shown in Fig. 2 when being executed by processor Rapid S50 or Fig. 3 is to step shown in fig. 8, and to avoid repeating, which is not described herein again.Alternatively, the computer program is located Reason device realizes the function of each module/unit in above-mentioned this embodiment of intelligent robot answering device, such as Fig. 9 institute when executing The module 10 shown is to the function of module 50, alternatively, function of the module 01 to module 06, to avoid repeating, which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of intelligent robot answer method characterized by comprising

Text Pretreatment is carried out to the urtext, obtains effective text；

Effective text is identified using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism, obtains mesh Mark is intended to；

Target is chosen according to the target intention and talks about art, and target words art is converted by target language by text-to-speech technology Sound controls the robot and plays the target voice.

2. intelligent robot answer method as described in claim 1, which is characterized in that described to carry out language to the raw tone Sound pretreatment, obtains efficient voice, comprising:

Preemphasis, framing and windowing process are carried out to the raw tone, obtain received pronunciation；

Endpoint detection processing is carried out to the received pronunciation, obtains efficient voice.

3. intelligent robot answer method as described in claim 1, which is characterized in that described to carry out text to the urtext This pretreatment obtains effective text, comprising:

The first pretreatment is carried out to the urtext using regular expression, and the first pretreated urtext will be passed through Corresponding cutting text is cut into according to preset length；

Second pretreatment is carried out to the cutting text using participle tool, obtains effective text.

4. intelligent robot answer method as described in claim 1, which is characterized in that before the acquisition raw tone, The intelligent robot answer method further include:

Training voice is obtained, voice pretreatment is carried out to the trained voice, obtains pretreatment voice；

The pretreatment voice is converted to by preprocessed text using speech-to-text technology；

Text Pretreatment is carried out to the preprocessed text, obtains training sample；

The training sample is divided into training set and test set；

The training set is input in original bidirectional circulating neural network model and is trained, effective bidirectional circulating nerve is obtained Network model；

The test set is input in effective bidirectional circulating neural network model and is tested, it is corresponding accurate to obtain test set Effective bidirectional circulating neural network model is determined as target bi circulation mind if accuracy rate reaches preset threshold by rate Through network model.

5. intelligent robot answer method as claimed in claim 4, which is characterized in that described that the training set is input to original It is trained in beginning bidirectional circulating neural network model, obtains effective bidirectional circulating neural network model, comprising:

To the weight and biasing progress Initialize installation in the original bidirectional circulating neural network model；

The training set is converted into term vector, the term vector is input in original bidirectional circulating neural network model and is carried out Training obtains model output；

Weight and biasing in the original bidirectional circulating neural network model are updated based on model output, is obtained effectively double To Recognition with Recurrent Neural Network model.

6. intelligent robot answer method as claimed in claim 5, which is characterized in that the original bidirectional circulating neural network Including Recognition with Recurrent Neural Network forward and backward Recognition with Recurrent Neural Network；

The described term vector is input in original bidirectional circulating neural network model is trained, and obtains model output, packet It includes:

The term vector is input to the input layer of the original bidirectional circulating neural network model, by input layer treated word Vector is input to the hidden layer forward of the Recognition with Recurrent Neural Network forward, and is handled using attention mechanism, obtains forward Output；

By the hidden layer backward of the input layer treated term vector the is input to Recognition with Recurrent Neural Network backward, and use note Meaning power mechanism is handled, and acquisition exports backward；

Fusion treatment is carried out to the output forward and the output backward, obtains model output.

7. intelligent robot answer method as claimed in claim 5, which is characterized in that the training set carries text mark Label；

The weight and biasing, acquisition based in the model output update original bidirectional circulating neural network model has Imitate bidirectional circulating neural network model, comprising:

Loss function is constructed based on model output and the text label；

Weight and biasing in the original bidirectional circulating neural network model are updated based on the loss function, is obtained effectively double To Recognition with Recurrent Neural Network model.

8. a kind of intelligent robot answering device characterized by comprising

Raw tone preprocessing module carries out voice to the raw tone for obtaining the collected raw tone of robot Pretreatment obtains efficient voice；

Efficient voice turns text module, for the efficient voice to be converted to urtext using speech-to-text technology；

Model identification module, for using the target bi Recognition with Recurrent Neural Network model for using attention mechanism to generate to effective text This is identified, target intention is obtained；

Text-to-speech module talks about art for choosing target according to the target intention, will be described by text-to-speech technology Target words art is converted into target voice, controls the robot and plays the target voice.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to Any one of 7 intelligent robot answer methods.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization intelligent robot answer party as described in any one of claim 1 to 7 when the computer program is executed by processor Method.