CN110162610A - Intelligent robot answer method, device, computer equipment and storage medium - Google Patents
Intelligent robot answer method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110162610A CN110162610A CN201910305320.0A CN201910305320A CN110162610A CN 110162610 A CN110162610 A CN 110162610A CN 201910305320 A CN201910305320 A CN 201910305320A CN 110162610 A CN110162610 A CN 110162610A
- Authority
- CN
- China
- Prior art keywords
- text
- neural network
- voice
- target
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000003860 storage Methods 0.000 title claims abstract description 17
- 238000003062 neural network model Methods 0.000 claims abstract description 96
- 230000000306 recurrent effect Effects 0.000 claims abstract description 55
- 238000005516 engineering process Methods 0.000 claims abstract description 38
- 230000007246 mechanism Effects 0.000 claims abstract description 24
- 230000002457 bidirectional effect Effects 0.000 claims description 83
- 238000012549 training Methods 0.000 claims description 68
- 238000013528 artificial neural network Methods 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 23
- 238000012360 testing method Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 18
- 238000005520 cutting process Methods 0.000 claims description 16
- 238000001514 detection method Methods 0.000 claims description 9
- 238000009432 framing Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 9
- 238000011282 treatment Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000009434 installation Methods 0.000 claims description 5
- 210000005036 nerve Anatomy 0.000 claims description 2
- 235000021167 banquet Nutrition 0.000 abstract description 7
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000001755 vocal effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 210000004218 nerve net Anatomy 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 208000003028 Stuttering Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1679—Programme controls characterised by the tasks executed
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Automation & Control Theory (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of intelligent robot answer method, device, computer equipment and storage mediums, this method comprises: obtaining the collected raw tone of robot, carry out voice pretreatment to the raw tone, obtain efficient voice;The efficient voice is converted to by urtext using speech-to-text technology;Text Pretreatment is carried out to the urtext, obtains effective text;Effective text is identified using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism, obtains target intention;Target is chosen according to the target intention and talks about art, and target words art is converted by target voice by text-to-speech technology, the robot is controlled and plays the target voice, improve the flexibility attended a banquet and talked with robot.
Description
Technical field
The present invention relates to intelligent decision field more particularly to a kind of intelligent robot answer methods, device, computer equipment
And storage medium.
Background technique
Existing intelligent training system, attend a banquet and robot question and answer process and dialog template be all it is pre-set,
No matter answering that is attending a banquet and which type of being made, robot can all be putd question to according to pre-set problem, lack spirit
Activity can not accomplish to carry out Intelligent dialogue according to the actual situation.
Summary of the invention
The embodiment of the present invention provides a kind of intelligent robot answer method, device, computer equipment and storage medium, with solution
It certainly attends a banquet and talks with inflexible problem with robot.
A kind of intelligent robot answer method, comprising:
The collected raw tone of robot is obtained, voice pretreatment is carried out to the raw tone, obtains efficient voice;
The efficient voice is converted to by urtext using speech-to-text technology;
Text Pretreatment is carried out to the urtext, obtains effective text;
Effective text is identified using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism, is obtained
Take target intention;
Target is chosen according to the target intention and talks about art, and target words art is converted by mesh by text-to-speech technology
Poster sound controls the robot and plays the target voice.
A kind of intelligent robot answering device, comprising:
Raw tone preprocessing module carries out the raw tone for obtaining the collected raw tone of robot
Voice pretreatment, obtains efficient voice;
Efficient voice turns text module, for the efficient voice to be converted to original text using speech-to-text technology
This;
Urtext processing module obtains effective text for carrying out Text Pretreatment to the urtext;
Model identification module, for using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism to having
Effect text is identified, target intention is obtained;
Text-to-speech module talks about art for choosing target according to the target intention, will by text-to-speech technology
The target words art is converted into target voice, controls the robot and plays the target voice.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing
The computer program run on device, the processor realize above-mentioned intelligent robot answer party when executing the computer program
Method.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter
Calculation machine program realizes above-mentioned intelligent robot answer method when being executed by processor.
Above-mentioned intelligent robot answer method, device, computer equipment and storage medium, by obtaining robot acquisition
Raw tone, and voice pretreatment is carried out to raw tone, efficient voice is obtained, subsequent step is facilitated to convert efficient voice to
Original text to be identified, improves the accuracy rate of conversion.Efficient voice is converted to by urtext using speech-to-text technology, and
Text Pretreatment is carried out to urtext, obtains effective text, then using target bi Recognition with Recurrent Neural Network model to effective
Text is identified, target intention is obtained, and improves the accuracy rate of identification target intention.After obtaining target intention, according to target
It is intended to choose target words art, improves the flexibility attended a banquet and talked with robot.Target words art is passed through into text-to-speech technology
It is converted into target voice, control robot plays target voice, to complete the dialogue between machine and seat personnel.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is an application scenario diagram of intelligent robot answer method in one embodiment of the invention;
Fig. 2 is a flow chart of intelligent robot answer method in one embodiment of the invention;
Fig. 3 is a specific flow chart of step S10 in Fig. 2;
Fig. 4 is a specific flow chart of step S30 in Fig. 2;
Fig. 5 is another flow chart of intelligent robot answer method in one embodiment of the invention;
Fig. 6 is a specific flow chart of step S05 in Fig. 5;
Fig. 7 is a specific flow chart of step S052 in Fig. 6;
Fig. 8 is a specific flow chart of step S053 in Fig. 6;
Fig. 9 is a schematic diagram of intelligent robot answering device in one embodiment of the invention;
Figure 10 is a schematic diagram of computer equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Intelligent robot answer method provided by the present application, can be applicable in the application environment such as Fig. 1, terminal device passes through
Network is communicated with server.Wherein, terminal device of the invention is specially robot.Server is handling machine people acquisition
Raw tone, and according to raw tone obtain target intention server.Wherein, raw tone refers to that robot is adopted by sound
The voice for the seat personnel that the needs of collection module acquisition identify.Target intention refers to be used to indicate that user says by what raw tone obtained
Talk about the information being intended to.
In one embodiment, it as shown in Fig. 2, providing a kind of intelligent robot answer method, applies in Fig. 1 in this way
Server for be illustrated, include the following steps:
S10: obtaining the collected raw tone of robot, carries out voice pretreatment to raw tone, obtains efficient voice.
Wherein, raw tone refers to the voice for the seat personnel that robot is identified by the needs that sound acquisition module acquires.
Voice pretreatment, which refers to, carries out the processing such as preemphasis, framing, adding window and end-point detection to raw tone, removes quiet in raw tone
Segment and noise segment retain the method for the apparent raw tone of vocal print consecutive variations.
Specifically, it after obtaining raw tone, needs to carry out preemphasis, framing, adding window and end-point detection etc. to raw tone
Voice pretreatment, removes mute section and noise segment in raw tone, only retains former containing the apparent voice of vocal print consecutive variations
Beginning voice, i.e. efficient voice.To raw tone carry out voice pretreatment, facilitate subsequent step by efficient voice be converted into it is original to
It identifies text, improves the accuracy rate of conversion.
S20: efficient voice is converted to by urtext using speech-to-text technology.
Speech-to-text technology in the present embodiment is using ASR technology.Wherein, ASR (Automatic Speech
Recognition, automatic speech recognition technology) it is a kind of technology that the voice of people is converted to text.
Specifically, after obtaining efficient voice, the corresponding server of robot uses ASR technology, and efficient voice is converted
For urtext.Wherein, urtext refers to the text that efficient voice is converted into corresponding written form by ASR technology.By
Expressed in the form of speech in efficient voice, if developer directly passes through the voice content listened to, to efficient voice into
Row label processing, it has not been convenient to operation and preservation, and processing speed is slow.Efficient voice is converted into urtext, with the shape of text
Formula is expressed, and carries out tag processes to the content of text by way of reading text, facilitates operation, and treatment effeciency is high.
S30: Text Pretreatment is carried out to urtext, obtains effective text.
Wherein, effective text, which refers to, pre-processes urtext, and removal data, additional character and stop words meet pre-
If the text of length (such as 8 words).Data in the present embodiment, which refer to, is converted to the number occurred after urtext for efficient voice;
Additional character refers to the character that cannot be identified occurred after efficient voice to be converted to urtext.As $, *, &, # ,+,?.
Specifically, after obtaining urtext, the corresponding server of robot needs to pre-process urtext, will
Data and additional character removal in urtext;Further, in order to facilitate step S40, nerve is recycled using target bi
Network model identifies effective text, after the data and additional character in removal urtext, it is also necessary to according to default
Length cuts the urtext of removal data and additional character, so that the urtext of the removal data and additional character
The requirement for meeting preset length obtains cutting text.Stop words in finally removal cutting text, reservation represent physical meaning
Word forms effective text.
Further, after obtaining effective text, effective text is sent to client by the corresponding server of robot, with
Make content of the developer in client by the effective text of reading, labeling processing is carried out to effective text, so that effectively text
The corresponding text label of this acquisition, so that step S0531 constructs loss function.
S40: effective text is known using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism
Not, target intention is obtained.
Wherein, target bi Recognition with Recurrent Neural Network (BRNN, Bi-directional Recurrent Neural
Networks, bidirectional circulating neural network) model refers to preparatory trained text effective for identification, obtain the mould of target intention
Type.Attention mechanism refers to the weight different according to imparting according to the different pairs of data importance, the big corresponding weight of importance
Greatly, the small corresponding weight of importance is small.If a word is " today, weather was fine ", wherein " today " does not weigh in the words
It wants, corresponding weight is small, and " weather " and " fine " is all critically important in the words, and corresponding weight is big and weight
Size is identical.
Specifically, after obtaining effective text, the corresponding server of robot carries out effective text using participle tool
Cutting removes stop words (participle, preposition, pronoun etc.), obtains target word.Wherein, target word refers to effective text removal stop words
Remaining word afterwards.After obtaining target word, target word is converted into corresponding target term vector using term vector crossover tool.Most
Afterwards, target term vector is input in the target bi Recognition with Recurrent Neural Network model generated using attention mechanism and is identified,
Obtain target intention.Target intention feeling the pulse with the finger-tip mark bidirectional circulating neural network model in the present embodiment is according to the knowledge to effective text
Not, the information of the corresponding intention of the effective text of acquisition.Target intention is obtained using target bi Recognition with Recurrent Neural Network model,
It can effectively improve the accuracy rate of target intention.
S50: choosing target according to target intention and talk about art, and target words art is converted into target language by text-to-speech technology
Sound, control robot play target voice.
Specifically, in order to more fully meet customer need, each target intention in the present embodiment is all provided with multiple
Talk about art template.After obtaining target intention, by target intention choose it is corresponding with target intention if art template, then from multiple
One, which is randomly selected, as target in words art template talks about art.Finally, target words art is converted by target voice by TTS technology,
And control robot and play the target voice, to complete the dialogue with seat personnel.Wherein, TTS technology refers to computer oneself
The technology that generation or externally input text information are changed into Chinese characters spoken language and export.Target voice refers to will by TTS technology
Target words art is converted into the voice that oral communication is carried out for robot and seat personnel.
Step S10- step S50 by obtaining the raw tone of robot acquisition, and carries out voice to raw tone and locates in advance
Reason obtains efficient voice, facilitates subsequent step to convert original text to be identified for efficient voice, improve the accuracy rate of conversion.
Efficient voice is converted to by urtext using speech-to-text technology, and Text Pretreatment is carried out to urtext, acquisition has
Text is imitated, then effective text is identified using target bi Recognition with Recurrent Neural Network model, obtains target intention, improves and knows
The accuracy rate of other target intention.After obtaining target intention, target is chosen according to target intention and talks about art, improves and attends a banquet and machine
The flexibility of people's dialogue.Target is talked about into art, target voice is converted by text-to-speech technology, control robot plays target
Voice, to complete the dialogue between machine and seat personnel.
In one embodiment, since raw tone is the not voice Jing Guo any processing, including noise segment and mute section,
Wherein, the noise segment in the present embodiment refers to that speaker when speaking, to be formed since the collision of the switch, object of door and window makes a sound
Voice segments.Mute section of finger speaker does not pronounce due to breathing, thinking deeply, so that occurring silent voice segments in raw tone.
Noise segment and mute section can to subsequent step using target bi Recognition with Recurrent Neural Network model obtain target intention generate it is serious
It influences, therefore, after obtaining raw tone, needs to handle raw tone, the noise segment and quiet in removal raw tone
Segment provides efficiently and accurately data source for subsequent step.As shown in figure 3, carrying out language to raw tone in step S10
Sound pretreatment, obtains efficient voice, specifically comprises the following steps:
S11: preemphasis, framing and windowing process are carried out to raw tone, obtain received pronunciation.
Wherein, received pronunciation refers to that raw tone carries out the voice obtained after preemphasis, framing and windowing process.
Specifically, the process for obtaining received pronunciation is as follows: (one) uses the formula s' of preemphasis processingn=sn-a*sn-1It is right
Raw tone carries out preemphasis processing, to eliminate the influence of the vocal cords and lip of speaker to speaker's voice, improves speaker
The high frequency resolution of voice.Wherein, s'nFor the voice signal amplitude at preemphasis treated n moment, snBelieve for the voice at n moment
Number amplitude, sn-1For the voice signal amplitude at n-1 moment, a is pre emphasis factor.(2) to preemphasis treated raw tone
Sub-frame processing is carried out, in framing, discontinuous place can all occur in the starting point and end point of each frame voice, and framing is got over
It is more, it is also bigger with the error of raw tone.(3) in order to keep the frequency characteristic of each frame voice, it is also necessary to carry out at adding window
Reason, the formula of windowing process areWith s "n=wn*s′n, wherein wnFor the Hamming at n moment
Window, N are that Hamming window window is long, s'nFor the signal amplitude in n moment time domain, s "nFor the signal amplitude in time domain after n moment adding window.
Raw tone is pre-processed, received pronunciation is obtained, received pronunciation progress endpoint detection processing is provided effectively to be subsequent
Data source.
S12: endpoint detection processing is carried out to received pronunciation, obtains efficient voice.
Wherein, endpoint detection processing is that a kind of processing of the starting point and end point of efficient voice is determined from one section of voice
Means.
Specifically, inevitably there are mute section of voices corresponding with noise segment in a segment standard voice, therefore,
In acquisition raw tone and after pretreatment, the corresponding server of robot can carry out endpoint detection processing to received pronunciation,
Mute section of voice corresponding with noise segment is got rid of, retains the apparent voice of vocal print consecutive variations, using the voice as effective language
Sound reduces subsequent data volume to be treated when converting urtext for efficient voice, in addition, getting rid of mute section and noise
The corresponding voice of section, can also be improved the accuracy of urtext.
Step S11- step S12, after carrying out preemphasis, framing and windowing process to raw tone, to the standard speech of acquisition
Sound carries out endpoint detection processing, removes mute section and noise segment in received pronunciation, retains only obvious comprising vocal print consecutive variations
Voice, that is, efficient voice, reduce subsequent data volume to be treated when converting urtext for efficient voice, improve original
The accuracy of text.
In one embodiment, as shown in figure 4, step S30, carries out Text Pretreatment to urtext, obtain effective text,
Specifically comprise the following steps:
S31: the first pretreatment is carried out to urtext using regular expression, and will be pretreated original by first
Text is cut into corresponding cutting text according to preset length.
Wherein, regular expression is also known as regular expression (Regular Expression, is often abbreviated as in code
Regex, regexp or RE), the regular expression in the present embodiment refers to that a kind of logic for being filtered operation to urtext is public
Formula.The regular expression is specifically used to be filtered data in urtext and additional character.Preset length refers to according to reality
Need to pre-set is used to for urtext being cut into the value of preset length.
Specifically, due in urtext data and symbol subsequent acquisition target intention is not acted on, also will increase
Therefore the data processing amount of target bi Recognition with Recurrent Neural Network model after obtaining urtext, is needed using writing in advance
Regular expression to urtext carry out first pretreatment, remove urtext in data and additional character.It is former in removal
After data and additional character in beginning text, it will be cut by the first pretreated urtext according to preset length default
Length obtains cutting text.Wherein, cutting text refers to urtext is cut according to preset length after the text that is formed
This.
S32: the second pretreatment is carried out to cutting text using participle tool, obtains effective text.
Specifically, cutting is carried out to cutting text using participle tool, removes stop words (participle, preposition, pronoun etc.), obtains
It takes and effective text is formed by based on remaining word.Participle tool in the present embodiment includes but is not limited to participle tool of stammering.
Stop words refers in information retrieval, to save memory space and improving search efficiency, in processing natural language data (or text
Originally certain words or word are fallen in meeting automatic fitration before or after), which can deactivate vocabulary with reference to Baidu or Harbin Institute of Technology stop
Word dictionary is by developer's self-defining.
Step S31- step S32 carries out the first pretreatment, removal data, special to urtext using regular expression
Then symbol cuts the first pretreated urtext according to preset length, obtain cutting text, finally use and divide
Word tool carries out the second pretreatment to cutting text, removes stop words, obtains effective text, provide for subsequent acquisition target intention
Effective data source.
In one embodiment, as shown in figure 5, in step S10, before obtaining raw tone, intelligent robot answer method
Further include trained original bidirectional circulating neural network model, obtains the target bi Recognition with Recurrent Neural Network that can identify target intention
Model specifically comprises the following steps:
S01: obtaining training voice, carries out voice pretreatment to training voice, obtains pretreatment voice.
Wherein, training voice refers to the weight for adjusting original bidirectional circulating neural network model and the voice of biasing.Tool
Body, the corresponding server of robot obtains training voice, and pre-processes to the training voice of acquisition, obtains pretreatment language
Sound, the pretreatment voice refer to the voice that trained voice obtains after pretreatment.Preprocessing process such as step in the present embodiment
S11- step S12 repeats no more to avoid repeating.
S02: preprocessed text is converted to for voice is pre-processed using speech-to-text technology.
Specifically, after obtaining pretreatment voice, pretreatment is converted to for voice is pre-processed using speech-to-text technology
Text.Wherein, preprocessed text refers to that pretreatment voice is converted into the text of corresponding written form by speech-to-text technology.
Speech-to-text technology in the present embodiment is using ASR technology.
S03: Text Pretreatment is carried out to preprocessed text, obtains training sample.
Specifically, after obtaining preprocessed text, the corresponding server of robot needs to carry out text to preprocessed text
Pretreatment removes data and additional character, and carries out according to preprocessed text of the preset length to removal data and additional character
Cutting finally removes stop words, obtains training sample.Wherein, training sample, which refers to, pre-processes preprocessed text, removes divisor
According to, additional character and stop words, meet the text of preset length.The training sample is used for training objective bidirectional circulating neural network
Model, so that subsequent obtain target intention according to target bi Recognition with Recurrent Neural Network model.Specific implementation process such as step S31-
Step S32 repeats no more to avoid repeating.
S04: training sample is divided into training set and test set.
Specifically, after obtaining training sample, training sample is divided into training set and test set.Generally, training set
Ratio with test set is 9:1.Training set refers to the text for adjusting the parameter in original bidirectional circulating neural network model.It surveys
Examination collection is the text for testing the recognition accuracy of trained original bidirectional circulating neural network model.
S05: training set is input in original bidirectional circulating neural network model and is trained, effective bidirectional circulating is obtained
Neural network model.
Wherein, original bidirectional circulating neural network model is by two Recognition with Recurrent Neural Network (Recurrent Neural
Networks, RNN) composition.For ease of description, one of Recognition with Recurrent Neural Network is referred to as followed forward in the present embodiment
Ring neural network (RNN forward), another Recognition with Recurrent Neural Network are known as Recognition with Recurrent Neural Network (RNN backward) backward.It is original double
RNN forward in Recognition with Recurrent Neural Network model (original BRNN) and backward RNN have corresponding hidden layer, input layer and defeated
Layer shares one out.The neural network mould that i.e. original BRNN is made of an input layer, two hidden layers and an output layer
Type.The original BRNN includes the ginseng of the neuron connection between each layer (input layer, two hidden layers and an output layer)
Number (weight and biasing), these weights and biasing determine the property and recognition effect of original BRNN.
Specifically, training set is obtained, training set is input in original bidirectional circulating neural network model and is trained, is adjusted
Weight and biasing in whole original bidirectional circulating neural network model, obtain effective bidirectional circulating neural network model.Wherein, have
Effect bidirectional circulating neural network model refers to the bidirectional circulating neural network model obtained according to training set.
S06: test set being input in effective bidirectional circulating neural network model and is tested, and it is corresponding to obtain test set
Effective bidirectional circulating neural network model is determined as target bi circulation mind if accuracy rate reaches preset threshold by accuracy rate
Through network model.
Specifically, after obtaining effective bidirectional circulating neural network model, in order to verify effective bidirectional circulating neural network
Test set is input in effective bidirectional circulating neural network model and tests by model accuracy, and it is corresponding to obtain test set
It is double to be determined as target if accuracy rate reaches preset threshold (such as 90%) by accuracy rate for effective bidirectional circulating neural network model
To Recognition with Recurrent Neural Network model.
Step S01- step S06 will be located by carrying out voice pretreatment to training voice using speech-to-text technology in advance
Reason voice is converted to preprocessed text, so that training set only includes the content that can be used to carry out model training.To pretreatment text
This progress Text Pretreatment obtains training sample, to improve training effectiveness and the training of original bidirectional circulating neural network model
Accuracy.In order to avoid there is over-fitting, also need using test set to trained effective bidirectional circulating neural network
It is tested in model, whether is satisfactory model with the effective bidirectional circulating neural network model of determination, if test set pair
The accuracy rate answered reaches preset threshold, then it represents that the identification accuracy of effective bidirectional circulating neural network model is met the requirements, can
To be determined as target bi Recognition with Recurrent Neural Network model, for obtaining target intention.
In one embodiment, as shown in fig. 6, step S05, is input to original bidirectional circulating neural network model for training set
In be trained, obtain effective bidirectional circulating neural network model, specifically comprise the following steps:
S051: to the weight and biasing progress Initialize installation in original bidirectional circulating neural network model.
In the present embodiment, Initialize installation is carried out to weight and biasing using preset value, which is developer's root
The value pre-set according to experience.The weight of original bidirectional circulating neural network model and biasing are carried out using preset value initial
Change setting, it can be when subsequent trained to original bidirectional circulating neural network model according to training set, when shortening the training of model
Between, improve the recognition accuracy of model.If weight and biasing Initialize installation to original bidirectional circulating neural network model are not
It is very appropriately, then the adjustment capability that will lead to model in the initial stage is very poor, to influence the original bidirectional circulating neural network
The subsequent recognition accuracy to target intention of model.
S052: being converted into term vector for training set, by term vector be input in original bidirectional circulating neural network model into
Row training obtains model output.
Specifically, the word in training set is converted to by term vector by term vector crossover tool, it is possible to understand that ground, wait train
Concentrating includes at least one term vector.The term vector crossover tool used in the present embodiment is word2vec (word to
Vector, word converting vector), wherein word2vec is a kind of tool that word is converted to vector, can be incited somebody to action in the tool
Each word is mapped to corresponding vector.
Will after training set is converted into term vector, firstly, input layer respectively by term vector be input to hidden layer forward and to
It is calculated in hidden layer afterwards, obtains hidden layer and the corresponding output of hidden layer backward forward.Wherein, before hidden layer is directed toward forward
The hidden layer of Recognition with Recurrent Neural Network;The hidden layer of Recognition with Recurrent Neural Network after hidden layer is directed toward backward.
Then, using the corresponding attention of hidden layer forward and backward hidden layer (attention) mechanism to forward
The output of hidden layer and backward hidden layer carries out Automobile driving.
Finally, obtaining finally entering original to by attention mechanism treated two outputs progress fusion treatments
The value of the output layer of bidirectional circulating neural network model, and by the calculating of output layer, obtain model output.Model output be to
The output that training set is obtained by the training of original bidirectional circulating neural network model.Fusion treatment in the present embodiment includes but not
It is limited to using arithmetic mean law and weighted average method, for ease of description, subsequent step uses method of arithmetical average pair
Treated that two outputs carry out fusion treatments for attention mechanism.
S053: weight and biasing in original bidirectional circulating neural network model are updated based on model output, obtained effective
Bidirectional circulating neural network model.
Specifically, it after obtaining model output, is adopted based on model output building loss function then according to loss function
Weight and the biasing that original bidirectional circulating neural network model is adjusted with back-propagation algorithm, obtain effective bidirectional circulating nerve net
Network model.Wherein, backpropagation (Back Propagation) algorithm refers to hides according to the reverse sequence adjustment of time sequence status
Between weight and biasing and input layer and hidden layer between layer and the output layer of original bidirectional circulating neural network model
The algorithm of weight and biasing.
Step S051- step S053, by being carried out to the weight in original bidirectional circulating neural network model with biasing
Initialize installation shortens the training time of model, improves the recognition accuracy of model.Then it two-way is followed using training set to original
Ring neural network model is trained, and adjusts the weight in original bidirectional circulating neural network model and biasing, so that original double
Weight and biasing into Recognition with Recurrent Neural Network model are more in line with needs.
In one embodiment, original bidirectional circulating neural network recycles including Recognition with Recurrent Neural Network forward and backward nerve net
Network obtains mould as shown in fig. 7, step S052, term vector is input in original bidirectional circulating neural network model and is trained
Type output, specifically comprises the following steps:
S0521: term vector is input to the input layer of original bidirectional circulating neural network model, treated by input layer
Term vector is input to the hidden layer forward of Recognition with Recurrent Neural Network forward, and is handled using attention mechanism, obtains defeated forward
Out.
Specifically, term vector is input to the input layer of original bidirectional circulating neural network model, what input layer will acquire
Term vector is input in hidden layer forward, passes through formula h in hidden layer forwardt1=σ (Uxt+Wht-1+ b) it calculates and hides forward
The output of layer.Wherein, σ indicates the activation primitive of Recognition with Recurrent Neural Network hidden layer forward, U indicate input layer and hidden layer forward it
Between weight, W indicates the weight between each hidden layer of Recognition with Recurrent Neural Network forward, b indicate input layer and follow forward hidden layer it
Between biasing, xtIndicate the term vector of t moment input input layer, ht1Indicate the output of the term vector of t moment in hidden layer forward,
ht-1Indicate the output of the term vector at t-1 moment in hidden layer forward.
The output of hidden layer forward is handled using attention mechanism, acquisition exports forward.Wherein, output refers to forward
The value obtained after being handled using attention mechanism the output of hidden layer forward.Specifically, according to formula
Calculate the importance value of semantic vector, wherein ct1Refer to attention mechanism to the attention of the semantic vector of t moment in hidden layer forward
(i.e. weight), αtjRefer to the correlation of the term vector term vector corresponding with t moment of j-th of input, hjRefer to the term vector of j input
The output obtained after being calculated by hidden layer forward.Further, normalization process isetj=VΓtanh
(U·hj+WSt-1+ b) wherein, k refers to that k-th inputs term vector, and V indicates the weight between hidden layer and output layer, VΓWeight V's
Transposition, St-1Refer to the output of t-1 moment output layer.
S0522: by input layer, treated that term vector is input to the hidden layer backward of Recognition with Recurrent Neural Network backward, and uses
Attention mechanism is handled, and acquisition exports backward.
Specifically, term vector is input to input layer, and the term vector that input layer will acquire is input in hidden layer backward,
Pass through formula h in hidden layer backwardt2=σ (Uxt+Wht-1+ b) calculate the output of hidden layer backward.Wherein, σ expression follows hidden backward
The activation primitive of layer is hidden, U indicates that input layer and the backward weight between hidden layer, W indicate that Recognition with Recurrent Neural Network is each backward and hide
Weight between layer, b indicate input layer and the biasing between hidden layer backward, xtIndicate input layer in t moment input word to
Amount, ht2Indicate the output of the term vector of t moment in hidden layer backward, ht-1Indicate the term vector at t-1 moment in hidden layer backward
Output.
The output of hidden layer backward is handled using attention mechanism, acquisition exports backward.Wherein, output refers to backward
The value obtained after being handled using attention mechanism the output of hidden layer backward.Specifically, according to formula
Calculate the importance value of semantic vector, wherein ct2Refer to attention mechanism to the language of t moment in the hidden layer of Recognition with Recurrent Neural Network backward
The attention (i.e. weight) of adopted vector, αtjRefer to the correlation of the term vector term vector corresponding with t moment of j-th of input, hjRefer to j
The output that the term vector of a input obtains after being calculated by hidden layer backward.Further, normalization process isetj=VΓtanh(U·hj+WSt-1+ b) wherein, k refers to that k-th inputs term vector, and V indicates hidden layer and defeated
Weight between layer out, VΓThe transposition of weight V, St-1Refer to the output of t-1 moment output layer.
S0523: fusion treatment is carried out to output forward and backward output, obtains model output.
Specifically, after obtaining output forward and exporting backward, formula is usedTo output forward and backward
Output carries out fusion treatment, obtains target output.Wherein, target output refers to the output that be finally input to output layer.Obtain mesh
After mark output, target is input in output layer, according to formula St=f (St-1,yt-1,ct) calculated, it is defeated to obtain model
Out.Wherein, StIndicate the output of t moment output layer, St-1Indicate the output of t-1 moment output layer, yt-1Refer to what the t-1 moment inputted
The text label that term vector carries, f generally select softmax function.It obtains model output and constructs loss function convenient for subsequent step,
So as to adjust Recognition with Recurrent Neural Network and the backward weight of Recognition with Recurrent Neural Network and weighting forward in bidirectional circulating neural network model.
Step S0521- step S0523, by obtaining output forward and exporting backward, to obtain model output, after convenient
Continuous step constructs loss function, so as to adjust Recognition with Recurrent Neural Network and backward circulation mind forward in bidirectional circulating neural network model
Weight and weighting through network.
In one embodiment, training set carries text label, wherein text label refers to that developer passes through to training sample
This understanding, the label being labeled.As shown in figure 8, step S053, updates original bidirectional circulating nerve net based on model output
Weight and biasing in network model obtain effective bidirectional circulating neural network model, specifically comprise the following steps:
S0531: loss function is constructed based on model output and text label.
Specifically, after obtaining model output, S is exported based on modeltWith text label ytConstruct loss function.The present embodiment
In loss function beWherein, T is indicated in training set
Timing tag entrained by term vector, t indicate t-th of timing in timing tag, θ indicate weight and biasing set (U, V,
W, b, c), ytIndicate the corresponding text label of term vector.
S0532: updating weight and biasing in original bidirectional circulating neural network model based on loss function, obtains effective
Bidirectional circulating neural network model.
Specifically, loss function is being obtained then, according to formulaWith
Back-propagation algorithm is respectively updated the corresponding weight of Recognition with Recurrent Neural Network forward and backward Recognition with Recurrent Neural Network and biasing,
Recognition with Recurrent Neural Network and the weight of Recognition with Recurrent Neural Network and biasing backward forward is adjusted, when model output is calculated by loss function
Obtained loss reaches requirement (as loss is no more than 10%), then the weight and the corresponding original loop neural network of biasing then may be used
It is determined as effective bidirectional circulating neural network.
Step S0531- step S0532 is updated in original bidirectional circulating neural network model by constructing loss function
Weight and biasing obtain effective bidirectional circulating neural network model.
Intelligent robot answer method provided by the invention, by obtaining the raw tone of robot acquisition, and to original
Voice carries out voice pretreatment, obtains efficient voice, facilitates subsequent step to convert original text to be identified for efficient voice, mention
The accuracy rate of height conversion.Efficient voice is converted to by urtext using speech-to-text technology, and text is carried out to urtext
This pretreatment obtains effective text.Original bidirectional circulating neural network model is trained using training set and test set and
Test, obtain target bi Recognition with Recurrent Neural Network model, and using target bi Recognition with Recurrent Neural Network model to effective text into
Row identification, obtains target intention, improves the accuracy rate of identification target intention.After obtaining target intention, selected according to target intention
It takes target to talk about art, improves the flexibility attended a banquet and talked with robot.Target words art is converted by text-to-speech technology
Target voice, control robot plays target voice, to complete the dialogue between machine and seat personnel.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
In one embodiment, a kind of intelligent robot answering device is provided, the intelligent robot answering device and above-mentioned reality
Intelligent robot answer method in example is applied to correspond.As shown in figure 9, the intelligent robot answering device includes that raw tone is pre-
Processing module 10, efficient voice turn text module 20, urtext processing module 30, model identification module 40 and text-to-speech
Module 50.Detailed description are as follows for each functional module:
Raw tone preprocessing module 10 carries out language to raw tone for obtaining the collected raw tone of robot
Sound pretreatment, obtains efficient voice.
Efficient voice turns text module 20, for efficient voice to be converted to urtext using speech-to-text technology.
Urtext processing module 30 obtains effective text for carrying out Text Pretreatment to urtext.
Model identification module 40, for using the target bi Recognition with Recurrent Neural Network model pair for using attention mechanism to generate
Effective text is identified, target intention is obtained.
Text-to-speech module 50 talks about art for choosing target according to target intention, by text-to-speech technology by mesh
Mark words art is converted into target voice, and control robot plays target voice.
Further, raw tone preprocessing module 10 includes that the first pretreatment unit of voice and voice second pre-process list
Member.
The first pretreatment unit of voice obtains standard speech for carrying out preemphasis, framing and windowing process to raw tone
Sound.
The second pretreatment unit of voice obtains efficient voice for carrying out endpoint detection processing to received pronunciation.
Further, text processing module 30 includes the second pretreatment unit of the first pretreatment unit of text and text.
The first pretreatment unit of text, for carrying out the first pretreatment to urtext using regular expression, and will be through
It crosses the first pretreated urtext and is cut into corresponding cutting text according to preset length.
The second pretreatment unit of text obtains effective for carrying out the second pretreatment to cutting text using participle tool
Text.
Further, intelligent robot answering device further includes trained speech preprocessing module 01, voice is trained to turn text
Module 02, training sample obtain module 03, training sample processing module 04, model training module 05 and model measurement module 06.
Training speech preprocessing module 01 carries out voice pretreatment to training voice, obtains pre- for obtaining trained voice
Handle voice.
Training speech to text module 02 is converted to pretreatment text for will pre-process voice using speech-to-text technology
This.
Training sample obtains module 03, for carrying out Text Pretreatment to preprocessed text, obtains training sample.
Training sample processing module 04, for training sample to be divided into training set and test set.
Model training module 05 is trained for training set to be input in original bidirectional circulating neural network model,
Obtain effective bidirectional circulating neural network model.
Model measurement module 06 is tested for test set to be input in effective bidirectional circulating neural network model,
The corresponding accuracy rate of test set is obtained, if accuracy rate reaches preset threshold, effective bidirectional circulating neural network model is determined
For target bi Recognition with Recurrent Neural Network model.
Further, model training module includes parameter initialization setting unit, model output acquiring unit and model ginseng
Number updating unit.
Parameter initialization setting unit, for being carried out just to the weight in original bidirectional circulating neural network model with biasing
Beginningization setting.
Model exports acquiring unit and term vector is input to original bidirectional circulating for training set to be converted into term vector
It is trained in neural network model, obtains model output.
Model parameter updating unit, for updating the weight in original bidirectional circulating neural network model based on model output
And biasing, obtain effective bidirectional circulating neural network model.
Further, original bidirectional circulating neural network includes Recognition with Recurrent Neural Network and backward Recognition with Recurrent Neural Network forward.
Further, model output acquiring unit includes exporting acquiring unit forward, exporting acquiring unit and fusion backward
Handle computing unit.
Acquiring unit is exported forward, for term vector to be input to the input layer of original bidirectional circulating neural network model,
By input layer, treated that term vector is input to the hidden layer forward of Recognition with Recurrent Neural Network forward, and is carried out using attention mechanism
Processing, acquisition export forward.
Export acquiring unit backward, for by input layer treated term vector is input to backward Recognition with Recurrent Neural Network to
Hidden layer afterwards, and handled using attention mechanism, acquisition exports backward.
Fusion treatment computing unit obtains model output for carrying out fusion treatment to output forward and backward output.
Further, training set carries text label.
Further, model parameter updating unit includes loss function construction unit and weight and biasing updating unit.
Loss function construction unit, for constructing loss function based on model output and text label.
Weight and biasing updating unit, for updating the power in original bidirectional circulating neural network model based on loss function
Value and biasing, obtain effective bidirectional circulating neural network model.
Specific restriction about intelligent robot answering device may refer to above for intelligent robot answer method
Restriction, details are not described herein.Modules in above-mentioned intelligent robot answering device can be fully or partially through software, hard
Part and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment,
It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor calls
Corresponding operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction
Composition can be as shown in Figure 10.The computer equipment include by system bus connect processor, memory, network interface and
Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment
Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data
Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The database of machine equipment is for storing the data that intelligent robot answer method is related to.The network interface of the computer equipment is used
It is communicated in passing through network connection with external terminal.To realize a kind of intelligent robot when the computer program is executed by processor
Answer method.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory
And the computer program that can be run on a processor, processor realize the robot intelligence of above-described embodiment when executing computer program
Can answer method, such as the step S50 or Fig. 3 of step S10- shown in Fig. 2 be to step shown in fig. 8, to avoid repeating, this
In repeat no more.Alternatively, each in realization this embodiment of intelligent robot answering device when processor executes computer program
Module/unit function, such as module shown in Fig. 9 10 is to the function of module 50, alternatively, function of the module 01 to module 06,
To avoid repeating, which is not described herein again.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes the intelligent robot answer method of above-described embodiment, such as the step of step S10- shown in Fig. 2 when being executed by processor
Rapid S50 or Fig. 3 is to step shown in fig. 8, and to avoid repeating, which is not described herein again.Alternatively, the computer program is located
Reason device realizes the function of each module/unit in above-mentioned this embodiment of intelligent robot answering device, such as Fig. 9 institute when executing
The module 10 shown is to the function of module 50, alternatively, function of the module 01 to module 06, to avoid repeating, which is not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
To any reference of memory, storage, database or other media used in each embodiment provided herein,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function
Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing
The all or part of function of description.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of intelligent robot answer method characterized by comprising
The collected raw tone of robot is obtained, voice pretreatment is carried out to the raw tone, obtains efficient voice;
The efficient voice is converted to by urtext using speech-to-text technology;
Text Pretreatment is carried out to the urtext, obtains effective text;
Effective text is identified using the target bi Recognition with Recurrent Neural Network model generated using attention mechanism, obtains mesh
Mark is intended to;
Target is chosen according to the target intention and talks about art, and target words art is converted by target language by text-to-speech technology
Sound controls the robot and plays the target voice.
2. intelligent robot answer method as described in claim 1, which is characterized in that described to carry out language to the raw tone
Sound pretreatment, obtains efficient voice, comprising:
Preemphasis, framing and windowing process are carried out to the raw tone, obtain received pronunciation;
Endpoint detection processing is carried out to the received pronunciation, obtains efficient voice.
3. intelligent robot answer method as described in claim 1, which is characterized in that described to carry out text to the urtext
This pretreatment obtains effective text, comprising:
The first pretreatment is carried out to the urtext using regular expression, and the first pretreated urtext will be passed through
Corresponding cutting text is cut into according to preset length;
Second pretreatment is carried out to the cutting text using participle tool, obtains effective text.
4. intelligent robot answer method as described in claim 1, which is characterized in that before the acquisition raw tone,
The intelligent robot answer method further include:
Training voice is obtained, voice pretreatment is carried out to the trained voice, obtains pretreatment voice;
The pretreatment voice is converted to by preprocessed text using speech-to-text technology;
Text Pretreatment is carried out to the preprocessed text, obtains training sample;
The training sample is divided into training set and test set;
The training set is input in original bidirectional circulating neural network model and is trained, effective bidirectional circulating nerve is obtained
Network model;
The test set is input in effective bidirectional circulating neural network model and is tested, it is corresponding accurate to obtain test set
Effective bidirectional circulating neural network model is determined as target bi circulation mind if accuracy rate reaches preset threshold by rate
Through network model.
5. intelligent robot answer method as claimed in claim 4, which is characterized in that described that the training set is input to original
It is trained in beginning bidirectional circulating neural network model, obtains effective bidirectional circulating neural network model, comprising:
To the weight and biasing progress Initialize installation in the original bidirectional circulating neural network model;
The training set is converted into term vector, the term vector is input in original bidirectional circulating neural network model and is carried out
Training obtains model output;
Weight and biasing in the original bidirectional circulating neural network model are updated based on model output, is obtained effectively double
To Recognition with Recurrent Neural Network model.
6. intelligent robot answer method as claimed in claim 5, which is characterized in that the original bidirectional circulating neural network
Including Recognition with Recurrent Neural Network forward and backward Recognition with Recurrent Neural Network;
The described term vector is input in original bidirectional circulating neural network model is trained, and obtains model output, packet
It includes:
The term vector is input to the input layer of the original bidirectional circulating neural network model, by input layer treated word
Vector is input to the hidden layer forward of the Recognition with Recurrent Neural Network forward, and is handled using attention mechanism, obtains forward
Output;
By the hidden layer backward of the input layer treated term vector the is input to Recognition with Recurrent Neural Network backward, and use note
Meaning power mechanism is handled, and acquisition exports backward;
Fusion treatment is carried out to the output forward and the output backward, obtains model output.
7. intelligent robot answer method as claimed in claim 5, which is characterized in that the training set carries text mark
Label;
The weight and biasing, acquisition based in the model output update original bidirectional circulating neural network model has
Imitate bidirectional circulating neural network model, comprising:
Loss function is constructed based on model output and the text label;
Weight and biasing in the original bidirectional circulating neural network model are updated based on the loss function, is obtained effectively double
To Recognition with Recurrent Neural Network model.
8. a kind of intelligent robot answering device characterized by comprising
Raw tone preprocessing module carries out voice to the raw tone for obtaining the collected raw tone of robot
Pretreatment obtains efficient voice;
Efficient voice turns text module, for the efficient voice to be converted to urtext using speech-to-text technology;
Urtext processing module obtains effective text for carrying out Text Pretreatment to the urtext;
Model identification module, for using the target bi Recognition with Recurrent Neural Network model for using attention mechanism to generate to effective text
This is identified, target intention is obtained;
Text-to-speech module talks about art for choosing target according to the target intention, will be described by text-to-speech technology
Target words art is converted into target voice, controls the robot and plays the target voice.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor
The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to
Any one of 7 intelligent robot answer methods.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In realization intelligent robot answer party as described in any one of claim 1 to 7 when the computer program is executed by processor
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910305320.0A CN110162610A (en) | 2019-04-16 | 2019-04-16 | Intelligent robot answer method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910305320.0A CN110162610A (en) | 2019-04-16 | 2019-04-16 | Intelligent robot answer method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110162610A true CN110162610A (en) | 2019-08-23 |
Family
ID=67639620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910305320.0A Pending CN110162610A (en) | 2019-04-16 | 2019-04-16 | Intelligent robot answer method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110162610A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110930989A (en) * | 2019-11-27 | 2020-03-27 | 深圳追一科技有限公司 | Speech intention recognition method and device, computer equipment and storage medium |
CN110993124A (en) * | 2019-12-09 | 2020-04-10 | 上海光电医用电子仪器有限公司 | Monitoring system and method with voice response function |
CN111159346A (en) * | 2019-12-27 | 2020-05-15 | 深圳物控智联科技有限公司 | Intelligent answering method based on intention recognition, server and storage medium |
CN111224863A (en) * | 2019-12-10 | 2020-06-02 | 平安国际智慧城市科技股份有限公司 | Session task generation method and device, computer equipment and storage medium |
CN111462752A (en) * | 2020-04-01 | 2020-07-28 | 北京思特奇信息技术股份有限公司 | Client intention identification method based on attention mechanism, feature embedding and BI-L STM |
CN111859904A (en) * | 2020-07-31 | 2020-10-30 | 南京三百云信息科技有限公司 | NLP model optimization method and device and computer equipment |
CN112035643A (en) * | 2020-09-01 | 2020-12-04 | 中国平安财产保险股份有限公司 | Method and device for reusing capabilities of conversation robot |
CN112347788A (en) * | 2020-11-06 | 2021-02-09 | 平安消费金融有限公司 | Corpus processing method, apparatus and storage medium |
WO2021051507A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Bot conversation generation method, device, readable storage medium, and bot |
CN112614514A (en) * | 2020-12-15 | 2021-04-06 | 科大讯飞股份有限公司 | Valid voice segment detection method, related device and readable storage medium |
CN112667787A (en) * | 2020-11-26 | 2021-04-16 | 平安普惠企业管理有限公司 | Intelligent response method, system and storage medium based on phonetics label |
CN112749761A (en) * | 2021-01-22 | 2021-05-04 | 上海机电工程研究所 | Enemy combat intention identification method and system based on attention mechanism and recurrent neural network |
CN113254621A (en) * | 2021-06-21 | 2021-08-13 | 中国平安人寿保险股份有限公司 | Seat call prompting method and device, computer equipment and storage medium |
CN113961698A (en) * | 2020-07-15 | 2022-01-21 | 上海乐言信息科技有限公司 | Intention classification method, system, terminal and medium based on neural network model |
CN114360517A (en) * | 2021-12-17 | 2022-04-15 | 天翼爱音乐文化科技有限公司 | Audio processing method and device in complex environment and storage medium |
CN111859911B (en) * | 2020-07-28 | 2023-07-25 | 中国平安人寿保险股份有限公司 | Image description text generation method, device, computer equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015062482A1 (en) * | 2013-11-01 | 2015-05-07 | Tencent Technology (Shenzhen) Company Limited | System and method for automatic question answering |
CN106095834A (en) * | 2016-06-01 | 2016-11-09 | 竹间智能科技(上海)有限公司 | Intelligent dialogue method and system based on topic |
CN107346340A (en) * | 2017-07-04 | 2017-11-14 | 北京奇艺世纪科技有限公司 | A kind of user view recognition methods and system |
CN107680586A (en) * | 2017-08-01 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Far field Speech acoustics model training method and system |
CN108632137A (en) * | 2018-03-26 | 2018-10-09 | 平安科技(深圳)有限公司 | Answer model training method, intelligent chat method, device, equipment and medium |
CN109065027A (en) * | 2018-06-04 | 2018-12-21 | 平安科技(深圳)有限公司 | Speech differentiation model training method, device, computer equipment and storage medium |
CN109389091A (en) * | 2018-10-22 | 2019-02-26 | 重庆邮电大学 | The character identification system and method combined based on neural network and attention mechanism |
CN109522393A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Intelligent answer method, apparatus, computer equipment and storage medium |
-
2019
- 2019-04-16 CN CN201910305320.0A patent/CN110162610A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015062482A1 (en) * | 2013-11-01 | 2015-05-07 | Tencent Technology (Shenzhen) Company Limited | System and method for automatic question answering |
CN106095834A (en) * | 2016-06-01 | 2016-11-09 | 竹间智能科技(上海)有限公司 | Intelligent dialogue method and system based on topic |
CN107346340A (en) * | 2017-07-04 | 2017-11-14 | 北京奇艺世纪科技有限公司 | A kind of user view recognition methods and system |
CN107680586A (en) * | 2017-08-01 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Far field Speech acoustics model training method and system |
US20190043482A1 (en) * | 2017-08-01 | 2019-02-07 | Baidu Online Network Technology (Beijing) Co., Ltd. | Far field speech acoustic model training method and system |
CN108632137A (en) * | 2018-03-26 | 2018-10-09 | 平安科技(深圳)有限公司 | Answer model training method, intelligent chat method, device, equipment and medium |
CN109065027A (en) * | 2018-06-04 | 2018-12-21 | 平安科技(深圳)有限公司 | Speech differentiation model training method, device, computer equipment and storage medium |
CN109522393A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Intelligent answer method, apparatus, computer equipment and storage medium |
CN109389091A (en) * | 2018-10-22 | 2019-02-26 | 重庆邮电大学 | The character identification system and method combined based on neural network and attention mechanism |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021051507A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Bot conversation generation method, device, readable storage medium, and bot |
WO2021103775A1 (en) * | 2019-11-27 | 2021-06-03 | 深圳追一科技有限公司 | Voice intent recognition method and device, computer device and storage medium |
CN110930989A (en) * | 2019-11-27 | 2020-03-27 | 深圳追一科技有限公司 | Speech intention recognition method and device, computer equipment and storage medium |
CN110930989B (en) * | 2019-11-27 | 2021-04-06 | 深圳追一科技有限公司 | Speech intention recognition method and device, computer equipment and storage medium |
CN110993124A (en) * | 2019-12-09 | 2020-04-10 | 上海光电医用电子仪器有限公司 | Monitoring system and method with voice response function |
CN111224863A (en) * | 2019-12-10 | 2020-06-02 | 平安国际智慧城市科技股份有限公司 | Session task generation method and device, computer equipment and storage medium |
CN111224863B (en) * | 2019-12-10 | 2021-06-22 | 平安国际智慧城市科技股份有限公司 | Session task generation method and device, computer equipment and storage medium |
CN111159346A (en) * | 2019-12-27 | 2020-05-15 | 深圳物控智联科技有限公司 | Intelligent answering method based on intention recognition, server and storage medium |
CN111462752A (en) * | 2020-04-01 | 2020-07-28 | 北京思特奇信息技术股份有限公司 | Client intention identification method based on attention mechanism, feature embedding and BI-L STM |
CN111462752B (en) * | 2020-04-01 | 2023-10-13 | 北京思特奇信息技术股份有限公司 | Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method |
CN113961698A (en) * | 2020-07-15 | 2022-01-21 | 上海乐言信息科技有限公司 | Intention classification method, system, terminal and medium based on neural network model |
CN111859911B (en) * | 2020-07-28 | 2023-07-25 | 中国平安人寿保险股份有限公司 | Image description text generation method, device, computer equipment and storage medium |
CN111859904A (en) * | 2020-07-31 | 2020-10-30 | 南京三百云信息科技有限公司 | NLP model optimization method and device and computer equipment |
CN112035643A (en) * | 2020-09-01 | 2020-12-04 | 中国平安财产保险股份有限公司 | Method and device for reusing capabilities of conversation robot |
CN112035643B (en) * | 2020-09-01 | 2023-10-24 | 中国平安财产保险股份有限公司 | Method and device for multiplexing capacity of conversation robot |
CN112347788A (en) * | 2020-11-06 | 2021-02-09 | 平安消费金融有限公司 | Corpus processing method, apparatus and storage medium |
CN112667787A (en) * | 2020-11-26 | 2021-04-16 | 平安普惠企业管理有限公司 | Intelligent response method, system and storage medium based on phonetics label |
CN112614514A (en) * | 2020-12-15 | 2021-04-06 | 科大讯飞股份有限公司 | Valid voice segment detection method, related device and readable storage medium |
CN112614514B (en) * | 2020-12-15 | 2024-02-13 | 中国科学技术大学 | Effective voice fragment detection method, related equipment and readable storage medium |
CN112749761A (en) * | 2021-01-22 | 2021-05-04 | 上海机电工程研究所 | Enemy combat intention identification method and system based on attention mechanism and recurrent neural network |
CN113254621A (en) * | 2021-06-21 | 2021-08-13 | 中国平安人寿保险股份有限公司 | Seat call prompting method and device, computer equipment and storage medium |
CN113254621B (en) * | 2021-06-21 | 2024-06-14 | 中国平安人寿保险股份有限公司 | Seat call prompting method and device, computer equipment and storage medium |
CN114360517A (en) * | 2021-12-17 | 2022-04-15 | 天翼爱音乐文化科技有限公司 | Audio processing method and device in complex environment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110162610A (en) | Intelligent robot answer method, device, computer equipment and storage medium | |
CN110287283B (en) | Intention model training method, intention recognition method, device, equipment and medium | |
CN112017644B (en) | Sound transformation system, method and application | |
US11222620B2 (en) | Speech recognition using unspoken text and speech synthesis | |
Chen et al. | End-to-end neural network based automated speech scoring | |
US20210295858A1 (en) | Synthesizing speech from text using neural networks | |
WO2020215666A1 (en) | Speech synthesis method and apparatus, computer device, and storage medium | |
CN107871496B (en) | Speech recognition method and device | |
CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
CN110827801A (en) | Automatic voice recognition method and system based on artificial intelligence | |
US20220059083A1 (en) | Neural modulation codes for multilingual and style dependent speech and language processing | |
Michelsanti et al. | Vocoder-based speech synthesis from silent videos | |
JP7393585B2 (en) | WaveNet self-training for text-to-speech | |
Paul et al. | Enhancing speech intelligibility in text-to-speech synthesis using speaking style conversion | |
An et al. | Disentangling style and speaker attributes for TTS style transfer | |
KR102363324B1 (en) | Method and tts system for determining the unvoice section of the mel-spectrogram | |
KR20200088263A (en) | Method and system of text to multiple speech | |
KR20220071960A (en) | A method and a TTS system for calculating an encoder score of an attention alignment corresponded to a spectrogram | |
CN117789771A (en) | Cross-language end-to-end emotion voice synthesis method and system | |
KR102532253B1 (en) | A method and a TTS system for calculating a decoder score of an attention alignment corresponded to a spectrogram | |
Yousfi et al. | Isolated Iqlab checking rules based on speech recognition system | |
KR20220071522A (en) | A method and a TTS system for generating synthetic speech | |
Karim et al. | Text to speech using Mel-Spectrogram with deep learning algorithms | |
CN115171700B (en) | Voiceprint recognition voice assistant method based on impulse neural network | |
KR102503066B1 (en) | A method and a TTS system for evaluating the quality of a spectrogram using scores of an attention alignment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190823 |
|
RJ01 | Rejection of invention patent application after publication |