CN106056207B - A kind of robot depth interaction and inference method and device based on natural language - Google Patents

A kind of robot depth interaction and inference method and device based on natural language Download PDF

Info

Publication number
CN106056207B
CN106056207B CN201610302605.5A CN201610302605A CN106056207B CN 106056207 B CN106056207 B CN 106056207B CN 201610302605 A CN201610302605 A CN 201610302605A CN 106056207 B CN106056207 B CN 106056207B
Authority
CN
China
Prior art keywords
case
user
attribute
text
robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610302605.5A
Other languages
Chinese (zh)
Other versions
CN106056207A (en
Inventor
闵华松
李潇
齐诗萌
林云汉
周昊天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Science and Engineering WUSE
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN201610302605.5A priority Critical patent/CN106056207B/en
Publication of CN106056207A publication Critical patent/CN106056207A/en
Application granted granted Critical
Publication of CN106056207B publication Critical patent/CN106056207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Abstract

The robot depth based on natural language that the invention discloses a kind of interacts and inference method and device, and this approach includes the following steps:1)Speech recognition:User speech input is received, input signal is handled, text message is obtained;2)Obtain case attribute:By step 1)The text of middle acquisition carries out word segmentation processing, and the case in the text and case library after participle is then carried out the attribute that similarity mode extracts case;3)Depth dialogue is interacted with three-dimensional scene:If according to step 2)The user view for extracting the attribute acquisition of case is imperfect, then the real-time map file for combining Kinect sensor to obtain repeatedly guides user, is completely intended to until obtaining, and is then directed to the job task that user is completely intended to and generates solution;Phonetic synthesis:Obtained solution is showed in a text form, synthesis voice feeds back to user by stereo set.Robot uses natural language per family with use in interactive process of the present invention.

Description

A kind of robot depth interaction and inference method and device based on natural language
Technical field
The present invention relates to artificial intelligence technology more particularly to a kind of robot depth interactions and reasoning based on natural language Method and apparatus.
Background technology
In recent years, with the fast development of intelligent robot, it is intended that allowing robot in complexity by way of dialogue Various job tasks are completed in environment.It is communicated with natural language with machine, this is that people are pursued for a long time:People It can make operation robot with the language that oneself is most accustomed to, go to learn various complexity with energy without taking much time again Computer language.
In this process, it is necessary to which intelligent robot system understands natural language, understands user and it is expected, and has one Kind inference mechanism makes inferences real time problem, solves and learns.In current achievement in research, representative inference mechanism There are Process Based (Rule-Based Reasoning, RBR), procedural inference (Procedural Reasoning System, PRS) and case-based reasoningf (case-based reasoning, CBR).Wherein, Process Based is core Inference mechanism be difficult to obtain inference rule in certain fields without being widely used;Kernel-based methods inference mechanism shortens Inference time, but there is also some shortcomings that newly-generated planning can not be learnt and be stored such as the restriction of planning library; The mechanism of case-based reasoningf has certain by accessing the source example in case searching to obtain the solution of current case Learning ability, it may have higher practicability.
But the inference mechanism of case-based reasoningf does not have analysis ability, can not analyze the indefinite purposes of user and anti- Feedback guiding, does not have independence.In this context, this method introduces BDI (belief-desire-intention) model, BDI It is a kind of behavior cognition framework, essence is how target in order to solve how to determine intelligent body and intelligent body realize target, It by case-based reasoning techniques and BDI models couplings, can both increase the independence of inference system, also solve BDI models The shortcomings that without learning ability.Meanwhile depth dialogue and three-dimensional scene reasoning process are also introduced, by reasoning and actual scene knot Altogether, the intelligent of robot is improved.
Invention content
The technical problem to be solved in the present invention is for the defects in the prior art, to provide a kind of based on natural language The interaction of robot depth and inference method and device realize that user interacts and reasoning with the depth of robot by natural language, Improve the intelligent and independence of robot.
The technical solution adopted by the present invention to solve the technical problems is:A kind of robot depth friendship based on natural language Mutually and inference method, include the following steps:
1) speech recognition:User speech input is received, input signal is handled, text message is obtained;
2) case attribute is obtained:The text obtained in step 1) is subjected to word segmentation processing, then by after participle text with Case in case library carries out the attribute of matching extraction present case;
For storing the case being pre-designed according to actual scene, each case has including following basic the case library Attribute value, including:The attribute set of case and the solution of case;
3) depth dialogue is interacted with three-dimensional scene:If the user that the attribute for extracting present case according to step 2) obtains anticipates Scheme it is imperfect, then combine Kinect sensor obtain real-time map information user is repeatedly guided, until obtain completely It is intended to, is then directed to the job task that user is completely intended to and generates solution;
Phonetic synthesis:Inference machine shows obtained solution in a text form, and machine is in a manner of voice It is sent to user, user is fed back to by stereo set using TTS technologies synthesis voice.
By said program, the step 1) speech recognition process specifically comprises the following steps:
1.1) it pre-processes:User speech information is acquired by microphone array, at the primary speech signal of input Reason, filter out unessential information therein and ambient noise, and carry out the end-point detection of voice signal, voice framing and Preemphasis processing;
1.2) feature extraction:The key characterization parameter for extracting reflection phonic signal character forms feature vector sequence;
1.3) Hidden Markov Model (HMM) is used to carry out acoustic model modeling, it will be to be identified during identification Voice is matched with acoustic model, to obtain recognition result;
1.4) grammer, semantic analysis are carried out to training text database, trains to obtain N-Gram by being based on statistical model Language model reduces search range to improve discrimination.
1.5) it is directed to the voice signal of input, is established according to oneself trained good HMM acoustic models, language model and dictionary One identification network, a best paths is found according to searching algorithm, this path is to maximum in the network The word string of the probability output voice signal, so that it is determined that the word that this speech samples is included.
By said program, the attribute of extraction present case is in the text and case library after segmenting in the step 2) Case carries out the attribute of the matching extraction present case of the text similarity based on vector space model.
By said program, the foundation of case library is using following steps in the step 2):
Conversation subject is designed according to demand, and according to conversation subject come design motif tree, subject tree is divided into theme node, necessary Attribute node and leaf node, the relationship between them is leaf node subordinate and indispensable attributes node, and indispensable attributes node is subordinated to Theme node, there are one the effective statuses of two-value to accord with for each node, be between leaf node or relationship, indispensable attributes Between node for relationship;
Dialogue generating function is write according to the node of subject tree, the set of these dialogue generating functions constitutes guiding library; Under different system modes, call the function that can obtain different response output, each generating function of talking with only is responsible for its institute The response of corresponding node, is independent of each other in design and modification.
Case attribute process is obtained by said program, in the step 2) to specifically comprise the following steps:
2.1) word segmentation processing is carried out to the text obtained in step 1), i.e., by text segmentation at single phrase;
2.2) text after participle is matched with the case in case library, since each case includes the attribute of task Set, when retrieving most like case, will extract the corresponding task attribute of case;
By said program, step 3) the depth dialogue specifically comprises the following steps with three-dimensional scene interactive process:
3.1) when inference machine receives voice messaging input, cartographic information that robot is obtained according to Kinect sensor Judge that user inputs voice, if uncorrelated to current map information, robot can carry out user's guiding;If user inputs and works as Preceding cartographic information is related, then user can be inputted and be matched with the case in case library by robot, if there are similar cases, User's input information is matched with the cartographic information that Kinect sensor obtains, judges whether to disclosure satisfy that user it is expected simultaneously Feed back to user;
3.2) after by Case Retrieval and map match, inference machine has just obtained corresponding task attribute and matching degree, connects Get off to be analyzed to obtain user to these information and it is expected, if be calculated user be contemplated to be it is complete if need not be into Row further guiding, is transferred to step 3.4), if it is desired to and it is imperfect, then it needs to carry out further user's guiding, is transferred to step 3.3);
3.3) a guiding case library is built with XML file, the guiding library contains when user it is expected imperfect and is directed to Lack the boot scheme that attribute makes user;By the desired each attribute of user with guiding library case attribute one by one compared with, It is all mutually 1, is not all 0, obtained value is added, and maximum value is best case, and the guiding case is taken to be guided as boot scheme User;It is expected until obtaining complete user;
3.4) weight after calling this in case library completely it is expected corresponding solution and matched with real-time three-dimensional environmental information With a succession of executable action sequence (Intention) of generation, to realize specified job task.
A kind of robot depth interaction and reasoning device based on natural language, including:
Point cloud acquisition module is used for the collected map depth informations of Kinect and colouring information after fusion treatment Generate three dimensional point cloud (PCD), by pretreatment, key point extraction, description son extraction, then by object features database into Row characteristic matching obtains three-dimensional scenic semanteme map and describes file;
Sound identification module, the voice signal input by user for being acquired to microphone array carry out noise reduction process, and Feature extraction is carried out using MFCC algorithms to search for by tone decoding then in conjunction with HMM acoustic models and N-gram language models Algorithm converts voice signal to text.
Depth is talked with and three-dimensional scene interactive module, for retrieving the case in the text and case library that receive Most like case is found, the map file that binding object recognition node obtains carries out map match, it is expected analysis and guiding, from And the expectation for improving user generates solution, while voice is sent in a text form to the answer and guidance information of user Synthesize node;
Voice synthetic module, the text obtained when using TTS technologies by human-computer interaction by text analyzing, prosody modeling and Three steps of phonetic synthesis generate corresponding voice signal and feed back to user;
Case library, the knowledge base for storing experience in reality built with XML file use for reference the experience memory mould of the mankind Formula, according to actual scene cases of design, each case includes following essential attribute value:The solution of attribute set and case.
The beneficial effect comprise that:
1. robot uses natural language, robot that can be obtained from main boot user per family with use in interactive process of the present invention It obtains user completely it is expected, and solution is obtained to execute task with case storehouse matching.
2. the present invention is using a kind of depth interaction towards Chinese speech and inference mechanism, on the basis of traditional CBR-BDI Upper increase depth dialogue and three-dimensional scene interactive module, and realize " user, which expresses, to be intended to mismatch with actual scene " and " use Family expression be intended to it is imperfect " when interaction and reasoning.Since this method is to supplement the unknown category in being intended to by human-computer interaction Property, compared to the reasoning based on common sense, this method can be more accurate, flexible and practical;Inference mechanism is with CBR-BDI simultaneously Basis can solve existing issue using past experience, and can carry out feedback to problem, can independently go to realize mesh Mark has preferable market application prospect and development potentiality.
Description of the drawings
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the hardware system framework figure of robot depth interaction and reasoning device in the embodiment of the present invention;
Fig. 2 is the interaction of robot depth and inference method program flow diagram in the embodiment of the present invention;
Fig. 3 is depth interaction and inference mechanism flow chart in the embodiment of the present invention;
Fig. 4 is depth dialogue and three-dimensional scene interactive module reasoning flow chart in the embodiment of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not used to limit The fixed present invention.
As shown in FIG. 1, FIG. 1 is a kind of robot depth interactions and inference method based on natural language proposed by the present invention Hardware system framework when for robot sorting system.Voice is inputted by microphone array, text is obtained by sound identification module This information, text message inputs man-machine depth alternate reasoning module, while the map file that the identification of Kinect cameras obtains also is sent out Depth interaction and reasoning module are given, obtaining complete user by improved CBR-BDI inference mechanisms it is expected, in map file The middle coordinate position for obtaining target, generates solution.The system platform used in the present invention is that Ubuntu (version 12.04) is embedding Enter formula platform.
Fig. 2 is a kind of the robot depth interaction and inference method program circuit based on natural language that the present invention is implemented Figure, it is main as follows:User information is converted to text by speech recognition process, case is carried out in case library after text participle Retrieval, analyzes the case attribute that Case Retrieval obtains, if case number of attributes is more than 0, carries out map match;If case Original state number of attributes is 0 in example, then represents user's input in vain, and the extraction guiding case from guiding library is needed to guide Until status attribute quantity is more than 0, map match is then carried out.If user's desired object quantity and the physical quantities in map Match, then carries out expectation analysis;If user's desired object quantity is mismatched with physical quantities in map, need to guide, directly To user's desired object quantity with until physical quantities match in map.Finally the value of the attribute of case and map match is added Expectation analysis is done to user's expectation, whether analysis expectation is complete, if all indispensable attributes are not in present case acquisition attribute value For sky, then it is expected completely, otherwise it is expected imperfect;(it is expected that whether the attribute completely obtained by present case includes Whole attributes with case judge), if expectation is imperfect, need further to guide, until it is expected it is complete until;If it is expected that Completely, then it extracts required information and generates solution, that is, user view.
Robot depth interaction and inference method flow chart of the Fig. 3 based on natural language, include mainly speech recognition, case Storage, acquisition case attribute, depth dialogue is interacted with three-dimensional scene and five parts of phonetic synthesis.
The specific implementation of the present invention is as follows:
S1:Speech recognition
S11:User handles the primary speech signal of input, is filtered out by microphone array input voice information Fall unessential information therein and ambient noise, and carries out at the end-point detection of voice signal, voice framing and preemphasis Reason.
S12:Speech recognition is carried out using Mel frequency cepstral coefficients (MFCC) algorithm.Using MFCC features, use Frame removes segmentation speech waveform, and per the general 10ms of frame, then per frame, extraction can represent 39 numbers of the frame voice, this 39 numbers The MFCC features of the word namely frame voice, are indicated with feature vector.
S13:Acoustic model modeling is carried out using hidden Markov model (HMM).To the time series structure of voice signal Statistical model is established, is changed come analog voice signal statistical property with the Markov chain with finite state number.
S14:Language model modeling is carried out using N-Gram models, carrys out the relationship between descriptor and word.The technical program N-gram language models are obtained using the training tool CMUCLMTK of CMU offers.
S15:Using the Viterbi algorithm based on Dynamic Programming at every point of time on each state, calculate decoding shape The posterior probability of state sequence pair observation sequence retains the path of maximum probability, and the corresponding state letter under each nodes records Breath is so as to last reversed acquisition word decoding sequence.
S2:Case stores
Case library is stored using XML file form in the technical program, and with good grounds actual scene designs in case library 1~ N case, there are two essential attribute values for each case, including:The attribute set of case, the solution (robot of case A series of actions sequence), by with the final attribute set that is generated after environmental interaction and reasoning.
For each newly-generated case, an initial attribute set can be obtained by similitude matching, it is initial to belong to Xing Ji credit unions constantly change with alternate reasoning process, and after final generation is complete to be intended to, end-state is stored in finally In attribute set.
Case is divided into inquiry theme and sorting theme in present design, and the attribute set of case includes:Physical quantities, object The title for the destination that body title, the position of object, object color, article size, object are placed.Such as case:" crawl one Red Big Apple is placed in the basket of the left side ", attribute is assigned as:Physical quantities:"one";Object names:" apple ";Object position It is set to sky;Object color:" red ";Article size:" big ";The destination title that object is placed:" left side basket ".
S3:Obtain case attribute
S31:The text obtained in S1 is segmented using segmenter.Example 1:User inputs the text that voice converts: " one apple of crawl ", after segmenter segments, result is:"/mono-/apple/of crawl ".
S32:Each word after participle is matched with case library, if not retrieving similar cases, establishes new case Example;If retrieving similar cases, similar cases are returned:, and calculate initial case number of attributes.Example 2:Case:" crawl one A apple ", case initial attribute have:Physical quantities and object names, then initial case attribute number magnitude is 2.When initial case Number of attributes is more than 0, then carries out map match;When initial case number of attributes is equal to 0, then input is invalid, robot actively into Row guiding.
S4:Depth dialogue is interacted with three-dimensional scene, and detailed process is as shown in Figure 4:
S41:Map match;
S411:System needs to obtain the operating environment Semantic map information of high quality by 3D vision environment sensing.This Design scheme extracts 3D point cloud image by Kinect and establishes CSHOT object models for the characteristic matching in scene.It calls Point cloud library (PCL) realizes real-time object using the method based on local surfaces feature descriptor to common daily rigid objects Body identifies and understands.The detection that object is realized by region growing segmentation algorithm, extracts the ISS characteristic points of scene;In key point Place calculates CSHOT feature description vectors;Candidate family is generated by the 3D characteristic matchings based on distance threshold;Pass through stochastical sampling Consistency algorithm generates transformation it is assumed that being verified by iteration closest approach algorithm pair hypothesis, generates one and keeps complete with scene The solution of office's consistency is simultaneously converted the coordinate information of object to robot coordinate system by coordinate conversion.Object will be obtained Mark and geological information be written XML semanteme map files.
The attribute of XML map file objects includes:The number of object in scene map;The title of object, such as apple, tangerine Son etc.;The color of object;The shape of object, such as cylindrical type, cube;The size of object, that is, long * wide the * high, (bottom surfaces half π * Diameter) 2* high etc..
S412:Whether have user desired object, and count its quantity if finding in XML maps, calculates the expectation object of user The case where matching of body quantity and map.Here it will appear four kinds of match conditions:(1) there is no satisfactory object in scene; (2) physical quantities are less than the desired quantity of user in scene;(3) quantity is just equal both in scene;(4) object in scene Body quantity is more than the desired quantity of user.
S42:It is expected that analyzing
Illustrate that being contemplated to be for user is effective and determination when there are situation in S412 (3), can carry out at this time in next step Expectation analysis, be to need to carry out user's guiding using the method in S43 for there is situation in S412 (1) (2) (4).
When it is expected analysis, the value of the attribute of case and map match is added to user and it is expected to do expectation analysis, phase Prestige can completely carry out case reuse and generate intention, i.e. robot motion sequence, otherwise it is expected imperfect to call guiding case Example library guides.
Example 3:There are one red Big Apples, user to be said to robot in map file:" one apple of crawl ", carries out map Matching is that corresponding situation is:Physical quantities and user's requested number are equal in scene, but carry out it is expected analysis when, object The destination name attribute value of placement is sky, then it is expected imperfect, needs to carry out user's guiding using the method in S43.
S43:User guides
When user it is expected that the expectation that analysis obtains is imperfect, user's guiding is carried out.Each attribute in case storage All there are one dialogue generating functions to be corresponding to it for node, and the set of these dialogue generating functions constitutes guiding library.In example 3, it is expected that Imperfect, the library of retrieval guiding at this time, robot can inquire user according to default attribute:" which basket may I ask will be put into ", then According to the information of user feedback, completion case attribute.
S44:Case reuses and complete intention generates
It is intended to through one or many guiding when incomplete, generating complete expectation, (indispensable attributes are not sky, not It is that all properties have required value), call this in case library completely it is expected corresponding solution and believe with real-time three-dimensional environment It is reused after breath matching, generates a succession of executable action sequence, to realize specified job task.
S5:Phonetic synthesis
S51:Text analyzing
By the text normalization of input, and the possible splicing mistake of user is handled, by the lack of standardization of appearance or can not pronounced Character filtering fall.The boundary for analyzing the word or phrase in text, determines the pronunciation of word, while analyzing the number occurred in text The pronunciation mode of word, surname, spcial character and various polyphones.Determine the weight side of tone transcriber not unisonance when pronunciation Formula.Finally, the inner parameter that the text conversion of input can be handled at computer, is further processed and is given birth to convenient for subsequent module At corresponding information.
S52:Prosody modeling
Go out segment5al feature for synthesis voice planning, prosodic parameter includes fundamental frequency, the duration of a sound, loudness of a sound, enables synthesis voice just The really expression meaning of one's words sounds more natural.
S53:Phonetic synthesis
According to prosody modeling as a result, converting text to voice output using Pitch synchronous overlap add method PSOLA.Processing The speech primitive of individual character or phrase corresponding to good text is extracted from phonetic synthesis library, utilizes specific speech synthesis technique The adjustment and modification that prosody characteristics are carried out to speech primitive, finally synthesize satisfactory voice, are fed back by stereo set To user.
It should be understood that for those of ordinary skills, it can be modified or changed according to the above description, And all these modifications and variations should all belong to the protection domain of appended claims of the present invention.

Claims (5)

1. a kind of robot depth interaction and inference method based on natural language, which is characterized in that include the following steps:
1) speech recognition:User speech input is received, input signal is handled, text message is obtained;
2) case attribute is obtained:The text obtained in step 1) is subjected to word segmentation processing, then by the text and case after participle Case in library carries out the attribute of the matching extraction case of the text similarity based on vector space model;
The case library is for storing the case being pre-designed according to actual scene, essential attribute value that there are three each cases, Including:The initial attribute set of case, the solution of case, by with the final property set that is generated after environmental interaction and reasoning It closes;
3) depth dialogue is interacted with three-dimensional scene:If the user view that the attribute for extracting case according to step 2) obtains is endless Whole, then the real-time map file for combining Kinect sensor to obtain repeatedly guides user, is completely intended to until obtaining, so It is directed to the job task that user is completely intended to afterwards and generates solution;
Step 3) the depth dialogue specifically comprises the following steps with three-dimensional scene interactive process:
3.1) when inference machine receives voice messaging input, robot judges according to the cartographic information that Kinect sensor obtains User inputs voice, if uncorrelated to current map information, robot can carry out user's guiding;If user inputs and current position Figure information is related, then user can be inputted and be matched with the case in case library by robot, if there are similar cases, will be used Family input information is matched with the cartographic information that Kinect sensor obtains, and judges whether to disclosure satisfy that user it is expected and feeds back To user;
3.2) after by Case Retrieval and map match, inference machine has just obtained corresponding task attribute and matching degree, next These information are analyzed to obtain user's expectation, it need not be into traveling if user is calculated and is contemplated to be completely One step guides, and is transferred to step 3.4), if it is desired to and it is imperfect, then it needs to carry out further user's guiding, is transferred to step 3.3);
3.3) a guiding case library is built with XML file, the guiding library, which contains to be directed to when user it is expected imperfect, to be lacked The boot scheme that attribute makes user;By the desired each attribute of user with guiding case library case attribute one by one compared with, It is all mutually 1, is not all 0, obtained value is added, and maximum value is best case, and the guiding case is taken to be guided as boot scheme User;It is expected until obtaining complete user;
3.4) this in case library is called completely it is expected corresponding solution and reused after being matched with real-time three-dimensional environmental information, it is raw At a succession of executable action sequence, to realize specified job task;
Phonetic synthesis:Inference machine shows obtained solution in a text form, and use is sent in a manner of voice Family.
2. robot depth interaction and inference method according to claim 1 based on natural language, which is characterized in that institute Step 1) speech recognition process is stated to specifically comprise the following steps:
1.1) it pre-processes:User speech information is acquired by microphone array, the primary speech signal of input is handled, is filtered Unessential information therein and ambient noise are removed, and carries out the end-point detection of voice signal, voice framing and pre-add It handles again;
1.2) feature extraction:The key characterization parameter for extracting reflection phonic signal character forms feature vector sequence;
1.3) Hidden Markov Model is used to carry out acoustic model modeling, by voice and acoustics to be identified during identification Model is matched, to obtain recognition result;
1.4) grammer, semantic analysis are carried out to training text database, trains to obtain N-Gram language by being based on statistical model Model reduces search range to improve discrimination;
1.5) it is directed to the voice signal of input, one is established according to oneself trained good HMM acoustic models, language model and dictionary It identifies network, finds a best paths in the network according to searching algorithm, this path is to maximum probability The word string of the voice signal is exported, so that it is determined that the word that this speech samples is included.
3. robot depth interaction and inference method according to claim 1 based on natural language, which is characterized in that institute The foundation for stating case library in step 2) uses following steps:
Conversation subject is designed according to demand, and according to conversation subject come design motif tree, subject tree is divided into theme node, indispensable attributes Node and leaf node, there are one the effective statuses of two-value to accord with for each node;
Dialogue generating function is write according to the node of subject tree, the set of these dialogue generating functions constitutes guiding library;In difference System mode under, call the function that can obtain different response outputs, each generating function of talking with only is responsible for corresponding to it The response of node is independent of each other in design and modification.
4. robot depth interaction and inference method according to claim 1 based on natural language, which is characterized in that institute Acquisition case attribute process in step 2) is stated to specifically comprise the following steps:
2.1) word segmentation processing is carried out to the text obtained in step 1), i.e., by text segmentation at single phrase;
2.2) text after participle is matched with the case in case library, since each case includes the feature and phase of problem Task attribute is answered, when retrieving most like case, the corresponding task attribute of case will be extracted.
5. a kind of robot depth interaction and reasoning device based on natural language, which is characterized in that including:
Point cloud acquisition module, for generating the collected map depth informations of Kinect and colouring information after fusion treatment Three dimensional point cloud carries out feature by pretreatment, key point extraction, description son extraction, then by object features database File is described with three-dimensional scenic semanteme map is obtained;
Sound identification module, the voice signal input by user for being acquired to microphone array carries out noise reduction process, and uses MFCC algorithms carry out feature extraction and pass through tone decoding searching algorithm then in conjunction with HMM acoustic models and N-gram language models Convert voice signal to text;
Depth is talked with and three-dimensional scene interactive module, for the case in the text and case library that receive to be carried out retrieval searching Most like case, the map file that binding object recognition node obtains carries out map match, it is expected analysis and guiding, to complete The expectation of kind user generates solution, while being sent to phonetic synthesis in a text form to the answer and guidance information of user Node;
Voice synthetic module, the text obtained when using TTS technologies by human-computer interaction pass through text analyzing, prosody modeling and voice Three steps of synthesis generate corresponding voice signal and feed back to user;
Case library, for storing the case being pre-designed according to actual scene, the case includes following essential attribute value:Attribute The solution of set and case.
CN201610302605.5A 2016-05-09 2016-05-09 A kind of robot depth interaction and inference method and device based on natural language Active CN106056207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610302605.5A CN106056207B (en) 2016-05-09 2016-05-09 A kind of robot depth interaction and inference method and device based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610302605.5A CN106056207B (en) 2016-05-09 2016-05-09 A kind of robot depth interaction and inference method and device based on natural language

Publications (2)

Publication Number Publication Date
CN106056207A CN106056207A (en) 2016-10-26
CN106056207B true CN106056207B (en) 2018-10-23

Family

ID=57176186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610302605.5A Active CN106056207B (en) 2016-05-09 2016-05-09 A kind of robot depth interaction and inference method and device based on natural language

Country Status (1)

Country Link
CN (1) CN106056207B (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6439806B2 (en) * 2017-01-11 2018-12-19 富士ゼロックス株式会社 Robot apparatus and program
CN106847271A (en) * 2016-12-12 2017-06-13 北京光年无限科技有限公司 A kind of data processing method and device for talking with interactive system
CN107066444B (en) * 2017-03-27 2020-11-03 上海奔影网络科技有限公司 Corpus generation method and apparatus based on multi-round interaction
CN106997243B (en) * 2017-03-28 2019-11-08 北京光年无限科技有限公司 Speech scene monitoring method and device based on intelligent robot
CN107423398B (en) * 2017-07-26 2023-04-18 腾讯科技(上海)有限公司 Interaction method, interaction device, storage medium and computer equipment
CN109522531B (en) * 2017-09-18 2023-04-07 腾讯科技(北京)有限公司 Document generation method and device, storage medium and electronic device
CN107622523B (en) * 2017-09-21 2018-08-21 石器时代(内蒙古)智能机器人科技有限公司 A kind of intelligent robot
CN107919126A (en) * 2017-11-24 2018-04-17 合肥博焱智能科技有限公司 A kind of intelligent speech interactive system
CN108009285B (en) * 2017-12-22 2019-04-26 重庆邮电大学 Forest Ecology man-machine interaction method based on natural language processing
CN107993651B (en) * 2017-12-29 2021-01-19 深圳和而泰数据资源与云技术有限公司 Voice recognition method and device, electronic equipment and storage medium
CN108114469A (en) * 2018-01-29 2018-06-05 北京神州泰岳软件股份有限公司 Game interaction method, apparatus, terminal and game interaction model based on dialogue
CN110399471A (en) * 2018-04-25 2019-11-01 北京快乐智慧科技有限责任公司 A kind of guiding situational dialogues method and system
CN111755009A (en) * 2018-06-26 2020-10-09 苏州思必驰信息科技有限公司 Voice service method, system, electronic device and storage medium
CN110750626B (en) * 2018-07-06 2022-05-06 中国移动通信有限公司研究院 Scene-based task-driven multi-turn dialogue method and system
CN109063840A (en) * 2018-07-10 2018-12-21 广州极天信息技术股份有限公司 A kind of Interactive Dynamic inference method and device
CN110853674A (en) * 2018-07-24 2020-02-28 中兴通讯股份有限公司 Text collation method, apparatus, and computer-readable storage medium
CN109119064A (en) * 2018-09-05 2019-01-01 东南大学 A kind of implementation method suitable for overturning the Oral English Teaching system in classroom
CN109243451A (en) * 2018-10-22 2019-01-18 武汉科技大学 A kind of network marketing method and system based on robot voice interaction
CN109766072B (en) * 2018-12-17 2022-02-01 深圳壹账通智能科技有限公司 Information verification input method and device, computer equipment and storage medium
CN109724603A (en) * 2019-01-08 2019-05-07 北京航空航天大学 A kind of Indoor Robot air navigation aid based on environmental characteristic detection
CN110047480A (en) * 2019-04-22 2019-07-23 哈尔滨理工大学 Added Management robot head device and control for the inquiry of department, community hospital
CN110096707B (en) * 2019-04-29 2020-09-29 北京三快在线科技有限公司 Method, device and equipment for generating natural language and readable storage medium
CN111935348A (en) * 2019-05-13 2020-11-13 阿里巴巴集团控股有限公司 Method and device for providing call processing service
WO2020258082A1 (en) * 2019-06-26 2020-12-30 深圳市欢太科技有限公司 Information recommendation method and apparatus, electronic device and storage medium
CN110310620B (en) * 2019-07-23 2021-07-13 苏州派维斯信息科技有限公司 Speech fusion method based on native pronunciation reinforcement learning
CN110784603A (en) * 2019-10-18 2020-02-11 深圳供电局有限公司 Intelligent voice analysis method and system for offline quality inspection
CN110955675B (en) * 2019-10-30 2023-12-19 中国银联股份有限公司 Robot dialogue method, apparatus, device and computer readable storage medium
CN110928302A (en) * 2019-11-29 2020-03-27 华中科技大学 Man-machine cooperative natural language space navigation method and system
CN110956958A (en) * 2019-12-04 2020-04-03 深圳追一科技有限公司 Searching method, searching device, terminal equipment and storage medium
CN112233666A (en) * 2020-10-22 2021-01-15 中国科学院信息工程研究所 Method and system for storing and retrieving Chinese voice ciphertext in cloud storage environment
CN112100338B (en) * 2020-11-02 2022-02-25 北京淇瑀信息科技有限公司 Dialog theme extension method, device and system for intelligent robot
CN112435658A (en) * 2020-12-18 2021-03-02 中国南方电网有限责任公司 Human-computer interaction system for natural language processing dialogue exchange based on corpus
CN112732743B (en) * 2021-01-12 2023-09-22 北京久其软件股份有限公司 Data analysis method and device based on Chinese natural language
CN113034592B (en) * 2021-03-08 2021-08-31 西安电子科技大学 Three-dimensional scene target detection modeling and detection method based on natural language description
CN114265920B (en) * 2021-12-27 2022-07-01 北京易聊科技有限公司 Intelligent robot conversation method and system based on signals and scenes
CN115527538B (en) * 2022-11-30 2023-04-07 广汽埃安新能源汽车股份有限公司 Dialogue voice generation method and device
CN116804691B (en) * 2023-06-28 2024-02-13 国网安徽省电力有限公司青阳县供电公司 Fault monitoring method for dispatching automation equipment of power system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510222A (en) * 2009-02-20 2009-08-19 北京大学 Multilayer index voice document searching method and system thereof
CN101551947A (en) * 2008-06-11 2009-10-07 俞凯 Computer system for assisting spoken language learning
CN101604204A (en) * 2009-07-09 2009-12-16 北京科技大学 Distributed cognitive technology for intelligent emotional robot

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551947A (en) * 2008-06-11 2009-10-07 俞凯 Computer system for assisting spoken language learning
CN101510222A (en) * 2009-02-20 2009-08-19 北京大学 Multilayer index voice document searching method and system thereof
CN101604204A (en) * 2009-07-09 2009-12-16 北京科技大学 Distributed cognitive technology for intelligent emotional robot

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于语音识别与文字理解的导购机器人设计与实现》;程志强;《中国优秀硕士论文全文数据库,信息科技辑》;20150331;第I140-462页 *

Also Published As

Publication number Publication date
CN106056207A (en) 2016-10-26

Similar Documents

Publication Publication Date Title
CN106056207B (en) A kind of robot depth interaction and inference method and device based on natural language
Pandey et al. Deep learning techniques for speech emotion recognition: A review
US20180203946A1 (en) Computer generated emulation of a subject
CN112037754B (en) Method for generating speech synthesis training data and related equipment
CN107851434A (en) Use the speech recognition system and method for auto-adaptive increment learning method
CN106971709A (en) Statistic parameter model method for building up and device, phoneme synthesizing method and device
CN115329779B (en) Multi-person dialogue emotion recognition method
Bhosale et al. End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios.
CN111862952B (en) Dereverberation model training method and device
CN116863038A (en) Method for generating digital human voice and facial animation by text
KR20200084443A (en) System and method for voice conversion
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN104538025A (en) Method and device for converting gestures to Chinese and Tibetan bilingual voices
Basak et al. Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems.
Ling An acoustic model for English speech recognition based on deep learning
Asadiabadi et al. Multimodal speech driven facial shape animation using deep neural networks
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN107123420A (en) Voice recognition system and interaction method thereof
Vlasenko et al. Fusion of acoustic and linguistic information using supervised autoencoder for improved emotion recognition
CN108629024A (en) A kind of teaching Work attendance method based on voice recognition
CN113257225A (en) Emotional voice synthesis method and system fusing vocabulary and phoneme pronunciation characteristics
Mahavidyalaya Phoneme and viseme based approach for lip synchronization
CN113538645A (en) Method and device for matching body movement and language factor of virtual image
CN115424616A (en) Audio data screening method, device, equipment and computer readable medium
CN110910904A (en) Method for establishing voice emotion recognition model and voice emotion recognition method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant