CN105785813A

CN105785813A - Intelligent robot system multi-modal output method and device

Info

Publication number: CN105785813A
Application number: CN201610158062.4A
Authority: CN
Inventors: 王合心
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2016-03-18
Filing date: 2016-03-18
Publication date: 2016-07-20

Abstract

The invention discloses an intelligent robot system multi-modal output method and device. The method comprises the following steps: generating a multi-modal output request according to received multi-modal input information, wherein the multi-modal input information comprises a voice command, an action command and visual information, and the multi-modal output request carries a multi-modal output label; responding to the multi-modal output request and searching a configuration file of a current intelligent robot, wherein the configuration file has correlation identification; and when the multi-modal output label is successfully matched with the correlation identification, calling the configuration file corresponding to the correlation identification, and executing multi-modal resource data corresponding to the configuration file. The method relieves calculating pressure of a server of an intelligent robot system effectively, and can prevent the problem that since a multi-modal output instruction is not matched with the multi-modal resource data actually carried by a robot, response cannot be realized according to the setting of the multi-modal output instruction, and thus efficiency of the system is improved.

Description

Method and device for the multi-modal output of intelligent robot system

Technical field

The present invention relates to field in intelligent robotics, particularly relate to a kind of method for the multi-modal output of intelligent robot system and device.

Background technology

Along with the development of robotics, intelligent robot product has been deep into the various aspects of people's life more and more.Robot is not only used for helping user to be efficiently completed the work specified, and more designed to be able to and carries out the multi-modal mutual partners such as language, action and emotion with user.

Intelligent robot system of the prior art completes with user based on remote server pair typically via network alternately.Server needs to undertake substantial amounts of analysis and calculating.The position apart from each other with the robot product of client generally can be arranged in due to server, server end and robotic end need to carry out data transmission via communication, so the real-time of system is subjected to impact, particularly when a large number of users is simultaneously to server for services, its system speed of response will seriously reduce.

Summary of the invention

One of the technical problem to be solved is to need to provide a kind of new multi-modal interaction mechanism to reduce cost, to improve real-time.

In order to solve above-mentioned technical problem, embodiments herein provide firstly a kind of method for the multi-modal output of intelligent robot system, multi-modal output request is generated including according to the multi-modal input information received, described multi-modal input information includes phonetic order, action command and visual information, and multi-modal output label is carried in described multi-modal output request；Responding described multi-modal output to ask, and search for the configuration file of current intelligent robot, described configuration file has association identification；When the match is successful for described multi-modal output label and association identification, call the configuration file corresponding with this association identification, and perform the multi-modal resource data that described configuration file is corresponding.

Preferably, also include, in robot operating system: by scene type data, and the multi-modal resource data corresponding with described scene type data configures according to arranging template；The multi-modal resource data of preset format is encapsulated as configuration file, and is stored in specified path.

Preferably, call and perform the multi-modal resource data corresponding with this association identification, including reading the multi-modal resource data being packaged in described configuration file；Current intelligent robot is indicated to carry out voice, action, multimedia output according to described multi-modal resource data.

Preferably, generating multi-modal output request according to the multi-modal input information received, come from, including receiving, the multi-modal output request of multi-modal input information generation resolving through server and issuing, described multi-modal output request comprises affection data and action data.

Preferably, labelling also stores the multi-modal resource data that current intelligent robot carries, and described multi-modal resource data includes speech data, action command data, multi-medium data.

Embodiments herein additionally provides a kind of device for the multi-modal output of intelligent robot system, including command receiver, it generates multi-modal output request according to the multi-modal input information received, described multi-modal input information includes phonetic order, action command and visual information, and multi-modal output label is carried in described multi-modal output request；Configuration file searcher, the described multi-modal output request of its response, and search for the configuration file of current intelligent robot, described configuration file has association identification；Configuration file executor, it, when the match is successful for described multi-modal output label and association identification, calls the configuration file corresponding with this association identification, and performs the multi-modal resource data that described configuration file is corresponding.

Preferably, also including configuration file resolver, it is by scene type data, and the multi-modal resource data corresponding with described scene type data configures according to arranging template, and the multi-modal resource data of preset format is encapsulated as configuration file, and it is stored in specified path.

Preferably, configuration file executor reads the multi-modal resource data being packaged in described configuration file when carrying out multi-modal output, and indicates current intelligent robot to carry out voice, action, multimedia output according to described multi-modal resource data.

Preferably, command receiver receives and comes from the multi-modal output request of multi-modal input information generation resolving through server and issuing, and described multi-modal output request comprises affection data and action data.

Preferably, described configuration file resolver also labelling also stores the multi-modal resource data that current intelligent robot carries, and multi-modal resource data includes speech data, action command data, multi-medium data.

Compared with prior art, the one or more embodiments in such scheme can have the advantage that or beneficial effect:

Generation process by the multi-modal output order of the analysis process and correspondence that separate multi-modal input information, effectively alleviate the calculating pressure of intelligent robot system server end, and it can be avoided that occur that multi-modal output order is inconsistent with the multi-modal resource data of the actual lift-launch of robot, and according to the problem being set for response of multi-modal output order, the efficiency of system cannot be improve.

Other advantages of the present invention, target, to be illustrated in the following description to a certain extent with feature, and to a certain extent, will be apparent to those skilled in the art based on to investigating hereafter, or can be instructed from the practice of the present invention.The target of the present invention and other advantages can be passed through description below, claims, and structure specifically noted in accompanying drawing and realize and obtain.

Accompanying drawing explanation

Accompanying drawing is used for providing being further appreciated by of the technical scheme to the application or prior art, and constitutes a part for description.Wherein, the accompanying drawing expressing the embodiment of the present application is used for explaining the technical scheme of the application together with embodiments herein, but is not intended that the restriction to technical scheme.

Fig. 1 is the schematic flow sheet of method for the multi-modal output of intelligent robot system according to an embodiment of the invention；

Fig. 2 is according to an embodiment of the invention based on the schematic flow sheet that configuration file is the multi-modal output request multi-modal resource data of configuration；

Fig. 3 is the schematic flow sheet calling and performing multi-modal resource data according to an embodiment of the invention；

Fig. 4 is the structural representation of device for the multi-modal output of intelligent robot system according to another embodiment of the present invention.

Detailed description of the invention

Describing embodiments of the present invention in detail below with reference to drawings and Examples, to the present invention, how application technology means solve technical problem whereby, and the process that realizes reaching relevant art effect can fully understand and implement according to this.Each feature in the embodiment of the present application and embodiment, can be combined with each other under not colliding premise, and the technical scheme formed is all within protection scope of the present invention.

In order to alleviate the calculating pressure of server end in existing intelligent robot system, the present invention proposes a kind of method that configuration file based on robot operating system carries out multi-modal output.Concrete, server end only needs the multi-modal input information for user that exports to be analyzed obtaining the analysis result of correspondence, determines that according to analysis result the work of the multi-modal output order of robotic end then transfers to robot this locality to complete.Describe in detail below in conjunction with embodiment.

The configuration file of robot operating system is the properly functioning essential elements for robot operating system, configuration file ensure that robot operating system is in normal course of operation, by the process of profile system, robot operating system is made to work in an orderly manner.

The configuration file of robot operating system is broadly divided into two classes, CONFIG.SYS and resource distribution file.Wherein, CONFIG.SYS is adapted to assist in the setting of developer's completion system rank；

Resource distribution file is for allowing developer indicate the multi-modal resource that robot has wherein, here multi-modal resource generally refers to robot system for realizing the resource required for multi-modal output, including voice data resource, video data resource, view data resource and for allowing the robot to export the programmed instruction resource of certain action.

Fig. 1 is the schematic flow sheet of method for the multi-modal output of intelligent robot system according to an embodiment of the invention, as it can be seen, the method includes:

Step S110, according to receive multi-modal input information generate multi-modal output request, described multi-modal input information includes phonetic order, action command and visual information, and multi-modal output label is carried in described multi-modal output request.

Step S120, responding described multi-modal output request, and search for the configuration file of current intelligent robot, described configuration file has association identification.

Step S130, when the match is successful for described multi-modal output label and association identification, call the configuration file corresponding with this association identification, and perform the multi-modal resource data that described configuration file is corresponding.

The multi-modal input information that intelligent robot system receives can include phonetic order, action command and visual information etc.；

The combination of multi-modal input information is more complicated, it is arranged at the local processor of robot and instruction database cannot analyze reliable or significant result, namely the true intention of the multi-modal information person of sending can not clearly be determined, the output order that can serve to indicate that robot produces multi-modal output can not be drawn, at this moment it is accomplished by sending to remote server multi-modal input information via network, utilize remote server that multi-modal input information is carried out parsing to generate multi-modal output order, again the multi-modal output order obtained is sent back to local machine people via network, the multi-modal resource data multi-modal output order of execution being called robot by local machine people carries out multi-modal output.

From said process it can be seen that owing to the generation of the multi-modal output order of the analysis of multi-modal input information and correspondence is all completed at server end, therefore server end undertake amount of calculation very huge.In addition, the situation of the available resources of the actual lift-launch of local machine people is depended on due to the execution of multi-modal output order, and for needing and the server for multiple client machine people service, which robot concrete has which multimedia resource or programmed instruction can be used to carry out voice output, action output, expression output etc., server end is usually and will not be well understood to, so server likely exports the multi-modal output order that cannot be carried out multi-modal output owing to local machine people does not possess some resource.

In the present embodiment, the generation process of the multi-modal output order to the analysis process of multi-modal input information and correspondence is easily separated, server is only completed the analysis process to multi-modal input information, and the process corresponding to the multi-modal output order of the analysis result of multi-modal input information that generates transfers to client machine people to complete.The calculating pressure alleviating server end is separated by task.

Concrete, server is after completing the analysis to multi-modal input information, can be inferred that the true intention of the user sending above-mentioned multi-modal input information, the multi-modal output request of the affective state of the true intention or expression user that can represent user is sent back client machine people.What client machine people received from server is can represent the true intention of user or represent the communication data of affective state of user, rather than directly can indicate that robot carries out the multi-modal output order of multi-modal output.

For example, user imprudence has smashed the vase liked, the view data that robot is gathered by video sensor identifies user and have input following information, action command input " vase that cleaning is smashed ", facial expression input " sad, sad expression in the eyes ", also identify that user have input following information, phonetic order input " very stupid " simultaneously.

The preprocessed system of above-mentioned action, expression and speech input information is sent to remote server by robotic end after processing, remote server, by action command inputs the comprehensive analysis of information, facial expression input information and speech input information, infers that this user has smashed the vase liked and some self-accusation and sad due to imprudence.

Based on multi-modal output mechanism of the prior art, server may proceed to according to inferring that the intention of user or the affective state of user determine that robot needs the multi-modal output performed.As above example, it is in self-accusation and sad affective state due to user, so server Hui Shi robot performs following multi-modal output order and comforts user, can include, action output order " moves in face of user; reach and pat the other side ", and voice output instruction " has no relations, not sad ".That is, in order to make robotic end be capable of above-mentioned multi-modal output, need to comprise being issued in the multi-modal output order of robotic end by server end, for making robot execution action output " moving in face of user; reach and pat the other side ", and it is used for the complete information making robot perform voice output " having no relations, not sad " etc..

And based on the mechanism that the generation process of the analysis process to multi-modal input information of the embodiment of the present invention and the multi-modal output order of correspondence is separated, in upper example, server the affective state inferring user be due to imprudence smashed the vase liked and some self-accusation and sad after, it is not necessary to further determine that for making robot perform the multi-modal output order of multi-modal output based on the intention of user or the affective state of user again.Namely without determining that robotic end is by action and/or passes through language, also or other modes respond and comfort the affective state of user, the affective state " emo=sad " that server has only to will conclude that is indicated, and sends back robotic end with the form of multi-modal output request.

Further it will be understood that for how utilizing communication data to form multi-modal output request in the present embodiment, do not limit with the meaning of the affective state of the intention or user that represent user.In reality, the concrete method for expressing of multi-modal output request is determined by the concrete implementation model of remote server and the communication protocol of server end and robotic end.

In the present embodiment, server need not generate multi-modal output order according to analyzing result, it is only necessary to exports the affective state with the multi-modal intention exporting the user that request represents or user.Owing to eliminating the process being generated multi-modal output order by server, thus can effectively alleviate the calculating pressure of server end, reduce the construction cost of server.Be conducive to increasing processing speed simultaneously, improve real-time.

Completed to generate based on the analysis result of multi-modal input information what the process of multi-modal output order completed mainly by the configuration file of robot operating system by robotic end.As shown in Figure 2, it is necessary to implement the steps of in robotic end:

Step S210, the respectively different multi-modal output request distribution multi-modal resource data for making robot carry out multi-modal output.

Step S220, utilize configuration template by multi-modal output request and distribute to this multi-modal output request multi-modal resource data write configuration file.

Step S230, the configuration file write is deposited in assigned catalogue, and the data in configuration file are encapsulated as the data form that robot system is preset.

Concrete, robot performs the process of multi-modal output and in fact calls the multi-modal resource data that robot system has the process they exported by different way exactly.For example, if it is desired to make robot output facial expression, then for being provided with the robot of display screen, export facial expression by playing video or display image on the display screen.

Therefore, before being respectively allocated multi-modal resource data for different multi-modal output requests, have to be understood that the situation of the multi-modal resource data that current robot has, and the whole multi-modal resource datas that can use carried by intelligent robot carry out labelling in the resource distribution file of system.As it was previously stated, the resource that oneself robot has, resource distribution file developer can be allowed to know by indicating in configuration file which resource of current robot can be used, it is possible to clear and definite knows what robot can do, it is impossible to what does.

The multi-modal resource data that robot system relates to generally comprises voice data, video data, view data or other multi-medium datas and for controlling the programmed instruction etc. of the motor of driven machine human action.

After distributing multi-modal resource data for multi-modal output request, according to robot operating system provide syntax format and utilize configuration template by multi-modal output request and distribute to this multi-modal output request multi-modal resource data write configuration file.Illustrate in conjunction with different embodiments separately below.

In order to increase the intelligent of robot, wish that robot can not only by reacting its multi-modal input information with the mutual of user, generation that also want to that robot can be spontaneous, actively also expresses self emotion, it is to say, robot can show certain multi-modal behavior on one's own initiative according to self residing different scene.In robotic end, the CONFIG.SYS that can directly utilize operating system that arranges of scene type is arranged.

For example, when robot is in and wakes scene up, the state of robot how is presented.Scene of waking up mentioned here is likely to be triggered by the enabling signal of robot interior, it is also possible to by the time quantum clocked flip with certain simulation cycle of robot interior.If to be the facial expression output waking scene setting " slowly opening eyes " up, then it is likely to need following multi-modal resource data: for utilizing the robot system of display screen output facial expression, it is necessary to the video resource data with the display picture of " slowly opening eyes " played back on the display screen.

CONFIG.SYS can provide formatting configuration template, utilize this configuration template to wake up scene configure time, revise the parameter value of correspondence position accordingly.As by scene type parameter modification for waking scene up, it is " slowly opening eyes " (for show the robot of frequency curtain output facial expression) by the video resource data modification of multi-modal resource distribution, or the program instruction data of multi-modal resource distribution is revised as driver.

It should be noted that above-described embodiment is merely to illustrate method scene type data and the multi-modal resource data corresponding with scene type data being configured according to configuration template, be not intended that the restriction to invention.

It addition, the concrete form that scene class data and the multi-modal resource data corresponding to scene type data present at configuration file is relevant with the form of the configuration file of robot operating system.Such as, the multi-modal resource data corresponding with scene type data of write in configuration file, can also be the information being used for representing the title of the multi-modal resource data of correspondence, or for representing the information of the storage position of multi-modal resource data, or multi-modal resource data relevant information etc. of institute's labelling in resource distribution file, it is not also limited by the present embodiment.

In other embodiments, the provided configuration template of CONFIG.SYS utilizing robot operating system arranges the corresponding multi-modal output of multi-modal output request.For example, after robotic end receives the multi-modal output request sent back by server end, know that the current affective state of user is for " some self-accusation is with sad due to oneself error " by the parsing of multi-modal output request, if to arrange following multi-modal output for robot, " move in face of user, reach and pat the other side " action output, and " have no relations, not sad " voice output etc., then it is likely to need following multi-modal resource data: for the control routine data of the drive motor of driven machine people walking, the motor drive module that this control routine data is sent to correspondence performs.For exporting the text data of voice content, text data are sent to speech production module and generate and export voice, or be used directly for the voice data exported, this voice data is sent to voice output module and directly exports.

Same, the configuration process that CONFIG.SYS is multi-modal output request provides the configuration template formatted, utilize this configuration template to wake up scene configure time, revise the parameter value of correspondence position accordingly.As output request being revised as " self-accusation is with sad ", the audio resource data (robot exported for directly utilizing audio resource data to carry out) of multi-modal resource distribution or textual resources data (for utilizing speech production module to generate the robot of voice output) are revised as " having no relations; not sad ", the program instruction data of multi-modal resource distribution is revised as driver.

It should be noted that above-described embodiment is still merely to illustrate the method multi-modal output request and the multi-modal resource data corresponding with multi-modal output request being configured according to configuration template, be not intended that the restriction to invention.It addition, the form of the configuration file of multi-modal output request and the concrete form presented at configuration file to multi-modal output multi-modal resource data corresponding to request and robot operating system is relevant, it is not also limited by the present embodiment.

In above-described embodiment, the configuration file of intelligent robot operating system is utilized to realize multi-modal output, it is simple to exploitation and functional module integrated.When robot adds new available multi-modal resource data, it is possible to easily realize the extension of systemic-function based on said method.

In above-described embodiment, the multi-modal output of system is configured in robotic end, can based on robotic end the actual multi-modal resource data carried multi-modal output is set, avoid the occurrence of multi-modal output order inconsistent with the multi-modal resource data of the actual lift-launch of robot, and according to the problem being set for response of multi-modal output order, the efficiency of system cannot be improve.

In above-described embodiment, utilize configuration file to achieve the affective state to robot self and the configuration of mental representation, make robot and user mutual more in real time flexibly, design and configuration to robot have more personalization.

It is understood that in another embodiment, it is also possible to be simultaneously based on the residing affective state of robot self and the multi-modal output request sent back from server end that receives arranges the multi-modal output of robot.

For example, when robot is in and wakes scene up, robot is said " playing games " by user, and robotic end knows being intended to of user " providing game " by the parsing of multi-modal output request after receiving the multi-modal output request sent back by server end.nullAnd be now in due to robot and wake up in scene and cannot ask to respond to the output of user，If multi-modal output request is arranged following multi-modal output for being in the robot waking up in scene: export for the facial expression waking scene setting " slowly opening eyes " up，Multi-modal output request for user arranges the voice output of " please first listening one section of music " and the audio frequency output of " playing a section audio "，Then need the multi-modal resource data that configuration is following: the video resource data (for utilizing the robot of display screen output facial expression) with the display picture of " slowly opening eyes " maybe can drive the control routine data (for utilizing the robot of bionic human face output facial expression) of the motor of bionic human face action、For exporting the textual resources data (for utilizing speech production module to generate the robot of voice output) of voice content or audio resource data (robot exported for directly utilizing audio resource data to carry out) and the audio resource data that utilize audio player to broadcast.

Utilize the method that above-mentioned resource data is configured by configuration template to may refer to previous embodiment, repeat no more herein.If through after a while, robot wakes up after scene terminates, it is possible to what continue through configuration file is set to other multi-modal resource data of robot to respond the multi-modal output request of user.

It addition, the output port of multi-modal resource data can also be arranged in configuration file simultaneously.As, in this example, having the audio resource data for voice output, also have the audio resource data for playing music.When writing configuration file, can based on profile template, in position place or the form with certain instruction are indicated, the above-mentioned audio resource data for voice output export voice output module, and the audio resource data for playing music export to audio player.

In this embodiment, owing to being in robotic end, multi-modal output request is responded, and multi-modal output order is configured, based on the affective state of robot self, multi-modal output request can be made more expression, more abundant interactive information can be provided, improve the experience of man-machine interaction, though robot and user mutual more in real time flexibly, design and configuration to robot have more personalization.

After writing configuration file, in the catalogue that system of being deposited in by configuration file is specified, and the data in configuration file are encapsulated as the operable default data form of robot operating system for subsequent calls.

It is understandable that, the above-mentioned process utilizing profile template that multi-modal output order is configured can have been implemented in advance by developer, therefore, after robot system receives the multi-modal output request sent back from server end, multi-modal output is carried out by the relevant multi-modal resource data having been written in configuration file can be directly invoked.

Concrete, the multi-modal output request of intelligent robot response, and search for the configuration file of current system, when multi-modal output request matches with the configuration information in configuration file, call and perform the multi-modal resource data corresponding with configuration information.

When multi-modal output request is mated with the configuration information in configuration file, it is possible to adopt the mode compared according to multi-modal output label and association identification labelling.

Multi-modal output label can be used in representing one group of data of the theme of multi-modal output request.Multi-modal output label can generated being simultaneously generated of multi-modal output request by server, it is also possible to is analyzed obtaining to the multi-modal output request received by client machine people.The multi-modal output label of Portable belt after multi-modal output request generation.

Association identification is the one group data corresponding with multi-modal output label.Association identification can be the backup of the content of multi-modal output label, it is also possible to by obtaining after multi-modal output label is carried out certain computing.Association identification corresponding to this configuration file is set when writing configuration file, for comparing with multi-modal output label.

When the match is successful for the multi-modal output label entrained by multi-modal output request and the association identification in configuration file, call the configuration file corresponding with this association identification, and perform the multi-modal resource data that configuration file is corresponding.Call and perform the process of multi-modal resource data as it is shown on figure 3, include:

Step S310, reading are packaged in the multi-modal resource data in configuration file.

Step S320, according to multi-modal resource data indicate current intelligent robot carry out voice, action, multimedia output.

Intactly illustrate that according to the examples below the multi-modal input information of user is responded by intelligent robot system, and carry out the process of multi-modal output.

Such as precedent, the multi-modal input information of user includes, action command input " vase that cleaning is smashed ", facial expression input " sad, sad expression in the eyes ", phonetic order input " very stupid ".The multi-modal output request for representing the current affective state of user being analyzed above-mentioned multi-modal input information obtaining via server includes the description of similar contents such as " some self-accusation are with sad due to oneself error ".

Robotic end receives above-mentioned multi-modal output request, after multi-modal output request is analyzed, extracts and obtains this multi-modal entrained multi-modal output label of request that exports for " self-accusation is with sad ".

Further, configuration file is scanned for by robotic end according to multi-modal output label, when searching the association identification matched with this multi-modal output label, calls the configuration file corresponding with this association identification from the store path specified.The output port given tacit consent to according to multi-modal resource data or export according to the output port specified in configuration file.Such as controlling the routine data of motor movement, it is sent to different motor drive modules according to the output port specified in configuration file and performs.

Fig. 4 is the structural representation of device for the multi-modal output of intelligent robot system according to another embodiment of the present invention, and this device 40 includes:

Command receiver 41, it generates multi-modal output request according to the multi-modal input information received, and described multi-modal input information includes phonetic order, action command and visual information, and multi-modal output label is carried in described multi-modal output request.

Configuration file searcher 42, the described multi-modal output request of its response, and search for the configuration file of current intelligent robot, described configuration file has association identification.

Configuration file executor 43, it, when the match is successful for described multi-modal output label and association identification, calls the configuration file corresponding with this association identification, and performs the multi-modal resource data that described configuration file is corresponding.

In addition, configuration file resolver it is additionally provided with inside intelligent robot system, it is by scene type data, and the multi-modal resource data corresponding with scene type data configures according to arranging template, and the multi-modal resource data of preset format is encapsulated as configuration file, and it is stored in specified path.

Above-mentioned each functional module performs corresponding function according to the method for previous embodiment, repeats no more.

Although the embodiment that disclosed herein is as above, but described content is only to facilitate the embodiment understanding the present invention and adopt, is not limited to the present invention.Technical staff in any the technical field of the invention; under the premise without departing from the spirit and scope that disclosed herein; any amendment and change can be done in the formal and details implemented; but the scope of patent protection of the present invention, still must be as the criterion with the scope that appending claims defines.

Claims

1. for a method for the multi-modal output of intelligent robot system, including:

Generating multi-modal output request according to the multi-modal input information received, described multi-modal input information includes phonetic order, action command and visual information, and multi-modal output label is carried in described multi-modal output request；

Responding described multi-modal output to ask, and search for the configuration file of current intelligent robot, described configuration file has association identification；

When the match is successful for described multi-modal output label and association identification, call the configuration file corresponding with this association identification, and perform the multi-modal resource data that described configuration file is corresponding.

2. method according to claim 1, it is characterised in that also include, in robot operating system:

By scene type data, and the multi-modal resource data corresponding with described scene type data configures according to arranging template；

The multi-modal resource data of preset format is encapsulated as configuration file, and is stored in specified path.

3. method according to claim 1 and 2, it is characterised in that call and perform the multi-modal resource data corresponding with this association identification, including:

Read the multi-modal resource data being packaged in described configuration file；

Current intelligent robot is indicated to carry out voice, action, multimedia output according to described multi-modal resource data.

4. method according to claim 1, it is characterised in that generate multi-modal output request according to the multi-modal input information received, including:

Receiving and come from the multi-modal output request of multi-modal input information generation resolving through server and issuing, described multi-modal output request comprises affection data and action data.

5. method according to claim 2, it is characterised in that also include: labelling also stores the multi-modal resource data that current intelligent robot carries, and described multi-modal resource data includes speech data, action command data, multi-medium data.

6. for a device for the multi-modal output of intelligent robot system, including:

Command receiver, it generates multi-modal output request according to the multi-modal input information received, and described multi-modal input information includes phonetic order, action command and visual information, and multi-modal output label is carried in described multi-modal output request；

Configuration file searcher, the described multi-modal output request of its response, and search for the configuration file of current intelligent robot, described configuration file has association identification；

Configuration file executor, it, when the match is successful for described multi-modal output label and association identification, calls the configuration file corresponding with this association identification, and performs the multi-modal resource data that described configuration file is corresponding.

7. device according to claim 6, it is characterized in that, also include configuration file resolver, it is by scene type data, and the multi-modal resource data corresponding with described scene type data configures according to arranging template, and the multi-modal resource data of preset format is encapsulated as configuration file, and it is stored in specified path.

8. the device according to claim 6 or 7, it is characterized in that, described configuration file executor reads the multi-modal resource data being packaged in described configuration file when carrying out multi-modal output, and indicates current intelligent robot to carry out voice, action, multimedia output according to described multi-modal resource data.

9. device according to claim 6, it is characterised in that described command receiver receives and comes from the multi-modal output request of multi-modal input information generation resolving through server and issuing, and described multi-modal output request comprises affection data and action data.

10. device according to claim 7, it is characterised in that described configuration file resolver also labelling also stores the multi-modal resource data that current intelligent robot carries, and described multi-modal resource data includes speech data, action command data, multi-medium data.