CN106239506A

CN106239506A - The multi-modal input data processing method of intelligent robot and robot operating system

Info

Publication number: CN106239506A
Application number: CN201610657719.1A
Authority: CN
Inventors: 匡亚明
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2016-08-11
Filing date: 2016-08-11
Publication date: 2016-12-21
Anticipated expiration: 2036-08-11
Also published as: CN106239506B

Abstract

The invention discloses multi-modal input data processing method and the robot operating system of a kind of intelligent robot, intelligent robot is provided with robot operating system, this processing method includes: user view obtaining step, receives and parses through the multi-modal input data of user, obtains user view；Application determines step, determines the application mated with described user view；Application performs to be intended to obtaining step, obtains this application that described multi-modal input data are comprised and performs intention；Performing instruction generation step, perform to be intended to the current application state value with robot operating system with described application, coupling generates and performs instruction；Multi-modal output step, carries out multi-modal output according to this execution instruction.The present invention can quickly call corresponding module and perform instruction associative operation, it is to avoid leak instruction and perform and unnecessary traversal matching process after robot receives the multi-modal input data of user.

Description

The multi-modal input data processing method of intelligent robot and robot operating system

Technical field

The present invention relates to field in intelligent robotics, particularly relate to the multi-modal input data process side of a kind of intelligent robot Method and robot operating system.

Background technology

Along with gradually popularizing of intelligent robot product, family come into by more intelligent robot, becomes the playfellow of child House keeper with adult.

In the prior art, intelligent robot is after receiving the multi-modal input data of user, and operating system will traversal Mating all application instruction to process and receive multi-modal input data, clock mode of operation hit rate is the lowest, the time is long for this, causes Intelligent robot is limited for the processing speed of multi-modal input data.

Therefore, after in order to receive the multi-modal input data of user in robot, it is possible to quickly call corresponding mould Block performs instruction associative operation, it is to avoid leak instruction and performs and unnecessary traversal matching process, need badly provide one for Multi-modal data carries out the method processed, to improve Consumer's Experience.

Summary of the invention

One of the technical problem to be solved is to need to provide a kind of to receive the multi-modal of user in robot After input data, it is possible to quickly call corresponding module and perform instruction associative operation, it is to avoid the leakage to instruction performs and need not The scheme of the traversal matching process wanted.

In order to solve above-mentioned technical problem, embodiments herein provide firstly the multi-modal defeated of a kind of intelligent robot Entering data processing method, described intelligent robot is provided with robot operating system, and this processing method includes: user view obtains Step, receives and parses through the multi-modal input data of user, obtains user view；Application determines step, determines and described user It is intended to the application of coupling；Application performs to be intended to obtaining step, obtains this application execution that described multi-modal input data are comprised It is intended to；Perform instruction generation step, perform to be intended to the current application state value with robot operating system, coupling with described application Generate and perform instruction；Multi-modal output step, carries out multi-modal output according to this execution instruction.

Preferably, in instruction generation step, farther include described execution: the first coupling step, grasp with system respectively Make and action command mates, it is judged that whether the operation of described system and action command can generate and perform intention with described application Perform order accordingly, if the match is successful, generate and perform instruction, otherwise, perform the second coupling step；Second coupling step, root The application searching current application place according to described current application state value processes class, and judges that this application processes whether class can give birth to Become corresponding with described application execution intention and perform order.

Preferably, if the second coupling step failure, then the 3rd coupling step is performed；3rd coupling step, suitable according to weights Whether sequence traversal is mated each application and is processed class and can generate and perform with described application to be intended to corresponding perform order.

Preferably, if the 3rd coupling step failure, then start conversational applications and process described multi-modal input data.

Preferably, described application state value includes applying key APP Key and application execution state Operate State.

On the other hand, additionally provide a kind of robot operating system for processing multi-modal input data, including: user It is intended to acquiring unit, its multi-modal input data being configured to receive and parse through user, obtains user view；Application determines list Unit, it is configured to determine the application mated with described user view；Application performs to be intended to acquiring unit, and it is described that it is configured to acquisition This application that multi-modal input data are comprised performs intention；Performing instruction signal generating unit, it is configured to perform with described application Being intended to the current application state value with robot operating system, coupling generates and performs instruction；Multi-modal output unit, it is configured to Multi-modal output is carried out according to this execution instruction.

Preferably, described perform instruction signal generating unit farther include: first coupling subelement, its be configured to respectively be System operation and action command mate, it is judged that whether the operation of described system and action command can generate and perform with described application It is intended to perform order accordingly, if the match is successful, generates and perform instruction, otherwise, start the second coupling subelement；Second coupling Subelement, its application being configured to search current application place according to described current application state value processes class, and judges to answer Whether can generate perform with described application to be intended to corresponding perform order with processing class.

Preferably, the described instruction signal generating unit that performs also includes: the 3rd coupling subelement, and it is configured at the second coupling Start during units match failure, whether can generate according to weights order traversal coupling each application process class and perform with described application It is intended to perform order accordingly.

Preferably, also including: conversational applications start unit, it is configured to start when it fails to match at the 3rd coupling subelement Conversational applications processes described multi-modal input data.

Compared with prior art, the one or more embodiments in such scheme can have the advantage that or useful effect Really:

The embodiment of the present invention, by utilizing robot operating system to receive and parse through the multi-modal input data of user, obtains User view, determines the application mated with user view and obtains this application that multi-modal input data are comprised and perform meaning Figure, and perform to be intended to the current application state value with robot operating system with application, coupling generates and performs instruction, finally according to This execution instruction carries out multi-modal output, it is possible to after robot receives the multi-modal input data of user, quickly call Corresponding module performs instruction associative operation, it is to avoid leaks instruction and performs and unnecessary traversal matching process.

Other features and advantages of the present invention will illustrate in the following description, and, partly become from description Obtain it is clear that or understand by implementing technical scheme.The purpose of the present invention and other advantages can be passed through Structure specifically noted in description, claims and accompanying drawing and/or flow process realize and obtain.

Accompanying drawing explanation

Accompanying drawing is used for providing being further appreciated by of the technical scheme to the application or prior art, and constitutes description A part.Wherein, the accompanying drawing expressing the embodiment of the present application is used for explaining the technical side of the application together with embodiments herein Case, but it is not intended that the restriction to technical scheme.

Fig. 1 is the knot of the robot operating system 100 for processing multi-modal input data according to the embodiment of the present invention Structure block diagram.

Fig. 2 is the structured flowchart performing instruction signal generating unit 140 according to the embodiment of the present invention.

Fig. 3 is the flow process signal of the multi-modal input data processing method of the intelligent robot according to the embodiment of the present invention Figure.

Fig. 4 is the concrete example flow diagram of the multi-modal input data processing method according to the embodiment of the present invention.

Detailed description of the invention

Describe embodiments of the present invention in detail below with reference to drawings and Examples, whereby how the present invention is applied Technological means solves technical problem, and the process that realizes reaching relevant art effect can fully understand and implement according to this.This Shen Please each feature in embodiment and embodiment, can be combined with each other under not colliding premise, the technical scheme formed All within protection scope of the present invention.

It addition, the step shown in the flow chart of accompanying drawing can be in the computer system of such as one group of computer executable instructions Middle execution.And, although show logical order in flow charts, but in some cases, can be to be different from herein Step shown or described by order execution.

(embodiment)

Fig. 1 is the knot of the robot operating system 100 for processing multi-modal input data according to the embodiment of the present invention Structure block diagram, illustrates each 26S Proteasome Structure and Function of this robot operating system 100 below with reference to Fig. 1.

As it is shown in figure 1, the robot operating system 100 being used for processing multi-modal input data of the present embodiment mainly wraps Include: user view acquiring unit 110, application determine that unit 120, application perform to be intended to acquiring unit 130, execution instruction generates single Unit 140 and multi-modal output unit 150.

User view acquiring unit 110, its multi-modal input data being configured to receive and parse through user, obtain user's meaning Figure.

It should be noted that multi-modal input data mainly include voice data resource, video data resource, view data Resource and for allowing the robot to export certain action or performing software or the programmed instruction resource of hardware.Multi-modal input The combination of data is more complicated, and family is intended to acquiring unit 110 and is analyzed multi-modal input data obtaining reliable or having a mind to The result of justice, determines the true intention of the multi-modal information person of sending.

For example, when user to robot send voice messaging " please dance " time, the robot sound by audio collection Sound data have identified phonetic order " please dance ".Multi-modal defeated to above-mentioned phonetic order of user view acquiring unit 110 Enter data and carry out pretreatment, such as, convert speech information into Word message, after the pre-treatment, by these phonetic orders The comprehensive analysis of input data, infers the user view that this user wants robot to dance.

Application determines unit 120, and it is connected with user view acquiring unit 110, and this application determines that unit 120 is configured to really The fixed application mated with user view.

After user view acquiring unit 110 obtains user view, application determines unit 120 inquiry and this user view Corresponding application.In one example, presetting what a application configuration data base, in this data base, association stores use Family is intended to and application program, and user view associates storage the most in the form of a list with application program, one to one or many-to-one Storage.Such as, when user view includes " dancing " content, corresponding application program is application of dancing, and therefore application determines Unit 120 can be searched list information from application configuration data base and find the application mated with user view.

Application performs to be intended to acquiring unit 130, and it is connected with user view acquiring unit 110, and this application performs intention and obtains Take unit 130 to be configured to obtain this application that multi-modal input data are comprised and perform intention.

Specifically, application performs to be intended to acquiring unit 130 and resolves the multi-modal input data obtained, and analyzes it Included in application perform intention.As above example, owing to robot is by have identified voice in the voice data of audio collection Instruction " please dance ", therefore, application execution intention acquiring unit 130 can obtain the application of " opening application of dancing " and perform meaning Figure.In addition to the application of " opening certain application " performs to be intended to, also include " closing certain application ", " open certain application Certain function " and the application such as certain function of application " close certain " perform intention.

Performing instruction signal generating unit 140, it performs to be intended to acquiring unit 130 and is connected with application, and this execution instruction generates single Unit 140 is configured to perform to be intended to the current application state value with robot operating system with application, and coupling generates and performs instruction.

Concrete as in figure 2 it is shown, perform instruction signal generating unit 140 and farther include: the first 1402, second, subelement of coupling Gamete unit 1404 and the 3rd coupling subelement 1406.These three coupling subelement is respectively to system operation, action command and non- Conversational applications carries out coupling and generates and perform instruction.

First coupling subelement 1402, it is configured to mate with system operation and action command respectively, it is judged that system Whether operation and action command can generate is intended to corresponding perform order with applying execution, if the match is successful, generates execution and refers to Order, otherwise, starts the second coupling subelement 1404.

It should be noted that system operation is the built-in command sent by operating system itself, may relate to software or hard Part.Action command is with the instruction of hardware direct correlation.As a example by above, application execution is intended to " opening dancing application ", with System operation and action command are unrelated, therefore obtain system operation after coupling and action command can not generate and apply and perform meaning Figure performs order accordingly, starts the second coupling subelement 1404.

Second coupling subelement 1404, its be configured to according to current application state value search current application place answer use Reason class, and judge that this application processes class and whether can generate and be intended to corresponding perform order with applying execution.Wherein, application state Value includes applying key APP Key and application execution state Operate State.

It should be noted that in the present embodiment, all application process class is all uniquely identified by affix in advance should Use key AppKey.And apply execution state Operate State to represent the function residing for current application, such as " start/run ", " suspend ".The purpose arranging this second coupling subelement 1404 is it can be considered that the state of current application starts further waits to hold The application of row, so can be according to application context environment, and the most up-to-date parsing is given and some users meaning of robot operating system Figure, it is judged that equipment is presently in the state of application, makes rational instructions match.

Such as, if current application state value is: APP Key=11 (" 11 " correspondence " application of dancing ") and Operate State=on (" on " corresponding " RUN "), illustrating that current application processes class is " application of dancing ", and this application processes class and is in " fortune OK " state, the application owing to parsing in upper example performs to be intended to " opening dancing application ", therefore may determine that current Application processes class and can generate and be intended to corresponding perform order with applying execution, then directly utilizes current dancing application and just may be used To complete to process, it is possible to quickly realize the process to multi-modal input data.

3rd coupling subelement 1406, it is configured to start, according to power when it fails to match at the second coupling subelement 1404 Whether value order traversal mates each application and processes class and can generate and perform with described application to be intended to corresponding perform order.

Occurring that application execution is intended to " opening dancing application ", and current application to process class be non-" application of dancing " and should Application processes class and is in " RUN " state, then it fails to match for the second coupling subelement 1404, now, and the 3rd coupling subelement 1406 Whether can generate according to the weights order traversal coupling each application process class pre-set and perform instruction accordingly.In this example In, class can be processed according to the use frequency of application process class to each application and pre-set weights, such as, music is applied Use frequency the highest, then arranging weights to this application is 10.

In upper example, the 3rd coupling subelement 1406 first determines whether whether " music application " can generate and " open Dance application " application perform be intended to perform order accordingly, due to judged result be can not, the most successively according to weights judgement Application processes class, until traversing " application of dancing ", then utilizing " application of dancing " to generate and performing instruction accordingly.

It should be noted that in order to ensure that application processes the access security of class, the application is to different user setups User right data, the 3rd coupling subelement 1406, before traveling through each application process class, first judges whether this user has all The access rights of application, or, after matching corresponding application, check whether user has this application access rights.

Multi-modal output unit 150, it is configured to carry out multi-modal output according to this execution instruction.

Perform instruction and be referred to as multi-modal output order, perform to instruct such as " continuing to open dancing application " obtaining Time, voice data is persistently sent to voice output module and directly exports dance music, simultaneously action by the most current dancing application Output order " moves in face of user, perform dance movement ", in order to make robotic end be capable of above-mentioned multi-modal output, Multi-modal output order needs comprise for making robot execution action output " move in face of user, perform dancing Action " complete information, particularly for driven machine people walking drive motor control routine data, by this control program Data are sent to the motor drive module of correspondence and perform, and according to the arm of the dance movement Data Control robot arranged and Leg action.

Due to during actual match, it may appear that the first coupling subelement 1402, second mates subelement 1404 and the Three coupling subelement 1406 situations that it fails to match, therefore, in another embodiment, this robot operating system 100 also wraps Include conversational applications start unit 160.

Conversational applications start unit 160, it is connected with the 3rd coupling subelement 1406, is configured to mate subelement the 3rd 1406 start conversational applications when it fails to match processes multi-modal input data.Arranging for application start unit 160 is only this A bright example, if cannot perform required output by coupling, carries out exporting deteriorating comparatively speaking by voice Tupe.

Referring to the flow process in Fig. 3, how intelligent robot operating system 100 is processed multi-modal input data to carry out Substep explanation.It should be noted that need before the application of this embodiment all application are processed class plus uniquely identifying App Key.The present embodiment is the example that a kind of user of parsing asks to be intended to, and can process the order meaning of all robot operating systems Figure, and carry out matching treatment by system operation, action command, non-conversational application respectively with the big class of conversational applications four.

(step S310)

User view acquiring unit 110 receives and parses through the multi-modal input data of user, obtains user view.

(step S320)

Application determines that unit 120 determines the application mated with user view.

(step S330)

Application performs to be intended to acquiring unit 130 and obtains this application that multi-modal input data are comprised and perform intention.

(step S340)

Perform instruction signal generating unit 140 to perform to be intended to the current application state value with robot operating system with application, Join generation and perform instruction.

(step S350)

Multi-modal output unit 150 carries out multi-modal output according to this execution instruction.

Below as a example by " application of dancing ", with reference to Fig. 4 while illustrating how that coupling generation performs instruction and according to being somebody's turn to do Perform instruction and carry out multi-modal output.

After user sends the phonetic order of similar " dancing ", robot operating system is obtained by step S310～S330 The application corresponding with this phonetic order performs intention, then performs to be intended to the current application with robot operating system by this application State value (including App Key and Operate State), as input, proceeds as follows.

First coupling subelement 1402 matching system operation and the action command that instruct in signal generating unit 140 is first carried out, As without Corresponding matching, then the second coupling subelement 1404 processes class according to App Key, inquiry current application place application, utilizes This application process class preferentially resolves application and performs intention.Intention is performed, then the 3rd coupling subelement 1406 as application can not be resolved The startup order of class is processed, it may be judged whether application can be resolved and perform intention, if not having by weights traversal coupling types of applications Be made into term of works, then conversational applications start unit 160 uses " conversational applications " to process this application and performs intention.

If mating above in step, after the match is successful, multi-modal output unit 150 utilizes the execution produced to instruct This decision-making, process related service operation, then according to output protocol encapsulated result and by result export.

In prior art, can be produced by the mode of operation of simple traversal coupling application instruction that hit rate is the lowest, traversal The problem of time length, such as: robot operating system is currently in dancing application, if user's " jumping off of sending that phonetic order is One first dance music " instruction, avoiding the need for traveling through as prior art all according to embodiments of the present invention, application just can be to this language Sound instruction directly processes, time-consuming.

It addition, in the prior art, the most do not consider that application context environment, to judge equipment current state, is made rationally Instructions match.The embodiment of the present invention considers application context environment, it is possible to relatively rapid process multi-modal input data. As: " dancing " instruction sent at present, this instruction will be processed by dancing application, return to the App Key applied that currently dances With the instruction (Operate State) of startup dancing application, dancing will be started according to App Key and Operate State and apply, Afterwards sending phonetic order such as: time " jumping next dance music ", this instruction will be resolved according to the inventive method, now enter Second step matching instruction in this example, it is not necessary to traversal all application.

Those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general calculating Device realizes, and they can concentrate on single calculating device, or is distributed in the network that multiple calculating device is formed On, alternatively, they can realize with calculating the executable program code of device, it is thus possible to be stored in storage Device is performed by calculating device, or they are fabricated to respectively each integrated circuit modules, or by many in them Individual module or step are fabricated to single integrated circuit module and realize.So, the present invention be not restricted to any specific hardware and Software combines.

Although the embodiment that disclosed herein is as above, but described content is only to facilitate understand the present invention and adopt Embodiment, be not limited to the present invention.Technical staff in any the technical field of the invention, without departing from this On the premise of spirit and scope disclosed by invention, in form and any amendment and change can be made in details implement, But the scope of patent protection of the present invention, still must be defined in the range of standard with appending claims.

One of ordinary skill in the art will appreciate that all or part of step realizing in above-described embodiment method is permissible Instructing relevant hardware by program to complete, described program can be stored in a computer read/write memory medium, This program upon execution, including all or part of step of above example, described storage medium, such as: ROM/RAM, magnetic disc, CD etc..

Claims

1. the multi-modal input data processing method of an intelligent robot, it is characterised in that described intelligent robot is provided with Robot operating system, this processing method includes:

User view obtaining step, receives and parses through the multi-modal input data of user, obtains user view；

Application determines step, determines the application mated with described user view；

Application performs to be intended to obtaining step, obtains this application that described multi-modal input data are comprised and performs intention；

Perform instruction generation step, perform to be intended to the current application state value with robot operating system, coupling with described application Generate and perform instruction；

Multi-modal output step, carries out multi-modal output according to this execution instruction.

Multi-modal input data processing method the most according to claim 1, it is characterised in that perform instruction generation described In step, farther include:

First coupling step, mates with system operation and action command respectively, it is judged that the operation of described system and action command Whether can generate and perform with described application to be intended to corresponding perform order, if the match is successful, generate and perform instruction, otherwise, hold Row second mates step；

Second coupling step, the application searching current application place according to described current application state value processes class, and judges to be somebody's turn to do Whether application processes class and can generate and perform with described application to be intended to corresponding perform order.

Multi-modal input data processing method the most according to claim 2, it is characterised in that

If the second coupling step failure, then perform the 3rd coupling step；

Whether the 3rd coupling step, can generate according to weights order traversal coupling each application process class and perform meaning with described application Figure performs order accordingly.

Multi-modal input data processing method the most according to claim 3, it is characterised in that

If the 3rd coupling step failure, then start conversational applications and process described multi-modal input data.

5. according to the multi-modal input data processing method according to any one of Claims 1 to 4, it is characterised in that

Described application state value includes applying key APP Key and application execution state Operate State.

6. the robot operating system being used for processing multi-modal input data, it is characterised in that including:

User view acquiring unit, its multi-modal input data being configured to receive and parse through user, obtain user view；

Application determines unit, and it is configured to determine the application mated with described user view；

Application performs to be intended to acquiring unit, and it is configured to obtain this application that described multi-modal input data are comprised and performs meaning Figure；

Performing instruction signal generating unit, it is configured to perform to be intended to the current application state with robot operating system with described application Value, coupling generates and performs instruction；

Multi-modal output unit, it is configured to carry out multi-modal output according to this execution instruction.

Robot operating system the most according to claim 6, it is characterised in that described execution instruction signal generating unit is further Including:

First coupling subelement, it is configured to mate with system operation and action command respectively, it is judged that described system operates Whether can generate with action command and perform with described application to be intended to corresponding perform order, if the match is successful, generate execution and refer to Order, otherwise, starts the second coupling subelement；

Second coupling subelement, its application being configured to search current application place according to described current application state value processes Class, and judge that this application processes class and whether can generate and perform with described application to be intended to corresponding perform order.

Robot operating system the most according to claim 7, it is characterised in that described execution instruction signal generating unit is also wrapped Include:

3rd coupling subelement, it is configured to start, according to weights order traversal when it fails to match at the second coupling subelement Join each application to process class and whether can generate and perform with described application to be intended to corresponding perform order.

Robot operating system the most according to claim 8, it is characterised in that also include:

Conversational applications start unit, it is configured to start conversational applications at the 3rd coupling subelement when it fails to match and processes described many Mode input data.

10. according to the robot operating system according to any one of claim 6～9, it is characterised in that