CN106095109B

CN106095109B - The method for carrying out robot on-line teaching based on gesture and voice

Info

Publication number: CN106095109B
Application number: CN201610459874.2A
Authority: CN
Inventors: 杜广龙; 邵亨康; 陈燕娇; 林思洁; 姜思君; 黄凯鹏; 叶玉琦; 雷颖仪; 张平
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2016-06-20
Filing date: 2016-06-20
Publication date: 2019-05-14
Anticipated expiration: 2036-06-20
Also published as: CN106095109A

Abstract

The present invention discloses the method for carrying out robot on-line teaching based on gesture and voice, comprising the following steps: S1, the coarse tuning process based on gesture；S2, voice-based trim process；S3, robot teaching is carried out in conjunction with gesture and voice.The invention proposes a kind of methods of robot on-line teaching, include gesture teaching and voice teaching, and operator can complete corresponding movement in such a way that the coarse adjustment of gesture and the fine tuning of voice be combined with each other come guidance machine people.Compared to existing technology, it appears more naturally, flexibly, convenient, easily manipulation, operator may not necessarily be placed on energy and how to operate in robot, and can complete certain tasks with wholesouled Manipulation of the machine people.

Description

The method for carrying out robot on-line teaching based on gesture and voice

Technical field

The invention belongs to robot teaching fields, relate to a kind of man-machine interaction method based on gesture and voice.

Background technique

With the continuous development of robot technology, people are more and more deep to the research of robot, want to intellectual technology Ask higher and higher.As forefathers and the intelligent interaction of robot and the hot spot direction for becoming robot research that cooperates, and robot Teaching playback technology is the basis of man-machine collaboration again.Robot teaching reproducing technology refers to that operator in advance wants robot displaying The operation and task of completion, robot is by learning and Memory, then reappears these operations.The starting of external teaching playback technology compared with It is early, and many research achievements have been achieved, such as: wearing metallic framework teaching passes through wise robot teaching etc..Although domestic Teaching technology reproducing technology is started late, but development is very rapid, is carried out in addition to traditional by control stick or teaching box Teaching, there are also off-line teachings and Virtual Demonstration, however these methods all rely on specific environment, have certain limitation.

This invention proposes a kind of robot teaching reproducing technology based on three-dimension gesture and speech recognition, which has Very high flexibility and untethered property.Wherein gesture teaching use Leap Motion sensor obtain gesture position data with And the attitude data of gesture, the technology also refer to mixing Kalman filtering and particle filter method to the gesture data of acquisition into Row processing and optimization.By gesture teaching, robot can get operator and it is expected the location information that is reached of robot and fast Speed is moved to the position.Voice teaching will use Microsoft Speech SDK and identify to voice, and the natural language of people is turned The instruction that robot can identify is turned to, the control instruction corpus having built up is recycled to execute these instructions.Of the invention Teaching playback technology is to carry out teaching to robot based on gesture and speech recognition, and robot utilizes the identification and acquisition pair of gesture Operating position carries out coarse positioning, then speech recognition is recycled to carry out finely positioning to operating position, further according to the language recognized The other content of sound carries out relevant operation.The technology has very high flexibility, Accuracy and high efficiency, and operator only needs to make With the speech gestures instruction being very natural, robot just very can rapidly and accurately complete the online task of teaching.Therefore, of the invention Think that the teaching playback technology based on three-dimension gesture and speech recognition will be the inevitable choice of the following intelligent robot development, And intelligent human-machine interaction technology very will be effectively pushed to develop toward higher level.

Summary of the invention

This invention proposes a kind of method of robot on-line teaching, includes gesture teaching and voice teaching, operation Person can complete corresponding movement in such a way that the coarse adjustment of gesture and the fine tuning of voice be combined with each other come guidance machine people.

The present invention is based on the methods that gesture and voice carry out robot on-line teaching, include the following steps:

S1, the coarse tuning process based on gesture；

S2, voice-based trim process；

S3, robot teaching is carried out in conjunction with gesture and voice.

The step S1 the following steps are included:

The gesture of people has the characteristics that nature, intuitive, flexible, operator can easily be indicated by gesture very much from Oneself intention, by it with obviously having good advantage on robot teaching, gesture is used to carry out coarse adjustment operation to robot. Operator can directly control robot motion by gesture motion, and the data such as hand gesture location and direction can be by Leap Motion is acquired, and can control robot motion after handling gesture position and direction these data.

1) gesture coordinate system

Leap Motion is the capture that the data such as hand gesture location and direction are carried out by a gesture tracking system, is Coordinate system there are three being set in system.

1: world coordinate system X_WY_WZ_W

2:Leap Motion coordinate system X_LY_LZ_L

3: palm coordinate system X_HY_HZ_H

By palm coordinate system X_HY_HZ_HTo Leap Motion coordinate system X_LY_LZ_LTransformation can represent hand gesture location.Assuming that Leap Motion coordinate system X_HY_HZ_HWith palm coordinate system X_LY_LZ_LIt is respectively φ, θ, ψ in the direction of rotation of X-axis, Y-axis, Z axis, then These rotation angles (φ, θ, ψ) can represent gestures direction.

2) location estimation is carried out by interval Kalman filtering

The hand gesture location got by Leap Motion is inaccurate, it is possible that handshaking equal error, right in this way Robot on-line teaching can have a great impact, although Kalman filtering can be used in location estimation, in some environment phases In the case where complexity, the obtained data of system are uncertain, carry out the mistake of location estimation appearance to object with Kalman filtering It is poor possible bigger, it can solve this problem using interval Kalman filtering.

The model of interval Kalman filtering is expressed as follows:

HereIt is the state vector of n × 1 at k moment,It is the state transition matrix of n × n,It is n × l control output square Battle array,It is the input vector of l × 1,WithRepresent noisy vector；It is the measurement vector of m × 1 at k moment,It is m × n Observation matrix.Wherein

By being spaced the location estimation of Kalman filtering, the present invention can be by gesture state x'_kIt is indicated in the state of moment k It is as follows:

x′_k=[p_x,k,V_x,k,A_x,k,p_y,k,V_y,k,A_y,k,p_z,k,V_z,k,A_z,k] (3)

In this process, noisy vector indicates are as follows:

w'_k=[0,0, w'_x,0,0,w'_y,0,0,w_z]^T (4)

Wherein (w'_x,w'_y,w_z) be palm acceleration process noise.By being spaced the fused gesture of Kalman filtering Position data can be used to carry out robot coarse adjustment control operation with regard to more accurate.

3) Attitude estimation is carried out by improved particle filter

Quaternion Algorithm can be used to carry out the estimation in rigid body direction, use brought by Quaternion Algorithm accidentally to reduce Difference enhances data fusion using improved particle filter.In t_kMoment, the approximation of rear portion density is defined as:

WhereinIt is t_kI-th of state particle at moment, N is number of samples,It is t_kThe standard of i-th of particle at moment Weight, δ () are Dirac functions.

Therefore, the particle analyzed can be calculated as follows:

In t_k+1The quaternion components of moment each particle can be expressed as follows:

Wherein ω_axis,kIt is angular speed, t is sample time.Gesture posture is estimated by improved particle filter method Meter, the very big raising that accuracy also obtains, therefore can also be used to carry out robot coarse adjustment control operation.

The step S2 the following steps are included:

Voice has naturally, easy, easy the features such as manipulating.Operator directly controls robot by voice command will Become simple direct, voice is used to be finely adjusted operation to robot.The present invention will carry out language using Microsoft Speech SDK The acquisition of sound, when user issues phonetic order, Microsoft Speech SDK extracts keyword from voice input, and voice is believed Breath is converted into natural language text, then handles the natural language text information, and the user wherein contained is intended to turn Robot control instruction is turned to, robot control instruction is finally converted to corresponding robot manipulation, last robot is complete At this operation.But therefore the process can be divided into four-stage: voice input, speech recognition, it is intended that understand, complete operation.Its It is also the part of the invention next to be discussed that it is most important that middle speech recognition and intention, which understand,.Carrying out robot teaching reproduction The preceding present invention can be pre-designed a set of perfect control command system and corresponding phonetic control command corpus.Due to this Invention research is the robot teaching reproducing technology based on three-dimension gesture and speech recognition, so the control that the present invention designs refers to It enables in corpus in addition to there is phonetic control command, also to there is gesture control instruction.Equally when designing voice control command system, Gesture instruction may be accompanied by while assigning phonetic order in view of operator, therefore the present invention will use five parameters (C_dir,C_opt,C_hand,C_val,C_unit) instruction identified.When operator assigns voice command to robot, robot First determine whether the phonetic order includes gesture instruction, if including C_handIt is set as 1, switchs to execute gesture instruction, if not wrapping Contain, C_handIt is set as NULL, voice is identified, the direction in acquisition instruction sentence, operation, characteristic value, the parameters such as unit are simultaneously Carry out respective operations.

After to speech recognition, into intention comprehension portion.The part mainly by natural language it is instruction morphing for pair The robot control instruction answered.Before carrying out understanding conversion to the natural language just identified instruction, the present invention will have one Maximum entropy disaggregated model, the present invention first text feature is extracted from training corpus, then using TF-IDF to text feature into The weighting of row feature vector, is Text eigenvector by text representation, has n word to be indicated as n dimensional feature vector.Then using most Big entropy algorithm models Text eigenvector with the corresponding conditional probability for being intended to output label, obtains being distributed most uniformly Model, utilize formula:

Maximum entropy probability distribution is obtained, to complete maximum entropy modeling.Wherein, f_i(x, y) is ith feature function, if Text vector is existing in the same sample with corresponding output label, then f_i(x, y) is equal to 1, is otherwise 0.λ_iFor f_i(x, y) is right The weight answered, z (x) are normalization factor.After maximum entropy disaggregated model is established, natural language to be tested can be instructed Convert.Text feature is first extracted from text to be tested, then with method mentioned above by text representation is text Then feature vector classifies to Text eigenvector using established maximum entropy disaggregated model, finally obtains robot Control instruction.

There are two types of modeling patterns: unified model attributes and independent attribute modeling.Unified model attributes refer to all properties It is combined into an instruction, and maximum entropy modeling is carried out to the instruction, then text is tested.Independent attribute modeling refers to point It is other that maximum entropy modeling is carried out to four attributes, then sample is tested, test result is finally combined into an output order. Unified model attributes can improve the accuracy of model in view of association mutual between each attribute, but such modeling side Method can make number of combinations very huge, become difficult classification very much.Can be many less although independent attribute models number of combinations, It is to lack relevance between attribute, the accuracy of model can reduce very much.Using unified model attributes, to guarantee the accurate of model Degree.By the model established above, accurately identifying for voice may be implemented, accordingly, it is possible to realize by voice to machine The fine tuning of device people controls operation.

The step S3 the following steps are included:

Robot on-line teaching is carried out by gesture and voice two ways, and gesture is responsible for coarse adjustment, and voice is responsible for micro- It adjusts.Voice control is divided into two kinds of orders: controlling order and commanding order, operator can be started by controlling order or The process for terminating robot on-line teaching, can also do between gesture instruction and commanding order and switch.The stream of on-line teaching Journey figure is as shown in Figure 1:

Firstly, operator issues the voice command started, robot is at armed state after receiving order, quasi- at any time It is standby to receive new instruction.Then, operator can set order to gesture control state, and in this condition, operator can To control the movement all around of robot by gesture, the amplitude moved at this time is bigger, thus gesture be easier into Row control.This is known as the coarse tuning process of robot.

However, in some cases, the movement range that robot is done is smaller, operator passes through gesture control machine at this time People is relatively difficult, because the gesture of people is not easy to control for small distance, can at this time turn to voice control, operator Voice control command can be switched to by voice, robot is controlled by voice and is all around moved, this is known as robot Trim process.

In most cases, the coarse adjustment and fine tuning of robot are bound together.For example, operator's finger one Direction, voice control robot is mobile to that direction, and robot just can read the content of phonetic order at this time, while read hand The direction of gesture meaning, makes correct operation.The present invention carries out whole control with IF-THEN rule herein, if A series of rule has been counted to realize the combination between gesture instruction and phonetic order.

Finally, after operation terminates, operator issues the order terminated, and robot will terminate corresponding operation, whole A teaching process just completes.

The mode that this gesture and voice be combined with each other controls robot motion, and naturality and flexibility are all very strong, Convenient for operation.

The present invention has the following advantages and effects with respect to the prior art:

The invention proposes a kind of methods of robot on-line teaching, include gesture teaching and voice teaching, operator Corresponding movement can be completed come guidance machine people in such a way that the coarse adjustment of gesture and the fine tuning of voice be combined with each other.Compared to existing Some technologies, it appears more naturally, flexibly, convenient, easily manipulation, operator may not necessarily be placed on energy how to operate machine On people, and certain tasks can be completed with wholesouled Manipulation of the machine people.

Detailed description of the invention

Fig. 1 is the flow chart of on-line teaching.

Specific embodiment

Below with reference to embodiment, the present invention is described in further detail, and embodiments of the present invention are not limited thereto.

Embodiment:

The present invention is based on gestures and voice to include the following steps: to carry out robot on-line teaching

S1, the coarse tuning process based on gesture；

S2, voice-based trim process；

S3, robot teaching is carried out in conjunction with gesture and voice.

The step S1 the following steps are included:

1) gesture coordinate system

1: world coordinate system X_WY_WZ_W

2:Leap Motion coordinate system X_LY_LZ_L

3: palm coordinate system X_HY_HZ_H

2) location estimation is carried out by interval Kalman filtering

The model of interval Kalman filtering is expressed as follows:

x′_k=[p_x,k,V_x,k,A_x,k,p_y,k,V_y,k,A_y,k,p_z,k,V_z,k,A_z,k] (3)

In this process, noisy vector indicates are as follows:

w'_k=[0,0, w'_x,0,0,w'_y,0,0,w_z]^T (4)

Wherein (w'_x,w'_y,w_z) be palm acceleration process noise.So the observing matrix of location estimation can be determined Justice is as follows:

It is available by interval Kalman filtering, the covariance of model error and observation error are as follows:

Wherein Δ Q_tWith Δ R_tIt is non-negative constant matrices.By being spaced the fused hand gesture location data of Kalman filtering With regard to more accurate, can be used to carry out robot coarse adjustment control operation.

3) Attitude estimation is carried out by improved particle filter

WhereinIt is t_kI-th of state particle at moment, N is number of samples,It is t_kThe standard of i-th of particle at moment is weighed Weight, δ () is Dirac function.

Therefore, the particle analyzed can be calculated as follows:

The step S2 the following steps are included:

Voice has naturally, easy, easy the features such as manipulating.Operator directly controls robot by voice command will Become simple direct, voice is used to be finely adjusted operation to robot.The present invention will carry out language using Microsoft Speech SDK The acquisition of sound, when user issues phonetic order, Microsoft Speech SDK extracts keyword from voice input, and voice is believed Breath is converted into natural language text, then handles the natural language text information, and the user wherein contained is intended to turn Robot control instruction is turned to, robot control instruction is finally converted to corresponding robot manipulation, last robot is complete At this operation.But therefore the process can be divided into four-stage: voice input, speech recognition, it is intended that understand, complete operation.Its It is also the part of the invention next to be discussed that it is most important that middle speech recognition and intention, which understand,.

The present invention can be pre-designed a set of perfect control command system and opposite before carrying out robot teaching and reproducing The phonetic control command corpus answered.Due to the present invention study be the robot teaching based on three-dimension gesture and speech recognition again Existing technology, so also to have gesture control instruction in addition to there is phonetic control command in the control instruction corpus that the present invention designs. Equally when designing voice control command system, it is contemplated that gesture may be accompanied by while operator assigns phonetic order and referred to It enables, therefore the present invention will use five parameter (C_dir,C_opt,C_hand,C_val,C_unit) instruction identified.As operator couple When voice command is assigned by robot, robot first determines whether the phonetic order includes gesture instruction, if including C_handIt is set as 1, switch to execute gesture instruction, if not including, C_handIt is set as NULL, voice is identified, the direction in acquisition instruction sentence, Operation, characteristic value, the parameters such as unit simultaneously carry out respective operations.

Maximum entropy probability distribution is obtained, to complete maximum entropy modeling.Wherein, f_i(x, y) is ith feature function, if Text vector is existing in the same sample with corresponding output label, then f_i(x, y) is equal to 1, is otherwise 0.λ_iFor f_i(x, y) is right The weight answered, z (x) are normalization factor.It, can natural language to be tested to the present invention after maximum entropy disaggregated model is established Speech instruction convert.The present invention first extracts text feature from text to be tested, then will be literary with method mentioned above Originally it is expressed as Text eigenvector, is then classified using established maximum entropy disaggregated model to Text eigenvector, most After obtain robot control instruction.

There are two types of modeling patterns: unified model attributes and independent attribute modeling.Unified model attributes refer to all properties It is combined into an instruction, and maximum entropy modeling is carried out to the instruction, then text is tested.Independent attribute modeling refers to point It is other that maximum entropy modeling is carried out to four attributes, then sample is tested, test result is finally combined into an output order. Unified model attributes can improve the accuracy of model in view of association mutual between each attribute, but such modeling side Method can make number of combinations very huge, become difficult classification very much.Can be many less although independent attribute models number of combinations, It is to lack relevance between attribute, the accuracy of model can reduce very much.The present invention will will use unified model attributes, to guarantee The accuracy of model.By the model established above, accurately identifying for voice may be implemented, accordingly, it is possible to pass through voice To realize the fine tuning control operation to robot.

The step S3 the following steps are included:

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent displacement

Mode is included within the scope of the present invention.

Claims

1. the method for carrying out robot on-line teaching based on gesture and voice, which comprises the following steps:

S1, the coarse tuning process based on gesture；

S2, voice-based trim process；

S3, robot teaching is carried out in conjunction with gesture and voice；The step S1 includes:

1) gesture coordinate system

Leap Motion is the capture that hand gesture location and bearing data are carried out by a gesture tracking system, is provided with Three coordinate systems,

1: world coordinate system X_WY_WZ_W

2:Leap Motion coordinate system X_LY_LZ_L

3: palm coordinate system X_HY_HZ_H；

By palm coordinate system X_HY_HZ_HTo Leap Motion coordinate system X_LY_LZ_LTransformation can represent hand gesture location；Assuming that LeapMotion coordinate system X_HY_HZ_HWith palm coordinate system X_LY_LZ_LIt is respectively φ, θ, ψ in the direction of rotation of X-axis, Y-axis, Z axis, then These rotation angles (φ, θ, ψ) can represent gestures direction；

2) location estimation is carried out by interval Kalman filtering

The hand gesture location got by Leap Motion is inaccurate, solves this problem using interval Kalman filtering, It is expressed as follows every the model of Kalman filtering:

HereIt is the state vector of n × 1 at k moment,It is the state transition matrix of n × n,It is n × l control output matrix, It is l × 1

Input vector,WithRepresent noisy vector；It is the measurement vector of m × 1 at k moment,It is the observation matrix of m × n； Wherein

By being spaced the location estimation of Kalman filtering, by gesture state x'_kIt is expressed as follows in the state of moment k:

x′_k=[p_x,k,V_x,k,A_x,k,p_y,k,V_y,k,A_y,k,p_z,k,V_z,k,A_z,k] (3)

In this process, noisy vector indicates are as follows:

w'_k=[0,0, w'_x,0,0,w'_y,0,0,w_z]^T (4)

Wherein (w'_x,w'_y,w_z) be palm acceleration process noise；By being spaced the fused hand gesture location of Kalman filtering Data can be used to carry out robot coarse adjustment control operation with regard to more accurate；

3) Attitude estimation is carried out by improved particle filter

Quaternion Algorithm can be used to carry out the estimation in rigid body direction, enhance data fusion using improved particle filter；In t_kWhen It carves, the approximation of rear portion density is defined as:

WhereinIt is t_kI-th of state particle at moment, N is number of samples,It is t_kThe criteria weights of i-th of particle at moment, δ () is Dirac function；

Therefore, the particle analyzed can be calculated as follows:

Wherein ω_axis,kIt is angular speed, t is sample time.

2. the method according to claim 1 for carrying out robot on-line teaching based on gesture and voice, which is characterized in that step Suddenly S2 includes:

Operator directly controls robot by voice command, and voice is used to be finely adjusted operation to robot；Use Microsoft Speech SDK carries out the acquisition of voice, and when user issues phonetic order, Microsoft Speech SDK is mentioned from voice input Keyword is taken out, converts natural language text for voice messaging, then handle the natural language text information, it will wherein The user contained is intended to be converted into robot control instruction, and robot control instruction is finally converted to corresponding robot Operation, last robot complete this operation；But therefore the process can be divided into four-stage: voice input, speech recognition, it is intended that Understand, completes operation；

Control command system and corresponding phonetic control command corpus are pre-designed before carrying out robot teaching reproduction； In addition to there is phonetic control command in control instruction corpus, also there is gesture control instruction；Equally in design voice control command When system, it is contemplated that gesture instruction may be accompanied by while operator assigns phonetic order, therefore use five parameters (C_dir,C_opt,C_hand,C_val,C_unit) instruction identified；When operator assigns voice command to robot, robot First determine whether the phonetic order includes gesture instruction, if including C_handIt is set as 1, switchs to execute gesture instruction, if not wrapping Contain, C_handIt is set as NULL, voice is identified, the direction in acquisition instruction sentence operates, and characteristic value, unit parameter is gone forward side by side Row respective operations；

After to speech recognition, into intention comprehension portion, instruction morphing control for corresponding robot of natural language is referred to It enables；Before carrying out understanding conversion to the natural language just identified instruction, there is a maximum entropy disaggregated model, first from training Text feature is extracted in corpus, feature vector weighting then is carried out to text feature using TF-IDF, is text by text representation Eigen vector has n word to be indicated as n dimensional feature vector；Then utilize maximum entropy algorithm, to Text eigenvector with it is corresponding The conditional probability of intention output label modeled, obtain being distributed most uniform model, utilize formula:

Maximum entropy probability distribution is obtained, to complete maximum entropy modeling；Wherein, f_i(x, y) be ith feature function, if text to Amount is existing in the same sample with corresponding output label, then f_i(x, y) is equal to 1, is otherwise 0；λ_iFor f_i(x, y) corresponding power Value, z (x) are normalization factor；After maximum entropy disaggregated model is established, just natural language to be tested instruction is converted ?；Text feature is first extracted from text to be tested, then with method mentioned above by text representation is Text eigenvector, Then classified using established maximum entropy disaggregated model to Text eigenvector, finally obtain robot control instruction；

There are two types of modeling patterns: unified model attributes and independent attribute modeling, unified model attributes, which refer to, combines all properties It is instructed at one, and maximum entropy modeling is carried out to the instruction, then text is tested；It is right respectively that independent attribute modeling refers to Four attributes carry out maximum entropy modeling, then test sample, and test result is finally combined into an output order.

3. the method according to claim 1 for carrying out robot on-line teaching based on gesture and voice, which is characterized in that institute Stating step S3 includes:

Robot on-line teaching is carried out by gesture and voice two ways, and gesture is responsible for coarse adjustment, and voice is responsible for fine tuning；Language Sound control system is divided into two kinds of orders: controlling order and commanding order, and operator can be started or be terminated by controlling order The process of robot on-line teaching can also be done between gesture instruction and commanding order and switch；

Firstly, operator issues the voice command started, robot is at armed state after receiving order, is ready to connect By new instruction；Then, order is set gesture control state by operator, and in this condition, operator can pass through hand Gesture controls the movement all around of robot, and the amplitude moved at this time is bigger, so gesture is easier to be controlled；This The referred to as coarse tuning process of robot；

However, operator is relatively difficult by gesture control robot at this time when the movement range that robot is done is smaller, because It is not easy to control for small distance for the gesture of people, voice control can be at this time turned to, operator can pass through voice Voice control command is switched to, robot is controlled by voice and is all around moved, this is known as the trim process of robot；

In most cases, the coarse adjustment and fine tuning of robot are bound together；Operator's finger a direction, voice It is mobile to that direction to control robot, robot just can read the content of phonetic order at this time, while read gesture meaning Correct operation is made in direction；Whole control is carried out with IF-THEN rule, designs a series of rule to realize hand Combination between gesture instruction and phonetic order；

Finally, after operation terminates, operator issues the order terminated, and robot will terminate corresponding operation, entirely show Journey is taught just to complete.