CN106055244A - Man-machine interaction method based on Kincet and voice - Google Patents

Man-machine interaction method based on Kincet and voice Download PDF

Info

Publication number
CN106055244A
CN106055244A CN201610306998.7A CN201610306998A CN106055244A CN 106055244 A CN106055244 A CN 106055244A CN 201610306998 A CN201610306998 A CN 201610306998A CN 106055244 A CN106055244 A CN 106055244A
Authority
CN
China
Prior art keywords
coordinate system
point
voice
kincet
man
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610306998.7A
Other languages
Chinese (zh)
Other versions
CN106055244B (en
Inventor
闵华松
齐诗萌
李潇
林云汉
吴凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Science and Engineering WUSE
Wuhan University of Science and Technology WHUST
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN201610306998.7A priority Critical patent/CN106055244B/en
Publication of CN106055244A publication Critical patent/CN106055244A/en
Application granted granted Critical
Publication of CN106055244B publication Critical patent/CN106055244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a man-machine interaction method based on Kincet and voice. The method comprises the steps that (1) a Kinect sensor is used to acquire an accurate space position and attitude information of each object in a scene in a Kinect coordinate system K, so that a target can be detected and recognized; (2) three-dimensional point cloud data is obtained after fusion processing of depth images and RGB which are collected by the Kinect respectively; (3) a space point cloud object is recognized, wherein the three-dimensional point cloud data is processed, so that a semantic description document can be obtained; (4) coordinate conversion is carried out to an object coordinate system O, so that a three-dimensional scene semantic map description document under a coordinate system R can be obtained; (5) a user's voice input is input, and an input signal is processed, so that text information can be obtained; and (6) the text information and an XML semantic map are input into an intelligent inference engine, and the inference engine generates an execution instruction and outputs a reply for the user as well as the text information of guiding information.

Description

A kind of based on Kincet with the man-machine interaction method of voice
Technical field
The present invention relates to robotics, particularly relate to a kind of based on Kincet with the man-machine interaction method of voice.
Background technology
Traditional man-machine interactive system, many employing WIMP interfaces define with window, menu, icon and instruction device as base The graphic user interface of plinth, by button, knob or other touching device input information.This interactive system can only be according to alternately The information that system designer is preset provides limited option to select for people, it is impossible to interacting bulk information with environmental information also needs Want operator to be manually entered, either apply in service link is still manufactured, be required for skilled work personnel behaviour Make.In any case optimize its structure or improve the guidance mode to user, all can only reduce use difficulty and can not really reduce work Make personnel amount and save the purpose of human cost.
Patents is found in literature search: sending out of Application No. CN201511016826.8 disclosed in 23 days March in 2016 Bright patent " a kind of method of human-computer interaction, device and robot ", it is proposed that a kind of mutual side based on voice and image information Method, system can be determined the identity of user by the voice messaging of user and can be judged the input of user by the action of user. The patent of invention " catering service system " of Application No. CN201510658482.4 disclosed in 23 days March in 2016, it is proposed that one Plant and obtain user instruction based on Audio Processing Unit and drawn the man-machine interaction method of customer location by microphone array.
But, above-mentioned patent pertains only to how to obtain user profile by multimedia technology, but cannot be by obtaining scene Information, it is necessary to assure interactive system is used in specific scene, once scene generation large change interactive system will be unable to response or Occur performing mistake.
Summary of the invention
The technical problem to be solved in the present invention is for defect of the prior art, it is provided that a kind of based on Kincet and language The man-machine interaction method of sound.
The technical solution adopted for the present invention to solve the technical problems is: a kind of based on Kincet with the man-machine interaction of voice Method, comprises the following steps:
2) process three dimensional point cloud and ask for the position under K coordinate system;Described coordinate system K is with kinect geometric center For initial point, to be perpendicular to camera lens outwardly direction for Z axis positive direction, with the line in the center of circle of tri-camera lenses of Kincet as X-axis, build Vertical coordinate system;
1) depth image gathered respectively by Kinect and RGB obtain three dimensional point cloud after fusion treatment;
3) spatial point cloud object identification: three dimensional point cloud is carried out process and obtains semantic description file;
4) object coordinates system O is carried out the three-dimensional scenic semanteme map that coordinate transform obtains under coordinate system R and describes file;Thing Body coordinate system O with emerging as initial point in the geometry of a cloud, with cross initial point the longest line segment direction of interior of articles as Z axis, cross initial point The plane being perpendicular to Z axis is exactly X/Y plane;Coordinate system R is with ground as X/Y plane, and the geometric center of mechanical arm base is to put down at XY Being projected as initial point on face, cross initial point and be perpendicular to ground upwards for Z axis positive direction, Y-axis is each parallel to the y-axis of K coordinate system;
5) receive user speech input, input signal is processed, obtains text message;
6) text message and XML semanteme map are inputted intellgence reasoning machine, inference machine produce perform instruction and export to The answer at family and the text message of guidance information.
By such scheme, described step 3) spatial point cloud object identification process includes that pretreatment, key point are extracted, described son Extract, then carry out characteristic matching by object features data base, finally obtain semantic description file.
By such scheme, described step 3) in:
3.1) pretreatment, described pre-treatment step is for filtering the cloud data that range sensor is too far away or too close;
3.2) using ISS algorithm that cloud data is carried out feature point detection, detailed process is as follows:
3.2.1) each some p in inquiry input cloud dataiRadius rframeInterior had a pj, and calculate according to formula 1 Weight;
Wij=1/ | | pi-pj| |, | pi-pj| < rframe (1)
3.2.2) covariance matrix is calculated according to weight according to formula 2
C O V ( p i ) = &Sigma; | p i - p j | < r f r a m e w i j ( p i - p j ) ( p i - p j ) T / &Sigma; | p i - p j | < r f r a m e w i j - - - ( 2 )
3.2.3) eigenvalue of covariance matrix is calculatedAnd eigenvalue is arranged according to descending order Row;
3.2.4) rate threshold γ is set21And γ32, retain and meetWithPoint set, these Point is key feature points;
3.3) Feature Descriptor of key feature points calculates, and concrete grammar is as follows:
First pass through calculate the covariance matrix of point being positioned at key point neighborhood local surfaces build one unique, bright True and stable local referential system LRF, using key point as starting point, rotates local surfaces until LRF sits with object Ox, Oy and the Oz axle alignment of mark system O, so can make a little have rotational invariance;
Then to each axle Ox, Oy, Oz perform following several steps, we using these axles as current axis:
3.3.1) local surfaces rotates around current axis with specified angle;
3.3.2) the local surfaces spot projection rotated is in XY, XZ and YZ plane;
3.3.3) setting up projective distribution matrix, this matrix only shows the quantity of the point that each subdomain comprises, subdomain Quantity represents the dimension of matrix, the same with specified angle it be also the parameter of this algorithm;
3.3.4) distribution matrix centre-to-centre spacing, i.e. μ are calculated11、μ21、μ12、μ22And e;
3.3.5) the value cascade composition subcharacter calculated;
Circulation performs above-mentioned steps, and iterations depends on the number of the rotation given;Finally, by the son of different coordinate axess Feature cascade forms final RoPS and describes son;
3.4) eigenvalue coupling, concrete grammar is as follows:
This patent uses characteristic matching method based on threshold values, under match pattern based on threshold value, if two describe son Between distance less than set threshold value, then show that two features are unanimously mated.
The range formula that threshold values is used is that (cluster is described son by multiple to the difference between two objects clusters of sign Set is constituted), the geometric center of i.e. two set is plus the manhatton distance sum such as formula 3 of the standard deviation of they every dimension With formula 5:
D (A, B)=L1(CA,CB)+L1(stdA,stdB) (3)
Wherein, D (A, B) represents the range difference of two i.e. A and B of object cluster, CA(i),CBI () is respectively A, B dimension Center, L1 represents manhatton distance formula, stdAI () represents the standard deviation of cluster A dimension, stdBI () represents cluster The standard deviation of B dimension;
std A ( i ) = 1 | A | &Sigma; j = 1 | A | ( a j ( i ) - C A ( i ) ) 2 , i = 1 , ... , n - - - ( 4 )
Two L describing sub-a and b1Distance is as follows:
L 1 ( a , b ) = &Sigma; i = 1 n | a ( i ) - b ( i ) | - - - ( 5 ) ,
Wherein, n representative feature describes the size of son, the i.e. dimension 135 of RoPS;
ajI () represents the RoPS of jth key point in A cluster and describes the value of the i dimension of son;
| A | represents the quantity of key point in cluster A;
| B | represents the quantity of key point in cluster B.
By such scheme, described step 4) in, specific as follows: to choose suitable position placement machine mechanical arm, set up coordinate system R, coordinate system K initial point coordinate in coordinate system R is that (d, l h), utilize PCA method to set up object coordinates system O, process coordinate system O arrives twice coordinate system transformation of coordinate system R again and draws the attitude of object to coordinate system K;Coordinate from coordinate system K carries out coordinate change Get the attitude information under R coordinate system in return, obtain the posture information that under R coordinate system, semantic description file is corresponding, reproduction XML language Free burial ground for the destitute figure.
By such scheme, described step 5) speech recognition process specifically includes following steps:
5.1) pretreatment: gather user speech information by microphone array, at the primary speech signal to input Reason, filter out unessential information therein and background noise, the end-point detection of lang tone signal of going forward side by side, voice framing and Preemphasis processes;
5.2) feature extraction: the key characterization parameter extracting reflection phonic signal character forms feature vector sequence;
5.3) HMM (HMM) is used to carry out acoustic model modeling, by be identified during identifying Voice mates with acoustic model, thus obtains recognition result;
5.4) training text data base is carried out grammer, semantic analysis, through obtaining N-Gram based on statistical model training Language model, thus improve discrimination, reduce hunting zone.
5.5) for the voice signal of input, set up according to own trained good HMM acoustic model, language model and dictionary One identifies network, finds an optimal paths according to searching algorithm in the network, and this path is exactly can be with maximum The word string of this voice signal of probability output, so that it is determined that the word that this speech samples is comprised.
The beneficial effect comprise that: by identifying that the position of object solves conventional automated equipment, product position Put the shortcoming that restriction scope is the least;Voice can have been applied with the combination of object location information in service occupation simultaneously;
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing:
Fig. 1 is kinect sensor model;And K coordinate system schematic diagram;
Fig. 2 is K coordinate system and the schematic diagram of ground contrast;
Fig. 3 is object identification overall flow figure;
Fig. 4 is Feature Descriptor flow chart;
Fig. 5 is the relation schematic diagram of K coordinate system and R coordinate system;
Fig. 6 is that object pose asks for overall flow figure;
Fig. 7 is interactive voice overall flow figure;
Fig. 8 is system entire block diagram.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with embodiment, to the present invention It is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not used to limit Determine the present invention.
As it is shown in figure 1, a kind of based on Kincet with the man-machine interaction method of voice, including following two part:
Part I scene interactivity, including following steps:
Step one, correctly lay Kinect, set up K coordinate system;
Kinect is placed on the right opposite of object, and Kinect investigative range is 1.8~3.6 meters, horizontal field of view be 53 ° vertical The visual field is 47 °, object to should be ensured that the object of ornaments within the scope of guarantee that Kinect can correctly gather data.Then such as figure Set up coordinate system K with the center of kinect as initial point, Kinect and ground plane relationship such as Fig. 2, wherein z-axis and horizontal plane shown in 1 Angle be θ.
Step 2, Kinect sensor complete object detection and recognition;
Kinect sampling depth image and RGB respectively obtain three dimensional point cloud after fusion treatment;
First pass around pretreatment and filter the cloud data that range sensor is too far away or too close, so can effectively reduce calculating Cost, improves processing speed, improves system real time.
After pretreatment, ISS algorithm is selected to carry out feature point detection.Then to detecting that characteristic point is with S/C-RoPS algorithm Carry out feature description.Carry out characteristic matching by object features data base again and must set out object to semantic description file.
Cloud data collecting flowchart such as Fig. 3.
Narration is extracted key point, is calculated Feature Descriptor and three steps of 3D characteristic matching in detail below.
The detailed process that wherein key point is extracted is as follows:
(1) each some p in inquiry input cloud dataiRadius rframeInterior a little, and calculate weight according to formula 1
Wij=1/ | | pi-pj| |, | pi-pj| < rframe (1)
(2) covariance matrix is calculated according to weight according to formula 2;
C O V ( p i ) = &Sigma; | p i - p j | < r f r a m e w i j ( p i - p j ) ( p i - p j ) T / &Sigma; | p i - p j | < r f r a m e w i j - - - ( 2 )
(3) eigenvalue of covariance matrix is calculatedAnd eigenvalue is arranged according to descending order;
(4) rate threshold γ is set21And γ32Retain and meetWithPoint set, these points are i.e. For key feature points.
Wherein the computational methods of Feature Descriptor are as follows:
First pass through calculate the covariance matrix of point being positioned at key point neighborhood local surfaces build one unique, bright True and stable local referential system (LRF), using key point as starting point, rotates local surfaces until LRF and Ox, Oy Align with Oz axle, so can make that a little there is rotational invariance;Then to each axle Ox, Oy, Oz perform following several steps, we These axles are as current axis:
1) local surfaces rotates around current axis with specified angle;
2) the local surfaces spot projection rotated is in XY, XZ and YZ plane;
3) setting up projective distribution matrix, this matrix only shows the quantity of the point that each subdomain comprises, the quantity of subdomain Represent the dimension of matrix, the same with specified angle it be also the parameter of this algorithm;
4) distribution matrix centre-to-centre spacing, i.e. μ are calculated11、μ21、μ12、μ22And e;
5) the value cascade composition subcharacter calculated.
Circulation performs these several steps repeatedly, and iterations depends on the number of the rotation given.Finally, by different coordinate axess Subcharacter cascade forms final RoPS and describes son.
Shape or the colouring information of local surfaces are added RoPS, coding information is extended and improves, generating one S/C-RoPS describes son, and as shown in Figure 4, the accuracy of characteristic matching is optimized the block diagram of algorithm.
This patent uses a kind of Decision-level fusion algorithm based on confidence level that S-RoPS is described son and C-RoPS describes son Carry out data message fusion.Concrete thought is single use S-RoPS or C-RoPS and describes son and carry out object identification, the most permissible Obtaining the high confidence level under each single mode method, convergence strategy is all candidate families being generated two kinds of independent solutions The confidence level of result compares, and selects the candidate family with high confidence level.
Wherein eigenvalue matching process is as follows:
This patent uses characteristic matching method based on threshold values.Under match pattern based on threshold value, if two describe son Between distance less than set threshold value, then show that two features are unanimously mated.
The range formula that threshold values is used is that (cluster is described son by multiple to the difference between two objects clusters of sign Set is constituted), the geometric center of i.e. two set is plus the manhatton distance sum such as formula 3 of the standard deviation of they every dimension With formula 5:
D (A, B)=L1(CA,CB)+L1(stdA,stdB) (3)
std A ( i ) = 1 | A | &Sigma; j = 1 | A | ( a j ( i ) - C A ( i ) ) 2 , i = 1 , ... , n - - - ( 4 )
stdBCalculating and stdASimilar, n representative feature describes the size of son
Two L describing sub-a and b1Distance is as follows:
L 1 ( a , b ) = &Sigma; i = 1 n | a ( i ) - b ( i ) | - - - ( 5 )
Step 3, choose suitable position mechanical arm, and set up coordinate system R, ask for the pose under K coordinate system;By sitting Position under K and attitude information are converted to the coordinate under coordinate system R and attitude information (object seat by mark conversion and coordinate system transformation Mark system O is the temporary variable produced to ask for attitude, does not has the point of the non-initial point of practical significance so being K to R rather than O arrives R), XML semanteme map is produced.
Choosing suitable position placement machine mechanical arm, set up coordinate system R as shown in Figure 5, coordinate system K initial point is in coordinate system R Coordinate be that (d, l h), utilize PCA method to set up object coordinates system O, through twice coordinate system transformation, and once to K coordinate system Under coordinate transform, obtain under R coordinate system corresponding posture information.Reproduction XML semanteme map.Idiographic flow such as Fig. 6.
1) geometric center of object point cloud is calculatedI represents some quantity, to institute's pointed set decentrationBy after decentration coordinate arrangement a little become the matrix of 3 × N
A = x 1 x 2 ... x n y 1 y 2 ... y n z 1 z 2 ... z n ; - - - ( 6 )
2) M=A A is madeT, calculate the eigen vector of M: λi·Vi=M Vi, i=1,2,3, and by feature to Amount normalization | | Vi| |=1, the characteristic vector of the long axis direction correspondence eigenvalue of maximum of object, if λ1≤λ2≤λ3, then thing can be obtained The spin matrix of body coordinate system relative coordinate system KTranslation matrix is the geometric center of object point cloudPose under then object coordinates ties up to coordinate system K such as formula 7:
T mod c a m = R mod c a m P mod c a m 0 1 &times; 3 1 - - - ( 7 )
IfcamC={Pi, then modC represents the some cloud under model library object coordinates system, according to major axis And central point, determine short axle and time long shaft plane, then determine short axle and time long axis direction according to the extreme value distribution of planar point.
At matching stage, in order to obtain the actual object transformation matrix to model library object, line-of-sight course is used to calculate six certainly By degree pose, to two corresponding three-dimensional point setmodP},objP}, if meeting rigid body transformation relationWhereinIt is spin matrix and the translation vector of two point sets, utilizes method of least square to ask Solve optimal solution, when obtaining making E minimum in formula 8With
E = &Sigma; i = 1 n | ( R o b j mod &CenterDot; P o b j + t o b j mod ) - P mod | 2 - - - ( 8 )
Then actual object is to the transformation matrix such as formula 9 of model library object:
T o b j mod = R o b j mod t o b j mod 0 1 &times; 3 1 - - - ( 9 )
Actual object is to the position auto-control such as formula 10 of sensor coordinate system:
T o b j c a m = T o b j mod &CenterDot; T mod c a m - - - ( 10 )
Spin matrix can be converted into angle of deflection, pitching angle beta, roll angle γ describe its attitude such as formula 11, and translation matrix can It is converted into centre coordinate and describes its position.
&beta; = A t a n 2 ( - r 31 , r 11 2 + r 21 2 )
&alpha; = A t a n 2 ( r 21 cos &beta; , r 11 cos &beta; )
&gamma; = A t a n 2 ( r 32 cos &beta; , r 33 cos &beta; ) - - - ( 11 )
Wherein rijRepresent the element that spin matrix i row j row are corresponding.
Coordinate system R and relation such as Fig. 5 of coordinate system K, both transformation matrixs such as formula 12.
x y z = 0 s i n &theta; - c o s &theta; 1 0 0 0 - cos &theta; - sin &theta; x k y k z k + d l h - - - ( 12 )
Wherein, θ represents the inclination angle of Kinect relative level, and x, y, z} are object coordinate figure under coordinate system R, {xk,yk,zkIt it is object coordinate figure under coordinate system K.
Object is as follows to the attitude matrix of coordinate system R;
T o b j r o b = T c a m r o b &CenterDot; T o b j c a m - - - ( 13 )
Wherein
T c a m r o b = 0 sin &theta; - c o s &theta; 1 0 0 0 - sin &theta; - c o s &theta; - - - ( 14 )
Part II, voice man-machine interaction, including following steps:
Step one, user send voice command, treated are transformed into text message.
After receiving the voice of user, finally give text message through pretreatment and tone decoding, idiographic flow such as figure 7:
Step 2, text message and XML semanteme map being inputted intellgence reasoning machine, inference machine produces and performs instruction and also export Text message;
User builds the map file semantically of current scene by the real-time generation module of Voice command three-dimensional map, and voice is known Other and phonetic synthesis node realizes human computer conversation by transmission and reception text respectively, and intellgence reasoning machine node then combines map File is analyzed and feedback information, improves user's expectation by degree of depth dialogue and ultimately generates solution and be sent to scheme solution Analysis and motion planning module.Speech recognition uses PocketSphnix to increase income speech recognition system, and phonetic synthesis uses It is that Ekho increases income speech synthesis system.
It should be appreciated that for those of ordinary skills, can be improved according to the above description or be converted, And all these modifications and variations all should belong to the protection domain of claims of the present invention.

Claims (5)

1. one kind based on Kincet and the man-machine interaction method of voice, it is characterised in that comprise the following steps:
1) use Kinect sensor obtain each object in scene at coordinate system K exact space position and attitude information, complete Object detection and recognition;Described coordinate system K is with kinect geometric center as initial point, to be perpendicular to camera lens outwardly direction as Z Axle positive direction, with the line in the center of circle of tri-camera lenses of Kincet as X-axis, crosses and sets up coordinate system;
2) depth image gathered respectively by Kinect and RGB obtain three dimensional point cloud after fusion treatment;
3) spatial point cloud object identification: three dimensional point cloud is carried out process and obtains semantic description file;
4) object coordinates system O is carried out the three-dimensional scenic semanteme map that coordinate transform obtains under coordinate system R and describes file;
5) receive user speech input, input signal is processed, obtains text message;
6) text message and XML semanteme map being inputted intellgence reasoning machine, inference machine produces and performs instruction and export user's Reply and the text message of guidance information.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 3) Spatial point cloud object identification process includes that pretreatment, key point are extracted, described son extraction, then is carried out by object features data base Characteristic matching, finally obtains semantic description file.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 3) In:
3.1) pretreatment, described pre-treatment step is for filtering the cloud data that range sensor is too far away or too close;
3.2) using ISS algorithm that cloud data is carried out feature point detection, detailed process is as follows:
3.2.1) each some p in inquiry input cloud dataiRadius rframeInterior had a pj, and calculate weight according to formula 1;
Wij=1/ | | pi-pj| |, | pi-pj| < rframe (1)
3.2.2) covariance matrix is calculated according to weight according to formula 2
C O V ( p i ) = &Sigma; | p i - p j | < r f r a m e w i j ( p i - p j ) ( p i - p j ) T / &Sigma; | p i - p j | < r f r a m e w i j - - - ( 2 )
3.2.3) eigenvalue of covariance matrix is calculatedAnd eigenvalue is arranged according to descending order;
3.2.4) rate threshold γ is set21And γ32, retain and meetWithPoint set, these point be Key feature points;
3.3) Feature Descriptor of key feature points calculates, and concrete grammar is as follows:
First pass through calculate the covariance matrix of point being positioned at key point neighborhood local surfaces build one unique, clear and definite With stable local referential system LRF, using key point as starting point, rotate local surfaces until LRF and object coordinates system O Ox, Oy and Oz axle alignment, so can make that a little there is rotational invariance;
Then to each axle Ox, Oy, Oz perform following several steps, we using these axles as current axis:
3.3.1) local surfaces rotates around current axis with specified angle;
3.3.2) the local surfaces spot projection rotated is in XY, XZ and YZ plane;
3.3.3) setting up projective distribution matrix, this matrix only shows the quantity of the point that each subdomain comprises, the quantity of subdomain Represent the dimension of matrix, the same with specified angle it be also the parameter of this algorithm;
3.3.4) distribution matrix centre-to-centre spacing, i.e. μ are calculated11、μ21、μ12、μ22And e;
3.3.5) the value cascade composition subcharacter calculated;
Circulation performs above-mentioned steps, and iterations depends on the number of the rotation given;Finally, by the subcharacter of different coordinate axess Cascade forms final RoPS and describes son;
3.4) eigenvalue coupling, concrete grammar is as follows:
This patent uses characteristic matching method based on threshold values, under match pattern based on threshold value, if two describe between son Distance less than set threshold value, then show that two features are unanimously mated;
The range formula that threshold values is used is to characterize the difference between two object clusters, and the geometric center of i.e. two set adds The manhatton distance sum of the standard deviation of they every dimension, such as formula (3) and formula (5):
D (A, B)=L1(CA,CB)+L1(stdA,stdB) (3)
Wherein, D (A, B) represents the range difference of two i.e. A and B of object cluster, CA(i),CBI () is respectively in A, B dimension The heart, L1 represents manhatton distance formula, stdAI () represents the standard deviation of cluster A dimension, stdBI () represents cluster B The standard deviation of dimension;
std A ( i ) = 1 | A | &Sigma; j = 1 | A | ( a j ( i ) - C A ( i ) ) 2 , i = 1 , ... , n - - - ( 4 )
N representative feature describes the size of son;
Two L describing sub-a and b1Distance is as follows:
L 1 ( a , b ) = &Sigma; i = 1 n | a ( i ) - b ( i ) | - - - ( 5 ) ,
ajI () represents the RoPS of jth key point in A cluster and describes the value of the i dimension of son;
| A | represents the quantity of key point in cluster A;
| B | represents the quantity of key point in cluster B.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 4) In, specific as follows: to choose suitable position placement machine mechanical arm, set up coordinate system R, coordinate system O initial point coordinate in coordinate system R For (d, l h), utilize PCA method to set up object coordinates system O, through twice coordinate transform, obtain semantic description file under R coordinate system Corresponding posture information, reproduction XML semanteme map.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 5) Speech recognition process specifically includes following steps:
5.1) pretreatment: gather user speech information by microphone array, processes the primary speech signal of input, filter Remove unessential information therein and background noise, the end-point detection of lang tone signal of going forward side by side, voice framing and pre-add Heavily process;
5.2) feature extraction: the key characterization parameter extracting reflection phonic signal character forms feature vector sequence;
5.3) HMM (HMM) is used to carry out acoustic model modeling, by voice to be identified during identifying Mate with acoustic model, thus obtain recognition result;
5.4) training text data base is carried out grammer, semantic analysis, through obtaining N-Gram language based on statistical model training Model, thus improve discrimination, reduce hunting zone.
5.5) for the voice signal of input, one is set up according to oneself trained good HMM acoustic model, language model and dictionary Identifying network, find an optimal paths according to searching algorithm in the network, this path is exactly can be with maximum of probability Export the word string of this voice signal, so that it is determined that the word that this speech samples is comprised.
CN201610306998.7A 2016-05-10 2016-05-10 Man-machine interaction method based on Kinect and voice Active CN106055244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610306998.7A CN106055244B (en) 2016-05-10 2016-05-10 Man-machine interaction method based on Kinect and voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610306998.7A CN106055244B (en) 2016-05-10 2016-05-10 Man-machine interaction method based on Kinect and voice

Publications (2)

Publication Number Publication Date
CN106055244A true CN106055244A (en) 2016-10-26
CN106055244B CN106055244B (en) 2020-08-04

Family

ID=57176838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610306998.7A Active CN106055244B (en) 2016-05-10 2016-05-10 Man-machine interaction method based on Kinect and voice

Country Status (1)

Country Link
CN (1) CN106055244B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108873707A (en) * 2017-05-10 2018-11-23 杭州欧维客信息科技股份有限公司 Speech-sound intelligent control system
CN109839622A (en) * 2017-11-29 2019-06-04 武汉科技大学 A kind of parallel computation particle probabilities hypothesis density filtering multi-object tracking method
CN111666797A (en) * 2019-03-08 2020-09-15 深圳市速腾聚创科技有限公司 Vehicle positioning method and device and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103472916A (en) * 2013-09-06 2013-12-25 东华大学 Man-machine interaction method based on human body gesture recognition
CN104571485A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 System and method for human and machine voice interaction based on Java Map

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103472916A (en) * 2013-09-06 2013-12-25 东华大学 Man-machine interaction method based on human body gesture recognition
CN104571485A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 System and method for human and machine voice interaction based on Java Map

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
KYLINFISH: "语音识别的基础知识与CMUsphinx介绍", 《HTTPS://WWW.CNBLOGS.COM/KYLINFISH/ARTICLES/3627188.HTML》 *
POINT CLOUD LIBRARY: "RoPs(Rotational Projection Statistics) feature", 《HTTP://POINTCLOUDS.ORG/DOCUMENTATION/TUTORIALS/ROPSES FEATURE.PHP》 *
YU ZHONG: "Intrinsic shape signatures: A shape descriptor for 3D object recognition", 《2009 IEEE 12TH INTERNATIONAL ON COMPUTER VISION WORKSHOPS》 *
吴凡 等: "一种实时的三维语义地图生成方法", 《计算机工程与应用》 *
熊志恒 等: "基于自然语言的分拣机器人解析器技术研究", 《计算机工程与应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108873707A (en) * 2017-05-10 2018-11-23 杭州欧维客信息科技股份有限公司 Speech-sound intelligent control system
CN109839622A (en) * 2017-11-29 2019-06-04 武汉科技大学 A kind of parallel computation particle probabilities hypothesis density filtering multi-object tracking method
CN109839622B (en) * 2017-11-29 2022-08-12 武汉科技大学 Multi-target tracking method for parallel computing particle probability hypothesis density filtering
CN111666797A (en) * 2019-03-08 2020-09-15 深圳市速腾聚创科技有限公司 Vehicle positioning method and device and computer equipment
CN111666797B (en) * 2019-03-08 2023-08-08 深圳市速腾聚创科技有限公司 Vehicle positioning method, device and computer equipment

Also Published As

Publication number Publication date
CN106055244B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN111432989B (en) Artificial enhancement cloud-based robot intelligent framework and related methods
CN110097553A (en) The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN106056207A (en) Natural language-based robot deep interacting and reasoning method and device
CN105739688A (en) Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN111967272B (en) Visual dialogue generating system based on semantic alignment
CN113378676A (en) Method for detecting figure interaction in image based on multi-feature fusion
CN113361636B (en) Image classification method, system, medium and electronic device
CN109064389B (en) Deep learning method for generating realistic images by hand-drawn line drawings
CN113012122A (en) Category-level 6D pose and size estimation method and device
CN108320051B (en) Mobile robot dynamic collision avoidance planning method based on GRU network model
CN107894836A (en) Remote sensing image processing and the man-machine interaction method of displaying based on gesture and speech recognition
CN110465089B (en) Map exploration method, map exploration device, map exploration medium and electronic equipment based on image recognition
CN106055244A (en) Man-machine interaction method based on Kincet and voice
CN110598595B (en) Multi-attribute face generation algorithm based on face key points and postures
Pramanick et al. Doro: Disambiguation of referred object for embodied agents
CN116561533B (en) Emotion evolution method and terminal for virtual avatar in educational element universe
CN109284692A (en) Merge the face identification method of EM algorithm and probability two dimension CCA
Wang et al. Combining ElasticFusion with PSPNet for RGB-D based indoor semantic mapping
CN116682178A (en) Multi-person gesture detection method in dense scene
KR20210054355A (en) Vision and language navigation system
Li et al. Few-shot meta-learning on point cloud for semantic segmentation
CN110517307A (en) The solid matching method based on laser specklegram is realized using convolution
CN115169448A (en) Three-dimensional description generation and visual positioning unified method based on deep learning
CN103793720A (en) Method and system for positioning eyes
Zhu et al. Speaker localization based on audio-visual bimodal fusion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant