CN106055244A - Man-machine interaction method based on Kincet and voice - Google Patents
Man-machine interaction method based on Kincet and voice Download PDFInfo
- Publication number
- CN106055244A CN106055244A CN201610306998.7A CN201610306998A CN106055244A CN 106055244 A CN106055244 A CN 106055244A CN 201610306998 A CN201610306998 A CN 201610306998A CN 106055244 A CN106055244 A CN 106055244A
- Authority
- CN
- China
- Prior art keywords
- coordinate system
- point
- voice
- kincet
- man
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a man-machine interaction method based on Kincet and voice. The method comprises the steps that (1) a Kinect sensor is used to acquire an accurate space position and attitude information of each object in a scene in a Kinect coordinate system K, so that a target can be detected and recognized; (2) three-dimensional point cloud data is obtained after fusion processing of depth images and RGB which are collected by the Kinect respectively; (3) a space point cloud object is recognized, wherein the three-dimensional point cloud data is processed, so that a semantic description document can be obtained; (4) coordinate conversion is carried out to an object coordinate system O, so that a three-dimensional scene semantic map description document under a coordinate system R can be obtained; (5) a user's voice input is input, and an input signal is processed, so that text information can be obtained; and (6) the text information and an XML semantic map are input into an intelligent inference engine, and the inference engine generates an execution instruction and outputs a reply for the user as well as the text information of guiding information.
Description
Technical field
The present invention relates to robotics, particularly relate to a kind of based on Kincet with the man-machine interaction method of voice.
Background technology
Traditional man-machine interactive system, many employing WIMP interfaces define with window, menu, icon and instruction device as base
The graphic user interface of plinth, by button, knob or other touching device input information.This interactive system can only be according to alternately
The information that system designer is preset provides limited option to select for people, it is impossible to interacting bulk information with environmental information also needs
Want operator to be manually entered, either apply in service link is still manufactured, be required for skilled work personnel behaviour
Make.In any case optimize its structure or improve the guidance mode to user, all can only reduce use difficulty and can not really reduce work
Make personnel amount and save the purpose of human cost.
Patents is found in literature search: sending out of Application No. CN201511016826.8 disclosed in 23 days March in 2016
Bright patent " a kind of method of human-computer interaction, device and robot ", it is proposed that a kind of mutual side based on voice and image information
Method, system can be determined the identity of user by the voice messaging of user and can be judged the input of user by the action of user.
The patent of invention " catering service system " of Application No. CN201510658482.4 disclosed in 23 days March in 2016, it is proposed that one
Plant and obtain user instruction based on Audio Processing Unit and drawn the man-machine interaction method of customer location by microphone array.
But, above-mentioned patent pertains only to how to obtain user profile by multimedia technology, but cannot be by obtaining scene
Information, it is necessary to assure interactive system is used in specific scene, once scene generation large change interactive system will be unable to response or
Occur performing mistake.
Summary of the invention
The technical problem to be solved in the present invention is for defect of the prior art, it is provided that a kind of based on Kincet and language
The man-machine interaction method of sound.
The technical solution adopted for the present invention to solve the technical problems is: a kind of based on Kincet with the man-machine interaction of voice
Method, comprises the following steps:
2) process three dimensional point cloud and ask for the position under K coordinate system;Described coordinate system K is with kinect geometric center
For initial point, to be perpendicular to camera lens outwardly direction for Z axis positive direction, with the line in the center of circle of tri-camera lenses of Kincet as X-axis, build
Vertical coordinate system;
1) depth image gathered respectively by Kinect and RGB obtain three dimensional point cloud after fusion treatment;
3) spatial point cloud object identification: three dimensional point cloud is carried out process and obtains semantic description file;
4) object coordinates system O is carried out the three-dimensional scenic semanteme map that coordinate transform obtains under coordinate system R and describes file;Thing
Body coordinate system O with emerging as initial point in the geometry of a cloud, with cross initial point the longest line segment direction of interior of articles as Z axis, cross initial point
The plane being perpendicular to Z axis is exactly X/Y plane;Coordinate system R is with ground as X/Y plane, and the geometric center of mechanical arm base is to put down at XY
Being projected as initial point on face, cross initial point and be perpendicular to ground upwards for Z axis positive direction, Y-axis is each parallel to the y-axis of K coordinate system;
5) receive user speech input, input signal is processed, obtains text message;
6) text message and XML semanteme map are inputted intellgence reasoning machine, inference machine produce perform instruction and export to
The answer at family and the text message of guidance information.
By such scheme, described step 3) spatial point cloud object identification process includes that pretreatment, key point are extracted, described son
Extract, then carry out characteristic matching by object features data base, finally obtain semantic description file.
By such scheme, described step 3) in:
3.1) pretreatment, described pre-treatment step is for filtering the cloud data that range sensor is too far away or too close;
3.2) using ISS algorithm that cloud data is carried out feature point detection, detailed process is as follows:
3.2.1) each some p in inquiry input cloud dataiRadius rframeInterior had a pj, and calculate according to formula 1
Weight;
Wij=1/ | | pi-pj| |, | pi-pj| < rframe (1)
3.2.2) covariance matrix is calculated according to weight according to formula 2
3.2.3) eigenvalue of covariance matrix is calculatedAnd eigenvalue is arranged according to descending order
Row;
3.2.4) rate threshold γ is set21And γ32, retain and meetWithPoint set, these
Point is key feature points;
3.3) Feature Descriptor of key feature points calculates, and concrete grammar is as follows:
First pass through calculate the covariance matrix of point being positioned at key point neighborhood local surfaces build one unique, bright
True and stable local referential system LRF, using key point as starting point, rotates local surfaces until LRF sits with object
Ox, Oy and the Oz axle alignment of mark system O, so can make a little have rotational invariance;
Then to each axle Ox, Oy, Oz perform following several steps, we using these axles as current axis:
3.3.1) local surfaces rotates around current axis with specified angle;
3.3.2) the local surfaces spot projection rotated is in XY, XZ and YZ plane;
3.3.3) setting up projective distribution matrix, this matrix only shows the quantity of the point that each subdomain comprises, subdomain
Quantity represents the dimension of matrix, the same with specified angle it be also the parameter of this algorithm;
3.3.4) distribution matrix centre-to-centre spacing, i.e. μ are calculated11、μ21、μ12、μ22And e;
3.3.5) the value cascade composition subcharacter calculated;
Circulation performs above-mentioned steps, and iterations depends on the number of the rotation given;Finally, by the son of different coordinate axess
Feature cascade forms final RoPS and describes son;
3.4) eigenvalue coupling, concrete grammar is as follows:
This patent uses characteristic matching method based on threshold values, under match pattern based on threshold value, if two describe son
Between distance less than set threshold value, then show that two features are unanimously mated.
The range formula that threshold values is used is that (cluster is described son by multiple to the difference between two objects clusters of sign
Set is constituted), the geometric center of i.e. two set is plus the manhatton distance sum such as formula 3 of the standard deviation of they every dimension
With formula 5:
D (A, B)=L1(CA,CB)+L1(stdA,stdB) (3)
Wherein, D (A, B) represents the range difference of two i.e. A and B of object cluster, CA(i),CBI () is respectively A, B dimension
Center, L1 represents manhatton distance formula, stdAI () represents the standard deviation of cluster A dimension, stdBI () represents cluster
The standard deviation of B dimension;
Two L describing sub-a and b1Distance is as follows:
Wherein, n representative feature describes the size of son, the i.e. dimension 135 of RoPS;
ajI () represents the RoPS of jth key point in A cluster and describes the value of the i dimension of son;
| A | represents the quantity of key point in cluster A;
| B | represents the quantity of key point in cluster B.
By such scheme, described step 4) in, specific as follows: to choose suitable position placement machine mechanical arm, set up coordinate system
R, coordinate system K initial point coordinate in coordinate system R is that (d, l h), utilize PCA method to set up object coordinates system O, process coordinate system
O arrives twice coordinate system transformation of coordinate system R again and draws the attitude of object to coordinate system K;Coordinate from coordinate system K carries out coordinate change
Get the attitude information under R coordinate system in return, obtain the posture information that under R coordinate system, semantic description file is corresponding, reproduction XML language
Free burial ground for the destitute figure.
By such scheme, described step 5) speech recognition process specifically includes following steps:
5.1) pretreatment: gather user speech information by microphone array, at the primary speech signal to input
Reason, filter out unessential information therein and background noise, the end-point detection of lang tone signal of going forward side by side, voice framing and
Preemphasis processes;
5.2) feature extraction: the key characterization parameter extracting reflection phonic signal character forms feature vector sequence;
5.3) HMM (HMM) is used to carry out acoustic model modeling, by be identified during identifying
Voice mates with acoustic model, thus obtains recognition result;
5.4) training text data base is carried out grammer, semantic analysis, through obtaining N-Gram based on statistical model training
Language model, thus improve discrimination, reduce hunting zone.
5.5) for the voice signal of input, set up according to own trained good HMM acoustic model, language model and dictionary
One identifies network, finds an optimal paths according to searching algorithm in the network, and this path is exactly can be with maximum
The word string of this voice signal of probability output, so that it is determined that the word that this speech samples is comprised.
The beneficial effect comprise that: by identifying that the position of object solves conventional automated equipment, product position
Put the shortcoming that restriction scope is the least;Voice can have been applied with the combination of object location information in service occupation simultaneously;
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing:
Fig. 1 is kinect sensor model;And K coordinate system schematic diagram;
Fig. 2 is K coordinate system and the schematic diagram of ground contrast;
Fig. 3 is object identification overall flow figure;
Fig. 4 is Feature Descriptor flow chart;
Fig. 5 is the relation schematic diagram of K coordinate system and R coordinate system;
Fig. 6 is that object pose asks for overall flow figure;
Fig. 7 is interactive voice overall flow figure;
Fig. 8 is system entire block diagram.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with embodiment, to the present invention
It is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not used to limit
Determine the present invention.
As it is shown in figure 1, a kind of based on Kincet with the man-machine interaction method of voice, including following two part:
Part I scene interactivity, including following steps:
Step one, correctly lay Kinect, set up K coordinate system;
Kinect is placed on the right opposite of object, and Kinect investigative range is 1.8~3.6 meters, horizontal field of view be 53 ° vertical
The visual field is 47 °, object to should be ensured that the object of ornaments within the scope of guarantee that Kinect can correctly gather data.Then such as figure
Set up coordinate system K with the center of kinect as initial point, Kinect and ground plane relationship such as Fig. 2, wherein z-axis and horizontal plane shown in 1
Angle be θ.
Step 2, Kinect sensor complete object detection and recognition;
Kinect sampling depth image and RGB respectively obtain three dimensional point cloud after fusion treatment;
First pass around pretreatment and filter the cloud data that range sensor is too far away or too close, so can effectively reduce calculating
Cost, improves processing speed, improves system real time.
After pretreatment, ISS algorithm is selected to carry out feature point detection.Then to detecting that characteristic point is with S/C-RoPS algorithm
Carry out feature description.Carry out characteristic matching by object features data base again and must set out object to semantic description file.
Cloud data collecting flowchart such as Fig. 3.
Narration is extracted key point, is calculated Feature Descriptor and three steps of 3D characteristic matching in detail below.
The detailed process that wherein key point is extracted is as follows:
(1) each some p in inquiry input cloud dataiRadius rframeInterior a little, and calculate weight according to formula 1
Wij=1/ | | pi-pj| |, | pi-pj| < rframe (1)
(2) covariance matrix is calculated according to weight according to formula 2;
(3) eigenvalue of covariance matrix is calculatedAnd eigenvalue is arranged according to descending order;
(4) rate threshold γ is set21And γ32Retain and meetWithPoint set, these points are i.e.
For key feature points.
Wherein the computational methods of Feature Descriptor are as follows:
First pass through calculate the covariance matrix of point being positioned at key point neighborhood local surfaces build one unique, bright
True and stable local referential system (LRF), using key point as starting point, rotates local surfaces until LRF and Ox, Oy
Align with Oz axle, so can make that a little there is rotational invariance;Then to each axle Ox, Oy, Oz perform following several steps, we
These axles are as current axis:
1) local surfaces rotates around current axis with specified angle;
2) the local surfaces spot projection rotated is in XY, XZ and YZ plane;
3) setting up projective distribution matrix, this matrix only shows the quantity of the point that each subdomain comprises, the quantity of subdomain
Represent the dimension of matrix, the same with specified angle it be also the parameter of this algorithm;
4) distribution matrix centre-to-centre spacing, i.e. μ are calculated11、μ21、μ12、μ22And e;
5) the value cascade composition subcharacter calculated.
Circulation performs these several steps repeatedly, and iterations depends on the number of the rotation given.Finally, by different coordinate axess
Subcharacter cascade forms final RoPS and describes son.
Shape or the colouring information of local surfaces are added RoPS, coding information is extended and improves, generating one
S/C-RoPS describes son, and as shown in Figure 4, the accuracy of characteristic matching is optimized the block diagram of algorithm.
This patent uses a kind of Decision-level fusion algorithm based on confidence level that S-RoPS is described son and C-RoPS describes son
Carry out data message fusion.Concrete thought is single use S-RoPS or C-RoPS and describes son and carry out object identification, the most permissible
Obtaining the high confidence level under each single mode method, convergence strategy is all candidate families being generated two kinds of independent solutions
The confidence level of result compares, and selects the candidate family with high confidence level.
Wherein eigenvalue matching process is as follows:
This patent uses characteristic matching method based on threshold values.Under match pattern based on threshold value, if two describe son
Between distance less than set threshold value, then show that two features are unanimously mated.
The range formula that threshold values is used is that (cluster is described son by multiple to the difference between two objects clusters of sign
Set is constituted), the geometric center of i.e. two set is plus the manhatton distance sum such as formula 3 of the standard deviation of they every dimension
With formula 5:
D (A, B)=L1(CA,CB)+L1(stdA,stdB) (3)
stdBCalculating and stdASimilar, n representative feature describes the size of son
Two L describing sub-a and b1Distance is as follows:
Step 3, choose suitable position mechanical arm, and set up coordinate system R, ask for the pose under K coordinate system;By sitting
Position under K and attitude information are converted to the coordinate under coordinate system R and attitude information (object seat by mark conversion and coordinate system transformation
Mark system O is the temporary variable produced to ask for attitude, does not has the point of the non-initial point of practical significance so being K to R rather than O arrives
R), XML semanteme map is produced.
Choosing suitable position placement machine mechanical arm, set up coordinate system R as shown in Figure 5, coordinate system K initial point is in coordinate system R
Coordinate be that (d, l h), utilize PCA method to set up object coordinates system O, through twice coordinate system transformation, and once to K coordinate system
Under coordinate transform, obtain under R coordinate system corresponding posture information.Reproduction XML semanteme map.Idiographic flow such as Fig. 6.
1) geometric center of object point cloud is calculatedI represents some quantity, to institute's pointed set decentrationBy after decentration coordinate arrangement a little become the matrix of 3 × N
2) M=A A is madeT, calculate the eigen vector of M: λi·Vi=M Vi, i=1,2,3, and by feature to
Amount normalization | | Vi| |=1, the characteristic vector of the long axis direction correspondence eigenvalue of maximum of object, if λ1≤λ2≤λ3, then thing can be obtained
The spin matrix of body coordinate system relative coordinate system KTranslation matrix is the geometric center of object point cloudPose under then object coordinates ties up to coordinate system K such as formula 7:
IfcamC={Pi, then modC represents the some cloud under model library object coordinates system, according to major axis
And central point, determine short axle and time long shaft plane, then determine short axle and time long axis direction according to the extreme value distribution of planar point.
At matching stage, in order to obtain the actual object transformation matrix to model library object, line-of-sight course is used to calculate six certainly
By degree pose, to two corresponding three-dimensional point setmodP},objP}, if meeting rigid body transformation relationWhereinIt is spin matrix and the translation vector of two point sets, utilizes method of least square to ask
Solve optimal solution, when obtaining making E minimum in formula 8With
Then actual object is to the transformation matrix such as formula 9 of model library object:
Actual object is to the position auto-control such as formula 10 of sensor coordinate system:
Spin matrix can be converted into angle of deflection, pitching angle beta, roll angle γ describe its attitude such as formula 11, and translation matrix can
It is converted into centre coordinate and describes its position.
Wherein rijRepresent the element that spin matrix i row j row are corresponding.
Coordinate system R and relation such as Fig. 5 of coordinate system K, both transformation matrixs such as formula 12.
Wherein, θ represents the inclination angle of Kinect relative level, and x, y, z} are object coordinate figure under coordinate system R,
{xk,yk,zkIt it is object coordinate figure under coordinate system K.
Object is as follows to the attitude matrix of coordinate system R;
Wherein
Part II, voice man-machine interaction, including following steps:
Step one, user send voice command, treated are transformed into text message.
After receiving the voice of user, finally give text message through pretreatment and tone decoding, idiographic flow such as figure
7:
Step 2, text message and XML semanteme map being inputted intellgence reasoning machine, inference machine produces and performs instruction and also export
Text message;
User builds the map file semantically of current scene by the real-time generation module of Voice command three-dimensional map, and voice is known
Other and phonetic synthesis node realizes human computer conversation by transmission and reception text respectively, and intellgence reasoning machine node then combines map
File is analyzed and feedback information, improves user's expectation by degree of depth dialogue and ultimately generates solution and be sent to scheme solution
Analysis and motion planning module.Speech recognition uses PocketSphnix to increase income speech recognition system, and phonetic synthesis uses
It is that Ekho increases income speech synthesis system.
It should be appreciated that for those of ordinary skills, can be improved according to the above description or be converted,
And all these modifications and variations all should belong to the protection domain of claims of the present invention.
Claims (5)
1. one kind based on Kincet and the man-machine interaction method of voice, it is characterised in that comprise the following steps:
1) use Kinect sensor obtain each object in scene at coordinate system K exact space position and attitude information, complete
Object detection and recognition;Described coordinate system K is with kinect geometric center as initial point, to be perpendicular to camera lens outwardly direction as Z
Axle positive direction, with the line in the center of circle of tri-camera lenses of Kincet as X-axis, crosses and sets up coordinate system;
2) depth image gathered respectively by Kinect and RGB obtain three dimensional point cloud after fusion treatment;
3) spatial point cloud object identification: three dimensional point cloud is carried out process and obtains semantic description file;
4) object coordinates system O is carried out the three-dimensional scenic semanteme map that coordinate transform obtains under coordinate system R and describes file;
5) receive user speech input, input signal is processed, obtains text message;
6) text message and XML semanteme map being inputted intellgence reasoning machine, inference machine produces and performs instruction and export user's
Reply and the text message of guidance information.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 3)
Spatial point cloud object identification process includes that pretreatment, key point are extracted, described son extraction, then is carried out by object features data base
Characteristic matching, finally obtains semantic description file.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 3)
In:
3.1) pretreatment, described pre-treatment step is for filtering the cloud data that range sensor is too far away or too close;
3.2) using ISS algorithm that cloud data is carried out feature point detection, detailed process is as follows:
3.2.1) each some p in inquiry input cloud dataiRadius rframeInterior had a pj, and calculate weight according to formula 1;
Wij=1/ | | pi-pj| |, | pi-pj| < rframe (1)
3.2.2) covariance matrix is calculated according to weight according to formula 2
3.2.3) eigenvalue of covariance matrix is calculatedAnd eigenvalue is arranged according to descending order;
3.2.4) rate threshold γ is set21And γ32, retain and meetWithPoint set, these point be
Key feature points;
3.3) Feature Descriptor of key feature points calculates, and concrete grammar is as follows:
First pass through calculate the covariance matrix of point being positioned at key point neighborhood local surfaces build one unique, clear and definite
With stable local referential system LRF, using key point as starting point, rotate local surfaces until LRF and object coordinates system O
Ox, Oy and Oz axle alignment, so can make that a little there is rotational invariance;
Then to each axle Ox, Oy, Oz perform following several steps, we using these axles as current axis:
3.3.1) local surfaces rotates around current axis with specified angle;
3.3.2) the local surfaces spot projection rotated is in XY, XZ and YZ plane;
3.3.3) setting up projective distribution matrix, this matrix only shows the quantity of the point that each subdomain comprises, the quantity of subdomain
Represent the dimension of matrix, the same with specified angle it be also the parameter of this algorithm;
3.3.4) distribution matrix centre-to-centre spacing, i.e. μ are calculated11、μ21、μ12、μ22And e;
3.3.5) the value cascade composition subcharacter calculated;
Circulation performs above-mentioned steps, and iterations depends on the number of the rotation given;Finally, by the subcharacter of different coordinate axess
Cascade forms final RoPS and describes son;
3.4) eigenvalue coupling, concrete grammar is as follows:
This patent uses characteristic matching method based on threshold values, under match pattern based on threshold value, if two describe between son
Distance less than set threshold value, then show that two features are unanimously mated;
The range formula that threshold values is used is to characterize the difference between two object clusters, and the geometric center of i.e. two set adds
The manhatton distance sum of the standard deviation of they every dimension, such as formula (3) and formula (5):
D (A, B)=L1(CA,CB)+L1(stdA,stdB) (3)
Wherein, D (A, B) represents the range difference of two i.e. A and B of object cluster, CA(i),CBI () is respectively in A, B dimension
The heart, L1 represents manhatton distance formula, stdAI () represents the standard deviation of cluster A dimension, stdBI () represents cluster B
The standard deviation of dimension;
N representative feature describes the size of son;
Two L describing sub-a and b1Distance is as follows:
ajI () represents the RoPS of jth key point in A cluster and describes the value of the i dimension of son;
| A | represents the quantity of key point in cluster A;
| B | represents the quantity of key point in cluster B.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 4)
In, specific as follows: to choose suitable position placement machine mechanical arm, set up coordinate system R, coordinate system O initial point coordinate in coordinate system R
For (d, l h), utilize PCA method to set up object coordinates system O, through twice coordinate transform, obtain semantic description file under R coordinate system
Corresponding posture information, reproduction XML semanteme map.
The most according to claim 1 based on Kincet with the man-machine interaction method of voice, it is characterised in that described step 5)
Speech recognition process specifically includes following steps:
5.1) pretreatment: gather user speech information by microphone array, processes the primary speech signal of input, filter
Remove unessential information therein and background noise, the end-point detection of lang tone signal of going forward side by side, voice framing and pre-add
Heavily process;
5.2) feature extraction: the key characterization parameter extracting reflection phonic signal character forms feature vector sequence;
5.3) HMM (HMM) is used to carry out acoustic model modeling, by voice to be identified during identifying
Mate with acoustic model, thus obtain recognition result;
5.4) training text data base is carried out grammer, semantic analysis, through obtaining N-Gram language based on statistical model training
Model, thus improve discrimination, reduce hunting zone.
5.5) for the voice signal of input, one is set up according to oneself trained good HMM acoustic model, language model and dictionary
Identifying network, find an optimal paths according to searching algorithm in the network, this path is exactly can be with maximum of probability
Export the word string of this voice signal, so that it is determined that the word that this speech samples is comprised.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610306998.7A CN106055244B (en) | 2016-05-10 | 2016-05-10 | Man-machine interaction method based on Kinect and voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610306998.7A CN106055244B (en) | 2016-05-10 | 2016-05-10 | Man-machine interaction method based on Kinect and voice |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106055244A true CN106055244A (en) | 2016-10-26 |
CN106055244B CN106055244B (en) | 2020-08-04 |
Family
ID=57176838
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610306998.7A Active CN106055244B (en) | 2016-05-10 | 2016-05-10 | Man-machine interaction method based on Kinect and voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106055244B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108873707A (en) * | 2017-05-10 | 2018-11-23 | 杭州欧维客信息科技股份有限公司 | Speech-sound intelligent control system |
CN109839622A (en) * | 2017-11-29 | 2019-06-04 | 武汉科技大学 | A kind of parallel computation particle probabilities hypothesis density filtering multi-object tracking method |
CN111666797A (en) * | 2019-03-08 | 2020-09-15 | 深圳市速腾聚创科技有限公司 | Vehicle positioning method and device and computer equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103472916A (en) * | 2013-09-06 | 2013-12-25 | 东华大学 | Man-machine interaction method based on human body gesture recognition |
CN104571485A (en) * | 2013-10-28 | 2015-04-29 | 中国科学院声学研究所 | System and method for human and machine voice interaction based on Java Map |
-
2016
- 2016-05-10 CN CN201610306998.7A patent/CN106055244B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103472916A (en) * | 2013-09-06 | 2013-12-25 | 东华大学 | Man-machine interaction method based on human body gesture recognition |
CN104571485A (en) * | 2013-10-28 | 2015-04-29 | 中国科学院声学研究所 | System and method for human and machine voice interaction based on Java Map |
Non-Patent Citations (5)
Title |
---|
KYLINFISH: "语音识别的基础知识与CMUsphinx介绍", 《HTTPS://WWW.CNBLOGS.COM/KYLINFISH/ARTICLES/3627188.HTML》 * |
POINT CLOUD LIBRARY: "RoPs(Rotational Projection Statistics) feature", 《HTTP://POINTCLOUDS.ORG/DOCUMENTATION/TUTORIALS/ROPSES FEATURE.PHP》 * |
YU ZHONG: "Intrinsic shape signatures: A shape descriptor for 3D object recognition", 《2009 IEEE 12TH INTERNATIONAL ON COMPUTER VISION WORKSHOPS》 * |
吴凡 等: "一种实时的三维语义地图生成方法", 《计算机工程与应用》 * |
熊志恒 等: "基于自然语言的分拣机器人解析器技术研究", 《计算机工程与应用》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108873707A (en) * | 2017-05-10 | 2018-11-23 | 杭州欧维客信息科技股份有限公司 | Speech-sound intelligent control system |
CN109839622A (en) * | 2017-11-29 | 2019-06-04 | 武汉科技大学 | A kind of parallel computation particle probabilities hypothesis density filtering multi-object tracking method |
CN109839622B (en) * | 2017-11-29 | 2022-08-12 | 武汉科技大学 | Multi-target tracking method for parallel computing particle probability hypothesis density filtering |
CN111666797A (en) * | 2019-03-08 | 2020-09-15 | 深圳市速腾聚创科技有限公司 | Vehicle positioning method and device and computer equipment |
CN111666797B (en) * | 2019-03-08 | 2023-08-08 | 深圳市速腾聚创科技有限公司 | Vehicle positioning method, device and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106055244B (en) | 2020-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111432989B (en) | Artificial enhancement cloud-based robot intelligent framework and related methods | |
CN110097553A (en) | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system | |
CN106056207A (en) | Natural language-based robot deep interacting and reasoning method and device | |
CN105739688A (en) | Man-machine interaction method and device based on emotion system, and man-machine interaction system | |
CN111967272B (en) | Visual dialogue generating system based on semantic alignment | |
CN113378676A (en) | Method for detecting figure interaction in image based on multi-feature fusion | |
CN113361636B (en) | Image classification method, system, medium and electronic device | |
CN109064389B (en) | Deep learning method for generating realistic images by hand-drawn line drawings | |
CN113012122A (en) | Category-level 6D pose and size estimation method and device | |
CN108320051B (en) | Mobile robot dynamic collision avoidance planning method based on GRU network model | |
CN107894836A (en) | Remote sensing image processing and the man-machine interaction method of displaying based on gesture and speech recognition | |
CN110465089B (en) | Map exploration method, map exploration device, map exploration medium and electronic equipment based on image recognition | |
CN106055244A (en) | Man-machine interaction method based on Kincet and voice | |
CN110598595B (en) | Multi-attribute face generation algorithm based on face key points and postures | |
Pramanick et al. | Doro: Disambiguation of referred object for embodied agents | |
CN116561533B (en) | Emotion evolution method and terminal for virtual avatar in educational element universe | |
CN109284692A (en) | Merge the face identification method of EM algorithm and probability two dimension CCA | |
Wang et al. | Combining ElasticFusion with PSPNet for RGB-D based indoor semantic mapping | |
CN116682178A (en) | Multi-person gesture detection method in dense scene | |
KR20210054355A (en) | Vision and language navigation system | |
Li et al. | Few-shot meta-learning on point cloud for semantic segmentation | |
CN110517307A (en) | The solid matching method based on laser specklegram is realized using convolution | |
CN115169448A (en) | Three-dimensional description generation and visual positioning unified method based on deep learning | |
CN103793720A (en) | Method and system for positioning eyes | |
Zhu et al. | Speaker localization based on audio-visual bimodal fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |