CN1302056A - Information processing equiopment, information processing method and storage medium - Google Patents

Information processing equiopment, information processing method and storage medium Download PDF

Info

Publication number
CN1302056A
CN1302056A CN00137498A CN00137498A CN1302056A CN 1302056 A CN1302056 A CN 1302056A CN 00137498 A CN00137498 A CN 00137498A CN 00137498 A CN00137498 A CN 00137498A CN 1302056 A CN1302056 A CN 1302056A
Authority
CN
China
Prior art keywords
image
behavior
user
result
robot device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN00137498A
Other languages
Chinese (zh)
Other versions
CN1204543C (en
Inventor
山下润一
小川浩明
本田等
赫尔穆特·卢克
田丸英司
藤田八重子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Publication of CN1302056A publication Critical patent/CN1302056A/en
Application granted granted Critical
Publication of CN1204543C publication Critical patent/CN1204543C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/021Optical sensing devices
    • B25J19/023Optical sensing devices including video camera means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Robotics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Social Psychology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Manipulator (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

A robot performing a variety of actions is disclosed. The voice of a user picked up by a microphone is recognized by a voice recognizer. A gesture of the user picked up by a CCD is recognized by an image recognizer. A behavior decision unit decides the behavior to be taken by the robot based on the voice information provided by the voice recognizer and the image information provided by the image recognizer.

Description

Messaging device, information processing method and storage medium
The present invention relates to messaging device, information processing method and storage medium, and be particularly related to messaging device, the information processing method of the behavior of using voice messaging and image information decision robot device, and the storage medium that is used for the software program of canned data disposal route.
A large amount of toy robot device (some time filled stagnant (stuffed)) is arranged on the market now.Some robot devices press the output synthetic speech according to touch switch.Other robot devices discern the speech that the user sends and speak and reply, so glad the and such robot device's talk of user.
Many robot devices are also arranged on the market, and their captured images are used for image recognition, and estimate the environment around them, and the mode of controlling with the oneself moves.
When user's voice was not really known, speech recognition was unsettled identification.Specifically, when user's word comprised by clear definition and the demonstrative pronoun of several explanations is arranged, the robot device can not discern the object of this pronoun indication.
Above-mentioned robot device moves in the mode that the oneself controls according to voice or image, and is difficult to according to voice messaging and image information operation.
Therefore, an object of the present invention is to provide a kind of robot device who uses voice and image information reliably to carry out speech recognition, thereby provide multiple motion to the robot device.
In one aspect of the invention, a kind of messaging device that uses among the robot device comprises: speech recognition device is used for recognizing voice; The image recognition device is used for recognition image; With the decision unit, at least one result in the image recognition result that provides according to the voice identification result that provides by speech recognition device with by the image recognition device, decision robot device's behavior.
Messaging device can comprise: storage unit, storage are described the relation of the image recognition result that speech recognition result that speech recognition device provides and image recognition device provide and according to the table of voice identification result and the well-determined robot device's of image recognition result behavior.
To can't help speech recognition device unique when definite when speech recognition, and the decision unit can be according to by the well-determined image recognition result of image recognition device, decision robot device's behavior.
When a plurality of objects appeared in the image-region of image recognition device identification, the decision unit can be according to by the well-determined voice identification result of speech recognition device, decision robot device's behavior.
The image recognition device can be identified in user's finger, face, eyes and the image of the scene that occurs on the direction of one of the predetermined portions indication of the user in the neck down.
This messaging device also further comprises: storage unit, be used to store the dumb show data of implementing by the user, and wherein the image recognition device is discerned user's image, to detect and the dumb show that is stored in the dumb show Data Matching in the storage unit; And regard the dumb show that is detected as image recognition result.
This messaging device also further comprises: detecting device is used to detect user's face; With the distance finger, be used for size according to user's face of detecting device detection, measure the distance between user and the robot device, wherein determine the unit to use the distance of measurement to determine robot device's behavior.
Speech recognition device can detect the melody that is included in the background sound, and the melody that detects is treated as voice identification result.
Speech recognition device can be from background sound the detection of acoustic phenomenon, and the acoustic phenomenon that is detected handled as voice identification result.
In another aspect of this invention, the information processing method of the messaging device that the robot device uses comprises: speech recognition steps, recognizing voice; The image recognition step, recognition image; And deciding step, according to the voice identification result that provides in speech recognition steps and at least one result in the image recognition result that the image recognition step provides, decision robot device's behavior.
In another aspect of this invention, the software program of the messaging device that uses in the robot device comprises: the program code that is used for carrying out the following step: the deciding step of the vast robot device's of the deciding behavior of at least one result of the image recognition step of the speech recognition steps of recognizing voice, recognition image, the image recognition result that provides according to the voice identification result that provides in speech recognition steps with in the image recognition step.
In another aspect of this invention, the software program of the messaging device that uses among the storage medium stores robot device.Comprise the program code of carrying out the following step: the image recognition step of the speech recognition steps of recognizing voice, recognition image, according to the deciding step that determines robot device's behavior in voice identification result that speech recognition steps provides and at least one result in the image recognition result that the image recognition step provides.
Fig. 1 is the outside drawing of a robot device's of the present invention embodiment;
Fig. 2 is the block scheme of robot device's inner structure shown in Figure 1;
Fig. 3 is the functional-block diagram of the controller of Fig. 2;
Fig. 4 is a functional-block diagram of carrying out robot device's part of voice and image recognition;
Fig. 5 is the inner structure block scheme of speech recognition device;
Fig. 6 is the inner structure block scheme of image recognition device;
Fig. 7 is the inner structure block scheme of behavior decision unit;
Fig. 8 is the behavior table of storing in the behavior table storage unit;
Fig. 9 is the behavior sorted table of storing in behavior sorted table storage unit;
Figure 10 is the process flow diagram of speech recognition process;
Figure 11 is the process flow diagram of image recognition processes;
Figure 12 is the process flow diagram of behavior decision process;
Figure 13 is to use the process flow diagram flow chart of voice messaging and image information output recognition result;
Figure 14 is to use another process flow diagram flow chart of voice messaging and image information output recognition result;
Figure 15 is to use the another process flow diagram flow chart of voice messaging and image information output recognition result;
Figure 16 shows user and robot device's geometric graph;
Figure 17 shows another structure of speech recognition device;
Figure 18 shows another behavior table of storing in the behavior table storage unit;
Figure 19 shows another behavior table of storing in the behavior table storage unit;
Figure 20 shows storage medium.
Fig. 1 is the outside drawing of robot device's 1 an of the present invention embodiment, and Fig. 2 shows robot device 1 electrical structure.
Dog of robot device's 1 simulation of this embodiment.Leg unit 3A, 3B, 3C and 3D are connected respectively to the trunk unit 2 of front left side, forward right side, left rear side and right lateral side.Head unit 4 and tail unit 5 are connected respectively in preceding and trunk unit 2 rear.
Tail unit 5 can extend on two degree of freedom ground from the base part 5B of trunk unit 2, makes that tail unit 5 is crooked or rotates.Dress in the trunk unit 2: controller 10 is used to control entire machine people device 1; Battery 11 is as robot device 1 power supply; With internal sensor 14, such as battery sensor 12 and thermal sensor 13.
Head unit 4 comprises: the microphone 15 corresponding with dog " ear "; The CCD corresponding (charge-coupled image sensor) video camera 16 with dog " eyes "; The touch sensor 17 corresponding with the sense of touch of dog; With the loudspeaker 18 corresponding with dog " mouth ".
Actuator 3AA 1To 3AA K, 3BA 1To 3BA K, 3CA 1To 3CA KAnd 3DA 1To 3DA KBe separately positioned among leg unit 3A, 3B, 3C and the 3D, and the joint between leg unit 3A, 3B, 3C, 3D and trunk unit 2 respectively.Actuator 4A 1To 4A LBe arranged on the joint between head unit 4 and the trunk unit 2, and actuator 5A 1And 5A 2Be arranged on the joint between tail unit 5 and the trunk unit 2.These joints make each linkage unit rotate with predetermined degree of freedom.
Microphone 15 in head unit 4 picks up the ambient sound that comprises user speech, and the voice signal that produces is outputed to controller 10.Ccd video camera 16 picks up the image around the robot device 1, and the picture signal that produces is sent to controller 10.
The touch sensor 17 that is arranged on the top of head unit 4 detects the physical action pressure that is applied thereto, and such as " by stroking " or " by beaing ", and testing result is sent to controller 10 as pressure signal.
Battery sensor 12 in trunk unit 2 detects remaining electric power in battery 11, and the power level that detects is outputed to controller 10 as the dump power indicator signal.The heat of accumulation in the thermal sensor 13 detection machine people devices 1, and testing result is sent to controller 10 as the flat signal of thermoelectricity.
Controller 10 comprises CPU (CPU (central processing unit)) 10A and storer 10B.CPU10A is stored in control program among the storer 10B by execution, carries out various processing.Specifically, controller 10 is according to the voice signal that is provided by microphone 15, ccd video camera 16, touch sensor 17, battery sensor 12 and thermal sensor 13 respectively, picture signal, pressure signal, remaining battery electric power indicator signal and thermoelectric flat signal, determine around the robot device 1 situation, from the existence of user's order and user action or do not exist.
According to definite result, what action or behavior are taked in controller 10 decisions.In response to determination result, actuator 3AA 1To 3AA K, 3BA 1To 3BA K, 3CA 1To 3CA KAnd 3DA 1To 3DA K, 4A 1To 4A LAnd 5A 1And 5A 2Be driven as required.Head unit 4 can vertical or horizontally rotate, and tail unit 5 can be swung, and leg unit 3A can be driven to 3D, walks such as four-footed thereby robot device 1 carries out any action.
Controller 10 is synthetic video as required, and by the synthetic sound of loudspeaker 18 outputs.LED (light emitting diode) (not shown) that is arranged on robot device 1 eye position can be connected, and can extinguish maybe and can blink.
Use the method, robot device 1 is in response to its ambient conditions, with self-control mode action.
Fig. 3 is the functional-block diagram of controller 10 shown in Figure 2.When CPU 10A carried out the control program that is stored among the storer 10B, robot device 1 was according to functional-block diagram operation shown in Figure 3.
Controller 10 comprises: sensor signal processor 31 is used for recognition machine people device 1 particular case on every side; Emotion/instinct model unit 32 is used to express robot device 1 emotion and instinct state; Behavior decision unit 33, according to the recognition result that provides by sensor signal processor 31, the action that decision will be taked; Determination result drive machines people device 1 action that provides according to behavior decision unit 33 is provided attitude converting unit 34; Driving governor 35 is used for driving and control actuator 3AA 1To 5A 2 Voice operation demonstrator 36, synthesized phonetic sound; With Sound Processor Unit 37, be used to control the output of voice operation demonstrator 36.
Sensor signal processor 31 is according to the voice signal, picture signal and the pressure signal that are provided by microphone 15, ccd video camera 16 and touch sensor 17 respectively, the instruction that the particular case around recognition machine people's device 1, the specific action that the user takes and user provide.Sensor signal processor 31 will indicate the identifying information of recognition result to output to emotion/instinct model unit 32 and behavior decision unit 33.
Specifically, sensor signal processor 31 comprises speech recognition device 31A.Under the control of behavior decision unit 33, speech recognition device 31A carries out speech recognition to the voice signal from microphone 15.Speech recognition device 31A gives emotion/instinct model unit 32 and behavior decision unit 33 with all order untill further notices like an elephant " walking ", " lying down ", " and then ball is walked " and so on of voice identification result.
Sensor signal processor 31 also comprises image recognition device 31B.This image recognition device 31B carries out image recognition to the picture signal from ccd video camera 16.For example, when image recognition device 31B detects " redness and circular object " or " vertical extension and the plane higher than predetermined altitude from ground ", image recognition device 31B is notified to emotion/instinct model unit 32 and behavior decision unit 33 with image recognition result, and this image recognition result may be indicated " ball is arranged " or " wall is arranged ".Sensor signal processor 31 is also discerned user's dumb show, and corresponding recognition result is notified to behavior decision unit 33.
Sensor signal processor 31 also comprises processor recognizer 31C.Processor recognizer 31C handles the pressure signal from touch sensor 17.When touch sensor 17 detects when being higher than pressure intended threshold level, the short duration, processor recognizer 31C recognition machine people device 1 " is beaten (or punishment) ".When touch sensor 17 detects when being lower than pressure intended threshold level, long duration, processor recognizer 31C is identified as " being stroked (or praise) ".Then, processor recognizer 31C presents recognition result to emotion/instinct model unit 32 and behavior decision unit 33.
The emotion model and the instinct model of the state that shows emotion of emotion/instinct model unit 32 handle machine people devices 1.Behavior decision unit 33 is according to the recognition result of sensor signal processor 31, the emotion/instinct status information of emotion/instinct model unit 32 and the next behavior that the decision of time is in the past taked.Behavior decision unit 33 is given attitude converting unit 34 with behavioural information as behavior command information then.
In response to the behavior command information from behavior decision unit 33, attitude converting unit 34 produces the attitude transitional information, and this information is used for making robot device 1 to be transformed into next attitude from current attitude.The attitude transitional information is given driving governor 35.In response to the attitude transitional information from attitude converting unit 34, driving governor 35 produces control signal, is used for driving actuator 3AA 1To 5A 1, and output a control signal to actuator 3AA respectively 1To 5A 2Actuator 3AA 1To 5A 1And 5A 2Be driven according to each control signal.So robot device 1 is with self-control mode work.
Robot device's 1 identification user's voice and dumb show determines its behavior thus.Fig. 4 shows the part of system shown in Figure 3, its decision robot device's behavior after identification user speech and dumb show.With reference to Fig. 4, show the microphone 15 and the speech recognition device 31A of identification user speech; The ccd video camera 16 and the image recognition device 31B of identification user's dumb show; With behavior decision unit 33.Based on the recognition result that speech recognition device 31A and image recognition device 31B provide, the next one action that behavior decision unit 33 decision robot devices 1 take.
Fig. 5 shows in detail speech recognition device 31A.User's voice is input to microphone 15, converts user's voice to the electricity voice signal at this.The electricity voice signal is given analog digital (AD) converter 51 among the speech recognition device 31A.AD converter 51 will be electricity voice signal sampling, the quantification of simulating signal and convert audio digital signals thus to.This audio digital signals is given feature extractor 52.
Feature extractor 52 extracts characteristic parameter every the frame of suitable number from the speech data from AD converter 51, right such as frequency spectrum, linear predictor coefficient, cepstrum coefficient and linear spectral.Then, this feature extractor 52 is given characteristic parameter impact damper 53 and matching unit 54 with characteristic parameter.The 53 interim storages of characteristic parameter impact damper are from the characteristic parameter of feature extractor 52.
Based on from the characteristic parameter of feature extractor 52 with from the characteristic parameter of characteristic parameter impact damper 53, matching unit 54 is in reference acoustic model database 55, dictionary database 56 and grammar database 57, and identification is input to the voice of microphone 15.
Sound model database 55 is to store the language that is identified acoustic characteristic such as the phoneme represented in the voice and the acoustic model of syllable.As an acoustic model, can adopt HMM (hidden-markov-model).Dictionary database 56 storages comprise the word dictionary of the pronunciation information (pronunciation information) of each word that will be identified.Grammar database 57 storage is described in the grammer how each word of registration in the dictionary database 56 is linked.This grammer can be context free grammar (CFG), or based on the rule (N grammer) of word chain probability.
Matching unit 54 connects the acoustic model that is stored in the acoustic model database 55 by the dictionary in the reference character dictionary database 56, produces neologism (word model).Matching unit 54 also by with reference to the grammer that is stored in the grammar database 57, connects several word models, and handles the word model that is connected based on characteristic parameter by continuous HMM method, thereby identification is input to the voice of microphone 15.So the voice identification result of matching unit 54 is exported with text.
When matching unit 54 needs to handle the voice of input again, use the characteristic parameter that is stored in the characteristic parameter impact damper 53.Use this method, there is no need to ask the user to speak once more.
Fig. 6 shows the inner structure of image recognition device 31B.The image that is picked up by ccd video camera 16 is input to the AD converter 61 among the image recognition device 31B.View data converts Digital Image Data to by AD converter 61, outputs to feature extractor 62 then.Feature extractor 62 extracts each feature from input image data, such as the edge and the variable density of the object in the image, thereby determine characteristic quantity, such as characteristic parameter or eigenvector.
The characteristic quantity that is extracted by feature extractor 62 outputs to face detector 63.Face detector 63 detects user's face from the input feature vector amount, and the result that will detect outputs to apart from finger 64.In the direction that the face of measuring the user is looked, the output of use face detector 63 measures user's distance apart from finger 64.Measurement result is output to behavior decision unit 33.
Distance to the user can be measured from the variation of facial size.For example, range observation can be used Henrv A.Rowley, the method that Shumeet Baluja and Takeo Kanade describe in paper " Neural Nerwork-Based Frace Detection (neural network that detects based on face) " IEEE pattern analysis and machine intelligenceization.
In this embodiment, facial size uses the wall scroll image signal line to measure.Perhaps, in order to mate relatively two picture signals (stereo-picture) on two signal line to measure user's distance.For example, the method of extracting three-dimensional information from stereo-picture is disclosed in the paper " Section 3.3.1 Point PatternMatching; Image Analysis Handbook (joint 3.3.1 Point Pattern Matching; graphical analysis handbook) ", this paper is by Takagi, Shimoda edits, and the Tokyo University publishes.
The characteristic quantity that feature extractor 62 extracts outputs to face detector 63 and matching unit 65.Matching unit 65 compares input feature vector amount and the pattern information that is stored in the mode standard database 66, and gives behavior decision unit 33 with comparative result.Be stored in data in the mode standard database 66 and comprise the data of the feature of dumb show view data and indication action pattern.Discern for dumb show, with reference to Japanese robot device's technical journal, Vol.17, No.7, pp.933-936,1999 by SeijiINOKUCHI showed, be entitled as " Gesture Recognition for Kansei Expression (being used for the dumb show identification that Kansei expresses) " paper.
The recognition result (measurement result) that recognition result that speech recognition device 31A provides and image recognition device 31B provide is input to behavior decision unit 33.Fig. 7 shows the inner structure of behavior decision unit 33.The voice identification result that speech recognition device 31A provides is input to the text resolution device 71 in the behavior decision unit 33.Text resolution device 71 carries out morphemic analysis and grammatical analysis based on the data that are stored in dictionary database 72 and the grammar database 73 to the input voice identification result.Text resolution device 71 extracts the meaning and the intention of input voice based on the content of dictionary in dictionary database 72.
Specifically, the parts of the speech information of dictionary database 72 storage words applications and grammer needs and the implication information of each word.Grammar database 73 is described the constraint that links in the word according to each the word information stores that is stored in the dictionary database 72.Use these data segments, text resolution device 71 is analyzed the input voice identification result.
The required data of grammar database 73 storage text resolutions, such as regular grammar, context free grammar, the statistics of word chain is set up, be used for semantic semantics such as HPSG (Head-driven PhraseStructure Grammar, driving head phrase structure grammar) of resolving with comprising.
The analysis result that text resolution device 71 provides outputs to keyword extractor 74.In response to the input analysis result, keyword extractor 74 is with reference to the data that are stored in the keyword database 75, and the user's of voice intention is sent in extraction.Extract the result and give behavior table reference unit 76.Keyword database 75 storage indication user views are such as the data of shouting and order and be used as keyword in searching keyword.More particularly, be stored as keyword data as the expression of voice messaging index in the follow-up phase behavior table reference unit 76 with corresponding to the word of this expression.
The recognition result that extraction result that behavior table reference unit 76 provides according to keyword extractor 74 and image recognition device 31B provide, with reference to the table that is stored in respectively in behavior table storage unit 77 and the behavior sorted table storage unit 78, decision robot device's 1 behavior.The table that is stored in the behavior table storage unit 77 is discussed now.Fig. 8 shows the behavior table that is stored in the behavior table storage unit 77.
Herein, image recognition result is divided into " calling ", " with pointing to ", " shaking hands ", " waving " and " coming to nothing ".According to every kind of image recognition result, may need or can not need supplemental information.In addition, also with reference to voice identification result.
For example, find that image recognition result is " calling ", need present user to reach user's information how far wherein, in other words, need measurement result.When the user calls, if voice identification result is represented the order of " coming ", the action of decision " walking close to the user ".When voice identification result is represented the order of " leaving ", the action of vast fixed " leaving ".Even when the user says " coming ", user's action is not walked close in decision always, and the back will be described this.
So the behavior table has been described single behavior, the behavior, is determined by user's dumb show (image recognition result), user's voice (voice identification result) with to three partial informations of user's distance (measurement result) for state of user according to the robot device.
Fig. 9 represents to be stored in the behavior sorted table in the behavior sorted table storage unit 78.The behavior sorted table is listed in the classification of listed behavior in the table of behavior shown in Fig. 8.Behavior in the table is divided into four listed among Fig. 9 classes: " with respect to the behavior of robot device position ", " with respect to the behavior of customer location ", " absolute behavior " and " other behavior ".
Behavior with respect to the robot device position comprises the behavior of determining according to the distance and bearing of robot device's current location.For example, when the user said " going " to the right, if the user is in the face-to-face position of aliging with robot device's left-hand side in its right side, in the face of robot device 1, robot device 1 was moved to the left from its position.
Behavior with respect to customer location comprises the behavior of determining according to the distance and bearing of user's current location.For example, when the user said " coming ", how far robot device 1 determined to move in user's 80cm scope, for example moves according to definite result is actual.
No matter absolute behavior is included in the behavior of determining under the situation of robot device 1 and user's current location.For example, when the user said " going west ", robot device 1 only westwards moved, because no matter direction is westwards determined in the position of current robot device oneself and active user's position.
Other behavior is neither to need azimuth information also not need the behavior of range information, for example comprises the voice that robot device 1 produces.
The decision of robot device 1 behavior is discussed now.Robot device 1 behavior is decided by user's voice and action.The identification of user speech is discussed referring now to process flow diagram shown in Figure 10.In step S1, then processed in the speech recognition process of speech recognition device 31A by the user speech that microphone 15 picks up.
In step S2, the text resolution device 71 that the voice identification result that is provided by speech recognition device 31A is input in the behavior decision unit 33 is used for text analyzing.In step S3, keyword extractor 74 operational analysis results carry out the keyword coupling.In step S4, determine whether to have extracted a keyword.When in step S4, determining to have extracted a keyword, handle proceeding to step S5.
The keyword that extracts in step S5 is taken as language message.When in step S4, determining not extract keyword, handle and forward step S6 to, and the information that does not have a keyword is by as language message.When step S5 or step S6 were done, in step S7, language message was output to behavior table reference unit 76.This processing was repeated in robot device 1 operating period.
When above-mentioned voice recognition processing is carried out, the also image of process user.Robot device 1 Flame Image Process is discussed referring now to process flow diagram shown in Figure 11.In step S11, the feature extractor 62 in image recognition device 31B extracts characteristic quantity from the image that ccd video camera 16 picks up.In step S12, determine whether the dumb show that is registered according to recognition result.Specifically, use from the characteristic quantity of feature extractor 62 outputs, matching unit 65 determines whether recognition result mates with any one the dumb show pattern information that is stored in the mode standard database 66.When finding this dumb show and a kind of dumb show pattern information coupling, handle forwarding step S13 to.
Judge in step S13 then whether the dumb show that dumb show determined and registration mates has supplemental information.For example, can be the user point to a direction with his finger to the dumb show with supplemental information, and in the case, and appearing at the information of object that the user points the direction of indication is supplemental information.When in step S13, determining that dumb show has its supplemental information, in step S14, detect this supplemental information.When in step S14, finishing the detection of supplemental information, handle forwarding step S15 to.
When in step S12, determining not register dumb show, or when in step S13, determining that this dumb show does not have related supplemental information, handle forwarding step S15 to.In step S15, behavioural information outputs to behavior table reference unit 76.
When handling when step S12 proceeds to step S15, behavioural information is the information that does not have dumb show, and in other words, the image recognition result indication does not determine robot device 1 to take the information of behavior.When handling when step S13 proceeds to step S15, behavioural information only contains the information relevant for dumb show.When process during from step S14 to step S15, behavioural information contains relevant for the information of dumb show and supplemental information.
Repeat this image recognition processing in robot device 1 operating period.Face detector 63 and the measurement result that provides apart from finger 64 can be provided supplemental information at step S13 as required.
The language message of the behavior table reference unit 76 use voice identification results in behavior decision unit 33 and the behavioural information decision robot device's 1 of image recognition result behavior.The operation of behavior table reference unit 76 is discussed referring now to Figure 12.In step S21, behavior table reference unit 76 receives from the language message of keyword extractor 74 with from the behavioural information of image recognition device 31B.In step S22, language message and behavioural information according to input, behavior table reference unit 76 is with reference to the behavior table and the behavior table sort table that is stored in the behavior sorted table storage unit 78, unique decision robot device's 1 behavior that are stored in the behavior table storage unit 77.
The decision of discussion behavior table reference unit 76 operation now.Based on table shown in Figure 8, determine operation.For example, when image recognition result (behavioural information) is the order of " calling " and voice identification result (language message) expression " coming ", set the user, leave the user and pay no attention to three kinds of behaviors of user.As user's " calling " and when telling that robot device 1 " comes ", robot device 1 selects user's action usually.Yet if robot device 1 always responds identical mode, the user may be weary of this response of the robot device 1.
Even when the user makes identical dumb show and says identical words, can different responses be arranged design robot device 1.Take anyly in three kinds of behaviors can determine in order, can determine, can determine with probable value with random fashion, can be by keyword decision or can be according to emotion decision at that time.
When carry out behavior when decision with probable value, the behavior of walking close to the user can have 50% probability, and the behavior of leaving the user can have 30% probability, and the behavior of paying no attention to the user can have 20% probability.
When the combination of carrying out behavior when decision according to keyword, can having adopting current action, current words, before moved and before talked about.For example, user when clapping his hand in action formerly, and calling is then said when " coming " orders in current action is designed to robot device 1 always to select user's behavior.When in the user formerly moves, beaing robot device 1, and in current action, call when then saying the order of " coming ", robot device 1 is designed to select to leave user's behavior.
Use the method, the combination of in the decision behavior, can adopt current action, current words, before having moved and before talked about.
When carrying out the behavior decision according to robot device 1 emotion, robot device 1 is with reference to the information in emotion/instinct model unit 32.For example, when user calling and when telling the robot device 1 that fearing that this way to the user, the robot device can walk close to the user.When the user called and tells angry robot device 1, robot device 1 can pay no attention to the user.
Use the method, according to language message and behavioural information, 76 decisions of behavior table reference unit are with reference to the behavior of behavior table.In step S23, behavior table reference unit 76 is told attitude converting unit 34 (referring to Figure 12) with the behavior of decision.Robot device 1 carries out scheduled operation according to follow-up process.
In the above-described embodiments, detect the direction that the user points indication, and detection is gone up the object of appearance over there as supplemental information.Perhaps, direction of seeing towards, eyes of user that can be by detecting user's face and the user direction of neck indication down detect supplemental information.
Except above-mentioned dumb show, mode standard database 66 can be stored various other dumb shows, to transmit intention and emotion, express "Yes" such as the head that shakes someone up and down, " head " expression of laterally shaking someone " is not " that the sign of triumph or peace is prayed, hail, or other various dumb shows.
When robot device's 1 identification user's voice, sound itself may be unclear (pronunciation dully), causes wrong identification.For example, the user may say " please get a book (book) " with unclear voice, and sensor signal processor 31 may " please be got a hook (hook) " with wrong speech and discern that sentence.Figure 13 is a processing flow chart of avoiding this wrong identification by the help of view data.
When the user spoke, his sound advanced robot device 1 by microphone 150 then, and inputed to speech recognition device 31A at step S31.At step S32, the voice of speech recognition device 31A identification input, thus, generation may be a plurality of word candidates that the user says.At step S33, carry out treatment step for most probable first candidate and second candidate.
At step S33, determine whether the mark difference between first candidate and second candidate drops within the predetermined threshold.When definite mark difference is outside predetermined threshold, in other words, when since first candidate away from second candidate, candidate during without any problem, is handled forwarding step S37 to as a candidate result.First candidate is confirmed to be correct result now.
When the mark difference between definite first candidate and second candidate dropped within the threshold value in step S33, in other words, when definite first candidate may be an error result, process proceeded to step S34.Handle a plurality of candidates then with balloon score.In step S36, carries out image recognition.In step S35, the image that picks up after image that picks up when the user is spoken or the image that had picked up before the user speaks or user speak carries out image recognition processing.
Use the image recognition result that in step S35, obtains then, replenish the voice recognition processing result.
As mentioned above, when the user said " please get a book ", first candidate was " please get a book ", and second candidate is " please get a hook ".If the mark difference between first candidate and second candidate drops in the predetermined threshold, be difficult to then determine which is correct.When image recognition result shows the picture that book is got in image, it is correct that first candidate of " please get a book " is determined.When image recognition result shows the picture that hook is got in image, it is correct that second candidate of " please get a hook " is determined.
So, replenish voice identification result, and confirm as correct result at step S37.When such voice identification result when being uncertain, use image recognition result to help to determine voice identification result.
In above-mentioned discussion, only compare first candidate and second candidate.Optionally, can compare first to the tenth candidate, to determine poor between them.
For example, user A and user B talk now.User A says " seeing this ".User B says " what that is ".Talk such in the daily life often exchanges.User A uses object of " this " indication, and user B uses " that " to indicate same object.Like this, demonstrative pronoun according to circumstances changes.
When user and robot device's 1 talk, same thing may take place.So robot device 1 needs the identification user what to be indicated with demonstrative pronoun.Figure 14 is the processing flow chart that robot device 1 determines the object of demonstrative pronoun.In step S41, the user speaks, and in step S42, carries out speech recognition, with the identification user's voice.
In step S43,, determine whether comprise demonstrative pronoun in user's the speech according to voice identification result.When determining not comprise demonstrative pronoun, in step S46, confirm that voice identification result is correct result.
When comprising demonstrative pronoun in the speech of in step S43, determining the user, handle and forward step S44 to, carry out image recognition.To the image that when the user speaks, picks up or point the image that picks up on the direction of indication the user and carry out image recognition.
In step S44, image is carried out image recognition, thereby in step S45, use image recognition result to determine the object of demonstrative pronoun.For example, the user says " getting that " to robot device 1 now, and the user for example points to the object of that object indication corresponding to " that " by the finger with him by dumb show then.
In step S42, in response to user's speech, robot device 1 carries out speech recognition, determines that then this speech comprises demonstrative pronoun " that ".The image that robot device 1 also picks up from the moment that the user speaks determines that the user has carried out the dumb show that his finger is pointed to a direction.
In step S44, robot device 1 determines user's indicated direction of demonstrative pronoun " that " indication, picks up the image of going up over there, and the image of being got is carried out image recognition.For example, when image recognition result showed that this object is " paper ", the object of being indicated by demonstrative pronoun " that " was found to be " paper ".When in step S45, determining the object of demonstrative pronoun, handle forwarding step S46 to this method.So confirm that at step S46 voice identification result is correct result.
So, by using the object of image information reliable recognition demonstrative pronoun.
When robot device's 1 captured image, a plurality of objects may appear in that image.Figure 15 determines in a plurality of objects which is the processing flow chart of user's object of indicating in his speech.In step S51, carry out the dumb show of discerning by ccd video camera 16 then by the user and give robot device 1.
When this dumb show was indicated specific direction, the image of scene on the direction of robot device's 1 needs identification user indication was to obtain supplemental information.So, pick up the image of scene on the direction of user's indication, and image carried out image recognition processing at step S52 image recognition device 31B.At step S53, image recognition result is used for having determined whether that a plurality of objects are included in this image.When in step S53, determining a plurality of objects not occur, promptly during an object, handle forwarding step S56 to.So export the image recognition result of this object.
When determining in this image, to comprise a plurality of object, handle and forward step S54 to, and carry out speech recognition at step S53.The voice that pick up when the user makes dumb show are carried out speech recognition.At step S55, the voice identification result of step S54 (voice messaging) is used for the supplemental image recognition result.This processing more specifically is discussed below.
For example, the user is when doing to refer to the dumb show of a predetermined direction, and the user says " getting a ball ".Robot device 1 response user's dumb show, and the identification user points to specific direction in its dumb show.Robot device 1 picks up the image on user's indicated direction, and image is carried out image recognition.When robot device 1 determined that a plurality of objects appear in this image, what is said or talked about when making dumb show carries out speech recognition for 1 couple of user of robot device.
When voice identification result shows that request " is got a ball ", determine that this ball is the object that the user wants most in a plurality of objects in the image.So image recognition result is replenished by voice messaging.When by voice messaging supplemental image recognition result, handle forwarding step S56 to.So output supplemental image recognition result.
Like this, can obtain correct image information by means of the unclear part of voice messaging by compensating images information.
Robot device according to the voice messaging action only moves on the direction that user speech is come, and according to moving on the direction of robot device in the residing scene of user of image information action.Robot device of the present invention determines the behavior that the user wishes with reference to the combination of voice messaging and image information, and correspondingly specifically moves.Robot device 1 behavior is classified as listed among Fig. 9, this also one is described.
By the identification user's voice, and detect user and robot device's 1 current location itself, come the vast behavior of taking surely.Specifically, when the user says " coming ", robot device's 1 this language of identification detects user's position then from image information.When user's behavior is walked close in decision, be determined to the distance and bearing of target location then.
With reference to Figure 16, the target location is set in user the place ahead apart from user 80cm.The characteristic quantity that face detector 63 uses the feature extractor 62 in image recognition device 31B (Fig. 6) to extract, identification user's face, and apart from the facial size of finger 64 with reference to the user, the distance between robot measurement device 1 and the user.Use tested distance, so robot device 1 determines to walk the target location that how far arrives apart from user the place ahead 80cm.
By the position of measuring the user and the position that explanation is measured in behavior, become more accurate in response to the robot device's 1 of user's dumb show behavior.
Use the word of the actual theory of user in the above-described embodiments.Robot device 1 behavior can decide in response to user's clapping (melody) or user's footsteps.
When the voice of user's generation comprised melody and sound, speech recognition device 31A can be configured to as shown in figure 17.The phonetic entry that microphone 15 picks up is carried out the analog digital conversion to AD converter 51.Numerical data is imported into melody/voice recognition unit 81 then.Melody/voice recognition unit 81 obtains melody and acoustic information.
The recognition result that melody/voice recognition unit 81 provides is given behavior decision unit 33.Figure 17 has saved the part of identification user speech, part promptly shown in Figure 5.Give feature extractor 52 (see figure 5)s from the audio digital signals of A/D converter 51 outputs, and give melody/voice recognition unit 81 (seeing Figure 17).
The recognition result that melody/voice recognition unit 81 provides is given behavior decision unit 33, but in the case, directly gives the behavior table reference unit 76 in the behavior decision unit 33, rather than give text resolution device (see figure 7).
The recognition methods of the melody/voice recognition unit 81 of speech recognition melody is discussed now.Melody/voice recognition unit 81 is by detecting idiophonic bat (clapping that comprises the user) or through chord change-detection bat, detecting melody.When the output expression detects the testing result of bat, what trifle, how many bats etc.
In following paper, write the detection method of melody: NEC, information and communication enineer association journal, the J77-D II, the paper that is entitled as " A Sound Source Separation System for PercussionInstruments (being used for idiophonic sound source piece-rate system) " that No.5 pp.901-991, Masataka GOTO and YoichiMURAOKA in 1994 are shown; NEC, information and communication enineer association journal, J81-D11, No.2 pp.227-237, the paper that is entitled as " A Real-Time Beat Tracking System for AudioSignals (the real-time beat tracker of sound signal) " that Masataka GOTO in 1998 and Yoichi MURAOKA are shown.Method disclosed herein also is used for the present invention.
What discuss below is robot device's 1 dancing determines the behavior of the melody recognition result decision that use melody/voice recognition unit 81 in unit 33 (behavior table reference unit 76) provides as behavior situation.Behavior table storage unit 77 storage behavior tables, as shown in figure 18.For example, when the melody recognition result showed that usefulness drops on the bat rate selection one double trifle of the scope of clapping in a minute 0 to 60, robot device 1 selected dancing A.When the melody recognition result show with the bat speed that drops on the scope of clapping in a minute 0 to 60 promptly do not select double trifle, also do not select triple trifles, when not having the quadruple trifle yet, robot device 1 selects dancing A.Like this, the type of dancing is well-determined by trifle and bat rate information.
In stage after behavior decision unit 33, according to the behavior with reference to the behavior table reference unit 76 that behavior determined in the behavior table storage unit 77 that is stored in the behavior decision unit 33, correspondingly the control robot device 1.
In above-mentioned discussion, from voice, obtain melodic information.Perhaps, can from user's dumb show, obtain melodic information.In order to obtain melody from dumb show, image recognition device 31B work shown in Figure 6.Can use to show and obtain the method for melodic information from dumb show in the paper at Seiji INOKUCHI, this paper is entitled as " Gesture Recognition for Kansei Expression (being used for the dumb show identification that Kansei expresses) " Japanese robot device and learns magazine, Vol.17, No.7.
Optionally, can obtain melody from voice and dumb show.
What discuss below is the behavior that determines robot device 1 by sound.The voice recognition result that melody/voice recognition unit 81 provides can be indicated the sound such as the footsteps or the class of screaming, and who or what sound source of sounding.For example, can according to the people who dislikes or the people who likes sound or according to what sound, expect different behaviors.
The recognition result that melody/voice recognition unit 81 provides outputs to behavior table reference unit 76.Behavior table reference unit 76 is with reference to the behavior table that is stored in the behavior table storage unit 76, thus the behavior of the recognition result of decision coupling sound import.Figure 19 has listed the behavior in response to sound that is stored in the behavior table storage unit 77.
With reference to behavior tabulation shown in Figure 19, behavior is by the unique decision of voice recognition result.For example, when voice recognition showed that robot device 1 hears the people's that robot device 1 likes footsteps, robot device 1 walked close to him happily.People who likes and the people's who dislikes information itself is determined according to talk between robot device 1 and the user and user's attitude by robot device 1.
Image information can be used with sound.For example, when robot device 1 heard sb's steps, robot device 1 can determine that who comes up from footsteps.If image is picked and identification, identify the people who walks close to.So robot device 1 determines that the people who walks close to is the people that likes of robot device 1 or the people who dislikes, any action is taked in decision then.
By comparing voice messaging and image information, robot device 1 can carry out different actions.In voice and the image recognition stage that the behavior decision is handled, robot device 1 carries out more accurate identification processing by making up these information.
Above-mentioned series of processing steps realizes that with hardware perhaps, above-mentioned series of processing steps can use software to realize.When realizing described series of processing steps by software, the program code that constitutes software is installed to the computing machine with its own specialized hardware or carries out multi-purpose general purpose personal computer from storage medium.
With reference to Figure 20, storage medium not only comprises encapsulation medium, and comprise ROM 112 or comprise the hard disk of storage unit 118, encapsulation medium can separate with computing machine and offers the user so that software program to be provided, such as disk 131 (such as floppy disk), CD 132 (such as CD-ROM (compact disc read-only memory)), magneto-optic disk 133 (such as MD (mini disk)) or semiconductor memory 134, ROM 112 or hard disk provide the software program of pre-installing on it in computing machine.
The treatment step that the software program that provides in the storage medium is provided does not need to carry out by the order of describing in each process flow diagram.Several treatment steps can walk abreast or separately carry out.
In this manual, system refers to the single entity that is made of a plurality of equipment.
According to the present invention, recognizing voice, recognition image, and at least a result behavior that decides robot device 1 to take in voice identification result or the image recognition result.Thereby carry out more accurate voice and image recognition.

Claims (12)

1, the messaging device that uses among a kind of robot device comprises:
The speech recognition apparatus is used for recognizing voice;
The image recognition apparatus is used for recognition image; With
Determination device, at least a result in the image recognition result that provides according to the voice identification result that provides by the speech recognition apparatus with by the image recognition apparatus, decision robot device's behavior.
2, messaging device as claimed in claim 1, also comprise: memory storage, storage are described the relation of the image recognition result that voice identification result that the speech recognition apparatus provides and image recognition apparatus provide and according to the table of voice identification result and the well-determined robot device's of image recognition result behavior.
3, messaging device as claimed in claim 1, wherein, to can't help speech recognition device unique when definite when speech recognition, and determination device can be according to by the well-determined image recognition result of image recognition apparatus, decision robot device's behavior.
4, messaging device as claimed in claim 1, wherein, when a plurality of objects appeared in the image-region of image recognition apparatus identification, determination device can be according to by the well-determined voice identification result of speech recognition apparatus, decision robot device's behavior.
5, messaging device as claimed in claim 1, wherein, the image recognition apparatus can be identified in user's finger, face, eyes and the image of the scene that occurs on the direction of one of the predetermined portions indication of the user in the neck down.
6, messaging device as claimed in claim 1 also comprises: memory storage, and the data that are used to store the dumb show of implementing by the user,
Image recognition apparatus identification user's image wherein detecting and the dumb show that is stored in the dumb show Data Matching in the memory storage, and is used as the dumb show that is detected as image recognition result.
7, messaging device as claimed in claim 1 also comprises: detector means is used to detect user's face; With
Apart from finger, be used for size according to user's face of detector means detection, measure the distance between user and the robot device,
Wherein determination device uses the distance decision robot device's who measures behavior.
8, messaging device as claimed in claim 1, wherein, the speech recognition apparatus detects the melody that is included in the background sound, and the melody that detects is treated as voice identification result.
9, messaging device as claimed in claim 1, wherein, speech recognition equipment is the detection of acoustic phenomenon from background sound, and the acoustic phenomenon that is detected is handled as voice identification result.
10, the information processing method of the messaging device that uses among a kind of robot device comprises:
Speech recognition steps, recognizing voice;
The image recognition step, recognition image; With
Deciding step, according at least one result in voice identification result that in speech recognition steps, provides and the image recognition result that in the image recognition step, provides, decision robot device's behavior.
11, a kind of software program of the messaging device that uses in the robot device comprises the program code that is used to carry out the following step:
Speech recognition steps is used for recognizing voice;
The image recognition step is used for recognition image; With
Robot device's behavior is provided according at least a result in voice identification result that provides in speech recognition steps and the image recognition result that provides in the image recognition step deciding step.
12, a kind of storage medium, the software program of the messaging device that uses among the storage robot device comprises the program code of carrying out the following step:
Speech recognition steps is used for recognizing voice;
The image recognition step is used for recognition image;
Robot device's behavior is provided according at least a result in voice identification result that provides in speech recognition steps and the image recognition result that provides in the image recognition step deciding step.
CNB001374982A 1999-12-28 2000-12-28 Information processing equiopment, information processing method and storage medium Expired - Fee Related CN1204543C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP37577399A JP2001188555A (en) 1999-12-28 1999-12-28 Device and method for information processing and recording medium
JP375773/1999 1999-12-28

Publications (2)

Publication Number Publication Date
CN1302056A true CN1302056A (en) 2001-07-04
CN1204543C CN1204543C (en) 2005-06-01

Family

ID=18506042

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB001374982A Expired - Fee Related CN1204543C (en) 1999-12-28 2000-12-28 Information processing equiopment, information processing method and storage medium

Country Status (4)

Country Link
US (1) US6509707B2 (en)
JP (1) JP2001188555A (en)
KR (1) KR20010062767A (en)
CN (1) CN1204543C (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1343115A1 (en) * 2001-08-23 2003-09-10 Sony Corporation Robot apparatus, face recognition method, and face recognition apparatus
CN1312576C (en) * 2003-07-03 2007-04-25 索尼株式会社 Speech communiction system and method, and robot apparatus
CN100351750C (en) * 2004-07-27 2007-11-28 索尼株式会社 Information-processing apparatus, information-processing method, recording medium, and program
CN102012740A (en) * 2010-11-15 2011-04-13 中国科学院深圳先进技术研究院 Man-machine interaction method and system
CN102141812A (en) * 2010-11-16 2011-08-03 深圳中科智酷机器人科技有限公司 Robot
CN102074232B (en) * 2009-11-25 2013-06-05 财团法人资讯工业策进会 Behavior identification system and identification method combined with audio and video
CN103257703A (en) * 2012-02-20 2013-08-21 联想(北京)有限公司 Augmented reality device and method
CN103578471A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Speech recognition method and electronic device thereof
WO2015176467A1 (en) * 2013-05-24 2015-11-26 文霞 Robot
CN105229726A (en) * 2013-05-07 2016-01-06 高通股份有限公司 For the adaptive audio frame process of keyword search
CN106095109A (en) * 2016-06-20 2016-11-09 华南理工大学 The method carrying out robot on-line teaching based on gesture and voice
CN106125925A (en) * 2016-06-20 2016-11-16 华南理工大学 Method is arrested based on gesture and voice-operated intelligence
CN107026940A (en) * 2017-05-18 2017-08-08 北京神州泰岳软件股份有限公司 A kind of method and apparatus for determining session feedback information
CN109358630A (en) * 2018-11-17 2019-02-19 国网山东省电力公司济宁供电公司 A kind of computer room crusing robot system
CN109961781A (en) * 2017-12-22 2019-07-02 深圳市优必选科技有限公司 Voice messaging method of reseptance, system and terminal device based on robot
CN109981970A (en) * 2017-12-28 2019-07-05 深圳市优必选科技有限公司 A kind of method, apparatus and robot of determining photographed scene
CN111429888A (en) * 2020-05-12 2020-07-17 珠海格力智能装备有限公司 Robot control method and device, storage medium and processor
CN112894831A (en) * 2021-04-21 2021-06-04 广东电网有限责任公司电力科学研究院 Double-arm robot insulated wire stripping system and method

Families Citing this family (128)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8352400B2 (en) 1991-12-23 2013-01-08 Hoffberg Steven M Adaptive pattern recognition based controller apparatus and method and human-factored interface therefore
US7966078B2 (en) 1999-02-01 2011-06-21 Steven Hoffberg Network media appliance system and method
US6616464B1 (en) * 1999-05-10 2003-09-09 Sony Corporation Robot device
EP1126409A4 (en) * 1999-05-10 2003-09-10 Sony Corp Image processing apparatus, robot apparatus and image processing method
US6983239B1 (en) * 2000-10-25 2006-01-03 International Business Machines Corporation Method and apparatus for embedding grammars in a natural language understanding (NLU) statistical parser
US20020137013A1 (en) * 2001-01-16 2002-09-26 Nichols Etta D. Self-contained, voice activated, interactive, verbal articulate toy figure for teaching a child a chosen second language
JP4143305B2 (en) * 2001-01-30 2008-09-03 日本電気株式会社 Robot device, verification environment determination method, and verification environment determination program
JP2002239256A (en) * 2001-02-14 2002-08-27 Sanyo Electric Co Ltd Emotion determination device in automatic response toy and automatic response toy
JP2002283261A (en) * 2001-03-27 2002-10-03 Sony Corp Robot device and its control method and storage medium
US6804396B2 (en) * 2001-03-28 2004-10-12 Honda Giken Kogyo Kabushiki Kaisha Gesture recognition system
US20030001908A1 (en) * 2001-06-29 2003-01-02 Koninklijke Philips Electronics N.V. Picture-in-picture repositioning and/or resizing based on speech and gesture control
JP4689107B2 (en) * 2001-08-22 2011-05-25 本田技研工業株式会社 Autonomous robot
KR100898435B1 (en) * 2001-10-22 2009-05-21 소니 가부시끼 가이샤 Robot apparatus and control method thereof
KR100446725B1 (en) * 2001-11-02 2004-09-01 엘지전자 주식회사 Behavior learning method of robot
WO2004027685A2 (en) * 2002-09-19 2004-04-01 The Penn State Research Foundation Prosody based audio/visual co-analysis for co-verbal gesture recognition
US7873448B2 (en) * 2002-12-10 2011-01-18 Honda Motor Co., Ltd. Robot navigation system avoiding obstacles and setting areas as movable according to circular distance from points on surface of obstacles
US7379560B2 (en) * 2003-03-05 2008-05-27 Intel Corporation Method and apparatus for monitoring human attention in dynamic power management
JP2004299033A (en) 2003-04-01 2004-10-28 Sony Corp Robot device, information processing method, and program
JP2004299025A (en) 2003-04-01 2004-10-28 Honda Motor Co Ltd Mobile robot control device, mobile robot control method and mobile robot control program
JP4311190B2 (en) * 2003-12-17 2009-08-12 株式会社デンソー In-vehicle device interface
US8442331B2 (en) * 2004-02-15 2013-05-14 Google Inc. Capturing text from rendered documents using supplemental information
US7707039B2 (en) * 2004-02-15 2010-04-27 Exbiblio B.V. Automatic modification of web pages
US10635723B2 (en) 2004-02-15 2020-04-28 Google Llc Search engines and systems with handheld document data capture devices
US20060053097A1 (en) * 2004-04-01 2006-03-09 King Martin T Searching and accessing documents on private networks for use with captures from rendered documents
US20060041605A1 (en) * 2004-04-01 2006-02-23 King Martin T Determining actions involving captured information and electronic content associated with rendered documents
US7812860B2 (en) 2004-04-01 2010-10-12 Exbiblio B.V. Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device
US20060122983A1 (en) * 2004-12-03 2006-06-08 King Martin T Locating electronic instances of documents based on rendered instances, document fragment digest generation, and digest based document fragment determination
US20060041484A1 (en) * 2004-04-01 2006-02-23 King Martin T Methods and systems for initiating application processes by data capture from rendered documents
US20060081714A1 (en) 2004-08-23 2006-04-20 King Martin T Portable scanning device
US7894670B2 (en) 2004-04-01 2011-02-22 Exbiblio B.V. Triggering actions in response to optically or acoustically capturing keywords from a rendered document
US20070300142A1 (en) * 2005-04-01 2007-12-27 King Martin T Contextual dynamic advertising based upon captured rendered text
US8081849B2 (en) * 2004-12-03 2011-12-20 Google Inc. Portable scanning and memory device
US20060098900A1 (en) * 2004-09-27 2006-05-11 King Martin T Secure data gathering from rendered documents
US7990556B2 (en) * 2004-12-03 2011-08-02 Google Inc. Association of a portable scanner with input/output and storage devices
US9008447B2 (en) * 2004-04-01 2015-04-14 Google Inc. Method and system for character recognition
US20080313172A1 (en) * 2004-12-03 2008-12-18 King Martin T Determining actions involving captured information and electronic content associated with rendered documents
US9116890B2 (en) 2004-04-01 2015-08-25 Google Inc. Triggering actions in response to optically or acoustically capturing keywords from a rendered document
US8146156B2 (en) 2004-04-01 2012-03-27 Google Inc. Archive of text captures from rendered documents
US9143638B2 (en) 2004-04-01 2015-09-22 Google Inc. Data capture from rendered documents using handheld device
JP4661074B2 (en) * 2004-04-07 2011-03-30 ソニー株式会社 Information processing system, information processing method, and robot apparatus
US8713418B2 (en) * 2004-04-12 2014-04-29 Google Inc. Adding value to a rendered document
US8874504B2 (en) * 2004-12-03 2014-10-28 Google Inc. Processing techniques for visual capture data from a rendered document
US8489624B2 (en) * 2004-05-17 2013-07-16 Google, Inc. Processing techniques for text capture from a rendered document
US8620083B2 (en) 2004-12-03 2013-12-31 Google Inc. Method and system for character recognition
JP2006015436A (en) * 2004-06-30 2006-01-19 Honda Motor Co Ltd Monitoring robot
US8346620B2 (en) 2004-07-19 2013-01-01 Google Inc. Automatic modification of web pages
JP4600736B2 (en) * 2004-07-22 2010-12-15 ソニー株式会社 Robot control apparatus and method, recording medium, and program
KR100723402B1 (en) * 2005-02-15 2007-05-30 삼성전자주식회사 Apparatus and method for recognizing gesture, and computer readable media for storing computer program
KR100741773B1 (en) 2005-02-23 2007-07-24 엘지전자 주식회사 Course designation method for mobile robot
JP4266211B2 (en) 2005-03-23 2009-05-20 株式会社東芝 Robot device, method of moving robot device, and program
KR100723404B1 (en) * 2005-03-29 2007-05-30 삼성전자주식회사 Apparatus and method for processing speech
KR20060127452A (en) * 2005-06-07 2006-12-13 엘지전자 주식회사 Apparatus and method to inform state of robot cleaner
US8935006B2 (en) * 2005-09-30 2015-01-13 Irobot Corporation Companion robot for personal interaction
JP4718987B2 (en) * 2005-12-12 2011-07-06 本田技研工業株式会社 Interface device and mobile robot equipped with the same
JP4940698B2 (en) * 2006-02-27 2012-05-30 トヨタ自動車株式会社 Autonomous mobile robot
JP2007257088A (en) * 2006-03-20 2007-10-04 Univ Of Electro-Communications Robot device and its communication method
KR100847136B1 (en) 2006-08-14 2008-07-18 한국전자통신연구원 Method and Apparatus for Shoulder-line detection and Gesture spotting detection
EP2067119A2 (en) 2006-09-08 2009-06-10 Exbiblio B.V. Optical scanners, such as hand-held optical scanners
US20100278453A1 (en) * 2006-09-15 2010-11-04 King Martin T Capture and display of annotations in paper and electronic documents
KR100822880B1 (en) * 2006-10-25 2008-04-17 한국전자통신연구원 User identification system through sound localization based audio-visual under robot environments and method thereof
JP4764377B2 (en) * 2007-05-09 2011-08-31 本田技研工業株式会社 Mobile robot
WO2009018988A2 (en) * 2007-08-03 2009-02-12 Ident Technology Ag Toy, particularly in the fashion of a doll or stuffed animal
WO2009027999A1 (en) * 2007-08-27 2009-03-05 Rao, Aparna External stimuli based reactive system
US20110145068A1 (en) * 2007-09-17 2011-06-16 King Martin T Associating rendered advertisements with digital content
US8638363B2 (en) * 2009-02-18 2014-01-28 Google Inc. Automatically capturing information, such as capturing information using a document-aware device
CN101411946B (en) * 2007-10-19 2012-03-28 鸿富锦精密工业(深圳)有限公司 Toy dinosaur
US10296874B1 (en) 2007-12-17 2019-05-21 American Express Travel Related Services Company, Inc. System and method for preventing unauthorized access to financial accounts
US8545283B2 (en) * 2008-02-20 2013-10-01 Ident Technology Ag Interactive doll or stuffed animal
TWI392983B (en) * 2008-10-06 2013-04-11 Sonix Technology Co Ltd Robot apparatus control system using a tone and robot apparatus
TW201019242A (en) * 2008-11-11 2010-05-16 Ind Tech Res Inst Personality-sensitive emotion representation system and method thereof
US8447066B2 (en) 2009-03-12 2013-05-21 Google Inc. Performing actions based on capturing information from rendered documents, such as documents under copyright
WO2010105246A2 (en) 2009-03-12 2010-09-16 Exbiblio B.V. Accessing resources based on capturing information from a rendered document
US9417700B2 (en) 2009-05-21 2016-08-16 Edge3 Technologies Gesture recognition systems and related methods
US8507781B2 (en) * 2009-06-11 2013-08-13 Harman International Industries Canada Limited Rhythm recognition from an audio signal
IL200921A (en) * 2009-09-14 2016-05-31 Israel Aerospace Ind Ltd Infantry robotic porter system and methods useful in conjunction therewith
KR20110036385A (en) * 2009-10-01 2011-04-07 삼성전자주식회사 Apparatus for analyzing intention of user and method thereof
KR20110055062A (en) * 2009-11-19 2011-05-25 삼성전자주식회사 Robot system and method for controlling the same
US9081799B2 (en) * 2009-12-04 2015-07-14 Google Inc. Using gestalt information to identify locations in printed information
US9323784B2 (en) * 2009-12-09 2016-04-26 Google Inc. Image search using text-based elements within the contents of images
US8396252B2 (en) 2010-05-20 2013-03-12 Edge 3 Technologies Systems and related methods for three dimensional gesture recognition in vehicles
US8296151B2 (en) * 2010-06-18 2012-10-23 Microsoft Corporation Compound gesture-speech commands
FR2963132A1 (en) * 2010-07-23 2012-01-27 Aldebaran Robotics HUMANOID ROBOT HAVING A NATURAL DIALOGUE INTERFACE, METHOD OF USING AND PROGRAMMING THE SAME
WO2012030872A1 (en) 2010-09-02 2012-03-08 Edge3 Technologies Inc. Method and apparatus for confusion learning
US8582866B2 (en) 2011-02-10 2013-11-12 Edge 3 Technologies, Inc. Method and apparatus for disparity computation in stereo images
US8655093B2 (en) 2010-09-02 2014-02-18 Edge 3 Technologies, Inc. Method and apparatus for performing segmentation of an image
US8666144B2 (en) 2010-09-02 2014-03-04 Edge 3 Technologies, Inc. Method and apparatus for determining disparity of texture
US9431027B2 (en) * 2011-01-26 2016-08-30 Honda Motor Co., Ltd. Synchronized gesture and speech production for humanoid robots using random numbers
US8970589B2 (en) 2011-02-10 2015-03-03 Edge 3 Technologies, Inc. Near-touch interaction with a stereo camera grid structured tessellations
KR101842459B1 (en) 2011-04-12 2018-05-14 엘지전자 주식회사 Robot cleaner and method for controlling the same
JP2011193483A (en) * 2011-04-14 2011-09-29 Toshiba Corp Television receiver and method of receiving television broadcasting
US8136724B1 (en) * 2011-06-24 2012-03-20 American Express Travel Related Services Company, Inc. Systems and methods for gesture-based interaction with computer systems
US8714439B2 (en) 2011-08-22 2014-05-06 American Express Travel Related Services Company, Inc. Methods and systems for contactless payments at a merchant
US9672609B1 (en) 2011-11-11 2017-06-06 Edge 3 Technologies, Inc. Method and apparatus for improved depth-map estimation
JP5838781B2 (en) * 2011-12-20 2016-01-06 富士通株式会社 Compound word reading display method and program, and reading generation apparatus
WO2013136118A1 (en) * 2012-03-14 2013-09-19 Nokia Corporation Spatial audio signal filtering
US8924011B2 (en) * 2012-04-03 2014-12-30 Knu-Industry Cooperation Foundation Intelligent robot apparatus responsive to environmental change and method of controlling and reconfiguring intelligent robot apparatus
JP6044819B2 (en) * 2012-05-30 2016-12-14 日本電気株式会社 Information processing system, information processing method, communication terminal, information processing apparatus, control method thereof, and control program
DE102012105608A1 (en) * 2012-06-27 2014-01-02 Miele & Cie. Kg Self-propelled cleaning device and method for operating a self-propelled cleaning device
US9761228B2 (en) * 2013-02-25 2017-09-12 Mitsubishi Electric Corporation Voice recognition system and voice recognition device
US10721448B2 (en) 2013-03-15 2020-07-21 Edge 3 Technologies, Inc. Method and apparatus for adaptive exposure bracketing, segmentation and scene organization
US9666194B2 (en) * 2013-06-07 2017-05-30 Flashbox Media, LLC Recording and entertainment system
US11138971B2 (en) 2013-12-05 2021-10-05 Lenovo (Singapore) Pte. Ltd. Using context to interpret natural language speech recognition commands
CN104715753B (en) * 2013-12-12 2018-08-31 联想(北京)有限公司 A kind of method and electronic equipment of data processing
US10276154B2 (en) * 2014-04-23 2019-04-30 Lenovo (Singapore) Pte. Ltd. Processing natural language user inputs using context data
WO2015195765A1 (en) 2014-06-17 2015-12-23 Nant Vision, Inc. Activity recognition systems and methods
CN105881535A (en) * 2015-02-13 2016-08-24 鸿富锦精密工业(深圳)有限公司 Robot capable of dancing with musical tempo
JP6551507B2 (en) * 2015-02-17 2019-07-31 日本電気株式会社 Robot control device, robot, robot control method and program
US9769367B2 (en) 2015-08-07 2017-09-19 Google Inc. Speech and computer vision-based control
US9836484B1 (en) 2015-12-30 2017-12-05 Google Llc Systems and methods that leverage deep learning to selectively store images at a mobile image capture device
US10225511B1 (en) 2015-12-30 2019-03-05 Google Llc Low power framework for controlling image sensor mode in a mobile image capture device
US9838641B1 (en) 2015-12-30 2017-12-05 Google Llc Low power framework for processing, compressing, and transmitting images at a mobile image capture device
US9836819B1 (en) 2015-12-30 2017-12-05 Google Llc Systems and methods for selective retention and editing of images captured by mobile image capture device
US10732809B2 (en) 2015-12-30 2020-08-04 Google Llc Systems and methods for selective retention and editing of images captured by mobile image capture device
EP3403146A4 (en) 2016-01-15 2019-08-21 iRobot Corporation Autonomous monitoring robot systems
US20170282383A1 (en) * 2016-04-04 2017-10-05 Sphero, Inc. System for content recognition and response action
JP2017205313A (en) * 2016-05-19 2017-11-24 パナソニックIpマネジメント株式会社 robot
JP6751536B2 (en) * 2017-03-08 2020-09-09 パナソニック株式会社 Equipment, robots, methods, and programs
JP6833601B2 (en) * 2017-04-19 2021-02-24 パナソニック株式会社 Interaction devices, interaction methods, interaction programs and robots
US10100968B1 (en) 2017-06-12 2018-10-16 Irobot Corporation Mast systems for autonomous mobile robots
JP6841167B2 (en) * 2017-06-14 2021-03-10 トヨタ自動車株式会社 Communication devices, communication robots and communication control programs
JP1622873S (en) * 2017-12-29 2019-01-28 robot
JP2019185360A (en) * 2018-04-09 2019-10-24 富士ゼロックス株式会社 Image processing device and program
JP2020034461A (en) * 2018-08-30 2020-03-05 Zホールディングス株式会社 Provision device, provision method, and provision program
US11110595B2 (en) 2018-12-11 2021-09-07 Irobot Corporation Mast systems for autonomous mobile robots
WO2020184733A1 (en) 2019-03-08 2020-09-17 엘지전자 주식회사 Robot
KR20210067539A (en) * 2019-11-29 2021-06-08 엘지전자 주식회사 Information processing method and apparatus therefor
US11731271B2 (en) * 2020-06-30 2023-08-22 Microsoft Technology Licensing, Llc Verbal-based focus-of-attention task model encoder
JP7272521B2 (en) * 2021-05-24 2023-05-12 三菱電機株式会社 ROBOT TEACHING DEVICE, ROBOT CONTROL SYSTEM, ROBOT TEACHING METHOD, AND ROBOT TEACHING PROGRAM

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6332624A (en) * 1986-07-28 1988-02-12 Canon Inc Information processor
JP3159242B2 (en) * 1997-03-13 2001-04-23 日本電気株式会社 Emotion generating apparatus and method
JPH10289006A (en) * 1997-04-11 1998-10-27 Yamaha Motor Co Ltd Method for controlling object to be controlled using artificial emotion

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1343115A1 (en) * 2001-08-23 2003-09-10 Sony Corporation Robot apparatus, face recognition method, and face recognition apparatus
EP1343115A4 (en) * 2001-08-23 2005-03-09 Sony Corp Robot apparatus, face recognition method, and face recognition apparatus
US7369686B2 (en) 2001-08-23 2008-05-06 Sony Corporation Robot apparatus, face recognition method, and face recognition apparatus
CN1312576C (en) * 2003-07-03 2007-04-25 索尼株式会社 Speech communiction system and method, and robot apparatus
CN100351750C (en) * 2004-07-27 2007-11-28 索尼株式会社 Information-processing apparatus, information-processing method, recording medium, and program
CN102074232B (en) * 2009-11-25 2013-06-05 财团法人资讯工业策进会 Behavior identification system and identification method combined with audio and video
CN102012740A (en) * 2010-11-15 2011-04-13 中国科学院深圳先进技术研究院 Man-machine interaction method and system
CN102141812A (en) * 2010-11-16 2011-08-03 深圳中科智酷机器人科技有限公司 Robot
CN103257703B (en) * 2012-02-20 2016-03-30 联想(北京)有限公司 A kind of augmented reality device and method
CN103257703A (en) * 2012-02-20 2013-08-21 联想(北京)有限公司 Augmented reality device and method
CN105229726A (en) * 2013-05-07 2016-01-06 高通股份有限公司 For the adaptive audio frame process of keyword search
CN105229726B (en) * 2013-05-07 2019-04-02 高通股份有限公司 Adaptive audio frame for keyword search is handled
WO2015176467A1 (en) * 2013-05-24 2015-11-26 文霞 Robot
CN103578471B (en) * 2013-10-18 2017-03-01 威盛电子股份有限公司 Speech identifying method and its electronic installation
CN103578471A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Speech recognition method and electronic device thereof
CN106125925B (en) * 2016-06-20 2019-05-14 华南理工大学 Intelligence based on gesture and voice control arrests method
CN106095109A (en) * 2016-06-20 2016-11-09 华南理工大学 The method carrying out robot on-line teaching based on gesture and voice
CN106125925A (en) * 2016-06-20 2016-11-16 华南理工大学 Method is arrested based on gesture and voice-operated intelligence
CN106095109B (en) * 2016-06-20 2019-05-14 华南理工大学 The method for carrying out robot on-line teaching based on gesture and voice
CN107026940A (en) * 2017-05-18 2017-08-08 北京神州泰岳软件股份有限公司 A kind of method and apparatus for determining session feedback information
CN109961781A (en) * 2017-12-22 2019-07-02 深圳市优必选科技有限公司 Voice messaging method of reseptance, system and terminal device based on robot
CN109961781B (en) * 2017-12-22 2021-08-27 深圳市优必选科技有限公司 Robot-based voice information receiving method and system and terminal equipment
CN109981970A (en) * 2017-12-28 2019-07-05 深圳市优必选科技有限公司 A kind of method, apparatus and robot of determining photographed scene
CN109358630A (en) * 2018-11-17 2019-02-19 国网山东省电力公司济宁供电公司 A kind of computer room crusing robot system
CN111429888A (en) * 2020-05-12 2020-07-17 珠海格力智能装备有限公司 Robot control method and device, storage medium and processor
CN112894831A (en) * 2021-04-21 2021-06-04 广东电网有限责任公司电力科学研究院 Double-arm robot insulated wire stripping system and method

Also Published As

Publication number Publication date
US6509707B2 (en) 2003-01-21
US20010020837A1 (en) 2001-09-13
JP2001188555A (en) 2001-07-10
CN1204543C (en) 2005-06-01
KR20010062767A (en) 2001-07-07

Similar Documents

Publication Publication Date Title
CN1204543C (en) Information processing equiopment, information processing method and storage medium
CN1199149C (en) Dialogue processing equipment, method and recording medium
CN1236422C (en) Obot device, character recognizing apparatus and character reading method, and control program and recording medium
US11908468B2 (en) Dialog management for multiple users
AU2018204246B2 (en) Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method
US20190279642A1 (en) System and method for speech understanding via integrated audio and visual based speech recognition
CN1132149C (en) Game apparatus, voice selection apparatus, voice recognition apparatus and voice response apparatus
US20190371318A1 (en) System and method for adaptive detection of spoken language via multiple speech models
CN1894740A (en) Information processing system, information processing method, and information processing program
CN101030370A (en) Speech communication system and method, and robot apparatus
CN1488134A (en) Device and method for voice recognition
CN1461463A (en) Voice synthesis device
CN1573924A (en) Speech recognition apparatus, speech recognition method, conversation control apparatus, conversation control method
CN1220174C (en) Speech output apparatus
US20220101856A1 (en) System and method for disambiguating a source of sound based on detected lip movement
CN114051639A (en) Emotion detection using speaker baseline
CN1221936C (en) Word sequence outputting device
KR20230076733A (en) English speaking teaching method using interactive artificial intelligence avatar based on emotion and memory, device and system therefor
CN1461464A (en) Language processor
CN110364164B (en) Dialogue control device, dialogue system, dialogue control method, and storage medium
CN116661603A (en) Multi-mode fusion user intention recognition method under complex man-machine interaction scene
JP2020204711A (en) Registration system

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee