WO2010126321A2 - 멀티 모달 정보를 이용하는 사용자 의도 추론 장치 및 방법 - Google Patents
멀티 모달 정보를 이용하는 사용자 의도 추론 장치 및 방법 Download PDFInfo
- Publication number
- WO2010126321A2 WO2010126321A2 PCT/KR2010/002723 KR2010002723W WO2010126321A2 WO 2010126321 A2 WO2010126321 A2 WO 2010126321A2 KR 2010002723 W KR2010002723 W KR 2010002723W WO 2010126321 A2 WO2010126321 A2 WO 2010126321A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user intention
- user
- intention
- modal
- predicted
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000005236 sound signal Effects 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 8
- 238000007664 blowing Methods 0.000 claims description 5
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 238000001514 detection method Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 9
- 210000000707 wrist Anatomy 0.000 description 8
- 230000004927 fusion Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012351 Integrated analysis Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000004247 hand Anatomy 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 208000037656 Respiratory Sounds Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0346—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
Definitions
- One or more aspects relate to a system using multi-modal information, and more particularly, to an apparatus and method for processing user input using multi-modal information.
- Multi-modal interface means a method of interface using voice, keyboard, pen, etc. for communication between human and machine.
- the method of analyzing the user intention is a method of fusing and analyzing the multi-modal input at the signal level, and analyzing the respective modality input information, and then analyzing the result at the semantic level.
- the fusion method at the signal level fusions and analyzes and classifies multi-modal input signals at once.
- the fusion method may be suitably used for signal processing simultaneously occurring, such as voice signals and lip movements.
- the feature space is very large, and the model for calculating the correlation between signals is very complicated and the learning amount is high.
- scalability as in the case of combining with other modalities or applying to other terminals is not easy.
- the method of fusing each modality at the semantic level analyzes the meaning of each modality input signal and then fuses the analysis result.
- the independence between modalities can be maintained to facilitate learning and expansion.
- the reason for the user's multi-modal input is that there is an association between modalities, which is difficult to find when analyzing meaning individually.
- An apparatus and method are provided that can efficiently and accurately infer user intention by predicting user intention by motion information and inferring the predicted user intention using multi-modal input information.
- an apparatus for inducing user intention may include a first predictor configured to predict a portion of user intention using at least one motion information, and a portion of the predicted user intention and multi-modal information input from at least one multi-modal sensor. It includes a second prediction unit for predicting the user intention using.
- a method of inferring user intention may include receiving at least one motion information, predicting a part of the user intention using the received motion information, and multimodal information input from at least one multi-modal sensor. And receiving the predicted user intention using a part of the predicted user intention and the multi-modal information.
- the user motion recognition predicts a part of the user intention, analyzes the multi-modal information according to the predicted part of the user intention, and predicts the user intention secondarily, thereby maintaining the independence between the modalities and the association between the modalities. It is easy to grasp and infer user intention accurately.
- the user can infer the user's inference apparatus without learning a special voice input method.
- Voice can be input.
- FIG. 1 is a diagram illustrating a configuration of a user intention reasoning apparatus according to an exemplary embodiment.
- FIG. 2 is a diagram illustrating an example of a configuration of a user intention predictor of FIG. 1.
- FIG. 3 is a diagram illustrating an exemplary operation of the user intention predictor of FIG. 2.
- FIG. 4 is a diagram illustrating an example of an operation of predicting a user's intention by receiving an additional multimodal input after a part of the user's intention is predicted.
- FIG. 5 illustrates another example of an operation of predicting a user's intention by receiving an additional multimodal input after a part of the user's intention is predicted.
- FIG. 6 is a diagram illustrating an example of a configuration of classifying a signal by combining an audio signal and a video signal.
- FIG. 7 is a diagram illustrating a user intention reasoning method using multi-modal information according to an exemplary embodiment.
- an apparatus for inducing user intention may include a first predictor configured to predict a portion of user intention using at least one motion information, and a portion of the predicted user intention and multi-modal information input from at least one multi-modal sensor. It includes a second prediction unit for predicting the user intention using.
- the first predictor may generate a control signal for executing an operation performed in the process of predicting the user intention by using a part of the predicted user intention.
- the control signal for executing the operation performed in the process of predicting the user intention may be a control signal for controlling the operation of the multi-modal sensor controlled by the user intention reasoning apparatus.
- the secondary predictor may interpret the multi-modal information input from the multi-modal sensor to predict the user intention to be associated with a part of the predicted user intention.
- the secondary predictor may predict the user intention by interpreting the input voice in association with the object selection.
- the second prediction unit may predict the user intention using the multi-modal information input from the at least one multi-modal sensor within a range of the part of the predicted user intention.
- the second predictor detects an acoustic signal, extracts and analyzes a feature with respect to the sensed acoustic signal, and predicts the user's intention.
- the second prediction unit may determine whether a voice section is detected from the sound signal, and when the voice section is detected, predict the user intention as the voice command intention.
- the second predictor may predict the user's intention by blowing a breath sound in the acoustic signal.
- the second predictor may predict the user intention as at least one of deletion, classification, and alignment of the selected object by using the multi-modal information.
- the apparatus may further include a user intention application unit configured to control software or hardware controlled by the user intention inference apparatus using the user intention prediction result.
- a method of inferring user intention may include receiving at least one motion information, predicting a part of the user intention using the received motion information, and multimodal information input from at least one multi-modal sensor. And receiving the predicted user intention using a part of the predicted user intention and the multi-modal information.
- FIG. 1 is a diagram illustrating a configuration of a user intention reasoning apparatus according to an exemplary embodiment.
- the user intention reasoning apparatus 100 includes a motion sensor 110, a controller 120, and a multi-modal sensing unit 130.
- the user intention inference device 100 includes a cellular phone, a personal digital assistane (PDA), a digital camera, a portable game console, an MP3 player, a portable / personal multimedia player (PMP), a handheld e-book, a portable laptop PC, a GPS (global) positioning system) and any type of device or system, such as navigation and desktop PCs, high definition televison (HDTV), optical disc players, set-top boxes, and the like.
- the user intention inference apparatus 100 may further include various components according to an implementation example, such as components for a multi-modal interface such as a user interface unit, a display unit, a sound output unit, and the like.
- the motion sensor 110 may include an inertial sensor, a geomagnetic sensor for detecting a direction, an acceleration sensor or a gyro sensor for detecting a motion, and the like to detect motion information.
- the motion sensor 110 may include an image sensor, an acoustic sensor, and the like.
- a plurality of motion sensors may be attached to a part of the user's body and the user intention reasoning apparatus 100 to detect motion information.
- the multi-modal sensing unit 130 may include at least one multi-modal sensor 132, 134, 136, and 138.
- the acoustic sensor 132 is a sensor for detecting an acoustic signal
- the image sensor 134 is a sensor for detecting image information
- the biometric information sensor 136 detects biometric information such as body temperature
- the touch sensor 138 is a touch.
- the touch gesture on the pad may be sensed, and other various types or types of multi-modal sensors may be included.
- the multi-modal sensing unit 130 illustrates that four sensors are included in the multi-modal sensing unit 130, but the number is not limited thereto.
- the type and range of the sensor included in the multi-modal sensing unit 130 may be wider than the type and range of the sensor included in the motion sensor 110 for the purpose of motion detection.
- the motion sensor 110 and the multi-modal sensing unit 130 are illustrated as being separately present in FIG. 1, they may be integrated.
- the same kind of sensor for example, an image sensor and an acoustic sensor, may be included in the sensor included in the motion sensor 110 and the multi-modal sensing unit 130.
- the multi-modal detection unit 130 may include a module for extracting feature values according to the type of the multi-modal information detected by each of the multi-modal sensors 132, 134, 136, and 138 to analyze the meaning. .
- Components for analyzing the multi-modal information may be included in the controller 120.
- the controller 120 may include an application, data, and an operating system for controlling the operation of each component of the user intention reasoning apparatus 100.
- the controller 120 includes a user intention predictor 122 and a user intention applicator 124.
- the user intention predictor 122 receives at least one motion information detected from the motion sensor 110, and primarily predicts a part of the user intention using the received motion information.
- the user intention predictor 122 may secondarily predict the user intention using a part of the predicted user intention and the multi-modal information input from the at least one multi-modal sensor. That is, when the user intention predictor 122 secondarily predicts the user intention, the user intention predictor 122 finally uses the motion information detected from the motion sensor 110 and the multi-modal information input from the multi-modal sensing unit 130 to finally determine the user intention. It can be predicted.
- the user intention predictor 122 may use various known inference models for inferring the user's intention.
- the user intention predictor 122 may generate a control signal for executing an operation performed in the process of predicting the user intention secondary by using a part of the user intentionally predicted.
- the control signal for executing the operation performed in the user intention inference process may be a control signal for controlling the operation of the multi-modal sensing unit 130 controlled by the user intention inference apparatus 100.
- the motion information may be used to activate some sensor operations associated with a part of the first predicted user intention among the sensors of the multi-modal sensing unit 130 based on the part of the first predicted user intention.
- power consumption used for the sensor operation may be reduced as compared with the case of activating all the sensors of the multi-modal sensing unit 130.
- accurate user intention can be inferred while simplifying the interpretation of the multi-modal input information while reducing the complexity of the user intention prediction process.
- the user intention predictor 122 may include a module (not shown) that extracts and analyzes features according to types of multi-modal information in order to predict user intention secondarily.
- the user intention predictor 122 may interpret the multi-modal information input from the multi-modal sensing unit 130 to be associated with a part of the user's intention predicted primarily.
- the user intention predictor 122 when a part of the user intention primarily predicted by the user intention predictor 122 is determined by selection of an object displayed on the display screen, when the voice is input from the multi-modal sensing unit 130, the input voice is input. Can be secondarily predicted by interpreting in conjunction with object selection.
- the user intention predictor The user's intention may be interpreted to mean "arrange the object selected on the display screen in the order of date”.
- the user intention predictor 122 may predict the secondary user intention as at least one of deleting, classifying, and sorting using the multi-modal information. Can be.
- the user intention application unit 124 may control software or hardware controlled by the user intention inference apparatus using the user intention prediction result.
- the user intention applying unit 124 may provide a multi-modal interface for interacting with the predicted user intention. For example, if a user's intention is predicted as a voice command, you can run an application or search application that performs voice recognition to understand the meaning in the voice command and automatically connects the phone to a specific person based on the recognition result. If the intention is to transfer the object selected by the user, the email application can be executed. As another example, when the user intention is predicted to be humming, an application for searching for music similar to the humming sound source may be driven. As another example, when the user intention is predicted to be blow, the avatar may be used as a command for executing a specific action in the game application.
- Multi-modal information can be interpreted in relation to a part of the user's intentionally predicted, while maintaining the accuracy of the intention.
- FIG. 2 is a diagram illustrating an example of a configuration of a user intention predictor of FIG. 1.
- the user intention predictor 122 may include a motion information analyzer 210, a first predictor 220, and a second predictor 230.
- the motion information analyzer 210 analyzes one or more motion information received from the motion sensor 110.
- the motion information analyzer 210 may measure location information and angle information of each part of the user's body to which the motion sensor 110 is attached, and the motion sensor 110 may use the measured location information and angle information. Location information and angle information of each part of the user's body that is not attached may also be calculated.
- the distance between the sensor and the sensor is measured, and each sensor can obtain three-dimensional rotation angle information about the reference coordinate system. Therefore, the distance between the wrist part and the head part and the rotation angle information of the wrist may be calculated from the motion information to calculate the distance between the wrist and the mouth part of the face and the rotation angle information of the wrist. Assuming that a user is holding a microphone corresponding to the acoustic sensor 132 of the user intention reasoning apparatus 100 in the hand, the distance between the mouths of the microphones and the direction of the microphone can be calculated.
- the motion information analyzer 210 may calculate the distance between the wrist and the mouth of the face and the rotation angle information of the microphone.
- an image sensor may be included in the motion sensor 110 to input image information to the motion information analyzer 210.
- the motion information analyzer 210 may recognize an object such as a face or a hand in the image and calculate a positional relationship between the objects. For example, the motion information analyzer 210 may calculate a distance and an angle between a face and two hands, a distance and an angle between two hands, and the like.
- the primary predictor 220 predicts a part of the user intention triggered by the motion information analysis. For example, the primary predictor 220 may predict whether the motion primarily selects an object on the screen through analysis of motion information including an image.
- the second prediction unit 230 predicts the user intention by using a part of the user intention predicted by the first prediction unit 220 and the multi-modal information input from the multi-modal sensing unit 130.
- the second prediction unit 230 may interpret the multi-modal information input from the multi-modal sensor to be associated with a part of the first predicted user intention in order to predict the user intention. For example, when a part of the first predicted user intention is a selection of an object displayed on the display screen, and the second predictor 230 receives a voice from the multi-modal detection unit 130, the input voice is selected from the object selection. By correlating and interpreting, the user's intention can be predicted secondarily.
- the first predictor 220 predicts that a part of the first predicted user's intention is to bring the microphone into the mouth
- the multimodal sensor 130 uses an image sensor 134 such as a camera.
- the secondary predictor 230 may predict the user's intention as a voice command input.
- the user predictor 124 detects a voice section from the sound signal of the second predictor 230 and performs semantic analysis through feature extraction and analysis on the detected voice section. It can be made available.
- the first prediction unit 220 firstly predicts that the microphone is brought to the mouth as a part of the user's intention, and the multimodal detection unit 130 uses the image sensor 134 such as a camera to make the lips
- the second prediction unit 230 may predict the user's intention as blow.
- the user's intentions are different: “Hold microphone into mouth and input voice command” and “Hold microphone into mouth.”
- some of the two user intentions are common to "take the microphone to the mouth,” and the first predictor 220 may first predict a portion of the user intention to narrow the scope of the user intention.
- the secondary predictor 230 may predict the user intention in consideration of multi-modal information.
- the difference predictor 230 may determine whether the user intention is "voice command input” or "blowing” in consideration of the sensed multi-modal information.
- FIG. 3 is a diagram illustrating an exemplary operation of the user intention predictor of FIG. 2.
- the primary predictor 220 may predict a part of the user's intention using the motion information analyzed by the motion information analyzer 210.
- the second prediction unit 230 receives a multi-modal signal such as an image detected by the image sensor 134 of the multi-modal detection unit 130 or an acoustic signal detected from the sound sensor 132, and the voice is detected. Information about whether or not the user can be generated to predict the intention of the user.
- the motion information analyzer 210 calculates a distance between a user's mouth and a hand holding a microphone using motion information detected from a motion sensor mounted on a user's head and wrist (310).
- the motion information analyzer 210 calculates the direction of the microphone from the rotation angle of the wrist (320).
- the first predictor 220 predicts a part of the user's intention by predicting whether the user moves the microphone to the mouth using the distance and direction information calculated by the motion information analyzer 210 (330). For example, when the first predictor 220 determines that the position of the user holding the user's mouth and the microphone is within a 20 cm radius around the mouth, and the microphone direction is toward the mouth, the user attempts to bring the microphone into the mouth. It can be predicted.
- the second prediction unit 230 analyzes the multimodal input signals input from the acoustic sensor 132 such as a microphone and the image sensor 134 such as a camera, and is it intended to be a voice command or an intention such as a hum or blowing. Etc., the user's intention can be predicted.
- the second prediction unit 230 predicts a part of the user's intention, that is, the first prediction brings the microphone to the mouth, when the movement of the lips is detected from the camera, and the voice is detected from the acoustic signal detected by the microphone, the user's intention is determined.
- the voice command may be determined as the intention (340).
- the first prediction is to bring the microphone to the mouth, the image protruding the lips forward from the camera is detected, and the breath sound is detected from the sound signal input from the microphone, the second prediction unit 230 is performed. May determine 350 the user intention to blow.
- FIG. 4 is a diagram illustrating an example of an operation of predicting a user's intention by receiving an additional multimodal input after a part of the user's intention is predicted.
- the second predictor 230 includes a microphone included in the multi-modal sensing unit 130.
- a multimodal signal is input by activating a sensor such as a camera (420).
- the second predictor 230 extracts features from an acoustic signal input from the microphone and an image signal input from the camera, and classifies and analyzes the features (430).
- Acoustic features include time energy, frequency energy, zero crossing rate, linear predictive coding (LPC), cepstral coefficients, and pitch features such as a time domain or statistical features such as a frequency spectrum may be extracted.
- LPC linear predictive coding
- cepstral coefficients cepstral coefficients
- pitch features such as a time domain or statistical features such as a frequency spectrum may be extracted.
- the features that can be extracted are not limited to these and can be extracted by other feature algorithms.
- the extracted features are input feature speech using classification and learning algorithms such as Decision Tree, Support Vector Machine, Bayesian Network, Neural Network, etc. It may be classified as an activity class or a non-speech activity class, but is not limited thereto.
- the second prediction unit 230 may predict the user's intention by inputting the voice command. As a result of the feature analysis, the second predictor 230 may predict the degree of blow when the voice section is not detected (440) and when the breathing sound is detected (450). In addition, as other types of features are detected, the user's intention may be determined in various ways such as humming. In this case, the second prediction unit 230 may predict the user intention within a range limited from the first prediction.
- the user's intention may be predicted using the multi-modal information of the user, and the performance of the voice detection operation may be controlled according to the prediction result.
- Voice can be intuitively input without learning a separate button for input or an operation method such as a screen touch.
- the second prediction unit 230 changes the image information input from the image sensor 134 such as a camera and the person input from the biometric information sensor 136 such as a vocal cord microphone to utter a voice.
- At least one of the at least one piece of sensing information may be used together with the feature information extracted from the sound signal to detect a voice section and process the voice of the detected voice section.
- the sensing information includes image information indicating a change in the shape of the user's mouth, temperature information changed by breathing during ignition, vibration information of a body part such as a throat or jaw that vibrates during ignition, and infrared detection from a face or mouth during ignition. It may include at least one of the information.
- the user intention application unit 124 may perform voice recognition by processing a voice signal belonging to the detected voice section, and switch the application module using the voice recognition result. For example, when the application is executed according to the recognition result, when the name is recognized, intelligent voice input start and end switching can be performed, such as a search for a phone number for the recognized name or a call to the retrieved phone number. have.
- the voice call starts and ends based on the multi-modal information to grasp the intention of the voice call automatically even if the user does not perform a separate operation such as pressing a call button. The operation mode can be switched to the mode.
- FIG. 5 illustrates another example of an operation of predicting a user's intention by receiving an additional multimodal input after a part of the user's intention is predicted.
- the second predictor 230 activates a sensor such as a camera and an ultrasonic sensor when a part of the first predicted user intention received from the first predictor 220 is a selection of a specific object (460). Input is received (470).
- a sensor such as a camera and an ultrasonic sensor
- the second prediction unit 230 analyzes the input multi-modal signal 480 to predict the user's intention.
- the predicted user intention may be intentions within a range defined from the first prediction.
- the second prediction unit 230 may determine that the user shakes the hand as a result of the multimodal signal analysis.
- the secondary predicting unit 230 interprets the waving operation as an intention to delete a specific item or file shown on the screen according to the application being executed by the user intention applying unit 124, and the user intention applying unit 224. ) Can be controlled to delete specific items or files.
- FIG. 6 is a diagram illustrating an example of feature-based signal classification in which the secondary predictor 230 performs integrated analysis by using an acoustic signal and an image signal together.
- the second predictor 230 may include an acoustic feature extractor 510, an acoustic feature analyzer 520, an image feature extractor 530, an image feature analyzer 540, and an integrated analyzer 550. have.
- the sound feature extractor 510 extracts a sound feature from the sound signal.
- the acoustic feature analyzer 520 extracts a speech section by applying a classification and learning algorithm to the acoustic features.
- the image feature extractor 530 extracts an image feature from a series of image signals.
- the image feature analyzer 540 extracts a speech section by applying a classification and learning algorithm to the extracted image features.
- the integrated analysis unit 550 fuses the results classified by the audio signal and the video signal, respectively, and finally detects the voice section.
- the acoustic feature and the image feature may be individually applied or the two features may be fused and applied.
- the integrated analyzer 550 may be used.
- An audio section may be detected by fusion with detection information extracted from an audio signal and an image signal.
- the user when using the voice interface, the user may intuitively input voice without separately learning a voice input method. For example, the user does not need to perform a separate button or screen touch for voice input.
- the user does not need to perform a separate button or screen touch for voice input.
- noise such as home noise, vehicle noise, non-talker noise
- the voice since the voice may be detected using other biometric information in addition to the image, the voice section of the user may be accurately detected even when the lighting is too bright or dark or the user's mouth is covered.
- FIG. 7 is a diagram illustrating a user intention reasoning method using multi-modal information according to an exemplary embodiment.
- the user intention reasoning apparatus 100 receives the detected motion information from at least one motion sensor (610).
- the user intention reasoning apparatus 100 primarily predicts a part of the user intention using the received motion information (620).
- the user intention inference apparatus 100 predicts the user's intention by using a part of the first predicted user intention and the multi-modal information. (640). In the second step of predicting the user intention, an operation may be performed to interpret the multi-modal information input from the multi-modal sensor to be associated with a portion of the first predicted user intention.
- a portion of the first predicted user intention may be used to generate a control signal for executing an operation performed in the secondary user intention prediction process.
- the control signal for executing the operation performed in the secondary user intention prediction process may be a control signal for controlling the operation of the multi-modal sensor controlled by the user intention reasoning apparatus 100.
- the user intention may be determined using multi-modal information input from at least one multi-modal sensor, within a range of the first predicted user intent.
- One aspect of the invention may be embodied as computer readable code on a computer readable recording medium. Codes and code segments that implement a program can be easily inferred by a computer programmer in the art.
- Computer-readable recording media include all kinds of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, and the like.
- the computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
- the invention is industrially applicable in the fields of computers, electronics, computer software and information technology.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (17)
- 적어도 하나의 모션 정보를 이용하여 사용자 의도의 일부분을 예측하는 1차 예측부; 및상기 예측된 사용자 의도의 일부분 및 적어도 하나의 멀티 모달 센서로부터 입력된 멀티 모달 정보를 이용하여 사용자 의도를 예측하는 2차 예측부를 포함하는 사용자 의도 추론 장치.
- 제1항에 있어서,상기 1차 예측부는 상기 예측된 사용자 의도의 일부분을 이용하여 상기 사용자 의도를 예측하는 과정에서 수행되는 동작을 실행시키기 위한 제어 신호를 생성하는 사용자 의도 추론 장치.
- 제2항에 있어서,상기 사용자 의도를 예측하는 과정에서 수행되는 동작을 실행시키기 위한 제어 신호는 상기 사용자 의도 추론 장치에 의해 제어되는 멀티 모달 센서의 동작을 제어하는 제어 신호인 사용자 의도 추론 장치.
- 제1항에 있어서,상기 2차 예측부는 사용자 의도를 예측하기 위하여 상기 멀티 모달 센서로부터 입력되는 멀티 모달 정보를 상기 예측된 사용자 의도의 일부분과 연관되도록 해석하는 사용자 의도 추론 장치.
- 제4항에 있어서,상기 예측된 사용자 의도의 일부분이 디스플레이 화면에 표시된 오브젝트의 선택이고, 상기 멀티 모달 센서로부터 음성이 입력되면, 상기 2차 예측부는 상기 입력된 음성을 상기 오브젝트 선택과 연관하여 해석함으로써 사용자 의도를 예측하는 사용자 의도 추론 장치.
- 제1항에 있어서,상기 2차 예측부는, 상기 예측된 사용자 의도의 일부분의 범위 내에서, 적어도 하나의 멀티 모달 센서로부터 입력된 멀티 모달 정보를 이용하여 사용자 의도를 예측하는 사용자 의도 추론 장치.
- 제6항에 있어서,상기 예측된 사용자 의도의 일부분이 마이크를 입에 가져가는 동작인 경우, 상기 2차 예측부는, 음향 신호를 감지하고, 감지된 음향 신호에 대하여 특징을 추출 및 분석하여, 사용자 의도를 예측하는 사용자 의도 추론 장치.
- 제7항에 있어서,상기 2차 예측부는, 상기 음향 신호에서 음성 구간이 검출되는지 결정하고, 음성 구간이 검출되는 경우 사용자 의도를 음성 명령 의도로 예측하는 사용자 의도 추론 장치.
- 제8항에 있어서,상기 2차 예측부는, 상기 음향 신호에서 호흡음이 검출된 경우, 사용자 의도를 불기로 예측하는 사용자 의도 추론 장치.
- 제1항에 있어서,상기 예측된 사용자 의도의 일부분이 디스플레이 화면에 표시된 오브젝트의 선택인 경우, 상기 2차 예측부는, 멀티 모달 정보를 이용하여 사용자 의도를 상기 선택된 오브젝트에 대한 삭제, 분류 및 정렬 중 적어도 하나로 예측하는 사용자 의도 추론 장치.
- 제1항에 있어서,상기 사용자 의도 예측 결과를 이용하여 상기 사용자 의도 추론 장치에서 제어되는 소프트웨어 또는 하드웨어를 제어하는 사용자 의도 적용부를 더 포함하는 사용자 의도 추론 장치.
- 적어도 하나의 모션 정보를 수신하는 단계;상기 수신된 모션 정보를 이용하여 사용자 의도의 일부분을 예측하는 단계;적어도 하나의 멀티 모달 센서로부터 입력된 멀티 모달 정보를 수신하는 단계; 및상기 예측된 사용자 의도의 일부분 및 상기 멀티 모달 정보를 이용하여 사용자 의도를 예측하는 단계를 포함하는 사용자 의도 추론 방법.
- 제12항에 있어서,상기 예측된 사용자 의도의 일부분을 이용하여 상기 사용자 의도를 예측하는 과정에서 수행되는 동작을 실행시키기 위한 제어 신호를 생성하는 단계를 더 포함하는 사용자 의도 추론 방법.
- 제13항에 있어서,상기 사용자 의도를 예측하는 과정에서 수행되는 동작을 실행시키기 위한 제어 신호는 상기 사용자 의도 추론 장치에 의해 제어되는 멀티 모달 센서의 동작을 제어하는 제어 신호인 사용자 의도 추론 방법.
- 제12항에 있어서,상기 사용자 의도를 예측하는 단계는,상기 멀티 모달 센서로부터 입력되는 멀티 모달 정보를 상기 예측된 사용자 의도의 일부분과 연관되도록 해석하는 단계를 포함하는 사용자 의도 추론 방법.
- 제12항에 있어서,상기 사용자 의도를 예측하는 단계에서, 사용자 의도는 상기 예측된 사용자 의도의 일부분의 범위 내에서, 적어도 하나의 멀티 모달 센서로부터 입력된 멀티 모달 정보를 이용하여 예측되는 사용자 의도 추론 방법.
- 제12항에 있어서,상기 사용자 의도 예측 결과를 이용하여 상기 사용자 의도 추론 장치에서 제어되는 소프트웨어 또는 하드웨어를 제어하는 단계를 더 포함하는 사용자 의도 추론 방법.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201080017476.6A CN102405463B (zh) | 2009-04-30 | 2010-04-29 | 利用多模态信息的用户意图推理装置及方法 |
EP10769966.2A EP2426598B1 (en) | 2009-04-30 | 2010-04-29 | Apparatus and method for user intention inference using multimodal information |
JP2012508401A JP5911796B2 (ja) | 2009-04-30 | 2010-04-29 | マルチモーダル情報を用いるユーザ意図推論装置及び方法 |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2009-0038267 | 2009-04-30 | ||
KR1020090038267A KR101581883B1 (ko) | 2009-04-30 | 2009-04-30 | 모션 정보를 이용하는 음성 검출 장치 및 방법 |
KR20090067034 | 2009-07-22 | ||
KR10-2009-0067034 | 2009-07-22 | ||
KR1020100036031A KR101652705B1 (ko) | 2009-07-22 | 2010-04-19 | 멀티 모달 정보를 이용하는 사용자 의도 추론 장치 및 방법 |
KR10-2010-0036031 | 2010-04-19 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2010126321A2 true WO2010126321A2 (ko) | 2010-11-04 |
WO2010126321A3 WO2010126321A3 (ko) | 2011-03-24 |
Family
ID=45541557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2010/002723 WO2010126321A2 (ko) | 2009-04-30 | 2010-04-29 | 멀티 모달 정보를 이용하는 사용자 의도 추론 장치 및 방법 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8606735B2 (ko) |
EP (1) | EP2426598B1 (ko) |
JP (1) | JP5911796B2 (ko) |
CN (1) | CN102405463B (ko) |
WO (1) | WO2010126321A2 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016148398A1 (ko) * | 2015-03-16 | 2016-09-22 | 주식회사 스마트올웨이즈온 | 멀티모달 정보를 기반으로 상황 인지 기능을 수행하여 사용자 인터페이스와 사용자 경험을 스스로 학습하고 개선하는 셋톱박스 및 촬영 장치 |
Families Citing this family (323)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU6630800A (en) | 1999-08-13 | 2001-03-13 | Pixo, Inc. | Methods and apparatuses for display and traversing of links in page character array |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
ITFI20010199A1 (it) | 2001-10-22 | 2003-04-22 | Riccardo Vieri | Sistema e metodo per trasformare in voce comunicazioni testuali ed inviarle con una connessione internet a qualsiasi apparato telefonico |
US7669134B1 (en) | 2003-05-02 | 2010-02-23 | Apple Inc. | Method and apparatus for displaying information during an instant messaging session |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US7633076B2 (en) | 2005-09-30 | 2009-12-15 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
ITFI20070177A1 (it) | 2007-07-26 | 2009-01-27 | Riccardo Vieri | Sistema per la creazione e impostazione di una campagna pubblicitaria derivante dall'inserimento di messaggi pubblicitari all'interno di uno scambio di messaggi e metodo per il suo funzionamento. |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US8364694B2 (en) | 2007-10-26 | 2013-01-29 | Apple Inc. | Search assistant for digital media assets |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8327272B2 (en) | 2008-01-06 | 2012-12-04 | Apple Inc. | Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars |
US8065143B2 (en) | 2008-02-22 | 2011-11-22 | Apple Inc. | Providing text input using speech data and non-speech data |
US8289283B2 (en) | 2008-03-04 | 2012-10-16 | Apple Inc. | Language input interface on a device |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8464150B2 (en) | 2008-06-07 | 2013-06-11 | Apple Inc. | Automatic language identification for dynamic text processing |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8355919B2 (en) | 2008-09-29 | 2013-01-15 | Apple Inc. | Systems and methods for text normalization for text to speech synthesis |
US8352272B2 (en) | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for text to speech synthesis |
US8352268B2 (en) | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for selective rate of speech and speech preferences for text to speech synthesis |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8396714B2 (en) | 2008-09-29 | 2013-03-12 | Apple Inc. | Systems and methods for concatenation of words in text to speech synthesis |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
WO2010067118A1 (en) | 2008-12-11 | 2010-06-17 | Novauris Technologies Limited | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8380507B2 (en) | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8311838B2 (en) | 2010-01-13 | 2012-11-13 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US8381107B2 (en) | 2010-01-13 | 2013-02-19 | Apple Inc. | Adaptive audio feedback system and method |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8639516B2 (en) | 2010-06-04 | 2014-01-28 | Apple Inc. | User-specific noise suppression for voice quality improvements |
US9552422B2 (en) | 2010-06-11 | 2017-01-24 | Doat Media Ltd. | System and method for detecting a search intent |
US10713312B2 (en) * | 2010-06-11 | 2020-07-14 | Doat Media Ltd. | System and method for context-launching of applications |
US9323844B2 (en) | 2010-06-11 | 2016-04-26 | Doat Media Ltd. | System and methods thereof for enhancing a user's search experience |
US9141702B2 (en) | 2010-06-11 | 2015-09-22 | Doat Media Ltd. | Method for dynamically displaying a personalized home screen on a device |
US20160300138A1 (en) * | 2010-06-11 | 2016-10-13 | Doat Media Ltd. | Method and system for context-based intent verification |
US20140365474A1 (en) * | 2010-06-11 | 2014-12-11 | Doat Media Ltd. | System and method for sharing content over the web |
US9529918B2 (en) | 2010-06-11 | 2016-12-27 | Doat Media Ltd. | System and methods thereof for downloading applications via a communication network |
US9069443B2 (en) | 2010-06-11 | 2015-06-30 | Doat Media Ltd. | Method for dynamically displaying a personalized home screen on a user device |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US9104670B2 (en) | 2010-07-21 | 2015-08-11 | Apple Inc. | Customized search or acquisition of digital media assets |
US20120038555A1 (en) * | 2010-08-12 | 2012-02-16 | Research In Motion Limited | Method and Electronic Device With Motion Compensation |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US9274744B2 (en) | 2010-09-10 | 2016-03-01 | Amazon Technologies, Inc. | Relative position-inclusive device interfaces |
US8700392B1 (en) * | 2010-09-10 | 2014-04-15 | Amazon Technologies, Inc. | Speech-inclusive device interfaces |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US9348417B2 (en) * | 2010-11-01 | 2016-05-24 | Microsoft Technology Licensing, Llc | Multimodal input system |
US20120159341A1 (en) | 2010-12-21 | 2012-06-21 | Microsoft Corporation | Interactions with contextual and task-based computing environments |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US20120166522A1 (en) * | 2010-12-27 | 2012-06-28 | Microsoft Corporation | Supporting intelligent user interface interactions |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9263045B2 (en) | 2011-05-17 | 2016-02-16 | Microsoft Technology Licensing, Llc | Multi-mode text input |
US20120304067A1 (en) * | 2011-05-25 | 2012-11-29 | Samsung Electronics Co., Ltd. | Apparatus and method for controlling user interface using sound recognition |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
TWI447066B (zh) * | 2011-06-08 | 2014-08-01 | Sitronix Technology Corp | Distance sensing circuit and touch electronic device |
US8928336B2 (en) | 2011-06-09 | 2015-01-06 | Ford Global Technologies, Llc | Proximity switch having sensitivity control and method therefor |
US8975903B2 (en) | 2011-06-09 | 2015-03-10 | Ford Global Technologies, Llc | Proximity switch having learned sensitivity and method therefor |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US10004286B2 (en) | 2011-08-08 | 2018-06-26 | Ford Global Technologies, Llc | Glove having conductive ink and method of interacting with proximity sensor |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US9143126B2 (en) | 2011-09-22 | 2015-09-22 | Ford Global Technologies, Llc | Proximity switch having lockout control for controlling movable panel |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US8994228B2 (en) | 2011-11-03 | 2015-03-31 | Ford Global Technologies, Llc | Proximity switch having wrong touch feedback |
US10112556B2 (en) | 2011-11-03 | 2018-10-30 | Ford Global Technologies, Llc | Proximity switch having wrong touch adaptive learning and method |
US8878438B2 (en) | 2011-11-04 | 2014-11-04 | Ford Global Technologies, Llc | Lamp and proximity switch assembly and method |
US9223415B1 (en) | 2012-01-17 | 2015-12-29 | Amazon Technologies, Inc. | Managing resource usage for task performance |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US8933708B2 (en) | 2012-04-11 | 2015-01-13 | Ford Global Technologies, Llc | Proximity switch assembly and activation method with exploration mode |
US9065447B2 (en) | 2012-04-11 | 2015-06-23 | Ford Global Technologies, Llc | Proximity switch assembly and method having adaptive time delay |
US9660644B2 (en) | 2012-04-11 | 2017-05-23 | Ford Global Technologies, Llc | Proximity switch assembly and activation method |
US9531379B2 (en) | 2012-04-11 | 2016-12-27 | Ford Global Technologies, Llc | Proximity switch assembly having groove between adjacent proximity sensors |
US9831870B2 (en) | 2012-04-11 | 2017-11-28 | Ford Global Technologies, Llc | Proximity switch assembly and method of tuning same |
US9568527B2 (en) | 2012-04-11 | 2017-02-14 | Ford Global Technologies, Llc | Proximity switch assembly and activation method having virtual button mode |
US9520875B2 (en) | 2012-04-11 | 2016-12-13 | Ford Global Technologies, Llc | Pliable proximity switch assembly and activation method |
US9944237B2 (en) | 2012-04-11 | 2018-04-17 | Ford Global Technologies, Llc | Proximity switch assembly with signal drift rejection and method |
US9559688B2 (en) | 2012-04-11 | 2017-01-31 | Ford Global Technologies, Llc | Proximity switch assembly having pliable surface and depression |
US9197206B2 (en) | 2012-04-11 | 2015-11-24 | Ford Global Technologies, Llc | Proximity switch having differential contact surface |
US9219472B2 (en) | 2012-04-11 | 2015-12-22 | Ford Global Technologies, Llc | Proximity switch assembly and activation method using rate monitoring |
US9287864B2 (en) | 2012-04-11 | 2016-03-15 | Ford Global Technologies, Llc | Proximity switch assembly and calibration method therefor |
US9184745B2 (en) | 2012-04-11 | 2015-11-10 | Ford Global Technologies, Llc | Proximity switch assembly and method of sensing user input based on signal rate of change |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9136840B2 (en) | 2012-05-17 | 2015-09-15 | Ford Global Technologies, Llc | Proximity switch assembly having dynamic tuned threshold |
US8981602B2 (en) | 2012-05-29 | 2015-03-17 | Ford Global Technologies, Llc | Proximity switch assembly having non-switch contact and method |
US9337832B2 (en) | 2012-06-06 | 2016-05-10 | Ford Global Technologies, Llc | Proximity switch and method of adjusting sensitivity therefor |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9641172B2 (en) | 2012-06-27 | 2017-05-02 | Ford Global Technologies, Llc | Proximity switch assembly having varying size electrode fingers |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US8922340B2 (en) | 2012-09-11 | 2014-12-30 | Ford Global Technologies, Llc | Proximity switch based door latch release |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US8796575B2 (en) | 2012-10-31 | 2014-08-05 | Ford Global Technologies, Llc | Proximity switch assembly having ground layer |
US9081413B2 (en) * | 2012-11-20 | 2015-07-14 | 3M Innovative Properties Company | Human interaction system based upon real-time intention detection |
CN103841137A (zh) * | 2012-11-22 | 2014-06-04 | 腾讯科技(深圳)有限公司 | 智能终端控制网页应用的方法及智能终端 |
US9147398B2 (en) | 2013-01-23 | 2015-09-29 | Nokia Technologies Oy | Hybrid input device for touchless user interface |
KR20240132105A (ko) | 2013-02-07 | 2024-09-02 | 애플 인크. | 디지털 어시스턴트를 위한 음성 트리거 |
US9311204B2 (en) | 2013-03-13 | 2016-04-12 | Ford Global Technologies, Llc | Proximity interface development system having replicator and method |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
CN105190607B (zh) | 2013-03-15 | 2018-11-30 | 苹果公司 | 通过智能数字助理的用户培训 |
CN112230878B (zh) | 2013-03-15 | 2024-09-27 | 苹果公司 | 对中断进行上下文相关处理 |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
AU2014233517B2 (en) | 2013-03-15 | 2017-05-25 | Apple Inc. | Training an at least partial voice command system |
JP6032350B2 (ja) * | 2013-03-21 | 2016-11-24 | 富士通株式会社 | 動作検知装置及び動作検知方法 |
CN103200330A (zh) * | 2013-04-16 | 2013-07-10 | 上海斐讯数据通信技术有限公司 | 一种触发手电筒的实现方法及移动终端 |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
KR101772152B1 (ko) | 2013-06-09 | 2017-08-28 | 애플 인크. | 디지털 어시스턴트의 둘 이상의 인스턴스들에 걸친 대화 지속성을 가능하게 하기 위한 디바이스, 방법 및 그래픽 사용자 인터페이스 |
EP3008964B1 (en) | 2013-06-13 | 2019-09-25 | Apple Inc. | System and method for emergency calls initiated by voice command |
US10474961B2 (en) | 2013-06-20 | 2019-11-12 | Viv Labs, Inc. | Dynamically evolving cognitive architecture system based on prompting for additional user input |
US9594542B2 (en) | 2013-06-20 | 2017-03-14 | Viv Labs, Inc. | Dynamically evolving cognitive architecture system based on training by third-party developers |
US10083009B2 (en) | 2013-06-20 | 2018-09-25 | Viv Labs, Inc. | Dynamically evolving cognitive architecture system planning |
US9633317B2 (en) | 2013-06-20 | 2017-04-25 | Viv Labs, Inc. | Dynamically evolving cognitive architecture system based on a natural language intent interpreter |
DE112014003653B4 (de) | 2013-08-06 | 2024-04-18 | Apple Inc. | Automatisch aktivierende intelligente Antworten auf der Grundlage von Aktivitäten von entfernt angeordneten Vorrichtungen |
US11199906B1 (en) | 2013-09-04 | 2021-12-14 | Amazon Technologies, Inc. | Global user input management |
US9367203B1 (en) | 2013-10-04 | 2016-06-14 | Amazon Technologies, Inc. | User interface techniques for simulating three-dimensional depth |
US20160163314A1 (en) * | 2013-11-25 | 2016-06-09 | Mitsubishi Electric Corporation | Dialog management system and dialog management method |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
EP2887205A1 (en) * | 2013-12-17 | 2015-06-24 | Sony Corporation | Voice activated device, method & computer program product |
US10741182B2 (en) * | 2014-02-18 | 2020-08-11 | Lenovo (Singapore) Pte. Ltd. | Voice input correction using non-audio based input |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
CN110797019B (zh) | 2014-05-30 | 2023-08-29 | 苹果公司 | 多命令单一话语输入方法 |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9582482B1 (en) | 2014-07-11 | 2017-02-28 | Google Inc. | Providing an annotation linking related entities in onscreen content |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9792334B2 (en) * | 2014-09-25 | 2017-10-17 | Sap Se | Large-scale processing and querying for real-time surveillance |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10038443B2 (en) | 2014-10-20 | 2018-07-31 | Ford Global Technologies, Llc | Directional proximity switch assembly |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
JP5784211B1 (ja) * | 2014-12-19 | 2015-09-24 | 株式会社Cygames | 情報処理プログラムおよび情報処理方法 |
CN105812506A (zh) * | 2014-12-27 | 2016-07-27 | 深圳富泰宏精密工业有限公司 | 操作方式控制系统与方法 |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9654103B2 (en) | 2015-03-18 | 2017-05-16 | Ford Global Technologies, Llc | Proximity switch assembly having haptic feedback and method |
US10923126B2 (en) | 2015-03-19 | 2021-02-16 | Samsung Electronics Co., Ltd. | Method and device for detecting voice activity based on image information |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US9548733B2 (en) | 2015-05-20 | 2017-01-17 | Ford Global Technologies, Llc | Proximity sensor assembly having interleaved electrode configuration |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
CN105159111B (zh) * | 2015-08-24 | 2019-01-25 | 百度在线网络技术(北京)有限公司 | 基于人工智能的智能交互设备控制方法及系统 |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10970646B2 (en) | 2015-10-01 | 2021-04-06 | Google Llc | Action suggestions for user-selected content |
CN105389461A (zh) * | 2015-10-21 | 2016-03-09 | 胡习 | 一种交互式儿童自主管理系统及其管理方法 |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10764226B2 (en) * | 2016-01-15 | 2020-09-01 | Staton Techiya, Llc | Message delivery and presentation methods, systems and devices using receptivity |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
CN107490971B (zh) * | 2016-06-09 | 2019-06-11 | 苹果公司 | 家庭环境中的智能自动化助理 |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | INTELLIGENT AUTOMATED ASSISTANT IN A HOME ENVIRONMENT |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
US10621992B2 (en) * | 2016-07-22 | 2020-04-14 | Lenovo (Singapore) Pte. Ltd. | Activating voice assistant based on at least one of user proximity and context |
CN106446524A (zh) * | 2016-08-31 | 2017-02-22 | 北京智能管家科技有限公司 | 智能硬件多模态级联建模方法及装置 |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10535005B1 (en) | 2016-10-26 | 2020-01-14 | Google Llc | Providing contextual actions for mobile onscreen content |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10229680B1 (en) * | 2016-12-29 | 2019-03-12 | Amazon Technologies, Inc. | Contextual entity resolution |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | USER INTERFACE FOR CORRECTING RECOGNITION ERRORS |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
DK179549B1 (en) | 2017-05-16 | 2019-02-12 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US10664533B2 (en) | 2017-05-24 | 2020-05-26 | Lenovo (Singapore) Pte. Ltd. | Systems and methods to determine response cue for digital assistant based on context |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
EP3724855A4 (en) | 2017-12-14 | 2022-01-12 | Magic Leap, Inc. | CONTEXT-BASED REPRESENTATION OF VIRTUAL AVATARS |
FR3076016B1 (fr) * | 2017-12-26 | 2021-10-22 | Thales Sa | Dispositif electronique d'interface entre au moins un systeme avionique et un ensemble de capteurs, installation avionique, procede de communication et programme d'ordinateur associes |
CN108563321A (zh) * | 2018-01-02 | 2018-09-21 | 联想(北京)有限公司 | 信息处理方法和电子设备 |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
TWI691923B (zh) * | 2018-04-02 | 2020-04-21 | 華南商業銀行股份有限公司 | 金融交易詐騙偵測防範系統及其方法 |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
DK179822B1 (da) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11076039B2 (en) | 2018-06-03 | 2021-07-27 | Apple Inc. | Accelerated task performance |
US11588902B2 (en) * | 2018-07-24 | 2023-02-21 | Newton Howard | Intelligent reasoning framework for user intent extraction |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US10831442B2 (en) * | 2018-10-19 | 2020-11-10 | International Business Machines Corporation | Digital assistant user interface amalgamation |
CN109192209A (zh) * | 2018-10-23 | 2019-01-11 | 珠海格力电器股份有限公司 | 一种语音识别方法及装置 |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
CN111737670B (zh) * | 2019-03-25 | 2023-08-18 | 广州汽车集团股份有限公司 | 多模态数据协同人机交互的方法、系统及车载多媒体装置 |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | USER ACTIVITY SHORTCUT SUGGESTIONS |
DK201970511A1 (en) | 2019-05-31 | 2021-02-15 | Apple Inc | Voice identification in digital assistant systems |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
CN110196642B (zh) * | 2019-06-21 | 2022-05-17 | 济南大学 | 一种基于意图理解模型的导航式虚拟显微镜 |
WO2021056255A1 (en) | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US11887600B2 (en) * | 2019-10-04 | 2024-01-30 | Disney Enterprises, Inc. | Techniques for interpreting spoken input using non-verbal cues |
EP3832435A1 (en) * | 2019-12-06 | 2021-06-09 | XRSpace CO., LTD. | Motion tracking system and method |
US11869213B2 (en) * | 2020-01-17 | 2024-01-09 | Samsung Electronics Co., Ltd. | Electronic device for analyzing skin image and method for controlling the same |
US11038934B1 (en) | 2020-05-11 | 2021-06-15 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
CN111968631B (zh) * | 2020-06-29 | 2023-10-10 | 百度在线网络技术(北京)有限公司 | 智能设备的交互方法、装置、设备及存储介质 |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11804215B1 (en) | 2022-04-29 | 2023-10-31 | Apple Inc. | Sonic responses |
Family Cites Families (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0375860A (ja) * | 1989-08-18 | 1991-03-29 | Hitachi Ltd | パーソナライズド端末 |
US5621858A (en) * | 1992-05-26 | 1997-04-15 | Ricoh Corporation | Neural network acoustic and visual speech recognition system training method and apparatus |
US5473726A (en) * | 1993-07-06 | 1995-12-05 | The United States Of America As Represented By The Secretary Of The Air Force | Audio and amplitude modulated photo data collection for speech recognition |
JP3375449B2 (ja) | 1995-02-27 | 2003-02-10 | シャープ株式会社 | 統合認識対話装置 |
US5806036A (en) * | 1995-08-17 | 1998-09-08 | Ricoh Company, Ltd. | Speechreading using facial feature parameters from a non-direct frontal view of the speaker |
JP3702978B2 (ja) * | 1996-12-26 | 2005-10-05 | ソニー株式会社 | 認識装置および認識方法、並びに学習装置および学習方法 |
JPH11164186A (ja) * | 1997-11-27 | 1999-06-18 | Fuji Photo Film Co Ltd | 画像記録装置 |
US6629065B1 (en) * | 1998-09-30 | 2003-09-30 | Wisconsin Alumni Research Foundation | Methods and apparata for rapid computer-aided design of objects in virtual reality and other environments |
JP2000132305A (ja) * | 1998-10-23 | 2000-05-12 | Olympus Optical Co Ltd | 操作入力装置 |
US6842877B2 (en) * | 1998-12-18 | 2005-01-11 | Tangis Corporation | Contextual responses based on automated learning techniques |
US6825875B1 (en) * | 1999-01-05 | 2004-11-30 | Interval Research Corporation | Hybrid recording unit including portable video recorder and auxillary device |
JP2000276190A (ja) | 1999-03-26 | 2000-10-06 | Yasuto Takeuchi | 発声を必要としない音声通話装置 |
SE9902229L (sv) * | 1999-06-07 | 2001-02-05 | Ericsson Telefon Ab L M | Apparatus and method of controlling a voice controlled operation |
US6904405B2 (en) | 1999-07-17 | 2005-06-07 | Edwin A. Suominen | Message recognition using shared language model |
JP2001100878A (ja) | 1999-09-29 | 2001-04-13 | Toshiba Corp | マルチモーダル入出力装置 |
US7028269B1 (en) * | 2000-01-20 | 2006-04-11 | Koninklijke Philips Electronics N.V. | Multi-modal video target acquisition and re-direction system and method |
JP2001216069A (ja) * | 2000-02-01 | 2001-08-10 | Toshiba Corp | 操作入力装置および方向検出方法 |
JP2005174356A (ja) * | 2000-02-01 | 2005-06-30 | Toshiba Corp | 方向検出方法 |
NZ503882A (en) * | 2000-04-10 | 2002-11-26 | Univ Otago | Artificial intelligence system comprising a neural network with an adaptive component arranged to aggregate rule nodes |
US6754373B1 (en) * | 2000-07-14 | 2004-06-22 | International Business Machines Corporation | System and method for microphone activation using visual speech cues |
US6894714B2 (en) * | 2000-12-05 | 2005-05-17 | Koninklijke Philips Electronics N.V. | Method and apparatus for predicting events in video conferencing and other applications |
US6964023B2 (en) * | 2001-02-05 | 2005-11-08 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
KR20020068235A (ko) | 2001-02-20 | 2002-08-27 | 유재천 | 치아와 입술 영상을 이용한 음성인식 장치 및 방법 |
US7171357B2 (en) | 2001-03-21 | 2007-01-30 | Avaya Technology Corp. | Voice-activity detection using energy ratios and periodicity |
US7102485B2 (en) * | 2001-05-08 | 2006-09-05 | Gene Williams | Motion activated communication device |
US7203643B2 (en) | 2001-06-14 | 2007-04-10 | Qualcomm Incorporated | Method and apparatus for transmitting speech activity in distributed voice recognition systems |
US20030055644A1 (en) | 2001-08-17 | 2003-03-20 | At&T Corp. | Systems and methods for aggregating related inputs using finite-state devices and extracting meaning from multimodal inputs using aggregation |
US6990639B2 (en) * | 2002-02-07 | 2006-01-24 | Microsoft Corporation | System and process for controlling electronic components in a ubiquitous computing environment using multimodal integration |
DE10208469A1 (de) * | 2002-02-27 | 2003-09-04 | Bsh Bosch Siemens Hausgeraete | Elektrisches Gerät, insbesondere Dunstabzugshaube |
US7230955B1 (en) | 2002-12-27 | 2007-06-12 | At & T Corp. | System and method for improved use of voice activity detection |
KR100515798B1 (ko) | 2003-02-10 | 2005-09-21 | 한국과학기술원 | 입 벌림 정도와 얼굴방향 인식방법 및 얼굴 제스처를이용한 로봇 구동방법 |
CA2420129A1 (en) | 2003-02-17 | 2004-08-17 | Catena Networks, Canada, Inc. | A method for robustly detecting voice activity |
US8745541B2 (en) * | 2003-03-25 | 2014-06-03 | Microsoft Corporation | Architecture for controlling a computer using hand gestures |
US20040243416A1 (en) * | 2003-06-02 | 2004-12-02 | Gardos Thomas R. | Speech recognition |
US7343289B2 (en) | 2003-06-25 | 2008-03-11 | Microsoft Corp. | System and method for audio/video speaker detection |
US7383181B2 (en) | 2003-07-29 | 2008-06-03 | Microsoft Corporation | Multi-sensory speech detection system |
US7318030B2 (en) | 2003-09-17 | 2008-01-08 | Intel Corporation | Method and apparatus to perform voice activity detection |
JP4311190B2 (ja) * | 2003-12-17 | 2009-08-12 | 株式会社デンソー | 車載機器用インターフェース |
US20050228673A1 (en) | 2004-03-30 | 2005-10-13 | Nefian Ara V | Techniques for separating and evaluating audio and video source data |
US8788265B2 (en) | 2004-05-25 | 2014-07-22 | Nokia Solutions And Networks Oy | System and method for babble noise detection |
US7624355B2 (en) * | 2004-05-27 | 2009-11-24 | Baneth Robin C | System and method for controlling a user interface |
FI20045315A (fi) | 2004-08-30 | 2006-03-01 | Nokia Corp | Ääniaktiivisuuden havaitseminen äänisignaalissa |
JP4630646B2 (ja) | 2004-11-19 | 2011-02-09 | 任天堂株式会社 | 息吹きかけ判別プログラム、息吹きかけ判別装置、ゲームプログラムおよびゲーム装置 |
EP1686804A1 (en) * | 2005-01-26 | 2006-08-02 | Alcatel | Predictor of multimedia system user behavior |
WO2006104576A2 (en) | 2005-03-24 | 2006-10-05 | Mindspeed Technologies, Inc. | Adaptive voice mode extension for a voice activity detector |
GB2426166B (en) | 2005-05-09 | 2007-10-17 | Toshiba Res Europ Ltd | Voice activity detection apparatus and method |
US7346504B2 (en) | 2005-06-20 | 2008-03-18 | Microsoft Corporation | Multi-sensory speech enhancement using a clean speech prior |
US20070005363A1 (en) * | 2005-06-29 | 2007-01-04 | Microsoft Corporation | Location aware multi-modal multi-lingual device |
US8175874B2 (en) | 2005-11-17 | 2012-05-08 | Shaul Shimhi | Personalized voice activity detection |
KR100820141B1 (ko) | 2005-12-08 | 2008-04-08 | 한국전자통신연구원 | 음성 구간 검출 장치 및 방법 그리고 음성 인식 시스템 |
US7860718B2 (en) | 2005-12-08 | 2010-12-28 | Electronics And Telecommunications Research Institute | Apparatus and method for speech segment detection and system for speech recognition |
DE102006037156A1 (de) * | 2006-03-22 | 2007-09-27 | Volkswagen Ag | Interaktive Bedienvorrichtung und Verfahren zum Betreiben der interaktiven Bedienvorrichtung |
KR20080002187A (ko) | 2006-06-30 | 2008-01-04 | 주식회사 케이티 | 개인 감성 및 상황 변화에 따른 맞춤형 감성 서비스 시스템및 그 방법 |
US8775168B2 (en) | 2006-08-10 | 2014-07-08 | Stmicroelectronics Asia Pacific Pte, Ltd. | Yule walker based low-complexity voice activity detector in noise suppression systems |
WO2008069519A1 (en) * | 2006-12-04 | 2008-06-12 | Electronics And Telecommunications Research Institute | Gesture/speech integrated recognition system and method |
US8326636B2 (en) * | 2008-01-16 | 2012-12-04 | Canyon Ip Holdings Llc | Using a physical phenomenon detector to control operation of a speech recognition engine |
US20080252595A1 (en) * | 2007-04-11 | 2008-10-16 | Marc Boillot | Method and Device for Virtual Navigation and Voice Processing |
JP2009042910A (ja) * | 2007-08-07 | 2009-02-26 | Sony Corp | 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム |
US8321219B2 (en) | 2007-10-05 | 2012-11-27 | Sensory, Inc. | Systems and methods of performing speech recognition using gestures |
US20090262078A1 (en) * | 2008-04-21 | 2009-10-22 | David Pizzi | Cellular phone with special sensor functions |
US20100162181A1 (en) | 2008-12-22 | 2010-06-24 | Palm, Inc. | Interpreting Gesture Input Including Introduction Or Removal Of A Point Of Contact While A Gesture Is In Progress |
-
2010
- 2010-04-29 EP EP10769966.2A patent/EP2426598B1/en not_active Not-in-force
- 2010-04-29 JP JP2012508401A patent/JP5911796B2/ja not_active Expired - Fee Related
- 2010-04-29 WO PCT/KR2010/002723 patent/WO2010126321A2/ko active Application Filing
- 2010-04-29 US US12/770,168 patent/US8606735B2/en active Active
- 2010-04-29 CN CN201080017476.6A patent/CN102405463B/zh not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
None |
See also references of EP2426598A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016148398A1 (ko) * | 2015-03-16 | 2016-09-22 | 주식회사 스마트올웨이즈온 | 멀티모달 정보를 기반으로 상황 인지 기능을 수행하여 사용자 인터페이스와 사용자 경험을 스스로 학습하고 개선하는 셋톱박스 및 촬영 장치 |
Also Published As
Publication number | Publication date |
---|---|
EP2426598A2 (en) | 2012-03-07 |
US20100280983A1 (en) | 2010-11-04 |
JP2012525625A (ja) | 2012-10-22 |
JP5911796B2 (ja) | 2016-04-27 |
US8606735B2 (en) | 2013-12-10 |
CN102405463B (zh) | 2015-07-29 |
CN102405463A (zh) | 2012-04-04 |
WO2010126321A3 (ko) | 2011-03-24 |
EP2426598B1 (en) | 2017-06-21 |
EP2426598A4 (en) | 2012-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2010126321A2 (ko) | 멀티 모달 정보를 이용하는 사용자 의도 추론 장치 및 방법 | |
CN106575150B (zh) | 使用运动数据识别手势的方法和可穿戴计算设备 | |
LaViola Jr | 3d gestural interaction: The state of the field | |
WO2010110573A2 (en) | Multi-telepointer, virtual object display device, and virtual object control method | |
WO2013055025A1 (ko) | 지능 로봇, 지능 로봇과 사용자의 상호작용을 위한 시스템 및 지능 로봇과 사용자의 상호작용을 위한 방법 | |
WO2017188801A1 (ko) | 동작-음성의 다중 모드 명령에 기반한 최적 제어 방법 및 이를 적용한 전자 장치 | |
KR20100119250A (ko) | 모션 정보를 이용하는 음성 검출 장치 및 방법 | |
WO2013009062A2 (ko) | 머리의 제스처 및 손의 제스처를 감지하여 컨텐츠를 제어하기 위한 방법, 단말 장치 및 컴퓨터 판독 가능한 기록 매체 | |
CN112527113B (zh) | 手势识别及手势识别网络的训练方法和装置、介质和设备 | |
CN104516499B (zh) | 利用用户接口的事件的设备和方法 | |
CN111833872B (zh) | 对电梯的语音控制方法、装置、设备、系统及介质 | |
LaViola Jr | An introduction to 3D gestural interfaces | |
CN109725727A (zh) | 有屏设备的手势控制方法和装置 | |
KR101652705B1 (ko) | 멀티 모달 정보를 이용하는 사용자 의도 추론 장치 및 방법 | |
WO2016036197A1 (ko) | 손동작 인식 장치 및 그 방법 | |
CN114167984A (zh) | 设备控制方法、装置、存储介质及电子设备 | |
Wang et al. | A gesture-based method for natural interaction in smart spaces | |
Costagliola et al. | Gesture‐Based Computing | |
CN109725722A (zh) | 有屏设备的手势控制方法和装置 | |
Babu et al. | Controlling Computer Features Through Hand Gesture | |
US11464380B2 (en) | Artificial intelligence cleaner and operating method thereof | |
WO2014178491A1 (ko) | 발화 인식 방법 및 장치 | |
Mali et al. | Hand gestures recognition using inertial sensors through deep learning | |
Chaudhry et al. | Music Recommendation System through Hand Gestures and Facial Emotions | |
Palivela et al. | Hand Gesture-Based AI System for Accessing Windows Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080017476.6 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10769966 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012508401 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2010769966 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010769966 Country of ref document: EP |