KR101617665B1

KR101617665B1 - Automatically adapting user interfaces for hands-free interaction

Info

Publication number: KR101617665B1
Application number: KR1020147011766A
Authority: KR
Inventors: 토마스 로버트 그루버; 해리 제이. 새들러
Original assignee: 애플 인크.
Priority date: 2011-09-30
Filing date: 2012-09-20
Publication date: 2016-05-03
Also published as: CN108337380A; KR20140082771A; HK1200621A1; EP2761860B1; CN103959751A; JP6353786B2; JP2017016683A; WO2013048880A1; AU2012316484A1; EP2761860A1; JP2015501022A; AU2016200568A1; CN108337380B

Abstract

가상 어시스턴트와 같은 시스템용 사용자 인터페이스는 핸즈-프리 사용에 자동적으로 적응된다. 핸즈-프리 콘텍스트는 자동 또는 수동 수단을 통해 검출되며, 이 시스템은 그러한 콘텍스트의 특별한 제한들을 반영하기 위해 사용자 경험을 수정하도록 복합 대화식 시스템의 다양한 단계들을 조정한다. 따라서, 본 발명의 시스템은 가상 어시스턴트와 같은 복합 시스템의 단일 구현이 사용자 인터페이스 요소들을 동적으로 제공하게 하고, 핸즈-온 사용에 대한 동일한 시스템의 사용자 경험을 손상시키지 않고서 핸즈-프리 사용을 허용하도록 사용자 인터페이스 거동을 변경하게 한다.User interfaces for systems such as virtual assistants are automatically adapted to hands-free use. The hands-free context is detected via automatic or manual means, and the system coordinates the various steps of the complex interactive system to modify the user experience to reflect particular limitations of such context. Thus, the system of the present invention allows a single implementation of a composite system, such as a virtual assistant, to dynamically provide user interface elements and to enable hands-free use without compromising the user experience of the same system for hands- Change interface behavior.

Description

[0001] AUTOMATICALLY ADAPTING USER INTERFACES FOR HANDS-FREE INTERACTION FOR HANDS-FREE INTERACTION [0002]

본 발명은 다중모드 사용자 인터페이스들에 관한 것이며, 보다 구체적으로 보이스-기반 및 시각적 양식(modality)들 둘 모두를 포함하는 사용자 인터페이스들에 관한 것이다.The present invention relates to multi-mode user interfaces, and more particularly to user interfaces including both voice-based and visual modalities.

많은 기존의 운영 시스템들 및 디바이스들은 사용자가 동작을 제어할 수 있게 하는 양식으로서 보이스 입력을 사용한다. 일례는 보이스 명령 시스템들이며, 이는 예를 들어 사람의 이름을 말함으로써 전화 번호의 다이얼링을 개시하기 위해 특정 구두 명령들을 동작들에 매핑시킨다. 다른 예는 대화식 보이스 응답(Interactive Voice Response, IVR) 시스템들이며, 이는 사람들이 자동화된 전화 서비스 데스크들과 같은 전화기를 통해 정적 정보(static information)에 액세스하게 한다.Many existing operating systems and devices use voice input as a form that allows the user to control the operation. One example is voice command systems, which map specific verbal commands to actions to initiate dialing of a phone number, for example by saying a person's name. Other examples are Interactive Voice Response (IVR) systems, which allow people to access static information via telephones, such as automated telephone service desks.

많은 보이스 명령 및 IVR 시스템들은 범위가 비교적 좁으며 단지 미리 정의된 세트의 보이스 명령들만을 취급할 수 있다. 게다가, 그들의 출력은 종종 고정된 세트의 응답들로부터 끌어낸다.Many voice commands and IVR systems are relatively narrow in scope and can only handle a predefined set of voice commands. In addition, their output often draws from a fixed set of responses.

본 명세서에서 또한 가상 어시스턴트(virtual assistant)로서 불리는, 지능형 자동화 어시스턴트(intelligent automated assistant)는 자연 언어 입력의 처리를 포함한, 인간과 컴퓨터 사이에서의 개선된 인터페이스를 제공할 수 있다. 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 1월 10일자로 출원된 "지능형 자동화 어시스턴트"에 대한 관련된 미국 특허 출원 제12/987,982호에 설명된 바와 같이 구현될 수 있는 그러한 어시스턴트는 사용자들이 음성 및/또는 텍스트 형태들로, 자연 언어를 사용하여 디바이스 또는 시스템과 상호작용하게 한다. 그러한 어시스턴트는 사용자 입력들을 해석하고, 사용자의 의도를 태스크들 및 이들 태스크들에 대한 파라미터들로 조작화하고, 이들 태스크들을 지원하기 위해 서비스들을 실행하며, 사용자가 이해하기 쉬운 출력을 생성한다.An intelligent automated assistant, also referred to herein as a virtual assistant, can provide an improved interface between a human and a computer, including the processing of natural language input. Such an assistant, which may be implemented as described in related U. S. Patent Application No. 12 / 987,982, entitled "Intelligent Automation Assistant" filed January 10, 2011, the entire disclosure of which is incorporated herein by reference Allowing users to interact with devices or systems using natural language, in voice and / or text forms. Such an assistant interprets the user inputs, manipulates the user's intent with the tasks and parameters for these tasks, executes the services to support these tasks, and generates user-understandable output.

가상 어시스턴트들은 보다 큰 범위의 입력을 인식하기 위해 기술을 이해한 일반적인 스피치(speech) 및 자연 언어를 사용할 수 있어서, 사용자와의 다이얼로그의 생성을 가능하게 한다. 일부 가상 어시스턴트들은 구두 응답들 및 쓰여진 텍스트를 포함한, 모드들의 조합으로 출력을 생성할 수 있으며, 또한 온-스크린 요소들의 직접적인 조작을 허용하는 그래픽 사용자 인터페이스(GUI)를 제공할 수 있다. 그러나, 사용자는 항상 자신이 그러한 시각적 출력 또는 직접적인 조작 인터페이스들을 이용할 수 있는 상황에 있는 것이 아닐 수 있다. 예를 들어, 사용자는 운전 중이거나 기계를 작동시키고 있을 수 있거나, 또는 시각 장애를 가질 수 있거나, 단순히 시각적 인터페이스가 불편하거나 익숙하지 않을 수 있다.Virtual assistants can use generic speech and natural language that understands the technology to recognize a larger range of inputs, enabling the creation of dialogs with users. Some virtual assistants can generate output in a combination of modes, including verbal responses and written text, and can also provide a graphical user interface (GUI) that allows direct manipulation of on-screen elements. However, the user may not always be in a situation where he can use such visual output or direct manipulation interfaces. For example, the user may be driving, operating the machine, have a visual impairment, or simply be uncomfortable or unfamiliar with the visual interface.

사용자가 접촉(키보드, 마우스, 터치스크린, 포인팅 디바이스 등)을 통해 스크린을 읽거나 디바이스와 상호작용하는 제한된 능력을 갖거나 그러한 능력을 전혀 갖지 않는 임의의 상황이 본 명세서에서 "핸즈-프리 콘텍스트(hands-free context)"로서 지칭된다. 예를 들어, 위에서 언급된 바와 같이, 사용자가 운전 동안에 디바이스를 동작시키려고 시도하는 상황들에서, 사용자는 가청 출력을 들을 수 있으며 그들의 보이스를 사용하여 응답할 수 있지만, 안전 이유들로 인해, 작은 활자를 읽거나 메뉴들을 탭핑(tapping)하거나 텍스트를 입력하지 않아야 한다.Any situation in which a user has limited ability to read or interact with a device through a contact (keyboard, mouse, touch screen, pointing device, etc.) or has no such capability is referred to herein as a " hands-free context ". For example, as noted above, in situations where the user tries to operate the device during operation, the user can listen to the audible output and respond using their voice, but due to safety reasons, Or tapping menus or not entering text.

핸즈-프리 콘텍스트들은 가상 어시스턴트들과 같은 복잡한 시스템들의 개발자들에게 특별한 난제들을 제공한다. 사용자들은 그들이 핸즈-프리 콘텍스트에 있든 아니든 간에 디바이스들의 특징부들에 대한 완전한 액세스를 요구한다. 그러나, 핸즈-프리 동작에 내재하는 특별한 제한들을 책임지지 못함은 디바이스 또는 시스템의 효용 및 유용성 둘 모두를 제한하는 상황들을 야기할 수 있으며, 심지어 사용자가 차량을 작동시키는 것과 같은 주 태스크로부터 주위가 분산되게 함으로써 안전성을 손상시킬 수 있다.Hands-free contexts provide special challenges to developers of complex systems such as virtual assistants. Users require full access to the features of the devices whether they are in a hands-free context or not. However, failure to account for any special limitations inherent in hands-free operation can lead to situations that limit both the utility and usefulness of the device or system and may even cause the user to be distracted from the main task, The safety can be impaired.

본 발명의 다양한 실시예들에 따르면, 가상 어시스턴트와 같은 시스템용 사용자 인터페이스는 핸즈-프리 사용에 자동적으로 적응된다. 핸즈-프리 콘텍스트는 자동 또는 수동 수단을 통해 검출되며, 이 시스템은 그러한 콘텍스트의 특별한 제한들을 반영하기 위해 사용자 경험을 수정하도록 복합 대화식 시스템의 다양한 단계들을 조정한다. 따라서, 본 발명의 시스템은 가상 어시스턴트 또는 다른 복합 시스템의 단일 구현이 사용자 인터페이스 요소들을 동적으로 제공하게 하고, 핸즈-온(hands-on) 사용에 대한 동일한 시스템의 사용자 경험을 손상시키지 않고서 핸즈-프리 사용을 허용하도록 사용자 인터페이스 거동을 변경하게 한다.According to various embodiments of the present invention, a user interface for a system, such as a virtual assistant, is automatically adapted to hands-free use. The hands-free context is detected via automatic or manual means, and the system coordinates the various steps of the complex interactive system to modify the user experience to reflect particular limitations of such context. Thus, the system of the present invention allows a single implementation of a virtual assistant or other complex system to dynamically provide user interface elements and enable hands-free operation without damaging the user experience of the same system for hands- Allows the user interface behavior to change to allow use.

예를 들어, 다양한 실시예들에서, 본 발명의 시스템은 가상 어시스턴트의 동작을 조절하기 위한 메커니즘들을 제공하여, 사용자들이 스크린 상에서의 세부 사항들을 읽을 필요 없이 사용자들의 태스크들을 완료하게 하는 방식으로 출력을 제공하도록 한다. 더욱이, 다양한 실시예들에서, 가상 어시스턴트는 읽기, 탭핑, 클릭, 타이핑, 또는 종종 그래픽 사용자 인터페이스를 사용하여 달성되는 다른 기능들의 수행에 대한 대안으로서 음성 입력을 수신하기 위한 메커니즘들을 제공할 수 있다.For example, in various embodiments, the system of the present invention provides mechanisms for manipulating the operation of a virtual assistant to provide output in a manner that allows users to complete tasks of users without having to read the details on the screen. . Moreover, in various embodiments, the virtual assistant can provide mechanisms for receiving voice input as an alternative to performing read, tapping, clicking, typing, or other functions that are often accomplished using a graphical user interface.

다양한 실시예들에서, 본 발명의 시스템은 종래의 그래픽 사용자 인터페이스의 기능과 동일한(또는 근사한) 기본 기능을 제공하면서, 핸즈-프리 콘텍스트와 연관된 특별한 요건들 및 제한들을 허용한다. 보다 일반적으로, 본 발명의 시스템은 핵심 기능이 실질적으로 동일하게 유지되게 하면서, 핸즈-프리 콘텍스트에서의 동작을 용이하게 한다. 일부 실시예들에서, 본 발명의 기술들에 따라 구성된 시스템들은, 단일 세션 내의 일부 경우들에서, 사용자들이 핸즈-프리 모드 및 종래의 ("핸즈-온") 모드 사이에서 자유롭게 선택하게 한다. 예를 들어, 동일한 인터페이스가 사무실 환경 및 이동 차량 둘 모두에 적응 가능하게 될 수 있는데, 이때 시스템은 환경이 변화함에 따라 사용자 인터페이스 거동에 대한 필요한 변경들을 동적으로 만든다.In various embodiments, the system of the present invention allows for special requirements and limitations associated with the hands-free context, while providing the same (or approximate) basic functionality as the functionality of a conventional graphical user interface. More generally, the system of the present invention facilitates operation in a hands-free context, while allowing core functions to remain substantially the same. In some embodiments, systems configured in accordance with the techniques of the present invention allow users to freely select between a hands-free mode and a conventional ("hands-on") mode in some instances within a single session. For example, the same interface can be made adaptable to both the office environment and the moving vehicle, where the system makes the necessary changes to the user interface behavior dynamically as the environment changes.

본 발명의 다양한 실시예들에 따르면, 다수의 메커니즘들 중 임의의 것이 가상 어시스턴트의 동작을 핸즈-프리 콘텍스트에 적응시키도록 구현될 수 있다. 다양한 실시예들에서, 가상 어시스턴트는 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 1월 10일자로 출원된 "지능형 자동화 어시스턴트"에 대한 미국 특허 출원 제12/987,982호에 설명된 바와 같은 지능형 자동화 어시스턴트이다. 그러한 어시스턴트는 자연 언어 다이얼로그를 사용하여 통합형 대화 방식(integrated, conversational manner)으로 사용자와 교감하며, 정보를 획득하거나 또는 다양한 동작들을 수행하기 위해 적절할 때 외부 서비스들을 작동시킨다.According to various embodiments of the present invention, any of a number of mechanisms may be implemented to adapt the operation of the virtual assistant to a hands-free context. In various embodiments, the virtual assistant is described in U.S. Patent Application No. 12 / 987,982, entitled " Intelligent Automation Assistant, " filed January 10, 2011, the entire disclosure of which is incorporated herein by reference It is the same intelligent automation assistant. Such an assistant interacts with the user in an integrated, conversational manner using natural language dialogs, and operates external services when appropriate to acquire information or perform various operations.

본 발명의 다양한 실시예들에 따르면, 가상 어시스턴트는 다양한 상이한 유형들의 동작들, 기능들, 및/또는 특징들을 수행함에 있어서 핸즈-프리 콘텍스트를 검출하도록 그리고 그에 따라 가상 어시스턴트의 동작을 조절하도록, 그리고/또는 가상 어시스턴트가 설치되는 전자 디바이스의 복수의 특징들, 동작들, 및 애플리케이션들을 조합하도록 구성, 설계, 및/또는 동작 가능할 수 있다. 일부 실시예들에서, 본 발명의 가상 어시스턴트는 입력을 수신하고, 출력을 제공하고, 사용자와의 다이얼로그에 참여하며, 그리고/또는 분별된 의도에 기초하여 동작들을 수행할 때(또는 개시할 때) 핸즈-프리 콘텍스트를 검출하고 그에 따라 가상 어시스턴트의 동작을 조절할 수 있다.According to various embodiments of the present invention, the virtual assistant is configured to detect a hands-free context in performing various different types of operations, functions, and / or features and to adjust the operation of the virtual assistant accordingly, and Designed and / or operable to combine multiple features, operations, and applications of the electronic device in which the virtual assistant is installed. In some embodiments, the virtual assistant of the present invention receives input (s), provides an output, participates in a dialog with a user, and / or performs (or initiates) operations based on the identified intent, It is possible to detect the hands-free context and adjust the operation of the virtual assistant accordingly.

동작들이, 예를 들어 전자 디바이스 상에서 이용 가능할 수 있는 임의의 애플리케이션들 또는 서비스들뿐만 아니라 인터넷과 같은 전자 네트워크를 통해 이용 가능한 서비스들을 활성화시키고/시키거나 그와 인터페이싱함으로써 수행될 수 있다. 다양한 실시예들에서, 외부 서비스들의 그러한 활성화는 애플리케이션 프로그래밍 인터페이스(application programming interface, API)들을 통해 또는 임의의 다른 적합한 메커니즘(들)에 의해 수행될 수 있다. 이러한 방식으로, 본 발명의 다양한 실시예들에 따라 구현된 가상 어시스턴트는 전자 디바이스의 많은 상이한 애플리케이션들 및 기능들에 대한, 그리고 인터넷을 통해 이용 가능할 수 있는 서비스들에 대한 핸즈-프리 사용 환경을 제공할 수 있다. 전술된 관련 출원에 설명된 바와 같이, 그러한 가상 어시스턴트의 사용은 사용자가 디바이스 상에서 그리고 웹-접속형 서비스들 상에서 어떤 기능이 이용 가능할 수 있는지, 사용자가 원하는 것을 얻기 위해 그러한 서비스들과 어떻게 인터페이싱하는지, 그리고 그러한 서비스들로부터 수신된 출력을 어떻게 해석하는지를 학습하는 부담을 덜어줄 수 있으며; 오히려 본 발명의 어시스턴트는 사용자 및 그러한 다양한 서비스들 사이에서의 중개자로서 작용할 수 있다.Operations may be performed, for example, by activating and / or interfacing with any applications or services that may be available on the electronic device, as well as services available over an electronic network such as the Internet. In various embodiments, such activation of external services may be performed through application programming interfaces (APIs) or by any other suitable mechanism (s). In this manner, the virtual assistant implemented in accordance with various embodiments of the present invention provides a hands-free environment for many different applications and functions of the electronic device, and for services that may be available over the Internet can do. The use of such virtual assistants, as described in the aforementioned related applications, allows a user to determine what functions are available on the device and on web-accessible services, how the user interfaces with those services to get what he wants, And it can lessen the burden of learning how to interpret the received output from such services; Rather, the assistant of the present invention can act as a mediator between the user and such a variety of services.

게다가, 다양한 실시예들에서, 본 발명의 가상 어시스턴트는 사용자가 종래의 그래픽 사용자 인터페이스들보다 더 직관적이며 덜 힘들게 찾을 수 있는 대화 인터페이스를 제공한다. 사용자는 핸즈-프리 또는 핸즈-온 콘텍스트가 활성인지 여부에 부분적으로 의존하여, 다수의 이용 가능한 입력 및 출력 메커니즘들 중 임의의 것을 사용하여 어시스턴트와의 대화 다이얼로그 형태로 참여할 수 있다. 그러한 입력 및 출력 메커니즘들의 예들은 제한 없이, 스피치, 그래픽 사용자 인터페이스들(버튼들 및 링크들), 텍스트 엔트리 등을 포함한다. 이 시스템은 디바이스 API들, 웹, 이메일 등, 또는 이들의 임의의 조합과 같은, 다수의 상이한 플랫폼들 중 임의의 것을 사용하여 구현될 수 있다. 부가적인 입력에 대한 요청들이 청각 및/또는 시각 방식으로 제공된 대화의 콘텍스트로 사용자에게 제공될 수 있다. 사용자 입력이 주어진 세션 내에서의 이전 이벤트들 및 통신들뿐만 아니라 사용자에 대한 이력 및 프로필 정보를 고려할 때 적절한 콘텍스트로 해석될 수 있도록 단기 및 장기 메모리가 참여될 수 있다.In addition, in various embodiments, the virtual assistant of the present invention provides a dialog interface that allows the user to look more intuitively and less painfully than conventional graphical user interfaces. The user may participate in an interaction dialog with the assistant using any of a number of available input and output mechanisms, depending in part on whether the hands-free or hands-on context is active. Examples of such input and output mechanisms include, without limitation, speech, graphical user interfaces (buttons and links), text entries, and the like. The system may be implemented using any of a number of different platforms, such as device APIs, web, email, etc., or any combination thereof. Requests for additional input may be provided to the user in the context of a conversation provided in an audible and / or visual manner. Short-term and long-term memory can be involved so that user input can be interpreted as an appropriate context when considering history and profile information for users as well as previous events and communications within a given session.

다양한 실시예들에서, 본 발명의 가상 어시스턴트는 전자 디바이스의 다양한 특징들 및 동작들을 제어할 수 있다. 예를 들어, 가상 어시스턴트는, 디바이스 상에서 종래의 사용자 인터페이스를 사용하여 달리 개시될 수도 있는 기능들 및 동작들을 수행하기 위해, API들을 통해 또는 다른 수단에 의해 디바이스 상에서의 기능 및 애플리케이션들과 인터페이싱하는 서비스들을 호출할 수 있다. 그러한 기능들 및 동작들은, 예를 들어 알람을 설정하는 것, 전화를 거는 것, 텍스트 메시지 또는 이메일 메시지를 전송하는 것, 캘린더 이벤트를 추가하는 것 등을 포함할 수 있다. 그러한 기능들 및 동작들은 사용자 및 어시스턴트 사이에서의 대화 다이얼로그의 콘텍스트에서 애드온(addon) 기능들로서 수행될 수 있다. 그러한 기능들 및 동작들은 그러한 다이얼로그의 콘텍스트에서 사용자에 의해 특정될 수 있거나, 이들은 다이얼로그의 콘텍스트에 기초하여 자동으로 수행될 수 있다. 당업자는 어시스턴트가 이에 의해 전자 디바이스 상에서 다양한 동작들을 개시 및 제어하기 위한 메커니즘으로서 사용될 수 있음을 인식할 것이다. 사용자의 현재 상황에 대한 추론들에 기여하는 상황적 증거를 수집함으로써, 그리고 그에 따라 사용자 인터페이스의 동작을 조절함으로써, 본 발명의 시스템은 가상 어시스턴트의 핸즈-프리 동작이 디바이스를 제어하기 위한 그러한 메커니즘을 구현하게 할 수 있게 하기 위한 메커니즘들을 제공할 수 있다.In various embodiments, the virtual assistant of the present invention may control various features and operations of the electronic device. For example, a virtual assistant may be a service that interfaces with functions and applications on a device via APIs or by other means to perform functions and operations that may otherwise be initiated using a conventional user interface on the device Can be called. Such functions and operations may include, for example, setting an alarm, placing a call, sending a text message or an email message, adding calendar events, and the like. Such functions and operations may be performed as add-on functions in the context of dialog dialogs between the user and the assistant. Such functions and operations may be specified by the user in the context of such a dialog, or they may be performed automatically based on the context of the dialog. Those skilled in the art will recognize that the assistant can thereby be used as a mechanism for initiating and controlling various operations on an electronic device. By collecting contextual evidence that contributes to the inferences about the user's current situation, and thus by regulating the operation of the user interface, the system of the present invention allows the hands-free operation of the virtual assistant to provide such a mechanism for controlling the device And can provide mechanisms to enable the implementation.

일부 실시예들에 따르면, 방법은 프로세서에서, 핸즈-프리 콘텍스트가 활성인지 여부를 검출하는 단계를 포함한다. 이 방법은 또한 출력 디바이스에서, 입력을 위해 사용자를 프롬프팅(prompting)하는 단계를 포함한다. 이 방법은 또한 입력 디바이스에서, 사용자 입력을 수신하는 단계를 포함한다. 이 방법은 또한, 프로세서에서, 사용자 의도의 표현을 도출하기 위해 수신된 사용자 입력을 해석하는 단계; 도출된 사용자 의도의 표현에 적어도 부분적으로 기초하여, 적어도 하나의 태스크 및 태스크에 대한 적어도 하나의 파라미터를 식별하는 단계; 결과를 도출하기 위해, 적어도 하나의 파라미터를 사용하여 적어도 하나의 태스크를 실행하는 단계; 도출된 결과에 기초하여 다이얼로그 응답을 생성하는 단계를 포함한다. 이 방법은 또한, 출력 디바이스에서, 생성된 다이얼로그 응답을 출력하는 단계를 포함한다. 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 입력을 위해 사용자를 프롬프팅하는 단계, 사용자 입력을 수신하는 단계, 수신한 사용자 입력을 해석하는 단계, 적어도 하나의 태스크 및 태스크에 대한 적어도 하나의 파라미터를 식별하는 단계, 및 다이얼로그 응답을 생성하는 단계 중 적어도 하나가 핸즈-프리 콘텍스트와 연관된 제한들에 따르는 방식으로 수행된다.According to some embodiments, the method includes detecting at a processor whether a hands-free context is active. The method also includes, at the output device, prompting the user for input. The method also includes receiving, at the input device, user input. The method also includes, at the processor, interpreting the received user input to derive a representation of the user's intent; Identifying at least one parameter for at least one task and a task based at least in part on an expression of the derived user intent; Executing at least one task using at least one parameter to derive a result; And generating a dialog response based on the derived result. The method also includes, at the output device, outputting the generated dialog response. Responsive to detecting that the device is in a hands-free context, prompting the user for input, receiving user input, interpreting the received user input, determining at least one Identifying a parameter, and generating a dialog response are performed in a manner that complies with the constraints associated with the hands-free context.

일부 실시예들에 따르면, 전자 디바이스는 하나 이상의 프로세서들, 메모리, 및 하나 이상의 프로그램들을 포함하며; 하나 이상의 프로그램들은 메모리에 저장되며 하나 이상의 프로세서들에 의해 실행되도록 구성되고, 하나 이상의 프로그램들은 전술된 방법들 중 임의의 것의 동작들을 수행하기 위한 명령어들을 포함한다. 일부 실시예들에 따르면, 컴퓨터-판독 가능한 저장 매체는 내부에 저장된 명령어들을 가지며, 명령어들은 전자 디바이스에 의해 실행될 때, 디바이스가 전술된 방법들 중 임의의 것의 동작들을 수행하게 한다. 일부 실시예들에 따르면, 전자 디바이스는 전술된 방법들 중 임의의 것의 동작들을 수행하기 위한 수단을 포함한다. 일부 실시예들에 따르면, 전자 디바이스에서의 사용을 위한 정보 처리 장치는 전술된 방법들 중 임의의 것의 동작들을 수행하기 위한 수단을 포함한다.According to some embodiments, an electronic device includes one or more processors, a memory, and one or more programs; One or more programs are stored in memory and configured to be executed by one or more processors, and one or more programs include instructions for performing operations of any of the methods described above. According to some embodiments, a computer-readable storage medium has stored therein instructions that, when executed by an electronic device, cause the device to perform operations of any of the methods described above. According to some embodiments, the electronic device includes means for performing operations of any of the methods described above. According to some embodiments, an information processing apparatus for use in an electronic device includes means for performing operations of any of the methods described above.

일부 실시예들에 따르면, 전자 디바이스는 핸즈-프리 콘텍스트가 활성인지 여부를 검출하도록 구성된 처리 유닛을 포함한다. 전자 디바이스는 또한 처리 유닛에 결합되고 입력을 위해 사용자를 프롬프팅하도록 구성된 출력 유닛, 및 처리 유닛에 결합되고 사용자 입력을 수신하도록 구성된 입력 유닛을 포함한다. 처리 유닛은, 사용자 의도의 표현을 도출하기 위해 수신된 사용자 입력을 해석하고, 도출된 사용자 의도의 표현에 적어도 부분적으로 기초하여, 적어도 하나의 태스크 및 태스크에 대한 적어도 하나의 파라미터를 식별하고, 결과를 도출하기 위해, 적어도 하나의 파라미터를 사용하여 적어도 하나의 태스크를 실행하고, 도출된 결과에 기초하여 다이얼로그 응답을 생성하고, 출력 디바이스가 생성된 다이얼로그 응답을 출력하게 하도록 추가로 구성된다. 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 입력을 위해 사용자를 프롬프팅하는 단계, 사용자 입력을 수신하는 단계, 수신한 사용자 입력을 해석하는 단계, 적어도 하나의 태스크 및 태스크에 대한 적어도 하나의 파라미터를 식별하는 단계, 및 다이얼로그 응답을 생성하는 단계 중 적어도 하나가 핸즈-프리 콘텍스트와 연관된 제한들에 따르는 방식으로 수행된다.According to some embodiments, the electronic device includes a processing unit configured to detect whether a hands-free context is active. The electronic device also includes an output unit coupled to the processing unit and configured to prompt the user for input, and an input unit coupled to the processing unit and configured to receive the user input. The processing unit interprets the received user input to derive a representation of the user's intent and identifies at least one parameter for the at least one task and the task based at least in part on the derived representation of the user's intent, To execute at least one task using at least one parameter, to generate a dialog response based on the derived result, and to cause the output device to output the generated dialog response. Responsive to detecting that the device is in a hands-free context, prompting the user for input, receiving user input, interpreting the received user input, determining at least one Identifying a parameter, and generating a dialog response are performed in a manner that complies with the constraints associated with the hands-free context.

첨부 도면들은 본 발명의 몇몇 실시예들을 예시하며, 설명과 함께, 실시예들에 따라 본 발명의 원리들을 설명하는 역할을 한다. 당업자는 도면들에 예시된 특정 실시예들이 단지 예시적이며, 본 발명의 범주를 제한하도록 의도되지 않음을 인식할 것이다.
<도 1>
도 1은 종래 기술에 따른, 텍스트 메시지를 읽기 위한 핸즈-온 인터페이스의 일례를 예시한 스크린 샷.
<도 2>
도 2는 텍스트 메시지에 응답하기 위한 인터페이스의 일례를 예시한 스크린 샷.
<도 3a 및 도 3b>
도 3a 및 도 3b는 보이스 구술 인터페이스가 텍스트 메시지에 응답하기 위해 사용되는 일례를 예시한 일련의 스크린 샷들.
<도 4>
도 4는 일 실시예에 따른, 텍스트 메시지를 수신하기 위한 인터페이스의 일례를 예시한 스크린 샷.
<도 5a 내지 도 5d>
도 5a 내지 도 5d는 사용자가 핸즈-프리 콘텍스트에서 텍스트 메시지를 수신하고 그에 응답하는, 본 발명의 실시예에 따른 다중모드 가상 어시스턴트의 동작의 일례를 예시한 일련의 스크린 샷들.
<도 6a 내지 도 6c>
도 6a 내지 도 6c는 사용자가 핸즈-프리 콘텍스트에서 텍스트 메시지를 교정하는, 본 발명의 실시예에 따른 다중모드 가상 어시스턴트의 동작의 일례를 예시한 일련의 스크린 샷들.
<도 7>
도 7은 일 실시예에 따른, 핸즈-프리 콘텍스트의 동적 검출 및 그에 대한 적응을 지원하는 가상 어시스턴트의 동작의 방법을 도시한 흐름도.
<도 8>
도 8은 일 실시예에 따른 가상 어시스턴트 시스템의 일례를 도시한 블록도.
<도 9>
도 9는 적어도 하나의 실시예에 따른 가상 어시스턴트의 적어도 일부분을 구현하기에 적합한 컴퓨팅 디바이스를 도시한 블록도.
<도 10>
도 10은 적어도 하나의 실시예에 따른, 독립형 컴퓨팅 시스템 상에 가상 어시스턴트의 적어도 일부분을 구현하기 위한 아키텍처를 도시한 블록도.
<도 11>
도 11은 적어도 하나의 실시예에 따른, 분산 컴퓨팅 네트워크 상에 가상 어시스턴트의 적어도 일부분을 구현하기 위한 아키텍처를 도시한 블록도.
<도 12>
도 12는 몇몇 상이한 유형들의 클라이언트들 및 동작 모드들을 예시하는 시스템 아키텍처를 도시한 블록도.
<도 13>
도 13은 일 실시예에 따른 본 발명을 구현하기 위해 서로 통신하는 클라이언트 및 서버를 도시한 블록도.
<도 14>
도 14는 일부 실시예들에 따른 전자 디바이스의 기능 블록도.The accompanying drawings illustrate several embodiments of the invention and, together with the description, serve to explain the principles of the invention in accordance with the embodiments. Those skilled in the art will recognize that the specific embodiments illustrated in the figures are illustrative only and are not intended to limit the scope of the invention.
&Lt; 1 >
1 is a screen shot illustrating an example of a hands-on interface for reading a text message according to the prior art;
2,
2 is a screen shot illustrating an example of an interface for responding to a text message;
3A and 3B,
Figures 3a and 3b are a series of screen shots illustrating an example in which a voice dictation interface is used to respond to a text message.
<Fig. 4>
4 is a screen shot illustrating an example of an interface for receiving a text message, according to one embodiment.
5A to 5D,
5A-5D are a series of screen shots illustrating an example of the operation of a multimode virtual assistant in accordance with an embodiment of the present invention in which a user receives and responds to a text message in a hands-free context.
6A to 6C,
6A-6C are a series of screen shots illustrating an example of the operation of a multimode virtual assistant in accordance with an embodiment of the present invention in which a user calibrates a text message in a hands-free context.
7,
7 is a flow diagram illustrating a method for dynamic detection of a hands-free context and a method of operation of a virtual assistant to support adaptation thereof, in accordance with an embodiment;
8,
8 is a block diagram showing an example of a virtual assistant system according to an embodiment;
9,
9 is a block diagram illustrating a computing device suitable for implementing at least a portion of a virtual assistant in accordance with at least one embodiment.
<Fig. 10>
10 is a block diagram illustrating an architecture for implementing at least a portion of a virtual assistant on a standalone computing system, in accordance with at least one embodiment.
11)
11 is a block diagram illustrating an architecture for implementing at least a portion of a virtual assistant on a distributed computing network, in accordance with at least one embodiment.
12,
12 is a block diagram illustrating a system architecture illustrating several different types of clients and modes of operation.
13,
13 is a block diagram illustrating a client and server communicating with one another to implement the invention in accordance with one embodiment.
<Fig. 14>
14 is a functional block diagram of an electronic device according to some embodiments.

본 발명의 다양한 실시예들에 따르면, 가상 어시스턴트의 동작들과 관련되어 핸즈-프리 콘텍스트가 검출되고 이에 따라 가상 어시스턴트의 사용자 인터페이스가 조절되어, 사용자가 핸즈-프리 콘텍스트에서 어시스턴트와 의미 있게 상호작용할 수 있게 한다.According to various embodiments of the present invention, a hands-free context is detected in association with the operations of the virtual assistant and the user interface of the virtual assistant is thereby adjusted so that the user can interact meaningfully with the assistant in a hands- Let's do it.

설명의 목적을 위해, 용어 "가상 어시스턴트"는 용어 "지능형 자동화 어시스턴트"와 동등하며, 둘 모두는 하기의 기능들 중 하나 이상을 수행하는 임의의 정보 처리 시스템을 지칭한다:For purposes of explanation, the term "virtual assistant" is equivalent to the term "intelligent automation assistant ", and both refer to any information handling system that performs one or more of the following functions:

· 음성 및/또는 텍스트 형태로, 인간 언어 입력을 해석하는 기능;Ability to interpret human language input in voice and / or text form;

· 단계들 및/또는 파라미터들을 갖는 태스크의 표현과 같은 실행될 수 있는 형태로 사용자 의도의 표현을 조작화하는 기능;The ability to manipulate a representation of a user's intent in a form that can be executed, such as a representation of a task with steps and / or parameters;

· 프로그램들, 방법들, 서비스들, API들 등을 작동시킴으로써, 태스크 표현들을 실행하는 기능; 및The ability to execute task expressions by activating programs, methods, services, APIs, etc.; And

· 언어 및/또는 그래픽 형태로 사용자에 대한 출력 응답들을 생성하는 기능.Ability to generate output responses for users in language and / or graphical form.

그러한 가상 어시스턴트의 일례는, 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 1월 10자로 출원된 "지능형 자동화 어시스턴트"에 대한 관련된 미국 특허 출원 제12/987,982호에 설명되어 있다.An example of such a virtual assistant is described in related US patent application Ser. No. 12 / 987,982, filed January 10, 2011, entitled "Intelligent Automation Assistant", the entire disclosure of which is incorporated herein by reference.

다양한 기술들이 이제 첨부한 도면들에 예시된 바와 같은 예시적인 실시예들을 참조하여 상세히 설명될 것이다. 하기의 설명에서, 본 명세서에 설명되거나 참조된 하나 이상의 태양들 및/또는 특징들의 철저한 이해를 제공하기 위해 다수의 특정 세부 사항들이 기재된다. 그러나, 본 명세서에 설명되거나 참조된 하나 이상의 태양들 및/또는 특징들이 이들 특정 세부 사항들의 일부 또는 전부 없이 실시될 수 있다는 것이 당업자에게 명백할 것이다. 다른 경우들에서, 잘 알려진 프로세스 단계들 및/또는 구조들은 본 명세서에 설명되거나 참조된 태양들 및/또는 특징들의 일부를 모호하게 하지 않도록 상세히 설명되지 않는다.Various techniques will now be described in detail with reference to exemplary embodiments as illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects and / or features described or referenced herein. However, it will be apparent to one skilled in the art that one or more aspects and / or features described or referenced herein may be practiced without some or all of these specific details. In other instances, well-known process steps and / or structures are not described in detail so as not to obscure some of the features and / or aspects described or referenced herein.

하나 이상의 상이한 발명들이 본 출원에서 설명될 수 있다. 또한, 본 명세서에 설명된 본 발명(들) 중 하나 이상을 위해, 다수의 실시예들이 본 특허 출원에서 설명될 수 있으며, 단지 예시 목적들만을 위해 제공된다. 설명된 실시예들은 어떠한 의미로도 제한하려고 의도되지 않는다. 본 발명(들) 중 하나 이상은, 개시 내용으로부터 쉽게 명백한 바와 같이, 다수의 실시예들에 넓게 적용 가능할 수 있다. 이들 실시예들은 당업자들이 본 발명(들) 중 하나 이상을 실시할 수 있게 하기 위해 충분히 상세히 설명되며, 다른 실시예들이 이용될 수 있고 본 발명(들) 중 하나 이상의 범주로부터 벗어나지 않고 구조적, 논리적, 소프트웨어, 전기적 및 다른 변화들이 이루어질 수 있음이 이해될 것이다. 따라서, 당업자들은 본 발명(들) 중 하나 이상이 다양한 수정들 및 변경들을 갖고 실시될 수 있음을 인식할 것이다. 본 발명(들) 중 하나 이상의 특정한 특징들이, 예시로서 본 발명(들) 중 하나 이상의 특정 실시예들이 도시되고 본 개시 내용의 일부를 형성하는 하나 이상의 특정한 도면들 또는 실시예들을 참조하여 설명될 수 있다. 그러나, 그러한 특징들은 이들이 설명되기 위해 참조되는 하나 이상의 특정 실시예들 또는 도면들에서의 사용으로 한정되지 않는다는 것이 이해되어야 한다. 본 개시 내용은 본 발명(들) 중 하나 이상의 모든 실시예들의 문자 그대로의 설명도 모든 실시예들에 존재해야 하는 본 발명(들) 중 하나 이상의 특징들의 목록도 아니다.One or more different inventions may be described in the present application. Also, for one or more of the inventions (s) described herein, a number of embodiments may be described in this patent application and are provided for illustrative purposes only. The described embodiments are not intended to be limiting in any sense. One or more of the present invention (s) may be broadly applicable to a number of embodiments, as will be readily apparent from the disclosure. These embodiments are described in sufficient detail to enable those skilled in the art to practice one or more of the invention (s), and that other embodiments may be utilized, and that structural, logical, It will be appreciated that software, electrical and other changes may be made. Accordingly, those skilled in the art will recognize that one or more of the inventions may be practiced with various modifications and variations. One or more of the specific features of the invention (s), by way of illustration, may be described with reference to one or more specific drawings or embodiments in which one or more specific embodiments of the invention (s) are shown and form a part hereof have. It is to be understood, however, that such features are not limited to use in one or more specific embodiments or drawings to which they are referenced for illustration. The present disclosure is not intended to be a list of one or more of the features of the invention (s) that must be present in all embodiments in the literal description of any one or more of the embodiments of the invention (s).

이러한 특허 출원에 제공된 섹션들의 제목들 및 본 특허 출원의 발명의 명칭은 단지 편리함을 위한 것이며, 본 개시 내용을 어떠한 방식으로도 제한하는 것으로서 취해져서는 안된다.The titles of the sections provided in such patent application and the title of the present patent application are for convenience only and are not to be taken as limiting the present disclosure in any way.

서로 통신하는 디바이스들은, 달리 명확하게 특정되지 않는다면, 서로 연속적인 통신 상태에 있을 필요는 없다. 게다가, 서로 통신하는 디바이스들은 하나 이상의 중개자들을 통해 직접 또는 간접적으로 통신할 수 있다.Devices that communicate with each other need not be in continuous communication with each other unless they are otherwise clearly specified. In addition, devices that communicate with each other can communicate directly or indirectly through one or more intermediaries.

서로 통신하는 몇몇 컴포넌트들을 갖는 실시예의 설명은 모든 그러한 컴포넌트들이 요구된다는 것을 의미하지는 않는다. 반대로, 다양한 선택적 컴포넌트들이 본 발명(들) 중 하나 이상의 매우 다양한 가능한 실시예들을 예시하기 위해 설명된다.The description of an embodiment having several components communicating with each other does not imply that all such components are required. Conversely, various optional components are described to illustrate one or more of the many different possible embodiments of the invention (s).

또한, 프로세스 단계들, 방법 단계들, 알고리즘들 등이 순차적인 순서로 설명될 수 있지만, 그러한 프로세스들, 방법들 및 알고리즘들은 임의의 적합한 순서로 작동하도록 구성될 수 있다. 다시 말하면, 본 특허 출원에서 설명될 수 있는 단계들의 임의의 시퀀스 또는 순서는, 그 자체로 그리고 자연히, 단계들이 그 순서로 수행될 요건을 표시하지는 않는다. 또한, 일부 단계들은 비-동시적으로 발생하는 것으로서 설명되거나 의미됨에도 불구하고, 동시에 수행될 수 있다(예컨대, 하나의 단계가 다른 단계 후에 설명되기 때문이다). 또한, 도면에서의 묘사에 의한 프로세스의 예시는 예시된 프로세스가 그에 대한 다른 변화들 및 수정들을 제외한다는 것을 의미하지 않으며, 예시된 프로세스 또는 그의 단계들 중 임의의 것이 본 발명(들) 중 하나 이상에 필수적임을 의미하지 않으며, 예시된 프로세스가 바람직하다는 것을 의미하지 않는다.Also, while process steps, method steps, algorithms, and the like may be described in a sequential order, such processes, methods, and algorithms may be configured to operate in any suitable order. In other words, any sequence or sequence of steps that may be described in this patent application, by itself and naturally, does not indicate the requirement that the steps be performed in that order. Also, some steps may be performed concurrently (for example, because one step is described after another step), although the steps may be described or implied as occurring non-concurrently. In addition, an illustration of a process by way of illustration in the drawings does not imply that the illustrated process excludes other variations and modifications thereto, and that any of the illustrated processes or steps thereof may be implemented by one or more of the inventions , And does not imply that the illustrated process is preferred.

단일의 디바이스 또는 물품이 설명될 때, 하나 초과의 디바이스/물품(이들이 상호작용하든 아니든)이 단일 디바이스/물품 대신에 사용될 수 있다는 것이 쉽게 명백할 것이다. 유사하게, 하나 초과의 디바이스 또는 물품이 설명되지만(이들이 상호작용하든 아니든), 단일 디바이스/물품이 하나 초과의 디바이스 또는 물품을 대신하여 사용될 수 있다는 것이 쉽게 명백할 것이다.When a single device or article is described, it will be readily apparent that more than one device / article (whether or not they interact) can be used instead of a single device / article. Similarly, although more than one device or article is described (whether they interact or not), it will be readily apparent that a single device / article may be used in place of more than one device or article.

디바이스의 기능 및/또는 특징들은 대안적으로 이러한 기능/특징들을 갖는 것으로 명확하게 설명되지 않는 하나 이상의 다른 디바이스들에 의해 실시될 수 있다. 따라서, 본 발명(들) 중 하나 이상의 다른 실시예들은 디바이스 자체를 포함할 필요는 없다.The functions and / or features of the device may alternatively be implemented by one or more other devices that are not explicitly described as having those functions / features. Accordingly, one or more of the other embodiments of the present invention need not include the device itself.

본 명세서에서 설명되거나 또는 참조된 기술들 및 메커니즘들은 때때로 명료함을 위해 단수 형태로 설명될 것이다. 그러나, 특정한 실시예들이 달리 언급되지 않는다면 기술의 다수의 반복들 또는 메커니즘의 다수의 실체화(instantiation)들을 포함한다는 것에 주목하여야 한다.The techniques and mechanisms described or referenced herein will sometimes be described in singular form for clarity. It should be noted, however, that a particular embodiment encompasses a number of repetitions of the technique or numerous instantiations of the mechanism, unless otherwise noted.

또한 가상 어시스턴트로서 알려진 지능형 자동화 어시스턴트를 구현하기 위한 기술의 문맥 내에서 설명되지만, 본 명세서에 설명된 다양한 태양들 및 기술들이 또한 소프트웨어와의 컴퓨터화된 상호작용 및/또는 인간을 수반하는 다른 기술 분야들에서 전개되고/되거나 적용될 수 있다는 것이 이해될 수 있다.Although described within the context of a technique for implementing an intelligent automation assistant known as a virtual assistant, the various aspects and techniques described herein may also be used in conjunction with computerized interactions with software and / Can be developed and / or applied in various ways.

가상 어시스턴트 기술(예컨대, 본 명세서에 설명된 하나 이상의 가상 어시스턴트 시스템 실시예들에 의해 이용되고/되거나 이에 의해 제공되고/되거나 이에 구현될 수 있음)에 관한 다른 태양들은 그 전체 개시 내용들이 본 명세서에 참고로 포함된 하기 중 하나 이상에 개시되어 있다:Other aspects of virtual assistant technology (e.g., utilized and / or provided by and / or implemented by one or more virtual assistant system embodiments described herein) may be found in the entire disclosure herein, Are disclosed in one or more of the following incorporated by reference:

· 2011년 1월 10일자로 출원된 "지능형 자동화 어시스턴트"에 대한 미국 특허 출원 제12/987,982호;U.S. Patent Application No. 12 / 987,982, entitled "Intelligent Automation Assistant, " filed January 10, 2011;

· 2010년 1월 18일자로 출원된 "지능형 자동화 어시스턴트"에 대한 미국 가특허 출원 제61/295,774호;U.S. Provisional Patent Application 61 / 295,774, filed on January 18, 2010, entitled "Intelligent Automation Assistant ";

· 2011년 9월 30일자로 출원된, 발명의 명칭이 "가상 어시스턴트에서의 명령들의 처리를 용이하게 하기 위한 콘텍스트 정보의 사용"인 미국 특허 출원 제13/250,854호;U.S. Patent Application No. 13 / 250,854, entitled "Use Context Information to Facilitate Processing of Instructions in a Virtual Assistant, " filed September 30, 2011;

· 2006년 9월 8일자로 출원된 "지능형 자동화 어시스턴트를 구성하기 위한 방법 및 장치"에 대한 미국 특허 출원 제11/518,292호;U.S. Patent Application No. 11 / 518,292, filed September 8, 2006, entitled " Method and Apparatus for Constructing Intelligent Automation Assistant ";

· 2009년 6월 12일자로 출원된 "의미론적 자동-완성을 위한 시스템 및 방법"에 대한 미국 가특허 출원 제61/186,414호.U.S. Provisional Patent Application 61 / 186,414, filed June 12, 2009, entitled " System and Method for Semantic Auto-Completion ".

하드웨어 아키텍처Hardware architecture

일반적으로, 본 명세서에 개시된 가상 어시스턴트 기술들은 하드웨어, 또는 소프트웨어와 하드웨어의 조합에 구현될 수 있다. 예를 들어, 이들은 운영 시스템 커널에, 별개의 사용자 프로세스에, 네트워크 애플리케이션들로 결합된 라이브러리 패키지에, 특별하게 구성된 기계 상에, 그리고/또는 네트워크 인터페이스 카드 상에 구현될 수 있다. 특정 실시예에서, 본 명세서에 개시된 기술들은 운영 시스템과 같은 소프트웨어에 또는 운영 시스템 상에서 구동하는 애플리케이션에 구현될 수 있다.Generally, the virtual assistant techniques disclosed herein may be implemented in hardware, or a combination of software and hardware. For example, they may be implemented in an operating system kernel, in a separate user process, in a library package coupled with network applications, on a specially configured machine, and / or on a network interface card. In certain embodiments, the techniques described herein may be implemented in software, such as an operating system, or in an application running on an operating system.

본 명세서에 개시된 가상 어시스턴트 실시예(들) 중 적어도 일부의 실시예의 소프트웨어/하드웨어 하이브리드 구현(들)은 메모리에 저장된 컴퓨터 프로그램에 의해 선택적으로 활성화되거나 재구성된 프로그래밍 가능한 기계 상에 구현될 수 있다. 그러한 네트워크 디바이스들은 상이한 유형들의 네트워크 통신 프로토콜들을 이용하도록 구성 또는 설계될 수 있는 다수의 네트워크 인터페이스들을 가질 수 있다. 이들 기계들 중 일부를 위한 일반적인 아키텍처는 본 명세서에 개시된 설명들로부터 나타날 수 있다. 특정 실시예들에 따르면, 본 명세서에 개시된 다양한 가상 어시스턴트 실시예들의 특징들 및/또는 기능들 중 적어도 일부는 최종-사용자 컴퓨터 시스템, 컴퓨터, 네트워크 서버 또는 서버 시스템, 이동 컴퓨팅 디바이스(예컨대, 개인 휴대 정보 단말기(personal digital assistant), 이동 전화기, 스마트폰, 랩탑, 태블릿 컴퓨터 등), 소비자 전자 디바이스, 음악 플레이어, 또는 임의의 다른 적합한 전자 디바이스, 라우터, 스위치 등, 또는 이들의 임의의 조합과 같은 하나 이상의 범용 네트워크 호스트 기계들 상에 구현될 수 있다. 적어도 일부 실시예들에서, 본 명세서에 개시된 다양한 가상 어시스턴트 실시예들의 특징들 및/또는 기능들 중 적어도 일부는 하나 이상의 가상화된 컴퓨팅 환경들(예컨대, 네트워크 컴퓨팅 클라우드들 등)에 구현될 수 있다.The software / hardware hybrid implementation (s) of at least some of the embodiments of virtual assistant embodiment (s) disclosed herein may be implemented on a programmable machine selectively activated or reconfigured by a computer program stored in memory. Such network devices may have multiple network interfaces that may be configured or designed to use different types of network communication protocols. A generic architecture for some of these machines may appear from the descriptions disclosed herein. According to certain embodiments, at least some of the features and / or functions of the various virtual assistant embodiments disclosed herein may be implemented in a computer system, a computer, a network server or server system, a mobile computing device Such as a personal digital assistant, a mobile phone, a smartphone, a laptop, a tablet computer, etc., a consumer electronic device, a music player, or any other suitable electronic device, router, switch, Lt; RTI ID = 0.0 > general-purpose < / RTI > network host machines. In at least some embodiments, at least some of the features and / or functions of the various virtual assistant embodiments disclosed herein may be implemented in one or more virtualized computing environments (e.g., network computing clouds, etc.).

이제 도 9를 참조하면, 본 명세서에 개시된 가상 어시스턴트 특징들 및/또는 기능들 중 적어도 일부분을 구현하기에 적절한 컴퓨팅 디바이스(60)를 묘사한 블록도가 도시되어 있다. 컴퓨팅 디바이스(60)는, 예를 들어, 최종 사용자 컴퓨터 시스템, 네트워크 서버 또는 서버 시스템, 이동 컴퓨팅 디바이스(예컨대, 개인 휴대 정보 단말기, 이동 전화기, 스마트폰, 랩탑, 태블릿 컴퓨터 등), 소비자 전자 디바이스, 음악 플레이어, 또는 임의의 다른 적합한 전자 디바이스, 또는 이들의 임의의 조합 또는 부분일 수 있다. 컴퓨팅 디바이스(60)는, 무선이든 유선이든, 그러한 통신을 위한 알려진 프로토콜들을 사용하여, 인터넷과 같은 통신 네트워크를 통해, 클라이언트들 및/또는 서버들과 같은 다른 컴퓨팅 디바이스들과 통신하도록 구성될 수 있다.Referring now to FIG. 9, a block diagram depicting a computing device 60 suitable for implementing at least a portion of the virtual assistant features and / or functions disclosed herein is shown. The computing device 60 may be, for example, an end user computer system, a network server or server system, a mobile computing device (e.g., a personal digital assistant, a mobile phone, a smartphone, a laptop, A music player, or any other suitable electronic device, or any combination or portion thereof. The computing device 60 may be configured to communicate with other computing devices, such as clients and / or servers, via a communication network, such as the Internet, using known protocols for such communication, whether wireless or wired .

일 실시예에서, 컴퓨팅 디바이스(60)는 중앙 처리 유닛(CPU)(62), 인터페이스(68)들, 및 버스(67)(예를 들어, PCI(peripheral component interconnect) 버스)를 포함한다. 적절한 소프트웨어 또는 펌웨어의 제어 하에서 동작할 때, CPU(62)는 특별하게 구성된 컴퓨팅 디바이스 또는 기계의 기능들과 연관된 특정 기능들을 구현할 책임이 있을 수 있다. 예를 들어, 적어도 하나의 실시예에서, 사용자의 개인 휴대 정보 단말기(PDA) 또는 스마트폰은 CPU(62), 메모리(61, 65), 및 인터페이스(68)(들)를 이용한 가상 어시스턴트 시스템으로서 기능하도록 구성 또는 설계될 수 있다. 적어도 하나의 실시예에서, CPU(62)는 예를 들어 운영 시스템 및 임의의 적절한 애플리케이션 소프트웨어, 드라이버들 등을 포함할 수 있는, 소프트웨어 모듈들/컴포넌트들의 제어 하에서 상이한 유형들의 가상 어시스턴트 기능들 및/또는 동작들 중 하나 이상을 수행하게 될 수 있다.In one embodiment, the computing device 60 includes a central processing unit (CPU) 62, interfaces 68, and a bus 67 (e.g., a peripheral component interconnect (PCI) bus). When operating under the control of appropriate software or firmware, the CPU 62 may be responsible for implementing particular functions associated with the functions of the specifically configured computing device or machine. For example, in at least one embodiment, a user's personal digital assistant (PDA) or smart phone is a virtual assistant system using a CPU 62, memories 61 and 65, and interface 68 (s) Lt; / RTI > In at least one embodiment, the CPU 62 may provide different types of virtual assistant functions and / or functions under the control of software modules / components, which may include, for example, an operating system and any suitable application software, drivers, Or perform one or more of the following operations.

CPU(62)는 예를 들어 모토롤라 또는 인텔 군의 마이크로프로세서들 또는 MIPS 군의 마이크로프로세서들로부터의 프로세서와 같은, 하나 이상의 프로세서(63)(들)를 포함할 수 있다. 일부 실시예들에서, 프로세서(63)(들)는 컴퓨팅 디바이스(60)의 동작들을 제어하기 위해 특별하게 설계된 하드웨어(예컨대, ASIC(application-specific integrated circuit), EEPROM(electrically erasable programmable read-only memory), FPGA(field-programmable gate array) 등)를 포함할 수 있다. 특정 실시예에서, 메모리(61)(비-휘발성 RAM 및/또는 ROM)는 또한 CPU(62)의 부분을 형성한다. 그러나, 메모리가 시스템에 결합될 수 있는 많은 상이한 방식들이 있다. 메모리 블록(61)은, 예를 들어 데이터의 캐싱 및/또는 저장, 명령어들의 프로그래밍 등과 같은 다양한 목적들을 위해 사용될 수 있다.CPU 62 may include one or more processors 63 (s), such as, for example, microprocessors of the Motorola or Intel family, or processors from microprocessors of the MIPS family. In some embodiments, processor 63 (s) may include hardware (e.g., application-specific integrated circuit (ASIC), electrically erasable programmable read-only memory (EEPROM) ), A field-programmable gate array (FPGA), etc.). In certain embodiments, the memory 61 (non-volatile RAM and / or ROM) also forms part of the CPU 62. However, there are many different ways in which memory can be coupled to the system. The memory block 61 may be used for various purposes such as, for example, caching and / or storing of data, programming of instructions, and the like.

본 명세서에 사용되는 바와 같이, 용어 "프로세서"는 단지 당업계에서 프로세서로 지칭되는 이들 집적 회로들로 한정되는 것이 아니라, 광범위하게 마이크로컨트롤러, 마이크로컴퓨터, 프로그래밍 가능한 로직 제어기, 애플리케이션-특정 집적 회로, 및 임의의 다른 프로그래밍 가능한 회로를 지칭한다.As used herein, the term "processor" is not limited to only those integrated circuits referred to in the art as a processor, but may be broadly referred to as a microcontroller, microcomputer, programmable logic controller, And any other programmable circuitry.

일 실시예에서, 인터페이스(68)들은 인터페이스 카드(때때로 "라인 카드"로 지칭됨)로서 제공된다. 일반적으로, 이들은 컴퓨팅 네트워크를 통해 데이터 패킷들의 전송 및 수신을 제어하며 때때로 컴퓨팅 디바이스(60)와 함께 사용되는 다른 주변 장치들을 지원한다. 제공될 수 있는 인터페이스들 중에 이더넷 인터페이스들, 프레임 릴레이 인터페이스들, 케이블 인터페이스들, DSL 인터페이스들, 토큰 링 인터페이스들 등이 있다. 게다가, 예를 들어 범용 시리얼 버스(USB), 시리얼, 이더넷, 파이어와이어, PCI, 병렬, 무선 주파수(RF), 블루투스™, (예컨대, 근거리 자기학(near-field magnetics)을 사용한) 근거리 무선 통신(near-field communication), 802.11(WiFi), 프레임 릴레이, TCP/IP, ISDN, 고속 이더넷 인터페이스들, 기가비트 이더넷 인터페이스들, 비동기식 전송 모드(asynchronous transfer mode, ATM) 인터페이스들, 고속 직렬 인터페이스(high-speed serial interface, HSSI) 인터페이스들, POS(Point of Sale) 인터페이스들, FDDI(fiber data distributed interface)들 등과 같은 다양한 유형들의 인터페이스들이 제공될 수 있다. 일반적으로, 그러한 인터페이스(68)들은 적절한 매체와의 통신에 적절한 포트들을 포함할 수 있다. 일부 경우들에서, 이들은 또한 독립 프로세서를 포함할 수 있고, 일부 경우들에서 휘발성 및/또는 비-휘발성 메모리(예컨대, RAM)를 포함할 수 있다.In one embodiment, the interfaces 68 are provided as interface cards (sometimes referred to as "line cards"). In general, they control the transmission and reception of data packets over the computing network and sometimes support other peripheral devices used with the computing device 60. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. In addition, it is also possible to use, for example, general-purpose serial bus (USB), serial, Ethernet, FireWire, PCI, parallel, radio frequency (RF), Bluetooth ™ (for example near-field magnetics) near-field communication, 802.11 (WiFi), Frame Relay, TCP / IP, ISDN, Fast Ethernet interfaces, Gigabit Ethernet interfaces, asynchronous transfer mode (ATM) interfaces, various types of interfaces may be provided, such as serial interface (HSSI) interfaces, point of sale (POS) interfaces, fiber data distributed interface (FDDI) In general, such interfaces 68 may include ports suitable for communication with an appropriate medium. In some cases, these may also include independent processors and may include volatile and / or non-volatile memory (e.g., RAM) in some cases.

도 9에 도시된 시스템은 본 명세서에 설명된 본 발명의 기술들을 구현하기 위해 컴퓨팅 디바이스(60)에 대한 하나의 특정 아키텍처를 예시하지만, 본 명세서에 설명된 특징들 및 기술들 중 적어도 일부분이 구현될 수 있는 유일한 디바이스 아키텍처는 결코 아니다. 예를 들어, 하나 또는 다수의 프로세서(63)들을 갖는 아키텍처들이 사용될 수 있으며, 그러한 프로세서(63)들은 단일 디바이스에 존재할 수 있거나 다수의 디바이스들 중에 분포될 수 있다. 일 실시예에서, 단일 프로세서(63)는 라우팅 연산들뿐만 아니라 통신들을 취급한다. 다양한 실시예들에서, 상이한 유형들의 가상 어시스턴트 특징들 및/또는 기능들이 클라이언트 디바이스(예를 들어, 개인 휴대 정보 단말기 또는 스마트폰 구동 클라이언트 소프트웨어) 및 서버 시스템(들)(예를 들어, 이하에 보다 상세히 설명되는 서버 시스템)을 포함하는 가상 어시스턴트 시스템에 구현될 수 있다.The system illustrated in FIG. 9 illustrates one particular architecture for computing device 60 to implement the techniques of the present invention described herein, but at least some of the features and techniques described herein It's never the only device architecture that can be. For example, architectures with one or more processors 63 may be used, and such processors 63 may reside in a single device or be distributed among multiple devices. In one embodiment, the single processor 63 handles communications as well as routing operations. In various embodiments, different types of virtual assistant features and / or functions may be implemented on a client device (e.g., personal digital assistant or smartphone-powered client software) and server system (s) A server system described in detail).

네트워크 디바이스 구성에 상관없이, 본 발명의 시스템은 데이터, 범용 네트워크 동작들을 위한 데이터, 프로그램 명령어들 및/또는 본 명세서에 설명되는 가상 어시스턴트 기술들의 기능에 관한 다른 정보를 저장하도록 구성된 하나 이상의 메모리들 또는 메모리 모듈들(예를 들어, 메모리 블록(65))을 채용할 수 있다. 프로그램 명령어들은, 예를 들어 운영 시스템 및/또는 하나 이상의 애플리케이션들의 동작을 제어할 수 있다. 메모리 또는 메모리들은 또한 데이터 구조들, 키워드 분류 정보, 광고 정보, 사용자 클릭 및 노출 정보, 및/또는 본 명세서에 설명된 다른 특정 비-프로그램 정보를 저장하도록 구성될 수 있다.Regardless of the network device configuration, the inventive system may include one or more memories configured to store data, data for general-purpose network operations, program instructions and / or other information relating to the functionality of the virtual assistant technologies described herein, Memory modules (e.g., memory block 65) may be employed. The program instructions may, for example, control the operation of the operating system and / or one or more applications. The memory or memories may also be configured to store data structures, keyword classification information, advertising information, user click and exposure information, and / or other specific non-program information described herein.

그러한 정보 및 프로그램 명령어들이 본 명세서에 설명된 시스템들/방법들을 구현하기 위해 채용될 수 있기 때문에, 적어도 일부 네트워크 디바이스 실시예들은 예를 들어 본 명세서에 설명된 다양한 동작들을 수행하기 위해 프로그램 명령어들, 상태 정보 등을 저장하도록 구성 또는 설계될 수 있는, 비일시적 기계-판독 가능한 저장 매체를 포함할 수 있다. 그러한 비일시적 기계-판독 가능한 저장 매체의 예들은, 하드 디스크들, 플로피 디스크들, 및 자기 테이프와 같은 자기 매체; CD-ROM 디스크들과 같은 광 매체; 플롭티컬(floptical) 디스크들과 같은 자기-광학 매체; 및 ROM, 플래시 메모리, 멤리스터(memristor) 메모리, RAM 등과 같은 프로그램 명령어들을 저장 및 수행하도록 특수하게 구성되는 하드웨어 디바이스들을 포함하지만 이로 한정되지 않는다. 프로그램 명령어들의 예들은 컴파일러에 의해 생성되는 바와 같은 기계 코드, 및 해석기를 사용하여 컴퓨터에 의해 실행될 수 있는 상위 레벨 코드를 포함한 파일들 둘 모두를 포함한다.Because such information and program instructions may be employed to implement the systems / methods described herein, at least some network device embodiments may include, for example, program instructions, And non-volatile machine-readable storage media that can be configured or designed to store status information, and the like. Examples of such non-volatile machine-readable storage media include magnetic media such as hard disks, floppy disks, and magnetic tape; Optical media such as CD-ROM disks; Magneto-optical media such as floptical disks; And hardware devices that are specifically configured to store and perform program instructions such as ROM, flash memory, memristor memory, RAM, and the like. Examples of program instructions include both machine code as generated by the compiler and files containing high-level code that can be executed by the computer using the interpreter.

일 실시예에서, 본 발명의 시스템은 독립형 컴퓨팅 시스템 상에 구현된다. 이제 도 10을 참조하면, 적어도 하나의 실시예에 따라, 독립형 컴퓨팅 시스템 상에 가상 어시스턴트의 적어도 일부분을 구현하기 위한 아키텍처를 묘사한 블록도가 도시되어 있다. 컴퓨팅 디바이스(60)는 다중모드 가상 어시스턴트(1002)를 구현하기 위한 소프트웨어를 구동하는 프로세서(63)(들)를 포함한다. 입력 디바이스(1206)는 예를 들어 키보드, 터치스크린, 마우스, 터치패드, 트랙볼, 5-방향 스위치, 조이스틱, 및/또는 이들의 임의의 조합을 포함한, 사용자 입력을 수신하기에 적합한 임의의 유형일 수 있다. 디바이스(60)는 또한, 예를 들어 마이크로폰과 같은 스피치 입력 디바이스(1211)를 포함할 수 있다. 출력 디바이스(1207)는 스크린, 스피커, 프린터, 및/또는 이들의 임의의 조합일 수 있다. 메모리(1210)는 소프트웨어를 구동하는 동안 프로세서(63)(들)에 의한 사용을 위해, 당업계에 알려진 바와 같은 구조 및 아키텍처를 갖는 RAM일 수 있다. 저장 디바이스(1208)는 디지털 형태로 데이터의 저장을 위한 임의의 자기, 광학 및/또는 전기 저장 디바이스일 수 있으며; 예들은 플래시 메모리, 자기 하드 드라이브, CD-ROM 및/또는 기타를 포함한다.In one embodiment, the system of the present invention is implemented on a standalone computing system. Referring now to FIG. 10, a block diagram depicting an architecture for implementing at least a portion of a virtual assistant on a standalone computing system, in accordance with at least one embodiment, is shown. The computing device 60 includes a processor 63 (s) that runs software to implement the multi-mode virtual assistant 1002. The processor 63 (s) The input device 1206 may be any type suitable for receiving user input, including, for example, a keyboard, a touch screen, a mouse, a touchpad, a trackball, a five-way switch, a joystick, and / have. The device 60 may also include a speech input device 1211, such as, for example, a microphone. Output device 1207 may be a screen, a speaker, a printer, and / or any combination thereof. The memory 1210 may be a RAM having the structure and architecture as known in the art, for use by the processor 63 (s) while running the software. The storage device 1208 may be any magnetic, optical and / or electrical storage device for storage of data in digital form; Examples include flash memory, magnetic hard drives, CD-ROMs, and / or the like.

다른 실시예에서, 본 발명의 시스템은 다수의 클라이언트들 및/또는 서버들을 갖는 것과 같은 분산된 컴퓨팅 네트워크 상에 구현된다. 이제 도 11을 참조하면, 적어도 하나의 실시예에 따라, 분산 컴퓨팅 네트워크 상에 가상 어시스턴트의 적어도 일부분을 구현하기 위한 아키텍처를 묘사한 블록도가 도시되어 있다.In another embodiment, the system of the present invention is implemented on a distributed computing network, such as having multiple clients and / or servers. Referring now to FIG. 11, a block diagram depicting an architecture for implementing at least a portion of a virtual assistant on a distributed computing network, in accordance with at least one embodiment, is shown.

도 11에 도시된 배열에서, 다수의 클라이언트(1304)들이 제공되며; 각각의 클라이언트(1304)는 본 발명의 클라이언트측 부분들을 구현하기 위한 소프트웨어를 구동할 수 있다. 게다가, 클라이언트(1304)들로부터 수신된 요청들을 취급하기 위해 다수의 서버(1340)들이 제공될 수 있다. 클라이언트(1304)들 및 서버(1340)들은 인터넷과 같은 전자 네트워크(1361)를 통해 서로 통신할 수 있다. 네트워크(1361)는 예를 들어 유선 및/또는 무선 프로토콜들을 포함한 임의의 알려진 네트워크 프로토콜들을 사용하여 구현될 수 있다.In the arrangement shown in Figure 11, a number of clients 1304 are provided; Each client 1304 may run software to implement the client-side portions of the present invention. In addition, multiple servers 1340 can be provided to handle requests received from clients 1304. Clients 1304 and servers 1340 may communicate with each other via an electronic network 1361, such as the Internet. The network 1361 may be implemented using any known network protocols including, for example, wired and / or wireless protocols.

게다가, 일 실시예에서, 서버(1340)들은 부가적인 정보를 획득하거나 특정 사용자들과의 이전의 상호작용들에 관한 저장된 데이터를 참조하기 위해 필요할 때 외부 서비스(1360)들을 호출할 수 있다. 외부 서비스(1360)들과의 통신들은 예를 들어 네트워크(1361)를 통해 일어날 수 있다. 다양한 실시예들에서, 외부 서비스(1360)들은 하드웨어 디바이스 자체와 관련되거나 그 상에 설치된 웹-인에이블드(web-enabled) 서비스들 및/또는 기능을 포함한다. 예를 들어, 어시스턴트(1002)가 스마트폰 또는 다른 전자 디바이스 상에 구현되는 실시예에서, 어시스턴트(1002)는 캘린더 애플리케이션("앱(app)"), 연락처들, 및/또는 다른 소스들에 저장된 정보를 획득할 수 있다.In addition, in one embodiment, the servers 1340 may call external services 1360 as needed to obtain additional information or refer to stored data relating to previous interactions with particular users. Communications with external services 1360 may occur via network 1361, for example. In various embodiments, external services 1360 include web-enabled services and / or functions associated with or installed on the hardware device itself. For example, in an embodiment where the assistant 1002 is implemented on a smartphone or other electronic device, the assistant 1002 may be stored in a calendar application ("app"), contacts, and / Information can be obtained.

다양한 실시예들에서, 어시스턴트(1002)는 어시스턴트가 설치된 전자 디바이스의 많은 특징들 및 동작들을 제어할 수 있다. 예를 들어, 어시스턴트(1002)는 디바이스들 상에서 종래의 사용자 인터페이스를 사용하여 달리 개시될 수도 있는 기능들 및 동작들을 수행하기 위해, API들을 통해 또는 다른 수단에 의해 디바이스 상에서의 기능 및 애플리케이션들과 인터페이싱하는 외부 서비스(1360)들을 호출할 수 있다. 그러한 기능들 및 동작들은, 예를 들어 알람을 설정하는 것, 전화를 거는 것, 텍스트 메시지 또는 이메일 메시지를 전송하는 것, 캘린더 이벤트를 추가하는 것 등을 포함할 수 있다. 그러한 특징들 및 동작들은 사용자 및 어시스턴트(1002) 사이에서의 대화 다이얼로그의 콘텍스트에서 애드-온 기능들로서 수행될 수 있다. 그러한 기능들 및 동작들은 그러한 다이얼로그의 콘텍스트에서 사용자에 의해 특정될 수 있거나, 이들은 다이얼로그의 콘텍스트에 기초하여 자동으로 수행될 수 있다. 당업자는 어시스턴트(1002)가 이에 의해 전자 디바이스 상에서의 다양한 동작들을 개시 및 제어하기 위한 제어 메커니즘으로서 사용될 수 있으며, 이것이 버튼들 또는 그래픽 사용자 인터페이스들과 같은 종래의 메커니즘들에 대한 대안으로서 사용될 수 있다는 것을 인식할 것이다.In various embodiments, the assistant 1002 can control many features and operations of the electronic device in which the assistant is installed. For example, the assistant 1002 may interface with functions and applications on the device via APIs or by other means to perform functions and operations that may be otherwise initiated using the conventional user interface on the devices Lt; RTI ID = 0.0 > 1360 < / RTI > Such functions and operations may include, for example, setting an alarm, placing a call, sending a text message or an email message, adding calendar events, and the like. Such features and operations may be performed as add-on functions in the context of the dialogue dialog between the user and the assistant 1002. [ Such functions and operations may be specified by the user in the context of such a dialog, or they may be performed automatically based on the context of the dialog. Those skilled in the art will appreciate that the assistant 1002 can thereby be used as a control mechanism for initiating and controlling various operations on an electronic device, which can be used as an alternative to conventional mechanisms such as buttons or graphical user interfaces Will recognize.

예를 들어, 사용자는 "나는 내일 오전 8시에 일어나야 한다"와 같은 입력을 어시스턴트(1002)에 제공할 수 있다. 일단 어시스턴트(1002)가 본 명세서에 설명된 기술들을 사용하여 사용자의 의도를 판단하였다면, 어시스턴트(1002)는 디바이스 상에서의 알람 클록 기능 또는 애플리케이션과 인터페이싱하기 위해 외부 서비스(1360)들을 호출할 수 있다. 어시스턴트(1002)는 사용자 대신에 알람을 설정한다. 이러한 방식으로, 사용자는 알람을 설정하거나 디바이스 상에서 다른 기능들을 수행하기 위한 종래의 메커니즘들에 대한 대체로서 어시스턴트(1002)를 사용할 수 있다. 사용자의 요청들이 모호하거나 추가 해명을 필요로 한다면, 어시스턴트(1002)는 능동적 유도, 의역하기, 제안들 등을 포함한, 본 명세서에 설명된 다양한 기술들을 사용할 수 있으며, 이는 핸즈-프리 콘텍스트에 적응될 수 있어서, 정확한 서비스(1360)들이 호출되고 의도된 동작이 취해진다. 일 실시예에서, 어시스턴트(1002)는 확인을 위해 사용자를 프롬프팅하고/하거나 기능을 수행하도록 서비스(1360)를 호출하기 전에 임의의 적합한 소스로부터 부가적인 콘텍스트 정보를 요청할 수 있다. 일 실시예에서, 사용자는 특정 서비스(1360)들을 호출하는 어시스턴트(1002)의 능력을 선택적으로 디스에이블시킬 수 있거나, 원한다면 모든 그러한 서비스-호출을 디스에이블시킬 수 있다.For example, the user may provide an input to the assistant 1002, such as "I have to wake up tomorrow morning at 8 am ". Once the assistant 1002 has determined the intent of the user using the techniques described herein, the assistant 1002 may invoke external services 1360 to interface with the alarm clock function or application on the device. The assistant 1002 sets an alarm on behalf of the user. In this manner, the user can use the assistant 1002 as an alternative to conventional mechanisms for setting alarms or performing other functions on the device. If the user's requests are ambiguous or require additional clarification, the assistant 1002 may use various techniques described herein, including active derivation, paraphrasing, suggestions, etc., which may be adapted to the hands-free context The correct services 1360 are invoked and the intended operation is taken. In one embodiment, the assistant 1002 may request additional contextual information from any suitable source before invoking the service 1360 to prompt and / or perform a function for verification. In one embodiment, the user may selectively disable the ability of the assistant 1002 to call specific services 1360, or may disable all such service-calls, if desired.

본 발명의 시스템은 다수의 상이한 유형들의 클라이언트(1304)들 및 동작 모드들 중 임의의 것을 갖고 구현될 수 있다. 이제 도 12를 참조하면, 여러 개의 상이한 유형들의 클라이언트(1304)들 및 동작 모드들을 예시한 시스템 아키텍처를 묘사한 블록도가 도시되어 있다. 당업자는 도 12에 도시된 다양한 유형들의 클라이언트(1304)들 및 동작 모드들이 단지 예시적이며, 본 발명의 시스템은 묘사된 것들을 제외한 클라이언트(1304)들 및/또는 동작 모드들을 사용하여 구현될 수 있다는 것을 인식할 것이다. 부가적으로, 시스템은 그러한 클라이언트(1304)들 및/또는 동작 모드들 중 임의의 것 또는 모두를 단독으로 또는 조합하여 포함할 수 있다. 묘사된 예들은 하기를 포함한다:The system of the present invention may be implemented with any of a number of different types of clients 1304 and modes of operation. Referring now to FIG. 12, a block diagram depicting a system architecture illustrating several different types of clients 1304 and modes of operation is shown. Those skilled in the art will appreciate that the various types of clients 1304 and modes of operation illustrated in FIG. 12 are merely exemplary and that the system of the present invention may be implemented using clients 1304 and / or modes of operation other than those depicted &Lt; / RTI > In addition, the system may include any or all of such clients 1304 and / or modes of operation, alone or in combination. Illustrative examples include:

· 입력/출력 디바이스들 및/또는 센서(1402)들을 갖는 컴퓨터 디바이스들. 클라이언트 컴포넌트는 임의의 그러한 컴퓨터 디바이스(1402) 상에 배치될 수 있다. 적어도 하나의 실시예는 네트워크(1361)를 통해 서버(1340)들과의 통신을 가능하게 하기 위해 웹 브라우저(1304A) 또는 다른 소프트웨어 애플리케이션을 사용하여 구현될 수 있다. 입력 및 출력 채널들은 예를 들어 시각 및/또는 청각 채널들을 포함한 임의의 유형의 것일 수 있다. 예를 들어, 일 실시예에서, 본 발명의 시스템은 웹 브라우저의 등가물이 스피치에 의해 구동되고 출력을 위해 스피치를 사용하는 시각 장애인을 위한 어시스턴트의 실시예를 허용하는, 보이스-기반 통신 방법들을 사용하여 구현될 수 있다.Computer devices having input / output devices and / or sensors 1402. The client component may be located on any such computer device (1402). At least one embodiment may be implemented using a web browser 1304A or other software application to enable communication with the servers 1340 via the network 1361. [ The input and output channels may be of any type, including, for example, visual and / or auditory channels. For example, in one embodiment, the system of the present invention utilizes voice-based communication methods, which allow an embodiment of an assistant for the visually impaired to have an equivalent of a web browser driven by speech and using speech for output. .

· 클라이언트가 이동 디바이스(1304B) 상에서 애플리케이션으로서 구현될 수 있는, I/O 및 센서(1406)들을 갖는 이동 디바이스들. 이는 이동 전화기들, 스마트폰들, 개인 휴대 정보 단말기들, 태블릿 디바이스들, 네트워킹된 게임 콘솔들 등을 포함하지만 이로 한정되지 않는다.Mobile devices with I / O and sensors 1406, wherein the client may be implemented as an application on mobile device 1304B. This includes, but is not limited to, mobile telephones, smart phones, personal digital assistants, tablet devices, networked game consoles, and the like.

· 클라이언트가 기기(1304C) 상에 내장된 애플리케이션으로서 구현될 수 있는, I/O 및 센서(1410)들을 갖는 소비자 기기들.Consumer devices having I / O and sensors 1410, wherein the client may be implemented as an application embedded on device 1304C.

· 클라이언트가 내장된 시스템 애플리케이션(1304D)으로서 구현될 수 있는, 대시보드 인터페이스들 및 센서(1414)들을 갖는 자동차들 및 다른 차량들. 이는 자동차 내비게이션 시스템들, 보이스 제어 시스템들, 자동차-내 엔터테인먼트 시스템들 등을 포함하지만 이로 한정되지 않는다.Automobiles and other vehicles with dashboard interfaces and sensors 1414 that can be implemented as a system application 1304D with a client embedded. This includes, but is not limited to, car navigation systems, voice control systems, in-car entertainment systems, and the like.

· 클라이언트가 디바이스-상주 애플리케이션(1304E)으로서 구현될 수 있는, 네트워크 상에 존재하거나 그와 인터페이싱하는 라우터(1418)들 또는 임의의 다른 디바이스와 같은 네트워킹된 컴퓨팅 디바이스들.Networked computing devices such as routers 1418 or any other device that resides on or interfaces with the network, where the client may be implemented as a device-resident application 1304E.

· 어시스턴트의 실시예가 이메일 양식 서버(1426)를 통해 연결되는 이메일 클라이언트(1424)들. 이메일 양식 서버(1426)는 통신 브리지로서 작용하는데, 예를 들어, 사용자로부터의 입력을 어시스턴트로 전송된 이메일 메시지들로서 취하고 응답들로서 어시스턴트로부터 사용자들로 출력을 전송한다.An email client 1424 in which an embodiment of an assistant is connected via email form server 1426. The e-mail form server 1426 acts as a communication bridge, for example, taking input from a user as email messages sent to an assistant and sending output from the assistant as responses, to users.

· 어시스턴트의 실시예가 메시징 양식 서버(1430)를 통해 연결되는 인스턴트 메시징 클라이언트(1428)들. 메시징 양식 서버(1430)는 통신 브리지로서 작용하여, 사용자로부터 입력을 어시스턴트로 전송된 메시지로서 취하고 응답 시 메시지들로서 어시스턴트로부터 사용자로 출력을 전송한다.Instant messaging clients 1428, in which embodiments of an assistant are connected through a messaging style server 1430. Messaging style server 1430 acts as a communication bridge to take input from the user as an assisted-forwarded message and to send the output from the assistant as messages in response to the user.

· 어시스턴트의 실시예가 VoIP(Voice over Internet Protocol) 양식 서버(1434)를 통해 연결되는 보이스 전화기(1432)들. VoIP 양식 서버(1434)는 통신 브리지로서 작용하여, 사용자로부터의 입력을 어시스턴트에게 말하여진 보이스로서 취하고 응답 시 예를 들어 합성된 스피치로서 어시스턴트로부터 사용자로 출력을 전송한다.Voice phone 1432 to which embodiments of the assistant are connected via a Voice over Internet Protocol (VoIP) form server 1434. The VoIP form server 1434 acts as a communication bridge to take input from the user as the voice spoken to the assistant and transmit the output from the assistant to the user as a synthesized speech in response, for example.

이메일, 인스턴트 메시징, 논의 포럼들, 그룹 채팅 세션들, 라이브 도움 또는 고객 지원 세션들 등을 포함하지만 이로 한정되지 않는 메시징 플랫폼들을 위해, 어시스턴트(1002)는 대화들에서의 참여자로서 작용할 수 있다. 어시스턴트(1002)는 일-대-일 상호작용들을 위해 본 명세서에 설명된 하나 이상의 기술들 및 방법들을 사용하여 대화를 모니터링하고 개인들 또는 그룹에 응답할 수 있다.For messaging platforms including, but not limited to, e-mail, instant messaging, discussion forums, group chat sessions, live help or customer support sessions, assistant 1002 may act as a participant in conversations. Assistant 1002 can monitor conversations and respond to individuals or groups using one or more of the techniques and methods described herein for one-to-one interactions.

다양한 실시예들에서, 본 발명의 기술들을 구현하기 위한 기능이 다수의 클라이언트 및/또는 서버 컴포넌트들 중에 분포될 수 있다. 예를 들어, 다양한 소프트웨어 모듈들이 본 발명과 관련되어 다양한 기능들을 수행하기 위해 구현될 수 있으며, 그러한 모듈들은 서버 및/또는 클라이언트 컴포넌트들 상에서 구동하기 위해 다양하게 구현될 수 있다. 그러한 배열을 위한 추가 세부 사항들은, 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 1월 10일자로 출원된 "지능형 자동화 어시스턴트"에 대한, 관련된 미국 특허 출원 제12/987,982호에서 제공된다.In various embodiments, the functionality for implementing the techniques of the present invention may be distributed among a number of client and / or server components. For example, various software modules may be implemented to perform various functions in connection with the present invention, and such modules may be variously implemented to operate on a server and / or client components. Additional details for such arrangements are provided in related US patent application Ser. No. 12 / 987,982, entitled " Intelligent Automation Assistant, " filed January 10, 2011, the entire disclosure of which is incorporated herein by reference. do.

도 13의 예에서, 입력 유도 기능 및 출력 처리 기능은, 입력 유도의 클라이언트 부분(2794a) 및 출력 처리의 클라이언트 부분(2792a)이 클라이언트(1304)에 위치되고 입력 유도의 서버 부분(2794b) 및 출력 처리의 서버 부분(2792b)이 서버(1340)에 위치되는 상태로, 클라이언트(1304) 및 서버(1340) 중에 분포된다. 하기의 컴포넌트들은 서버(1340)에 위치된다:In the example of Figure 13 the input and output processing functions are such that the client portion 2794a of the input derivation and the client portion 2792a of the output processing are located in the client 1304 and the server portion 2794b and the output Is distributed among the client 1304 and the server 1340, with the server portion 2792b of the processing being located in the server 1340. [ The following components are located at server 1340:

· 완성 어휘(2758b);· Completed vocabulary (2758b);

· 언어 패턴 인식기들의 완성 라이브러리(2760b);A completion library 2760b of language pattern recognizers;

· 단기 개인용 메모리의 마스터 버전(2752b);A master version 2752b of short term personal memory;

· 장기 개인용 메모리의 마스터 버전(2754b).· Master version of long-term personal memory (2754b).

일 실시예에서, 클라이언트(1304)는 응답성을 개선하고 네트워크 통신들에 대한 의존을 감소시키기 위해 이들 컴포넌트들의 서브세트들 및/또는 부분들을 국소적으로 유지한다. 이러한 서브세트들 및/또는 부분들은 잘 알려진 캐시 관리 기술들에 따라 유지되고 업데이트될 수 있다. 그러한 서브세트들 및/또는 부분들은, 예를 들어 하기를 포함한다:In one embodiment, client 1304 locally maintains subsets and / or portions of these components to improve responsiveness and reduce dependence on network communications. These subsets and / or portions may be maintained and updated in accordance with well known cache management techniques. Such subsets and / or portions include, for example, the following:

· 어휘의 서브세트(2758a);A subset of the vocabulary 2758a;

· 언어 패턴 인식기들의 라이브러리의 서브세트(2760a);A subset 2760a of libraries of language pattern recognizers;

· 단기 개인용 메모리의 캐시(2752a);Cache 2752a of short term personal memory;

· 장기 개인용 메모리의 캐시(2754a).Cache 2754a of long term personal memory.

부가적인 컴포넌트들은, 예를 들어 하기를 포함한 서버(1340)의 부분으로서 구현될 수 있다:Additional components may be implemented as part of the server 1340, including, for example, the following:

· 언어 해석기(2770);A language interpreter 2770;

· 다이얼로그 흐름 프로세서(2780);A dialog flow processor 2780;

· 출력 프로세서(2790);An output processor 2790;

· 도메인 엔티티 데이터베이스(2772)들;Domain entity databases 2772;

· 태스크 흐름 모델(2786)들;· Task flow models (2786);

· 서비스들 조합(2782);A combination of services 2782;

· 서비스 능력 모델(2788)들.· Service capability models (2788).

서버(1340)는 요구될 때 외부 서비스(1360)들과 인터페이싱함으로써 부가적인 정보를 획득한다.Server 1340 obtains additional information by interfacing with external services 1360 as required.

이제 도 14를 참조하면, 일부 실시예들에 따른 전자 디바이스(2000)의 기능 블록도가 도시되어 있다. 디바이스의 기능 블록들이 본 발명의 원리들을 실행하기 위해 하드웨어, 소프트웨어, 또는 하드웨어와 소프트웨어의 조합에 의해 구현될 수 있다. 도 14에 설명된 기능 블록들은 전술된 바와 같이 본 발명의 원리들을 구현하기 위해 조합되거나 서브-블록들로 분리될 수 있음이 당업자에 의해 이해된다. 그러므로, 본 명세서에서의 설명은 본 명세서에 설명된 기능 블록들의 임의의 가능한 조합 또는 분리 또는 추가 정의를 지원할 수 있다.Referring now to FIG. 14, a functional block diagram of an electronic device 2000 in accordance with some embodiments is shown. The functional blocks of the device may be implemented by hardware, software, or a combination of hardware and software to implement the principles of the present invention. It is understood by those skilled in the art that the functional blocks described in FIG. 14 may be combined or separated into sub-blocks to implement the principles of the invention as described above. Thus, the description herein may support any possible combination or separation or further definition of the functional blocks described herein.

도 14에 도시된 바와 같이, 전자 디바이스(2000)는 처리 유닛(2006)을 포함한다. 일부 실시예들에서, 처리 유닛(2006)은 콘텍스트 검출 유닛(2008), 사용자 입력 해석 유닛(2010), 태스크 및 파라미터 식별 유닛(2012), 태스크 실행 유닛(2014), 및 다이얼로그 응답 생성 유닛(2016)을 포함한다. 전자 디바이스(2000)는 또한 처리 유닛에 결합되며 입력을 위해 사용자를 프롬프팅하도록 구성된 출력 유닛(2002)을 포함한다. 전자 디바이스는 또한 처리 유닛에 결합되며 사용자 입력을 수신하도록 구성된 입력 유닛(2003)을 포함한다. 일부 실시예들에서, 전자 디바이스(2000)는 또한 환경 조건 센서, 주변 디바이스, 차량의 온보드 시스템, 위치 센서(예컨대, GPS 센서), 속도 센서 등과 같은, 디바이스(2000)의 하나 이상의 센서들 및/또는 전자 디바이스(2000)의 외부에 있는 센서들/디바이스들로부터 데이터를 수신하도록 구성된 통신 유닛(2004)을 포함한다.As shown in FIG. 14, the electronic device 2000 includes a processing unit 2006. In some embodiments, the processing unit 2006 includes a context detection unit 2008, a user input interpretation unit 2010, a task and parameter identification unit 2012, a task execution unit 2014, and a dialog response generation unit 2016 ). The electronic device 2000 also includes an output unit 2002 coupled to the processing unit and configured to prompt the user for input. The electronic device also includes an input unit 2003 coupled to the processing unit and configured to receive user input. In some embodiments, the electronic device 2000 may also include one or more sensors and / or sensors of the device 2000, such as an environmental condition sensor, a peripheral device, an onboard system of the vehicle, a position sensor (e.g., GPS sensor) Or a communication unit 2004 configured to receive data from sensors / devices external to the electronic device 2000.

처리 유닛은, (예컨대, 콘텍스트 검출 유닛(2008)을 사용하여) 핸즈-프리 콘텍스트가 활성인지 여부를 검출하고; (예컨대, 사용자 입력 해석 유닛(2010)을 사용하여) 사용자 의도의 표현을 도출하기 위해 수신된 사용자 입력을 해석하고; 도출된 사용자 의도의 표현에 적어도 부분적으로 기초하여, (예컨대, 태스크 및 파라미터 식별 유닛(2012)을 사용하여) 적어도 하나의 태스크 및 태스크에 대한 적어도 하나의 파라미터를 식별하고; (예로서, 태스크 실행 유닛(2014)을 사용하여) 결과를 도출하기 위해 적어도 하나의 파라미터를 사용하여 적어도 하나의 태스크를 실행하고; (예컨대, 다이얼로그 응답 생성 유닛(2016)을 사용하여) 도출된 결과에 기초하여 다이얼로그 응답을 생성하고; (예컨대, 출력 유닛(2002)을 사용하여) 출력 디바이스가 생성된 다이얼로그 응답을 출력하게 하도록 구성된다. (예컨대, 콘텍스트 검출 유닛(2008)에 의한) 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 입력을 위해 사용자를 프롬프팅하는 단계, 사용자 입력을 수신하는 단계, 수신한 사용자 입력을 해석하는 단계, 적어도 하나의 태스크 및 태스크에 대한 적어도 하나의 파라미터를 식별하는 단계, 및 다이얼로그 응답을 생성하는 단계 중 적어도 하나가 (예컨대, 입력 유닛(2003), 출력 유닛(2002), 및/또는 처리 유닛(2006)에 의해) 핸즈-프리 콘텍스트와 연관된 제한들에 따르는 방식으로 수행된다.The processing unit detects whether the hands-free context is active (e.g., using the context detection unit 2008); Interpret the received user input to derive a representation of the user's intention (e.g., using the user input interpretation unit 2010); Identify at least one parameter for at least one task and a task (e.g., using task and parameter identification unit 2012) based at least in part on an expression of the derived user intent; Execute at least one task using at least one parameter to derive a result (e.g., using task execution unit 2014); Generate a dialog response based on the derived result (e.g., using the dialog response generating unit 2016); (E.g., using output unit 2002) to cause the output device to output the generated dialog response. In response to detecting that the device is in a hands-free context (e.g., by context detection unit 2008), prompting the user for input, receiving user input, interpreting the received user input , Identifying at least one parameter for at least one task and a task, and generating a dialog response (e.g., input unit 2003, output unit 2002, and / or processing unit 2006)) in accordance with the constraints associated with the hands-free context.

일부 실시예들에서, 적어도 2개의 상호작용 모드들이 컴퓨팅 디바이스와의 사용자 상호작용을 위해 이용 가능한 경우, 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 입력을 위해 사용자를 프롬프팅하는 단계, 사용자 입력을 수신하는 단계, 수신한 사용자 입력을 해석하는 단계, 적어도 하나의 태스크 및 상기 태스크에 대한 적어도 하나의 파라미터를 식별하는 단계, 및 다이얼로그 응답을 생성하는 단계 중 적어도 하나가 (예컨대, 처리 유닛(2006)의 하나 이상의 유닛들을 사용하여) 핸즈-프리 동작에 적응된 제1 상호작용 모드를 사용하여 수행되며; 디바이스가 핸즈-프리 콘텍스트에 있지 않다는 검출에 응답하여, 입력을 위해 사용자를 프롬프팅하는 단계, 사용자 입력을 수신하는 단계, 수신한 사용자 입력을 해석하는 단계, 적어도 하나의 태스크 및 상기 태스크에 대한 적어도 하나의 파라미터를 식별하는 단계, 및 다이얼로그 응답을 생성하는 단계 중 적어도 하나가 (예로서, 처리 유닛(2006)의 하나 이상의 유닛들을 사용하여) 핸즈-프리 동작에 적응되지 않은 제2 상호작용 모드를 사용하여 수행된다.In some embodiments, in response to detecting that the device is in a hands-free context, if at least two interaction modes are available for user interaction with the computing device, prompting the user for input, At least one of the steps of receiving an input, interpreting a received user input, identifying at least one task and at least one parameter for the task, and generating a dialog response (e.g., 2006) using a first interaction mode adapted for hands-free operation; In response to detecting that the device is not in a hands-free context, prompting the user for input, receiving user input, interpreting the received user input, determining at least one task and at least At least one of the steps of identifying a parameter and generating a dialog response may be performed in a second interaction mode that is not adapted to the hands-free operation (e.g., using one or more units of the processing unit 2006) .

일부 실시예들에서, 처리 유닛(2006)은, 컴퓨팅 디바이스에 의해 제공된 시각적 출력을 보는 사용자의 능력; 컴퓨팅 디바이스에 의해 제공된 그래픽 사용자 인터페이스와 상호작용하는 사용자의 능력; 컴퓨팅 디바이스의 물리적 컴포넌트를 사용하는 사용자의 능력; 컴퓨팅 디바이스 상에서 터치 입력을 수행하는 사용자의 능력; 컴퓨팅 디바이스 상에서 스위치를 활성화시키는 사용자의 능력; 및 컴퓨팅 디바이스 상에서 키보드를 사용하는 사용자의 능력으로 이루어진 군으로부터 선택된 적어도 하나에서의 제한을 표시한 조건을 검출함으로써 (예컨대, 콘텍스트 검출 유닛(2008)을 사용하여) 핸즈-프리 콘텍스트가 활성인지 여부를 검출하도록 추가로 구성된다.In some embodiments, the processing unit 2006 includes: a user's ability to view the visual output provided by the computing device; The user's ability to interact with the graphical user interface provided by the computing device; The ability of a user to use physical components of a computing device; The ability of a user to perform touch input on a computing device; The ability of the user to activate the switch on the computing device; (E.g., using the context detection unit 2008) to detect whether the hands-free context is active or not by detecting conditions indicative of a restriction in at least one selected from the group consisting of the ability of the user to use the keyboard on the computing device .

일부 실시예들에서, 출력 유닛(2002)은 디바이스가 핸즈-프리 콘텍스트에 있지 않다는 검출에 응답하여, 핸즈-프리 콘텍스트에 적응되지 않은 제1 출력 모드를 통해 사용자를 프롬프팅함으로써, 그리고 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 핸즈-프리 콘텍스트에 적응된 제2 출력 모드를 통해 사용자를 프롬프팅함으로써 입력을 위해 사용자를 프롬프팅하도록 구성된다. 일부 실시예들에서, 제1 출력 모드는 시각적 출력 모드이다. 일부 실시예들에서, 제2 출력 모드는 청각적 출력 모드이다.In some embodiments, the output unit 2002 is responsive to detecting that the device is not in a hands-free context, by prompting the user through a first output mode that is not adapted to the hands-free context, - prompting the user for input by prompting the user via a second output mode adapted to the hands-free context, in response to detecting that the user is in the free context. In some embodiments, the first output mode is a visual output mode. In some embodiments, the second output mode is an audible output mode.

일부 실시예들에서, (예컨대, 출력 유닛(2002)을 사용하여) 시각적 출력 모드를 통해 사용자를 프롬프팅하는 것은 디스플레이 스크린 상에 프롬프트를 디스플레이하는 것을 포함하며, (예컨대, 출력 유닛(2002)을 사용하여) 청각적 출력 모드를 통해 사용자를 프롬프팅하는 것은 음성 프롬프트(spoken prompt)를 출력하는 것을 포함한다.In some embodiments, prompting the user through the visual output mode (e.g., using the output unit 2002) may include displaying a prompt on the display screen (e.g., using the output unit 2002) ) Prompting the user through the audible output mode involves outputting a spoken prompt.

일부 실시예들에서, 처리 유닛(2006)은 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, (예컨대, 사용자 해석 유닛(2010)을 사용하여) 핸즈-프리 동작과 연관된 어휘를 사용하여 수신된 사용자 입력을 해석하도록 구성된다.In some embodiments, the processing unit 2006 is responsive to detecting that the device is in a hands-free context, using the vocabulary associated with the hands-free operation (e.g., using the user interpretation unit 2010) And is configured to interpret user input.

일부 실시예들에서, 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 처리 유닛(2006)은 (예컨대, 태스크 실행 유닛(2014)을 사용하여) 핸즈-프리 동작과 연관된 적어도 하나의 태스크 흐름 식별 단계를 수행하도록 구성된다. 일부 실시예들에서, 핸즈-프리 동작과 연관된 적어도 하나의 태스크 흐름 식별 단계를 수행하는 것은 (예컨대, 출력 유닛(2002) 및/또는 입력 유닛(2003)을 사용하여) 스피치-기반 인터페이스를 통해 입력된 콘텐트를 검토 및 확인하도록 사용자를 프롬프팅하는 것을 포함한다.In some embodiments, in response to detecting that the device is in a hands-free context, the processing unit 2006 may identify at least one task flow associated with the hands-free operation (e.g., using the task execution unit 2014) Step < / RTI > In some embodiments, performing at least one task flow identification step associated with a hands-free operation may be performed using a speech-based interface (e.g., using output unit 2002 and / or input unit 2003) Lt; RTI ID = 0.0 > review < / RTI >

일부 실시예들에서, 처리 유닛(2006)은 또한 (예컨대, 태스크 실행 유닛(2014) 및/또는 출력 유닛(2002)을 사용하여) 청각적 출력을 사용하여 적어도 하나의 태스크 흐름 단계를 수행하도록 구성된다. 일부 실시예들에서, 처리 유닛(2006)은 핸즈-프리 콘텍스트에 적합한 이용 가능한 태스크 흐름 단계들의 제한된 군으로부터 선택된 적어도 하나의 태스크 흐름 단계를 (예컨대, 태스크 실행 유닛(2014)을 사용하여) 수행함으로써 적어도 하나의 태스크 흐름 식별 단계를 수행하도록 추가로 구성된다.In some embodiments, the processing unit 2006 is also configured to perform at least one task flow step using the audible output (e.g., using the task execution unit 2014 and / or the output unit 2002) do. In some embodiments, the processing unit 2006 may perform at least one task flow step (e.g., using the task execution unit 2014) selected from a limited group of available task flow steps suitable for the hands-free context And is further configured to perform at least one task flow identification step.

일부 실시예들에서, 디바이스가 핸즈-프리 콘텍스트에 있다는 검출에 응답하여, 처리 유닛(2006)은 (예컨대, 다이얼로그 응답 생성 유닛(2016)을 사용하여) 스피치-기반 출력 모드로 다이얼로그 응답을 생성하도록 구성된다.In some embodiments, in response to detecting that the device is in a hands-free context, the processing unit 2006 may be configured to generate a dialog response in a speech-based output mode (e.g., using the dialog response generation unit 2016) .

일부 실시예들에서, 처리 유닛(2006)은 (예컨대, 다이얼로그 응답 생성 유닛(2016)을 사용하여) 음성 형태로 사용자 입력의 적어도 일부분을 의역함으로써 스피치-기반 출력 모드로 다이얼로그 응답을 생성하도록 구성된다. 일부 실시예들에서, 처리 유닛(2006)은 (예컨대, 다이얼로그 응답 생성 유닛(2016)을 사용하여) 다른 음성 출력으로부터 의역된 사용자 입력을 구별하기 위해 복수의 보이스들을 사용하여 스피치를 생성함으로써 스피치-기반 출력 모드로 다이얼로그 응답을 생성하도록 구성된다. 일부 실시예들에서, 처리 유닛(2006)은 (예로서, 다이얼로그 응답 생성 유닛(2016)을 사용하여) 개인 데이터의 적어도 하나의 아이템과 다이얼로그 템플릿을 조합함으로써 스피치-기반 출력 모드로 다이얼로그 응답을 생성하도록 구성된다.In some embodiments, the processing unit 2006 is configured to generate a dialog response in a speech-based output mode by translating at least a portion of the user input in a voice form (e.g., using the dialog response generation unit 2016) . In some embodiments, the processing unit 2006 may generate a speech by using a plurality of voices to distinguish user input that is diverted from another voice output (e.g., using the dialog response generation unit 2016) Based output mode. In some embodiments, the processing unit 2006 generates a dialog response in a speech-based output mode by combining a dialog template with at least one item of personal data (e.g., using the dialog response generation unit 2016) .

일부 실시예들에서, 처리 유닛(2006)은 (예컨대, 입력 유닛(2003)을 사용하여) 핸즈-프리 콘텍스트를 명시하는 사용자 입력을 수신하는 것; (예컨대, 통신 유닛(2004)을 사용하여) 핸즈-프리 콘텍스트와 연관된 환경 조건을 나타내는 데이터를 적어도 하나의 센서로부터 수신하는 것; (예컨대, 통신 유닛(2004)을 사용하여) 핸즈-프리 콘텍스트와 연관된 주변 디바이스의 연결을 검출하는 것; (예컨대, 통신 유닛(2004)을 사용하여) 핸즈-프리 콘텍스트와 연관되지 않은 주변 디바이스의 연결해제를 검출하는 것; (예컨대, 통신 유닛(2004)을 사용하여) 차량의 온보드 시스템과의 통신을 검출하는 것; (예컨대, 통신 유닛(2004)을 사용하여) 현재 위치를 검출하는 것; 및 (예컨대, 통신 유닛(2004)을 사용하여) 현재 속도를 검출하는 것으로 이루어진 군으로부터 선택된 적어도 하나의 프로세스를 수행함으로써 핸즈-프리 콘텍스트가 활성인지 여부를 검출하도록 구성된다.In some embodiments, the processing unit 2006 may be configured to receive a user input that specifies a hands-free context (e.g., using the input unit 2003); Receiving data indicative of environmental conditions associated with the hands-free context from at least one sensor (e.g., using communication unit 2004); Detecting a connection of peripheral devices associated with the hands-free context (e.g., using communication unit 2004); Detecting disconnection of peripheral devices not associated with the hands-free context (e.g., using communication unit 2004); Detecting communication with the onboard system of the vehicle (e.g., using communication unit 2004); (E.g., using the communication unit 2004) to detect the current position; And to detect whether the hands-free context is active by performing at least one process selected from the group consisting of detecting the current speed (e.g., using communication unit 2004).

일부 실시예들에서, 출력 유닛(2002)은 대화 인터페이스를 통해 사용자를 프롬프팅하도록 구성되며; 입력 유닛(2003)은 대화 인터페이스를 통해 사용자 입력을 수신하도록 구성된다. 일부 실시예들에서, 입력 유닛(2003)은 음성 입력을 수신하도록 구성되며; 처리 유닛(2006)은 (예컨대, 사용자 입력 해석 유닛(2010)을 사용하여) 음성 입력을 텍스트 표현으로 변환하도록 구성된다.In some embodiments, the output unit 2002 is configured to prompt the user via a dialog interface; The input unit 2003 is configured to receive user input via a dialog interface. In some embodiments, the input unit 2003 is configured to receive speech input; The processing unit 2006 is configured to convert the speech input to a textual representation (e.g., using the user input interpretation unit 2010).

개념 아키텍처Conceptual architecture

이제 도 8을 참조하면, 다중모드 가상 어시스턴트(1002)의 특정의 예시적인 실시예의 단순화된 블록도가 도시되어 있다. 위에서 참조되어진 관련된 미국 특허 출원들에 보다 상세히 설명된 바와 같이, 다중모드 가상 어시스턴트(1002)의 상이한 실시예들은 일반적으로 가상 어시스턴트 기술에 관한 다양한 상이한 유형들의 동작들, 기능들, 및/또는 특징들을 제공하도록 구성, 설계 및/또는 동작 가능하게 될 수 있다. 또한, 본 명세서에 보다 상세히 설명된 바와 같이, 본 명세서에 개시된 다중모드 가상 어시스턴트(1002)의 많은 다양한 동작들, 기능들, 및/또는 특징들은 다중모드 가상 어시스턴트(1002)와 상호작용하는 상이한 엔티티들에 상이한 유형들의 이점들 및/또는 이득들을 가능하게 하거나 제공할 수 있다. 도 8에 도시된 실시예는 전술된 하드웨어 아키텍처들 중 임의의 것을 사용하여, 또는 상이한 유형의 하드웨어 아키텍처를 사용하여 구현될 수 있다.Referring now to FIG. 8, a simplified block diagram of a particular exemplary embodiment of a multimode virtual assistant 1002 is shown. Different embodiments of the multi-mode virtual assistant 1002 generally include a variety of different types of operations, functions, and / or features relating to virtual assistant technology, as described in more detail in related US patent applications referenced above. May be configured, designed, and / or operable to provide, Also, as described in greater detail herein, many of the various operations, functions, and / or features of the multimode virtual assistant 1002 disclosed herein may be implemented using different entities / RTI > may provide or provide different types of advantages and / or benefits to the user. The embodiment shown in FIG. 8 may be implemented using any of the hardware architectures described above, or using different types of hardware architectures.

예를 들어, 상이한 실시예들에 따르면, 다중모드 가상 어시스턴트(1002)는 예를 들어 하기 중 하나 이상(또는 이들의 조합)과 같이, 다양한 상이한 유형들의 동작들, 기능들, 및/또는 특징들을 제공하도록 구성, 설계 및/또는 동작 가능하게 될 수 있다:For example, in accordance with different embodiments, the multi-mode virtual assistant 1002 may include a variety of different types of operations, functions, and / or features, such as, for example, one or more Designed and / or operable to provide:

· 제품들 및 서비스들을 발견하거나, 찾거나, 그 중에서 선택하거나, 구매하거나, 예약하거나 주문하기 위해 인터넷을 통해 이용 가능한 데이터 및 서비스들의 애플리케이션들을 자동화한다. 이들 데이터 및 서비스들을 사용하는 프로세스를 자동화하는 것에 더하여, 다중모드 가상 어시스턴트(1002)는 또한 데이터 및 서비스들의 여러 소스들의 조합된 사용을 즉시 가능하게 할 수 있다. 예를 들어, 다중모드 가상 어시스턴트는 수 개의 검토 사이트들로부터 제품들에 대한 정보를 조합하고, 다수의 분배기들로부터 가격들 및 이용 가능성을 검사하고, 그들의 위치들 및 시간 제약들을 검사하며, 사용자가 그들의 문제점에 대한 개인화된 해결책을 찾는 것을 도울 수 있다.Automate applications of data and services available over the Internet to discover, find, select, purchase, reserve or order products and services. In addition to automating processes that use these data and services, the multimode virtual assistant 1002 can also immediately enable the combined use of multiple sources of data and services. For example, the multimode virtual assistant combines information about products from a number of review sites, checks prices and availability from multiple distributors, checks their location and time constraints, They can help you find a personalized solution to their problem.

· 해야 할 것들(영화들, 이벤트들, 공연들, 전시회들, 쇼들 및 명소들을 포함하지만 이로 한정되지 않음); 갈 곳들(여행 목적지들, 머물기 위한 호텔들 및 다른 장소들, 랜드마크들 및 관심대상의 다른 사이트들 등을 포함하지만 이로 한정되지 않음); 먹거나 마실 장소들(예를 들어, 레스토랑들 및 바들), 다른 것들을 충족시키기 위한 시간들 및 장소들, 및 인터넷 상에서 발견될 수 있는 엔터테인먼트 또는 사회적 상호작용의 임의의 다른 소스를 발견하고, 조사하고, 그 중에서 선택하고, 예약하며, 달리 학습하기 위해 인터넷을 통해 이용 가능한 데이터 및 서비스들의 사용을 자동화한다.· Things to do (including, but not limited to, movies, events, performances, exhibitions, shows and attractions); (Including, but not limited to, travel destinations, hotels and other places to stay, landmarks, and other sites of interest); To discover, investigate, and provide information about food and drink places (e.g., restaurants and bars), times and places for meeting other things, and any other source of entertainment or social interaction that can be found on the Internet, And automates the use of data and services available over the Internet to select, reserve, and otherwise learn from them.

· 탐색(위치-기반 탐색을 포함); 내비게이션(맵들 및 방향들); 데이터베이스 검색(예를 들어, 이름 또는 다른 속성들에 의해 비즈니스들 또는 사람을 찾는 것); 날씨 조건들 및 예보들을 얻는 것, 시장 아이템들의 가격 또는 금융 거래의 상태를 검사하는 것; 트래픽 또는 비행편들의 상태를 모니터링하는 것; 캘린더들 및 스케줄들을 액세스하고 업데이트하는 것; 리마인더들, 경보들, 태스크들 및 프로젝트들을 관리하는 것; 이메일 또는 다른 메시징 플랫폼들을 통해 통신하는 것; 및 국소적으로 또는 원격으로 디바이스들을 동작시키는 것(예컨대, 전화들을 거는 것, 광 및 온도를 제어하는 것, 가정 보안 디바이스들을 제어하는 것, 음악 또는 비디오를 재생하는 것 등)을 포함한, 그래픽 사용자 인터페이스들을 갖는 전용 애플리케이션들에 의해 달리 제공되는, 애플리케이션들 및 서비스들의 동작을 자연 언어 다이얼로그를 통해 가능하게 한다. 일 실시예에서, 다중모드 가상 어시스턴트(1002)는 디바이스 상에서 이용 가능한 많은 기능들 및 앱들을 개시하고 작동시키고 제어하기 위해 사용될 수 있다.· Navigation (including location-based navigation); Navigation (maps and directions); Searching the database (e.g., looking for businesses or people by name or other attributes); Obtaining weather conditions and forecasts, checking the price of market items or the status of a financial transaction; Monitoring the status of traffic or flights; Accessing and updating calendars and schedules; Managing reminders, alerts, tasks and projects; Communicating via email or other messaging platforms; And a graphical user interface, including locally or remotely operating devices (e.g., dialing phones, controlling light and temperature, controlling home security devices, playing music or video, etc.) Enables the operation of applications and services, which are otherwise provided by dedicated applications having interfaces, via natural language dialogs. In one embodiment, the multimode virtual assistant 1002 can be used to launch, operate, and control many of the functions and applications available on the device.

· 활동들, 제품들, 서비스들, 엔터테인먼트의 소스, 시간 관리에 대한 개인 추천들, 또는 자연 언어로의 대화식 다이얼로그로부터 그리고 데이터와 서비스들에 대한 자동화된 액세스로부터 이득을 얻는 임의의 다른 종류의 추천 서비스를 제공한다.· Any other kind of recommendation that benefits from activities, products, services, sources of entertainment, personal recommendations for time management, or interactive dialogs in natural language and from automated access to data and services. Service.

상이한 실시예들에 따르면, 다중모드 가상 어시스턴트(1002)에 의해 제공된 다양한 유형들의 기능들, 동작들, 작용들, 및/또는 다른 특징들의 적어도 일부분이 하나 이상의 클라이언트 시스템(들)에서, 하나 이상의 서버 시스템(들)에서, 그리고/또는 이들의 조합들에서 구현될 수 있다.According to different embodiments, at least a portion of the various types of functions, operations, operations, and / or other features provided by the multimode virtual assistant 1002 may be implemented in one or more client systems (s) System (s), and / or combinations thereof.

상이한 실시예들에 따르면, 다중모드 가상 어시스턴트(1002)에 의해 제공된 다양한 유형들의 기능들, 동작들, 작용들, 및/또는 다른 특징들의 적어도 일부분이 본 명세서에 보다 상세히 설명된 바와 같이, 사용자 입력을 해석 및 조작화할 때 콘텍스트 정보를 사용할 수 있다.According to different embodiments, at least some of the various types of functions, operations, operations, and / or other features provided by the multimode virtual assistant 1002 may be implemented as user input The context information can be used when analyzing and manipulating the information.

예를 들어, 적어도 하나의 실시예에서, 다중모드 가상 어시스턴트(1002)는 특정 태스크들 및/또는 동작들을 수행할 때 다양한 상이한 유형들의 데이터 및/또는 다른 유형들의 정보를 이용하고/하거나 생성하도록 동작 가능할 수 있다. 이는 예를 들어 입력 데이터/정보 및/또는 출력 데이터/정보를 포함할 수 있다. 예를 들어, 적어도 하나의 실시예에서, 다중모드 가상 어시스턴트(1002)는 예를 들어 하나 이상의 로컬 및/또는 원격 메모리들, 디바이스들 및/또는 시스템들과 같이, 하나 이상의 상이한 유형들의 소스들로부터 정보를 액세스, 처리, 및/또는 달리 이용하도록 동작 가능할 수 있다. 부가적으로, 적어도 하나의 실시예에서, 다중모드 가상 어시스턴트(1002)는 예를 들어 하나 이상의 로컬 및/또는 원격 디바이스들 및/또는 시스템들에 저장될 수 있는, 하나 이상의 상이한 유형들의 출력 데이터/정보를 생성하도록 동작 가능할 수 있다.For example, in at least one embodiment, the multimode virtual assistant 1002 may be operable to utilize and / or generate various different types of data and / or other types of information when performing particular tasks and / It can be possible. This may include, for example, input data / information and / or output data / information. For example, in at least one embodiment, the multimode virtual assistant 1002 may be coupled to one or more different types of sources, such as, for example, one or more local and / or remote memories, devices, and / May be operable to access, process, and / or otherwise use the information. Additionally, in at least one embodiment, the multimode virtual assistant 1002 may include one or more different types of output data / data streams, which may be stored in, for example, one or more local and / or remote devices and / May be operable to generate information.

다중모드 가상 어시스턴트(1002)에 의해 액세스되고/되거나 이용될 수 있는 상이한 유형들의 입력 데이터/정보의 예들은 하기 중 하나 이상(또는 이들의 조합)을 포함할 수 있지만 이로 한정되지 않는다:Examples of different types of input data / information that may be accessed and / or utilized by the multimode virtual assistant 1002 may include, but are not limited to, one or more of the following (or a combination thereof)

· 보이스 입력: 이동 전화기들 및 태블릿들과 같은 이동 디바이스들, 마이크로폰들을 갖는 컴퓨터들, 블루투스 헤드셋들, 자동차 보이스 제어 시스템들로부터, 전화 시스템, 응답 서비스들 상에서의 녹음들, 통합 메시징 서비스들 상에서의 오디오 보이스메일, 클록 라디오들과 같은 보이스 입력을 갖는 소비자 애플리케이션들, 전화국, 가정 엔터테인먼트 제어 시스템들, 및 게임 콘솔들을 통함.Voice input: mobile devices such as mobile phones and tablets, computers with microphones, Bluetooth headsets, from automobile voice control systems, telephone systems, recordings on response services, on unified messaging services Consumer applications with voice inputs such as audio voice mail, clock radios, telephone offices, home entertainment control systems, and game consoles.

· 컴퓨터들 또는 이동 디바이스들 상에서의 키보드들, 원격 제어부들 또는 다른 소비자 전자 디바이스들 상에서의 키패드들, 어시스턴트로 전송된 이메일 메시지들, 어시스턴트로 전송된 인스턴트 메시지들 또는 유사한 단문 메시지들, 다중사용자 게임 환경들에서의 플레이어들로부터 수신된 텍스트, 및 메시지 피드들에서 스트리밍된 텍스트.Keyboards on computers or mobile devices, keypads on remote controls or other consumer electronic devices, email messages sent to an assistant, instant messages sent to an assistant or similar short messages, Text received from players in environments, and text streamed from message feeds.

· 센서들 또는 위치-기반 시스템들로부터 오는 위치 정보. 예들은 GPS(Global Positioning System) 및 이동 전화기들 상에서의 A-GPS(Assisted GPS)를 포함한다. 일 실시예에서, 위치 정보는 명시적 사용자 입력과 조합된다. 일 실시예에서, 본 발명의 시스템은 알려진 주소 정보 및 현재 위치 결정에 기초하여, 사용자가 집에 있을 때를 검출할 수 있다. 이러한 방식으로, 집 밖에 있는 것과는 대조적으로 집에 있을 때 사용자가 관심 있어 할 수 있는 정보의 유형뿐만 아니라 사용자가 집에 있는지 여부에 의존하여 사용자 대신에 호출되어야 하는 서비스들 및 동작들의 유형에 관해 소정 추론들이 이루어질 수 있다.Location information from sensors or location - based systems. Examples include GPS (Global Positioning System) and A-GPS (Assisted GPS) on mobile phones. In one embodiment, the location information is combined with explicit user input. In one embodiment, the system of the present invention can detect when a user is at home based on known address information and current positioning. In this way, as opposed to being outside the home, there is a certain amount of information about the types of services and behaviors that should be called on behalf of the user, depending on the type of information that the user may be interested in, Inferences can be made.

· 클라이언트 디바이스들 상에서의 클록들로부터의 시간 정보. 이는 예를 들어 현지 시간 및 시간대를 표시하는 전화기들 또는 다른 클라이언트 디바이스들로부터의 시간을 포함할 수 있다. 게다가, 시간은 사용자 요청들의 콘텍스트에 사용될 수 있는데, 예를 들어 "한 시간 내" 및 "오늘 밤"과 같은 구절들을 해석하기 위해 사용될 수 있다.Time information from clocks on client devices. This may include, for example, time from telephones or other client devices indicating local time and time zones. In addition, time can be used in the context of user requests, for example, to interpret phrases such as "within an hour" and "tonight ".

· 나침반, 가속도계, 자이로스코프, 및/또는 이동 속도 데이터뿐만 아니라 이동 또는 핸드헬드형 디바이스들 또는 자동-이동 제어 시스템들과 같은 내장된 시스템들로부터의 다른 센서 데이터. 이는 또한 원격 제어부들로부터 기기들 및 게임 콘솔들로의 디바이스 위치설정 데이터를 포함할 수 있다.Other sensor data from embedded systems, such as compass, accelerometer, gyroscope, and / or movement speed data, as well as mobile or handheld devices or auto-motion control systems. It may also include device location data from remote controls to devices and game consoles.

· 클릭하는 것, 및 그래픽 사용자 인터페이스(GUI)를 갖는 임의의 디바이스 상에서의 GUI로부터의 메뉴 선택 및 다른 이벤트들. 추가 예들은 터치 스크린에 대한 터치들을 포함한다.Clicking and menu selection and other events from the GUI on any device with a graphical user interface (GUI). Additional examples include touches to the touch screen.

· 센서들로부터의 이벤트들, 및 알람 클록들, 캘린더 경보들, 가격 변화 트리거들, 위치 트리거들, 서버들로부터 디바이스 상으로의 푸시 통지 등과 같은 다른 데이터-구동 트리거들.Other data-driven triggers, such as events from sensors and alarm clocks, calendar alerts, pricing triggers, location triggers, push notifications from servers to devices, etc.

본 명세서에 설명된 실시예들에 대한 입력은 또한 다이얼로그 및 요청 이력을 포함한, 사용자 상호작용 이력의 콘텍스트를 포함한다.The input to the embodiments described herein also includes the context of the user interaction history, including the dialog and request history.

상기 언급된 관련된 미국 특허 출원들에서 설명된 바와 같이, 많은 상이한 유형들의 출력 데이터/정보가 다중모드 가상 어시스턴트(1002)에 의해 생성될 수 있다. 이들은 하기 중 하나 이상(또는 이들의 조합들)을 포함할 수 있지만 이로 한정되지 않는다:Many different types of output data / information can be generated by the multimode virtual assistant 1002, as described in the above-mentioned related U.S. patent applications. These may include, but are not limited to, one or more of the following (or combinations thereof):

· 출력 디바이스로 그리고/또는 디바이스의 사용자 인터페이스로 직접 전송된 텍스트 출력;Text output sent directly to the output device and / or to the user interface of the device;

· 이메일을 통해 사용자로 전송된 텍스트 및 그래픽들;Text and graphics sent to the user via email;

· 메시징 서비스를 통해 사용자로 전송된 텍스트 및 그래픽들;Text and graphics sent to the user through the messaging service;

· 하기 중 하나 이상(또는 이들의 조합들)을 포함할 수 있는 스피치 출력:Speech output that may include one or more of the following (or combinations thereof):

o 합성된 스피치;synthesized speech;

o 샘플링된 스피치;o sampled speech;

o 기록된 메시지들;o Recorded messages;

· 사진들, 리치 텍스트(rich text), 비디오들, 사운드들, 및 하이퍼링크들(예를 들어, 웹 브라우저에서 렌더링된 콘텐트)을 갖는 정보의 그래픽 레이아웃;Graphic layout of information with pictures, rich text, videos, sounds, and hyperlinks (e.g., content rendered in a web browser);

· 장치가 온 또는 오프되게 하고, 사운드를 만들게 하고, 색상을 변경하게 하고, 진동하게 하고, 광을 제어하게 하는 것 등과 같은, 디바이스 상에서의 물리적 동작들을 제어하기 위한 작동기 출력;An actuator output for controlling physical operations on the device, such as to cause the device to turn on or off, to make a sound, to change color, to vibrate, to control light;

· 매핑 애플리케이션을 호출하는 것, 전화를 보이스 다이얼링하는 것, 이메일 또는 인스턴트 메시지를 전송하는 것, 매체를 재생하는 것, 캘린더들에 엔트리들을 만드는 것, 태스크 관리기들, 및 노트 애플리케이션들, 및 다른 애플리케이션들과 같은 다른 애플리케이션들을 디바이스 상에서 호출하는 것;It is also possible to use: - calling a mapping application, voice dialing a phone, sending an email or an instant message, playing media, creating entries in calendars, task managers, and note applications, &Lt; / RTI > on the device;

· 원격 카메라를 작동시키는 것, 휠체어를 제어하는 것, 원격 스피커들 상에서 음악을 재생하는 것, 원격 디스플레이들 상에 비디오들을 재생하는 것 등과 같은, 소정의 디바이스에 의해 부착되거나 제어되는 디바이스들에 대한 물리적 동작들을 제어하기 위한 작동기 출력.For devices attached or controlled by a given device, such as operating a remote camera, controlling a wheelchair, playing music on remote speakers, playing videos on remote displays, etc. Actuator output to control physical operations.

도 8의 다중모드 가상 어시스턴트(1002)는 단지 구현될 수 있는 광범위한 가상 어시스턴트 시스템 실시예들로부터의 일례임이 이해될 수 있다. 가상 어시스턴트 시스템의 다른 실시예들(도시되지 않음)은, 예를 들어 도 8의 예시적인 가상 어시스턴트 시스템 실시예에 예시된 것들보다 부가적인, 보다 적은, 그리고/또는 상이한 컴포넌트들/특징들을 포함할 수 있다.It should be appreciated that the multimode virtual assistant 1002 of FIG. 8 is merely exemplary from a wide range of virtual assistant system embodiments that may be implemented. Other embodiments (not shown) of the virtual assistant system may include additional, fewer and / or different components / features than those illustrated in, for example, the exemplary virtual assistant system embodiment of FIG. 8 .

다중모드 가상 어시스턴트(1002)는, 예를 들어 하드웨어, 및/또는 하드웨어와 소프트웨어의 조합들의 사용을 통해 구현되고/되거나 실체화될 수 있는, 복수의 상이한 유형들의 컴포넌트들, 디바이스들, 모듈들, 프로세스들, 시스템들 등을 포함할 수 있다. 예를 들어, 도 8의 예시적인 실시예에 예시된 바와 같이, 어시스턴트(1002)는 하기의 유형들의 시스템들, 컴포넌트들, 디바이스들, 프로세서들 등 중 하나 이상(또는 이들의 조합들)을 포함할 수 있다:The multimode virtual assistant 1002 may comprise a plurality of different types of components, devices, modules, processes, and / or components that may be implemented and / or implemented through use of, for example, hardware and / Systems, and the like. 8, the assistant 1002 includes one or more (or combinations thereof) of the following types of systems, components, devices, processors, etc. can do:

· 하나 이상의 활성 온톨로지(ontology)들(1050);One or more active ontologies 1050;

· 활성 입력 유도 컴포넌트(들)(2794)(클라이언트 부분(2794a) 및 서버 부분(2794b)을 포함할 수 있음);Active input induction component (s) 2794 (which may include a client portion 2794a and a server portion 2794b);

· 단기 개인용 메모리 컴포넌트(들)(2752)(마스터 버전(2752b) 및 캐시(2752a)를 포함할 수 있음);Short term personal memory component (s) 2752 (may include master version 2752b and cache 2752a);

· 장기 개인용 메모리 컴포넌트(들)(2754)(마스터 버전(2754b) 및 캐시(2754a)를 포함할 수 있음);Long term personal memory component (s) 2754 (may include master version 2754b and cache 2754a);

· 도메인 모델들 컴포넌트(들)(2756);Domain models component (s) 2756;

· 어휘 컴포넌트(들)(2758)(완성 어휘(2758b) 및 서브세트(2758a)를 포함할 수 있음);Lexical component (s) 2758 (which may include the completed vocabulary 2758b and subset 2758a);

· 언어 패턴 인식기(들) 컴포넌트(들)(2760)(완전 라이브러리(2760b) 및 서브세트(2760a)를 포함할 수 있음);Language pattern recognizer (s) component (s) 2760 (may include full library 2760b and subset 2760a);

· 언어 해석기 컴포넌트(들)(2770);Language interpreter component (s) 2770;

· 도메인 엔티티 데이터베이스(들)(2772);Domain entity database (s) 2772;

· 다이얼로그 흐름 프로세서 컴포넌트(들)(2780);Dialogue flow processor component (s) 2780;

· 서비스 조합 컴포넌트(들)(2782);Service combination component (s) 2782;

· 서비스들 컴포넌트(들)(2784);Services components (s) 2784;

· 태스크 흐름 모델 컴포넌트(들)(2786);Task flow model component (s) 2786;

· 다이얼로그 흐름 모델 컴포넌트(들)(2787);Dialogue flow model component (s) 2787;

· 서비스 모델 컴포넌트(들)(2788);Service model component (s) 2788;

· 출력 프로세서 컴포넌트(들)(2790).Output processor component (s) 2790.

소정의 클라이언트/서버-기반 실시예들에서, 이들 컴포넌트들 중 일부 또는 모두는 클라이언트(1304)와 서버(1340) 사이에 분포될 수 있다. 그러한 컴포넌트들은 위에서 참고되어진 관련된 미국 특허 출원들에서 추가로 설명되어 있다.In some client / server-based embodiments, some or all of these components may be distributed between the client 1304 and the server 1340. Such components are further described in the related U.S. patent applications referred to above.

일 실시예에서, 가상 어시스턴트(1002)는 예를 들어 터치스크린 입력, 키보드 입력, 음성 입력, 및/또는 이들의 임의의 조합을 포함한 임의의 적절한 입력 양식을 통해 사용자 입력(2704)을 수신한다. 일 실시예에서, 어시스턴트(1002)는 또한, 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 9월 30일자로 출원된, 발명의 명칭이 "가상 어시스턴트에서 명령들의 처리를 용이하게 하기 위한 콘텍스트 정보의 사용(Using Context Information to Facilitate Processing of Commands in a Virtual Assistant)"인 미국 특허 출원 제13/250,854호에 설명된 바와 같이, 이벤트 콘텍스트, 애플리케이션 콘텍스트, 개인 음향 콘텍스트, 및/또는 다른 형태들의 콘텍스트를 포함할 수 있는 콘텍스트 정보(1000)를 수신한다. 콘텍스트 정보(1000)는 또한, 적용 가능하다면, 본 명세서에 설명된 기술들에 따라 사용자 인터페이스를 적응시키기 위해 사용될 수 있는 핸즈-프리 콘텍스트를 포함한다.In one embodiment, virtual assistant 1002 receives user input 2704 via any suitable input form including, for example, touch screen input, keyboard input, voice input, and / or any combination thereof. In one embodiment, the assistant 1002 is also referred to as "facilitating processing of instructions in the virtual assistant ", filed September 30, 2011, the entire disclosure of which is incorporated herein by reference. An application context, a personal acoustic context, and / or other forms of information, as described in U.S. Patent Application No. 13 / 250,854, entitled " Using Context Information to Facilitate Processing of Commands in a Virtual Assistant &Lt; RTI ID = 0.0 > 1000 < / RTI > Context information 1000 also includes a hands-free context that, if applicable, can be used to adapt the user interface in accordance with the techniques described herein.

본 명세서에 설명된 기술들에 따라 사용자 입력(2704) 및 콘텍스트 정보(1000)를 처리할 때, 가상 어시스턴트(1002)는 사용자로의 프리젠테이션을 위한 출력(2708)을 생성한다. 출력(2708)은 임의의 적합한 출력 양식에 따라 생성될 수 있는데, 이는 적절하다면 다른 인자들뿐만 아니라 핸즈-프리 콘텍스트에 의해 통지받을 수 있다. 출력 양식들의 예들은 스크린 상에 제공된 바와 같은 시각적 출력, 청각적 출력(음성 출력 및/또는 삐소리들 및 다른 사운드들을 포함할 수 있음), 햅틱 출력(예를 들어, 진동), 및/또는 이들의 임의의 조합을 포함한다.When processing user input 2704 and context information 1000 in accordance with the techniques described herein, virtual assistant 1002 generates an output 2708 for presentation to a user. Output 2708 may be generated according to any suitable output format, which may be notified by hands-free context as well as other factors if appropriate. Examples of output forms include visual output as provided on the screen, audible output (which may include audio output and / or beeps and other sounds), haptic output (e.g., vibration), and / Or any combination thereof.

도 8에 묘사된 다양한 컴포넌트들의 동작에 관한 부가적인 세부 사항들은, 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 1월 10일자로 출원된 "지능형 자동화 어시스턴트"에 대한 관련된 미국 특허 출원 제12/987,982호에 제공되어 있다.Additional details regarding the operation of the various components depicted in FIG. 8 may be found in related U.S. patent applications to "Intelligent Automation Assistant" filed January 10, 2011, the entire disclosure of which is incorporated herein by reference. 12 / 987,982.

핸즈Hands -- 프리free 콘텍스트에In the context 대한 사용자 인터페이스들의 적응 Adaptation of user interfaces

예시적인 목적들을 위해, 본 발명이 예로서 본 명세서에서 설명된다. 그러나, 당업자는 예들에서 묘사된 특정한 입력 및 출력 메커니즘들이 단지 사용자와 어시스턴트(1002) 사이에서의 하나의 가능한 상호작용을 예시하도록 의도되며, 본 발명의 범주를 청구된 대로 제한하려고 의도되지 않는다는 것을 인식할 것이다. 더욱이, 대안적인 실시예들에서, 본 발명은 다중모드 가상 어시스턴트(1002)를 반드시 수반할 필요 없이 디바이스에서 구현될 수 있으며; 오히려 본 발명의 기능은 오로지 특허청구범위에서 한정된 바와 같은 본 발명의 필수적인 특성들로부터 벗어남이 없이, 임의의 적합한 디바이스 상에서 구동하는 운영 시스템 또는 애플리케이션에 직접 구현될 수 있다.For illustrative purposes, the present invention is described herein by way of example. However, those skilled in the art will recognize that the particular input and output mechanisms depicted in the examples are intended to illustrate only one possible interaction between the user and the assistant 1002, and are not intended to limit the scope of the invention as claimed. something to do. Moreover, in alternative embodiments, the present invention may be implemented in a device without necessarily involving a multimode virtual assistant 1002; Rather, the functionality of the invention may be embodied directly in an operating system or application running on any suitable device, without departing from the essential characteristics of the present invention as defined in the claims.

이제 도 1을 참조하면, 종래 기술에 따라, 텍스트 메시지를 읽기 위한 종래의 핸즈-온 인터페이스(169)의 일례를 예시하는 스크린 샷이 도시되어 있다. 도 1에 도시된 바와 같은 그래픽 사용자 인터페이스(GUI)는 일반적으로 사용자가 버블(171)로 도시된 메시지 텍스트와 같은 미세한 세부 사항들을 읽을 수 있을 것과, 텍스트 필드(172)에서 타이핑하고 전송 버튼(173)을 탭핑함으로써 응답할 수 있을 것을 요구한다. 많은 디바이스들에서, 그러한 동작들은 스크린을 보고 터치하는 것을 요구하며, 따라서 본 명세서에서 핸즈-프리 콘텍스트들로 불리는 소정의 콘텍스트들에서 수행하는 것은 비현실적이다.Referring now to FIG. 1, there is shown a screen shot illustrating an example of a conventional hands-on interface 169 for reading a text message, in accordance with the prior art. A graphical user interface (GUI), such as that shown in Figure 1, generally refers to the ability of the user to read fine details such as the message text shown in the bubble 171, to type in the text field 172, ) By tapping. In many devices, such operations require viewing and touching the screen, and thus it is impractical to perform in certain contexts referred to herein as hands-free contexts.

이제 도 2를 참조하면, 텍스트 메시지(171)에 응답하기 위한 인터페이스(170)의 일례를 예시하는 스크린 샷이 도시되어 있다. 가상 키보드(270)가 텍스트 필드(172)에서의 사용자 탭핑에 응답하여 제공되어, 키들에 대응하는 스크린의 영역들을 탭핑함으로써 텍스트가 텍스트 필드(172)에 입력되게 한다. 사용자는 텍스트 메시지가 입력된 때 전송 버튼(173)을 탭핑한다. 사용자가 말하기에 의해 텍스트를 입력하기를 원한다면, 사용자는 스피치 버튼(271)을 탭핑하는데, 이는 음성 입력을 수신하고 이를 텍스트로 변환하기 위해 보이스 구술 인터페이스를 호출한다. 따라서, 버튼(271)은 사용자가 핸즈-프리 콘텍스트에 있음을 사용자가 나타낼 수 있게 하는 메커니즘을 제공한다.Referring now to FIG. 2, a screen shot illustrating an example of an interface 170 for responding to a text message 171 is shown. A virtual keyboard 270 is provided in response to user tapping in the text field 172 to cause the text to be entered into the text field 172 by tapping areas of the screen corresponding to the keys. The user taps the transmit button 173 when a text message is entered. If the user wishes to enter text by speaking, the user taps the speech button 271, which invokes the voice dictation interface to receive the speech input and convert it to text. Thus, the button 271 provides a mechanism that allows the user to indicate that the user is in a hands-free context.

이제 도 3a 및 도 3b를 참조하면, 텍스트 메시지(171)에 응답하기 위해 보이스 구술 인터페이스가 사용되는 인터페이스(175)의 일례를 예시한 일련의 스크린 샷들이 도시되어 있다. 스크린(370)은 예를 들어 사용자가 스피치 버튼(271)을 탭핑한 후 제공된다. 마이크로폰 아이콘(372)은 디바이스가 음성 입력을 수용할 준비가 되었음을 표시한다. 사용자는 스피치를 입력하는데, 이는 마이크로폰 또는 유사한 디바이스일 수 있는 스피치 입력 디바이스(1211)를 통해 수신된다. 사용자는 사용자가 음성 입력을 입력하는 것을 완료하였음을 나타내기 위해 완료 버튼(371)을 탭핑한다.3A and 3B, a series of screen shots illustrating an example of an interface 175 in which a voice dictation interface is used to respond to a text message 171 is shown. The screen 370 is provided after the user has tapped the speech button 271, for example. The microphone icon 372 indicates that the device is ready to accept the voice input. The user inputs speech, which is received via speech input device 1211, which may be a microphone or similar device. The user taps the completion button 371 to indicate that the user has completed inputting the voice input.

음성 입력은 임의의 잘 알려진 스피치-투-텍스트(speech-to-text) 알고리즘 또는 시스템을 사용하여 텍스트로 변환된다. 스피치-투-텍스트 기능은 디바이스(60) 상에 또는 서버 상에 존재할 수 있다. 일 실시예에서, 스피치-투-텍스트 기능은 예를 들어 미국 매사추세츠주 벌링턴 소재의 뉘앙스 커뮤니케이션즈, 인크.(Nuance Communications, Inc.)로부터 입수 가능한 뉘앙스 레커그나이저(Nuance Recognizer)를 사용하여 구현된다.The speech input is converted to text using any well-known speech-to-text algorithm or system. The speech-to-text function may reside on the device 60 or on the server. In one embodiment, the speech-to-text function is implemented using, for example, a Nuance Recognizer available from Nuance Communications, Inc., Burlington, Mass., USA.

도 3b에 도시된 바와 같이, 변환의 결과들이 필드(172)에 보여질 수 있다. 키보드(270)는 사용자가 필드(172)에서 생성된 텍스트를 편집하게 하기 위해 제공될 수 있다. 사용자가 입력된 텍스트에 만족할 때, 사용자는 텍스트 메시지가 전송되게 하기 위해 전송 버튼(173)을 탭핑한다.As shown in FIG. 3B, the results of the transformation may be shown in field 172. FIG. The keyboard 270 may be provided to allow the user to edit the text generated in the field 172. When the user is satisfied with the entered text, the user taps the transmit button 173 to allow the text message to be transmitted.

도 2, 도 3a, 및 도 3b와 관련하여 설명된 예에서, 수 개의 동작들은 사용자가 디스플레이 스크린을 보고/보거나 터치 입력을 제공할 것을 요구한다. 그러한 동작들은 하기를 포함한다:In the example described with reference to Figures 2, 3a, and 3b, several actions require the user to view / view the display screen or provide a touch input. Such operations include the following:

· 디스플레이 스크린 상에서 텍스트 메시지(171)를 읽는 동작;Reading the text message 171 on the display screen;

· 스피치 입력 모드에 들어가기 위해 버튼(271)을 터치하는 동작;- Touching the button 271 to enter the speech input mode;

· 스피치 입력이 완료됨을 나타내기 위해 완료 버튼(371)을 터치하는 동작;- touching the Done button 371 to indicate that the speech input is complete;

· 사용자의 음성 입력으로부터 생성된 변환된 텍스트를 보는 동작;Viewing the converted text generated from the user ' s speech input;

· 메시지를 전송하기 위해 전송 버튼(173)을 터치하는 동작.• Touching the transmit button 173 to transmit a message.

본 발명의 일 실시예에서, 스피치 입력을 수용하고 처리하기 위한 메커니즘들은, 핸즈-프리 콘텍스트에 있을 때 사용자가 디스플레이 스크린과 상호작용하고/하거나 터치 인터페이스를 사용할 필요성을 감소시키는 방식으로, 디바이스(60)에 통합된다. 따라서, 본 발명의 시스템은 그에 따라 핸즈-프리 콘텍스트에서의 상호작용을 위한 개선된 사용자 인터페이스를 제공할 수 있다.In one embodiment of the present invention, the mechanisms for accepting and processing speech input are configured in a manner that allows the user to interact with the display screen and / or reduce the need to use a touch interface when in the hands- ). Thus, the system of the present invention can thereby provide an improved user interface for interaction in a hands-free context.

이제 도 4 및 도 5a 내지 도 5d를 참조하면, 핸즈-프리 콘텍스트가 인식되는 일 실시예에 따라, 텍스트 메시지를 수신하고 이에 응답하기 위한 인터페이스의 일례를 예시한 일련의 스크린 샷들이 도시되어 있으며; 따라서, 이 예에서, 본 발명의 기술들에 따르면, 사용자가 스크린과 상호작용할 필요성이 감소된다.Referring now to FIG. 4 and FIGS. 5A-5D, a series of screen shots is illustrated illustrating an example of an interface for receiving and responding to a text message, in accordance with one embodiment in which a hands-free context is recognized; Thus, in this example, according to the techniques of the present invention, the need for the user to interact with the screen is reduced.

도 4에서, 스크린(470)은 디바이스(60)가 잠김 모드에 있는 동안 수신되는 텍스트 메시지(471)를 묘사한다. 사용자는 알려진 기술들에 따라 메시지(471)에 응답하거나 달리 그와 상호작용하기 위해 슬라이더(472)를 활성화시킬 수 있다. 그러나, 이 예에서, 디바이스(60)는 보이지 않는 곳에 있고/있거나 손이 닿지 않는 곳에 있을 수 있거나, 또는 사용자는 예를 들어 사용자가 운전 중이거나 일부 다른 활동에 참여한다면 디바이스(60)와 상호작용할 수 없을 것이다. 본 명세서에 설명되는 바와 같이, 다중모드 가상 어시스턴트(1002)는 그러한 핸즈-프리 콘텍스트에서 텍스트 메시지(471)를 수신하고 이에 응답하기 위한 기능을 제공한다.In FIG. 4, screen 470 depicts a text message 471 that is received while device 60 is in the locked mode. The user may activate the slider 472 to respond to or otherwise interact with the message 471 in accordance with known techniques. However, in this example, the device 60 may be invisible and / or out of reach, or the user may interact with the device 60 if, for example, the user is driving or participates in some other activity It will not be possible. As described herein, the multimode virtual assistant 1002 provides functionality for receiving and responding to a text message 471 in such a hands-free context.

일 실시예에서, 디바이스(60) 상에 설치된 가상 어시스턴트(1002)는 핸즈-프리 콘텍스트를 자동으로 검출한다. 그러한 검출은 사용자가 디바이스(60)의 스크린과 상호작용하거나 GUI를 적절히 동작시키는 것이 어렵거나 불가능할 수 있는 시나리오 또는 상황을 판단하는 임의의 수단에 의해 일어날 수 있다.In one embodiment, the virtual assistant 1002 installed on the device 60 automatically detects the hands-free context. Such detection may occur by any means by which a user may interact with the screen of the device 60 or determine a scenario or situation in which it may be difficult or impossible to properly operate the GUI.

예를 들어 그리고 제한 없이, 핸즈-프리 콘텍스트의 판단은 단독으로 또는 임의의 조합으로 하기 중 임의의 것에 기초하여 이루어질 수 있다:For example and without limitation, the judgment of the hands-free context can be made alone or in any combination based on any of the following:

· 센서들(예를 들어, 나침반, 가속도계, 자이로스코프, 속도계, 주변 광 센서, 블루투스 연결 검출기, 클록, WiFi 신호 검출기, 마이크로폰 등을 포함)로부터의 데이터;Data from sensors (eg, compasses, accelerometers, gyroscopes, speedometers, ambient light sensors, Bluetooth connectivity detectors, clocks, WiFi signal detectors, microphones, etc.);

· 예를 들어, GPS를 통해 디바이스(60)가 소정의 지리적 위치에 있음을 판단;For example, via GPS to determine that the device 60 is in a predetermined geographic location;

· 클록으로부터의 데이터(예를 들어, 핸즈-프리 콘텍스트가 하루의 소정의 시각들 및/또는 한 주의 소정의 요일들에 활성화되는 것으로 규정될 수 있음);Data from the clock (e.g., the hands-free context may be defined to be active at certain times of day and / or on certain days of the week);

· 미리 정의된 파라미터들(예를 들어, 사용자 또는 관리자는 임의의 조건 또는 조건들의 조합이 검출될 때 핸즈-프리 콘텍스트가 활성임을 규정할 수 있음);Predefined parameters (e.g., the user or administrator may specify that the hands-free context is active when any condition or combination of conditions is detected);

· 블루투스 또는 다른 무선 I/O 디바이스들의 연결(예를 들어, 이동 차량의 블루투스-인에이블드 인터페이스와의 연결이 검출되는 경우);A connection of Bluetooth or other wireless I / O devices (e.g., when a connection with a Bluetooth-enabled interface of a mobile vehicle is detected);

· 사용자가 이동 차량에 있거나 차를 운전하고 있음을 나타낼 수 있는 임의의 다른 정보;Any other information that may indicate that the user is in or is driving a moving vehicle;

· 헤드폰들, 헤드셋들, 어댑터 케이블들에 의해 연결된 것들 등을 포함한 부착된 주변 장치들의 존재 또는 부재;The presence or absence of attached peripherals including headphones, headsets, those connected by adapter cables, etc .;

· 사용자가 디바이스(60)와 접촉하지 않는지 또는 그에 근접하고 있는지의 판단;Determining whether the user is not in contact with or in proximity to the device 60;

· 어시스턴트(1002)와의 상호작용을 트리거하기 위해 사용되는 특정 신호(예를 들어, 사용자가 귀에 디바이스를 유지하는 모션 제스처(motion gesture), 또는 블루투스 디바이스 상에서의 버튼의 누름, 또는 부착된 오디오 디바이스 상에서의 버튼의 누름);(E.g., a motion gesture in which a user holds a device in the ear, or a push on a Bluetooth device, or on an attached audio device) that is used to trigger an interaction with the assistant 1002 Pushing the button of the button);

· 단어들의 연속 스트림에서의 특정 단어들의 검출(예를 들어, 어시스턴트(1002)는 명령들을 듣도록, 그리고 사용자가 그 이름을 부르거나 또는 "컴퓨터!"와 같은 일부 명령을 말할 때 호출되도록 구성될 수 있으며; 특정 명령은 핸즈-프리 콘텍스트가 활성인지 여부를 나타낼 수 있음).Detection of specific words in a continuous stream of words (e.g., the assistant 1002 is configured to be called to hear commands and when the user speaks some command, such as a name or "computer! And a particular command may indicate whether the hands-free context is active).

다른 실시예들에서, 사용자는 핸즈-프리 콘텍스트가 활성인지 또는 비활성인지를 수동으로 표시할 수 있고/있거나 하루의 소정의 시각들 및/또는 한 주의 소정의 요일들에 활성화하고/하거나 비활성화하도록 핸즈-프리 콘텍스트를 스케줄링할 수 있다.In other embodiments, the user can manually indicate whether the hands-free context is active or inactive and / or to activate / deactivate certain days of the week and / or certain days of the week, - Free context can be scheduled.

일 실시예에서, 핸즈-프리 콘텍스트에 있는 동안 텍스트 메시지(470)를 수신할 때, 다중모드 가상 어시스턴트(1002)는 디바이스(60)가 삐 소리 또는 톤(tone)과 같은 오디오 표시를 출력하게 하여 텍스트 메시지의 수신을 표시하게 한다. 전술된 바와 같이, 사용자는 알려진 기술들에 따라 메시지(471)에 응답하거나 달리 상호작용하기 위해 슬라이더(472)를 활성화시킬 수 있다(예를 들어, 핸즈-프리 모드가 부정확하게 검출되는 경우, 또는 사용자가 구동의 정지를 선택하거나 달리 사용자가 자신을 디바이스(60)와의 핸즈-온 상호작용에 이용 가능하게 하는 경우). 대안적으로, 사용자는 핸즈-프리 방식으로 어시스턴트(1002)와의 상호작용을 가능하게 하기 위해 어시스턴트(1002)와의 음성 다이얼로그에 참여할 수 있다.In one embodiment, upon receiving the text message 470 while in the hands-free context, the multimode virtual assistant 1002 causes the device 60 to output an audio indication, such as a beep or tone, Display the reception of a text message. As described above, the user may activate the slider 472 to respond to or otherwise interact with the message 471 in accordance with known techniques (e.g., if the hands-free mode is detected incorrectly, or If the user chooses to stop the drive or otherwise allows the user to make himself available for hands-on interaction with the device 60). Alternatively, the user may participate in the voice dialog with the assistant 1002 to enable interaction with the assistant 1002 in a hands-free manner.

일 실시예에서, 사용자는 핸즈-프리 콘텍스트에 적당한 임의의 적합한 메커니즘에 의해 음성 다이얼로그를 개시한다. 예를 들어, 사용자가 블루투스-설비된 차량을 운전하고 디바이스(60)가 차량과 통신하는 환경에서, 쉽게-액세스되는 버튼(예를 들어, 자동차의 조향 핸들 상에 장착된 버튼)이 이용 가능해질 수 있다. 버튼의 누름은 어시스턴트(1002)와의 음성 다이얼로그를 개시하며, 사용자가 블루투스 연결을 통해 그리고 차량에 설치된 마이크로폰 및/또는 스피커를 통해 어시스턴트(1002)와 통신하게 한다. 대안적으로, 사용자는 디바이스(60) 자체 상에서, 또는 헤드셋 상에서, 또는 임의의 다른 주변 디바이스 상에서 버튼을 누름으로써, 또는 사용자가 음성 다이얼로그를 개시하기를 원한다는 것을 어시스턴트(1002)에게 시그널링하는 일부 다른 구별되는 동작을 수행함으로써, 음성 다이얼로그를 개시할 수 있다. 다른 예로서, 사용자는 어시스턴트(1002)에 의해 이해되고 음성 다이얼로그를 개시시키는 명령을 말할 수 있다. 당업자는 사용자가 어시스턴트(1002)와의 음성 다이얼로그를 쉽게 개시하게 하기 위해 많은 다른 기술들이 제공될 수 있다는 것을 인식할 것이다. 바람직하게는, 음성 다이얼로그를 개시하기 위해 사용되는 메커니즘은 사용자 측에서의 눈-손 공동작용을 요구하지 않아서, 사용자가 운전과 같은 주요 태스크에 집중하게 하고/하거나, 도 2, 도 3a, 및 도 3b에 묘사된 것과 같이 GUI와 상호작용하는 사용자의 능력을 방해하고, 억제하고, 구속하거나, 제한하는 불리한 조건을 갖는 개인에 의해 수행될 수 있다.In one embodiment, the user initiates a voice dialogue by any suitable mechanism appropriate for the hands-free context. For example, in an environment where a user drives a Bluetooth-equipped vehicle and the device 60 communicates with the vehicle, an easily-accessed button (e.g., a button mounted on the steering wheel of the vehicle) becomes available . The pressing of the button initiates a voice dialog with the assistant 1002 and allows the user to communicate with the assistant 1002 via a Bluetooth connection and through a microphone and / or speaker installed in the vehicle. Alternatively, a user may select some other distinction that signals to the assistant 1002 that he or she wants to initiate a voice dialog on the device 60 itself, on a headset, or on any other peripheral device, The voice dialogue can be started. As another example, the user may speak an instruction understood by the assistant 1002 and initiating a voice dialogue. Those skilled in the art will appreciate that many other techniques may be provided to allow the user to easily initiate voice dialogs with the assistant 1002. [ Preferably, the mechanism used to initiate the voice dialogue does not require eye-hand synchro- nization on the user side, allowing the user to focus on the main task, such as driving, and / or in Figures 2, 3a, May be performed by an individual having adverse conditions that interfere with, inhibit, constrain, or limit the ability of the user to interact with the GUI as depicted.

일단 음성 다이얼로그가 개시된다면, 어시스턴트(1002)는 음성 입력을 듣는다. 일 실시예에서, 어시스턴트(1002)는 핸즈-프리 콘텍스트에 있는 동안 사용자에 의해 쉽게 검출되는 일부 출력 메커니즘에 의해 음성 입력의 수신을 알린다. 일례는 오디오 삐 소리 또는 톤, 및/또는 심지어 운전 동안에 사용자에 의해 그리고/또는 일부 다른 메커니즘에 의해 쉽게 보여지는 차량 대시보드 상에서의 시각적 출력이다. 음성 입력은 알려진 스피치 인식 기술들을 사용하여 처리된다. 어시스턴트(1002)는 이어서 음성 입력에 의해 표시된 동작(들)을 수행한다. 일 실시예에서, 어시스턴트(1002)는 음성 출력을 제공하는데, 이는 사용자와의 오디오 다이얼로그를 계속하기 위해, 스피커들(디바이스(60)에 있거나 차량에 설치됨), 헤드폰들 등을 통해 출력될 수 있다. 예를 들어, 어시스턴트(1002)는 텍스트 메시지들, 이메일 메시지들 등의 콘텐트를 읽을 수 있으며, 음성 형태로 사용자에게 옵션들을 제공할 수 있다.Once the voice dialogue is started, the assistant 1002 hears voice input. In one embodiment, the assistant 1002 announces receipt of the speech input by some output mechanism that is easily detected by the user while in the hands-free context. An example is a visual output on a vehicle dashboard that is easily seen by the user, and / or some other mechanism, during audio chimes or tones, and / or even during driving. Voice input is processed using known speech recognition techniques. The assistant 1002 then performs the action (s) indicated by the voice input. In one embodiment, the assistant 1002 provides audio output, which may be output via speakers (either on the device 60 or installed in a vehicle), headphones, etc., to continue the audio dialog with the user . For example, the assistant 1002 can read content such as text messages, e-mail messages, and the like, and can provide options to the user in a voice form.

예를 들어, 사용자가 "내 새로운 메시지를 읽어"를 말한다면, 어시스턴트(1002)는 디바이스(60)가 수신 확인 톤을 내게 할 수 있다. 어시스턴트(1002)는 이어서 "당신은 톰 데번으로부터의 새로운 메시지를 가지고 있습니다"와 같은 음성 출력을 낼 수 있다. 어시스턴트는 "어이, 게임하러 갈까?"를 말한다. 음성 출력은 텍스트를 스피치로 변환하기 위한 임의의 알려진 기술을 사용하여 어시스턴트(1002)에 의해 생성될 수 있다. 일 실시예에서, 텍스트-투-스피치 기능은 예를 들어 미국 매사추세츠주 벌링턴 소재의 뉘앙스 커뮤니케이션즈, 인크.로부터 입수 가능한 뉘앙스 보컬라이저(Nuance Vocalizer)를 사용하여 구현된다.For example, if the user says "read my new message ", the assistant 1002 can cause the device 60 to issue an acknowledgment tone. Assistant 1002 can then output voice output such as "You have a new message from Tom Devon". The assistant says, "Hey, let's go to the game." Voice output may be generated by the assistant 1002 using any known technique for converting text to speech. In one embodiment, the text-to-speech function is implemented using, for example, a Nuance Vocalizer available from Nuance Communications, Inc., Burlington, Mass., USA.

이제 도 5a를 참조하면, 사용자와 어시스턴트(1002) 사이에서의 언어 상호 교환이 일어나는 동안 디바이스(60)의 스크린 상에 보여질 수 있는 출력을 보여주는 스크린 샷(570)의 일례가 도시되어 있다. 일부 핸즈-프리 상황들에서, 사용자는 예를 들어 디바이스(60)의 스크린 상에서의 출력이 차량의 내비게이션 시스템의 디스플레이 스크린 상에서 복제된다면, 스크린을 볼 수 있지만 스크린을 쉽게 터치할 수는 없다. 도 5a 내지 도 5d에 묘사된 바와 같은 음성 대화의 시각적 반향(echoing)은 사용자의 음성 입력이 어시스턴트(1002)에 의해 적절히 및 정확하게 이해되었는지를 사용자가 확인하도록 도울 수 있으며, 사용자가 어시스턴트(1002)의 음성 응답들을 이해하도록 추가로 도울 수 있다. 그러나, 그러한 시각적 반향은 선택적이며, 본 발명은 디바이스(60)의 스크린 상에 또는 어딘가에 임의의 시각적 디스플레이 없이 구현될 수 있다. 따라서, 사용자는 오직 음성 입력 및 출력에 의해, 또는 시각적 및 음성 입력들 및/또는 출력들의 조합에 의해 어시스턴트(1002)와 상호작용할 수 있다.Referring now to FIG. 5A, an example of a screenshot 570 showing the output that can be viewed on the screen of the device 60 during language interchange between the user and the assistant 1002 is shown. In some hands-free situations, the user can, for example, see the screen but not easily touch the screen if the output on the screen of the device 60 is duplicated on the display screen of the vehicle's navigation system. The visual echoing of a voice conversation, as depicted in FIGS. 5A-5D, may help the user confirm that the voice input of the user is properly and accurately understood by the assistant 1002, Lt; RTI ID = 0.0 > voice responses. &Lt; / RTI > However, such visual reflections are optional, and the present invention may be implemented without any visual display on the screen of the device 60 or elsewhere. Thus, the user can interact with the assistant 1002 only by voice input and output, or by a combination of visual and speech inputs and / or outputs.

이 예에서, 어시스턴트(1002)는 프롬프트(571)를 표시하고 말한다. 사용자 입력에 응답하여, 어시스턴트(1002)는 디스플레이 상에서 그리고/또는 음성 형태로 사용자 입력(572)을 반복한다. 어시스턴트는 이어서 인입 텍스트 메시지를 도입하며(573) 이를 읽는다. 일 실시예에서, 텍스트 메시지는 또한 스크린 상에 표시될 수 있다.In this example, the assistant 1002 indicates and speaks the prompt 571. [ In response to the user input, the assistant 1002 repeats the user input 572 on the display and / or in the form of a voice. The assistant then introduces an incoming text message (573) and reads it. In one embodiment, the text message may also be displayed on the screen.

도 5b에 도시된 바와 같이, 사용자에 대한 인입 메시지를 읽은 후, 어시스턴트(1002)는 이어서 사용자에게 사용자가 "메시지에 응답하거나 메시지를 다시 읽을 수 있다"고 말한다(574). 다시 한번, 그러한 출력은, 일 실시예에서, 음성 형태로(즉, 구두로) 제공된다. 그러한 방식으로, 본 발명의 시스템은, 사용자에게 텍스트 필드들, 버튼들, 및/또는 링크들을 보도록 요구하지 않으며 온-스크린 객체들과의 상호작용 또는 터치에 의한 직접 조작을 요구하지 않는다는 점에서, 핸즈-프리 콘텍스트에 잘 맞춰진 방식으로 이용 가능한 동작들을 사용자에게 알려준다. 도 5b에 묘사된 바와 같이, 일 실시예에서, 음성 출력은 스크린 상에서 반향되지만(574), 음성 출력의 그러한 표시는 요구되지 않는다. 일 실시예에서, 스크린 상에 표시된 반향 메시지들은 잘 알려진 메커니즘들에 따라 자동으로 상방으로 스크롤된다.As shown in FIG. 5B, after reading the incoming message for the user, the assistant 1002 then tells the user that the user can "reply to the message or read the message again" (574). Once again, such an output, in one embodiment, is provided in a voice form (i.e., verbally). In this way, the system of the present invention does not require the user to view text fields, buttons, and / or links and does not require interaction with on-screen objects or direct manipulation by touch, And informs the user of available operations in a manner that is well-adapted to the hands-free context. As depicted in FIG. 5B, in one embodiment, the speech output is echoed on the screen (574), but such an indication of the speech output is not required. In one embodiment, echo messages displayed on the screen are automatically scrolled upward according to well-known mechanisms.

이 예에서, 사용자는 "응, 6시에 거기 있을게를 답해라"를 말한다. 도 5b에 묘사된 바와 같이, 일 실시예에서, 사용자의 음성 입력은 반향되어 음성 입력이 적절히 이해되었음을 사용자가 점검할 수 있게 한다(575). 게다가, 일 실시예에서, 어시스턴트(1002)는 청각적 형태로 사용자의 음성 입력을 반복하여, 사용자가 스크린을 볼 수 없을지라도 사용자의 명령에 대한 이해를 확인할 수 있게 한다. 따라서, 본 발명의 시스템은 핸즈-프리 콘텍스트 모두에서 그리고 실현 가능하지 않거나 현재 동작 환경에 잘 맞지 않는 방식으로 디바이스(60)와 상호작용하거나 스크린을 보도록 사용자에게 요구하지 않고, 사용자가 응답 명령을 개시하고, 응답을 구성하며, 명령 및 구성된 응답이 적절히 이해되었음을 확인할 수 있게 하는 메커니즘을 제공한다.In this example, the user says, "Yes, answer what is there at 6 o'clock." As depicted in FIG. 5B, in one embodiment, the user's voice input is echoed to allow the user to check 575 that the voice input is properly understood. In addition, in one embodiment, the assistant 1002 repeats the user's speech input in an audible form, allowing the user to verify his / her understanding of the command even though the user can not see the screen. Thus, the system of the present invention does not require the user to interact with the device 60 or view the screen in a manner that is either not feasible or not well suited to the current operating environment, both in the hands-free context, , Configure the response, and provide a mechanism to ensure that the command and configured response are properly understood.

일 실시예에서, 어시스턴트(1002)는 메시지를 되읽기함으로써 사용자의 구성된 텍스트 메시지의 추가 확인을 제공한다. 이 예에서, 어시스턴트(1002)는 구두로, "톰 데번에 대한 당신의 응답이 여기 있습니다: "응, 6시에 거기 있을게."를 말한다. 일 실시예에서, 인용 부호들의 의미는 보이스 및/또는 운율에서의 변화들을 가지고 전달된다. 예를 들어, 문자열 "톰 데번에 대한 당신의 응답이 여기 있습니다"는 남자 보이스와 같은 하나의 보이스로 말하여질 수 있는 반면, 문자열 "응, 6시에 거기 있을게"는 여자 보이스와 같은 다른 보이스로 말하여질 수 있다. 대안적으로, 동일한 보이스가 그러나 인용 부호들을 전달하기 위해 상이한 운율을 갖고 사용될 수 있다.In one embodiment, the assistant 1002 provides additional confirmation of the user's configured text message by reading the message back. In this example, the assistant 1002 verbally says, "Your response to Tom Devon is:" Yeah, I'll be there at 6. ". In one embodiment, For example, the string "Your response to Tom Devon is here" can be spoken in one voice like a male voice, while the string "Yes, at 6 o'clock Can be said to be a different voice, such as a female voice. Alternatively, the same voice may be used with different rhyme to convey the quotation marks.

일 실시예에서, 어시스턴트(1002)는 도 5b 및 도 5c에 묘사된 바와 같이, 음성 교환의 시각적 반향을 제공한다. 도 5b 및 도 5c는 "톰 데번에 대한 당신의 응답이 여기 있습니다"의 어시스턴트(1002)의 음성 출력을 반향시키는 메시지(576)를 도시한다. 도 5c는 메시지의 수신인 및 콘텐트를 포함한 구성되는 텍스트 메시지의 요약(577)을 도시한다. 도 5c에서, 이전 메시지들은 스크린 밖에서 상방으로 스크롤되지만, 알려진 메커니즘들에 따라 하방으로 스크롤됨으로써 보여질 수 있다. 전송 버튼(578)은 메시지를 전송하고, 취소 버튼(579)은 메시지를 취소한다. 일 실시예에서, 사용자는 또한 "전송" 또는 "취소"와 같은 키워드를 말함으로써 메시지를 전송하거나 취소할 수 있다. 대안적으로, 어시스턴트(1002)는 "전송할 준비가 되었습니까?"와 같은 음성 프롬프트를 생성할 수 있으며, 다시 한번, 버튼(578, 579)들을 갖는 디스플레이(570)가 음성 프롬프트가 출력되는 동안 보여질 수 있다. 이어서, 사용자는 버튼(578, 579)들을 터치함으로써 또는 음성 프롬프트에 응답함으로써 사용자가 하고자 원하는 것을 표시할 수 있다. 프롬프트는 "예" 또는 "아니오" 응답을 허용하는 포맷으로 발생될 수 있어, 사용자가 사용자의 의도를 알려지게 하기 위해 임의의 특별한 어휘를 사용할 필요가 없게 한다.In one embodiment, the assistant 1002 provides a visual echo of voice exchange, as depicted in Figures 5B and 5C. Figures 5B and 5C show a message 576 that echoes the voice output of the assistant 1002 of "Your response to Tom Devon is here ". FIG. 5C shows a summary 577 of the composed text message including the recipient and content of the message. In Figure 5c, previous messages are scrolled upwards outside the screen, but can be seen by scrolling downward according to known mechanisms. The transmission button 578 transmits the message, and the cancel button 579 cancels the message. In one embodiment, the user can also send or cancel the message by speaking a keyword such as "send" or "cancel ". Alternatively, the assistant 1002 may generate a voice prompt such as "Are you ready to transmit ?, and once again, the display 570 with the buttons 578 and 579 will be shown while the voice prompt is being output . The user can then display what the user wants by touching the buttons 578, 579 or by answering the voice prompt. Prompts can be generated in a format that allows an "yes " or" no "response, thereby obviating the need for the user to use any special vocabulary to inform the user's intent.

일 실시예에서, 어시스턴트(1002)는 예를 들어 "네, 당신의 메시지를 전송할 것입니다"와 같은 음성 출력을 생성함으로써, 메시지를 전송하기 위해 사용자의 음성 명령을 확인할 수 있다. 도 5d에 도시된 바와 같이, 이러한 음성 출력은 전송되는 텍스트 메시지의 요약(581)과 함께, 스크린(570) 상에서 반향될 수 있다(580).In one embodiment, the assistant 1002 can verify the user's voice command to transmit the message, for example, by generating a voice output such as "yes, your message will be sent ". As shown in FIG. 5D, this voice output may be echoed on screen 570 (580), along with a summary 581 of text messages to be transmitted.

선택적인 시각적 반향과 조합되어진 전술된 음성 교환은 어시스턴트(1002)가 다중모드 인터페이스에서 중복 출력들을 제공하게 하는 일례를 예시한다. 이러한 방식으로, 어시스턴트(1002)는 아이즈-프리(eyes-free), 핸즈-프리, 및 완전한 핸즈-온을 포함한 소정 범위의 콘텍스트들을 지원할 수 있다.The above-described voice exchange combined with selective visual echo illustrates an example of how the assistant 1002 provides redundant outputs in a multimode interface. In this way, the assistant 1002 can support a range of contexts including eyes-free, hands-free, and complete hands-on.

이 예는 또한 표시되고 말하여진 출력이 그들의 상이한 콘텍스트들을 반영하기 위해 서로 달라지게 할 수 있는 메커니즘들을 예시한다. 이 예는 또한 응답하기 위한 대안적인 메커니즘들이 이용 가능하게 되는 방식들을 예시한다. 예를 들어, 어시스턴트가 "메시지를 전송할 준비가 되었습니까?"를 말하고 도 5c에 도시된 스크린(570)을 표시한 후에, 사용자는 단어 "전송" 또는 "예"를 말하거나 스크린 상에서 전송 버튼(578)을 탭핑할 수 있다. 이들 동작들 중 임의의 것이 어시스턴트(1002)에 의해 동일한 방식으로 해석될 것이며, 텍스트 메시지가 전송되게 할 것이다. 따라서, 본 발명의 시스템은 어시스턴트(1002)와 사용자의 상호작용에 대하여 고도의 유연성을 제공한다.This example also illustrates the mechanisms by which the displayed and spoken output can be made to differ from each other to reflect their different contexts. This example also illustrates the ways in which alternative mechanisms for responding become available. For example, after the assistant speaks "Are you ready to send a message?" And displays the screen 570 shown in FIG. 5C, the user may speak the word "TRANSMIT" or "YES" ). &Lt; / RTI > Any of these actions will be interpreted in the same way by the assistant 1002, causing the text message to be transmitted. Thus, the system of the present invention provides a high degree of flexibility for the interaction of the user with the assistant 1002.

이제 도 6a 내지 도 6c를 참조하면, 본 발명의 실시예에 따른 다중모드 가상 어시스턴트(1002)의 동작의 일례를 예시한 일련의 스크린 샷들이 도시되어 있는데, 여기서 사용자는 예를 들어 실수들을 정정하거나 보다 많은 콘텐트를 부가하기 위해, 핸즈-프리 콘텍스트에서 텍스트 메시지(577)를 교정한다. 도 3a 및 도 3b와 관련되어 전술된 바와 같은 직접 조작을 수반한 시각적 인터페이스에서, 사용자는 텍스트 필드(172)의 콘텐트들을 편집하고 이에 의해 텍스트 메시지(577)를 교정하기 위해 가상 키보드(270) 상에서 타이핑할 수 있다. 그러한 동작들은 핸즈-프리 콘텍스트에서 실현 가능하지 않을 수 있기 때문에, 다중모드 가상 어시스턴트(1002)는 텍스트 메시지(577)의 그러한 편집이 대화 인터페이스에서 음성 입력 및 출력을 통해 일어날 수 있게 하는 메커니즘을 제공한다.6A-6C, there is shown a series of screen shots illustrating an example of the operation of a multimode virtual assistant 1002 in accordance with an embodiment of the present invention, wherein the user may, for example, correct To add more content, the text message 577 is calibrated in a hands-free context. In a visual interface with direct manipulation as described above in connection with FIGS. 3A and 3B, the user can edit the contents of the text field 172 and edit it on the virtual keyboard 270 to correct the text message 577 You can type. Because such operations may not be feasible in a hands-free context, the multimode virtual assistant 1002 provides a mechanism for such editing of the text message 577 to occur through voice input and output at the conversation interface .

일 실시예에서, (예를 들어, 사용자의 음성 입력에 기초하여) 일단 텍스트 메시지(577)가 구성되면, 다중모드 가상 어시스턴트(1002)는 메시지가 전송될 준비가 되었음을 사용자에게 통지하며, 메시지가 전송되어야 하는지 여부를 사용자에게 묻는 구두 출력을 생성한다. 사용자가 구두 또는 직접 조작 입력을 통해, 사용자가 메시지를 전송할 준비가 되지 않았음을 표시한다면, 다중모드 가상 어시스턴트(1002)는 메시지의 전송, 취소, 검토, 또는 변경과 같은 이용 가능한 옵션들을 사용자에게 알리기 위해 음성 출력을 생성한다. 예를 들어, 어시스턴트(1002)는 "네, 저는 아직 메시지를 전송하지 않았습니다. 계속하기 위해, 당신은 메시지를 전송, 취소, 검토 또는 변경할 수 있습니다."로 말할 수 있다.In one embodiment, once the text message 577 is configured (e.g., based on the user's voice input), the multi-mode virtual assistant 1002 notifies the user that the message is ready to be sent, Generates verbal output asking the user whether it should be sent. If the user indicates through the verbal or direct manipulation input that the user is not ready to send a message, the multimode virtual assistant 1002 may send the available options to the user, such as sending, canceling, reviewing, Generate voice output to inform. For example, the assistant 1002 may say, "Yes, I have not sent a message yet. To continue, you can send, cancel, review or change the message."

도 6a에 도시된 바와 같이, 일 실시예에서, 다중모드 가상 어시스턴트(1002)는 메시지(770)를 표시하고, 텍스트 메시지(577)에 대하여 이용 가능한 옵션들을 사용자에게 시각적으로 알림으로써 음성 출력을 반향시킨다. 일 실시예에서, 텍스트 메시지(577)는, 텍스트 메시지(577)를 각각 전송하거나 취소하기 위한 버튼(578, 579)들과 함께, 사용자가 필드(773) 내에서 탭핑함으로써 메시지(577)를 편집할 수 있음을 표시하도록, 편집 가능한 필드(773)에 표시된다. 일 실시예에서, 편집 가능한 필드(773) 내에서 탭핑하는 것은 직접 조작에 의한 편집을 허용하도록 가상 키보드(도 3b에 묘사된 것과 유사함)를 호출한다.6A, in one embodiment, the multi-mode virtual assistant 1002 displays a message 770 and echoes the voice output by visually announcing to the user the options available for the text message 577 . In one embodiment, the text message 577 may be edited by the user by tapping within the field 773, together with the buttons 578 and 579 for transmitting or canceling the text message 577, Is displayed in the editable field 773 so as to indicate that it is possible to do so. In one embodiment, tapping within the editable field 773 invokes a virtual keyboard (similar to that depicted in Figure 3B) to allow direct editing.

사용자는 또한 음성 입력을 제공함으로써 어시스턴트(1002)와 상호작용할 수 있다. 따라서, 텍스트 메시지(577)와 상호작용하기 위한 옵션들을 제공하는 어시스턴트(1002)의 음성 메시지에 응답하여, 사용자는 "메시지를 변경해"를 말할 수 있다. 어시스턴트(1002)는 음성 텍스트를 인식하고 교정된 메시지를 말하도록 사용자를 프롬프팅하는 구두 메시지로 응답한다. 예를 들어, 어시스턴트(1002)는 "네, … 메시지가 무엇을 말하길 원합니까?"를 말할 수 있으며, 이어서 사용자의 응답에 듣기 시작할 수 있다. 도 6b는 그러한 음성 프롬프트와 관련되어 보여질 수 있는 스크린(570)의 일례를 묘사한다. 다시 한번, 사용자의 음성 텍스트는 어시스턴트(1002)의 프롬프트(772)와 함께, 시각적으로 반향된다(771).The user can also interact with the assistant 1002 by providing voice input. Thus, in response to the voice message of the assistant 1002 providing options for interacting with the text message 577, the user can say "Change Message ". Assistant 1002 responds with a verbal message that recognizes the spoken text and prompts the user to say the corrected message. For example, the assistant 1002 can say "Yes, what do you want the message to say ?, " and then start listening to the user's response. FIG. 6B depicts an example of a screen 570 that may be viewed in association with such a voice prompt. Once again, the user's voice text is visually echoed (771) with the prompt 772 of the assistant 1002.

일 실시예에서, 일단 사용자가 이러한 방식으로 프롬프팅되면, 사용자의 후속 음성 입력의 정확한 콘텐트들이 텍스트 메시지에 대한 콘텐트로서 해석되어, 사용자 명령들의 보통의 자연 언어 해석을 바이패스한다. 사용자의 음성 입력은, 입력에서의 충분한 길이의 일시 정지가 검출될 때, 또는 입력이 완료됨을 표시하는 특정 단어의 검출 시, 또는 사용자가 텍스트 메시지를 말하는 것을 완료하였음을 표시하기 위해 사용자가 버튼을 누르거나 또는 일부 다른 명령을 활성화시켰음을 검출 시, 완료된 것으로 추정된다. 일 실시예에서, 어시스턴트(1002)는 이어서 음성 형태로 입력 텍스트 메시지를 다시 반복하며, 선택적으로 도 6c에 도시된 바와 같이 텍스트 메시지를 반향시킬 수 있다. 어시스턴트(1002)는 도 6c에 도시된 바와 같이 스크린 상에서 또한 반향될 수 있는(770) "메시지를 전송할 준비가 되었습니까?"와 같은 음성 프롬프트를 제공한다. 사용자는 이어서 "취소", "전송", "예" 또는 "아니오"로 응답할 수 있으며, 그 중 임의의 것이 어시스턴트(1002)에 의해 정확하게 해석된다. 대안적으로, 사용자는 원하는 동작을 유발하기 위해 스크린 상에서의 버튼(578 또는 579)을 누를 수 있다.In one embodiment, once the user is prompted in this manner, the exact content of the user's subsequent speech input is interpreted as the content for the text message, bypassing the normal natural language interpretation of user commands. The user's voice input may be activated by a user pressing a button to indicate when a pause of sufficient length at the input is detected, or when detecting a specific word indicating that the input is complete, or to indicate that the user has finished speaking a text message It is presumed to have been completed when it detects that it has pressed or activated some other command. In one embodiment, the assistant 1002 then repeats the input text message again in the form of a voice, and may optionally echo the text message as shown in FIG. 6C. Assistant 1002 provides a voice prompt such as "Ready to Send a Message ?, " which can also be echoed (770) on the screen as shown in Fig. 6C. The user can then respond with "cancel", "transfer", "yes" or "no", any of which is correctly interpreted by the assistant 1002. Alternatively, the user may press a button 578 or 579 on the screen to trigger the desired action.

이러한 방식으로 텍스트 메시지(577)를 수정하기 위한 메커니즘을 제공함으로써, 본 발명의 시스템은, 일 실시예에서, 사용자가 각각의 단계에서 상호작용의 모드를 자유롭게 선택할 수 있도록 핸즈-온 접근법과 통합되는, 핸즈-프리 콘텍스트에 적절한 흐름 경로를 제공한다. 더욱이, 일 실시예에서, 어시스턴트(1002)는 어시스턴트의 자연 언어 처리 메커니즘을 전체 흐름에서 특정 단계들에 적응시키며, 예를 들어, 전술된 바와 같이, 일부 상황들에서 어시스턴트(1002)는 사용자가 텍스트 메시지를 말하도록 프롬프팅될 때 사용자 명령들의 보통의 자연 언어 해석을 바이패스하는 모드에 들어갈 수 있다.By providing a mechanism for modifying the text message 577 in this manner, the system of the present invention can, in one embodiment, be integrated with a hands-on approach so that the user can freely select the mode of interaction at each stage , Providing a suitable flow path for the hands-free context. Moreover, in one embodiment, the assistant 1002 adapts the natural language processing mechanism of the assistant to the specific steps in the overall flow, for example, as described above, in some situations, When prompted to speak a message, you can enter a mode to bypass the normal natural language interpretation of user commands.

방법Way

일 실시예에서, 다중모드 가상 어시스턴트(1002)는 핸즈-프리 콘텍스트를 검출하며, 핸즈-프리 동작에 대한 사용자 경험을 수정하도록 어시스턴트의 동작의 하나 이상의 단계들을 적응시킨다. 전술된 바와 같이, 핸즈-프리 콘텍스트의 검출은 다중모드 가상 어시스턴트(1002)의 동작에 영향을 주기 위해 다양한 방식들로 적용될 수 있다. 이제 도 7을 참조하면, 일 실시예에 따라, 핸즈-프리 콘텍스트의 동적 검출 및 핸즈-프리 콘텍스트에 대한 적응을 지원하는 가상 어시스턴트(1002)의 동작의 방법(10)을 묘사한 흐름도가 도시되어 있다. 방법(10)은 다중모드 가상 어시스턴트(1002)의 하나 이상의 실시예들과 관련되어 구현될 수 있다. 도 7에 묘사된 바와 같이, 핸즈-프리 콘텍스트는, 일 실시예에 따라, 다중모드 가상 어시스턴트(1002)에서 다양한 처리 단계들에서 사용될 수 있다.In one embodiment, the multimode virtual assistant 1002 detects the hands-free context and adapts one or more steps of the operation of the assistant to modify the user experience for the hands-free operation. As described above, the detection of the hands-free context can be applied in various ways to influence the operation of the multimode virtual assistant 1002. [ 7, a flow diagram depicting a method 10 of operation of a virtual assistant 1002 supporting dynamic detection of a hands-free context and adaptation to a hands-free context is shown, according to one embodiment have. The method 10 may be implemented in association with one or more embodiments of the multimode virtual assistant 1002. As depicted in FIG. 7, the hands-free context may be used in various processing steps in a multimode virtual assistant 1002, according to one embodiment.

적어도 하나의 실시예에서, 방법(10)은 예를 들어 하기 중 하나 이상(또는 이들의 조합)과 같은 다양한 유형들의 기능들, 동작들, 작용들, 및/또는 다른 특징들을 수행하고/하거나 구현하도록 동작 가능할 수 있다:In at least one embodiment, the method 10 may perform and / or implement various types of functions, operations, operations, and / or other features, e.g., one or more of the following Lt; RTI ID = 0.0 >

· 사용자와 다중모드 가상 어시스턴트(1002) 사이에서의 대화 인터페이스의 인터페이스 제어 흐름 루프를 실행한다. 방법(10)의 적어도 하나의 반복은 대화에서의 왕복으로서 역할할 수 있다. 대화 인터페이스는 사용자 및 어시스턴트(1002)가 대화 방식으로 발언들을 왔다 갔다하게 함으로써 통신하는 인터페이스이다.Performs an interface control flow loop of the conversation interface between the user and the multimode virtual assistant 1002. At least one iteration of method (10) may serve as a round-trip in the conversation. The conversation interface is an interface through which the user and the assistant 1002 communicate by talking back and forth.

· 다중모드 가상 어시스턴트(1002)에 대한 실행 제어 흐름을 제공한다. 즉, 절차는 입력의 수집, 입력의 처리, 출력의 생성, 및 사용자로의 출력의 프리젠테이션을 제어한다.Provides execution control flow to the multimode virtual assistant 1002. That is, the procedure controls the collection of inputs, processing of inputs, generation of outputs, and presentation of output to the user.

· 다중모드 가상 어시스턴트(1002)의 컴포넌트들간의 통신들을 조정한다. 즉, 하나의 컴포넌트의 출력이 다른 컴포넌트에 공급되는 곳과, 환경 및 환경 상에서의 작용으로부터의 전체 입력이 발생할 수 있는 곳을 향할 수 있다.Controls communications between the components of the multimode virtual assistant 1002. That is, where the output of one component is fed to another component, and where the full input from an action in the environment and environment can occur.

적어도 일부 실시예들에서, 방법(10)의 부분들은 또한 컴퓨터 네트워크의 다른 디바이스들 및/또는 시스템들에서 구현될 수 있다.In at least some embodiments, portions of the method 10 may also be implemented in other devices and / or systems of a computer network.

특정 실시예들에 따르면, 방법(10)의 다수의 인스턴스(instance)들 또는 스레드(thread)들이 하나 이상의 프로세서(63)들 및/또는 하드웨어의 다른 조합들 및/또는 하드웨어 및 소프트웨어의 사용을 통해 동시에 구현되고/되거나 개시될 수 있다. 적어도 하나의 실시예에서, 방법(10)의 하나 이상 또는 선택된 부분들은 하나 이상의 클라이언트(1304)(들)에서, 하나 이상의 서버(1340)(들)에서, 그리고/또는 이들의 조합들에서 구현될 수 있다.According to certain embodiments, a plurality of instances or threads of the method 10 may be accessed through the use of one or more processors 63 and / or other combinations of hardware and / or hardware and software May be implemented and / or initiated simultaneously. In at least one embodiment, one or more of the methods 10 may be implemented in one or more clients 1304 (s), one or more servers 1340 (s), and / or combinations thereof .

예를 들어, 적어도 일부 실시예들에서, 방법(10)의 다양한 태양들, 특징들, 및/또는 기능들이 소프트웨어 컴포넌트들, 네트워크 서비스들, 데이터베이스들 및/또는 기타 등등, 또는 이들의 임의의 조합에 의해 수행, 구현, 및/또는 개시될 수 있다.For example, in at least some embodiments, various aspects, features, and / or functions of method 10 may be implemented in software components, network services, databases and / or the like, May be implemented, implemented, and / or initiated by a processor.

상이한 실시예들에 따르면, 방법(10)의 하나 이상의 상이한 스레드들 또는 인스턴스들이, 방법(10)의 적어도 하나의 인스턴스의 개시를 트리거하기 위해 (예를 들어, 최소 임계치 기준들과 같은) 하나 이상의 상이한 유형들의 기준들을 만족시키는 하나 이상의 조건들 또는 이벤트들의 검출에 응답하여 개시될 수 있다. 방법의 하나 이상의 상이한 스레드들 또는 인스턴스들의 개시 및/또는 구현을 트리거할 수 있는 다양한 유형들의 조건들 또는 이벤트들의 예들은 하기 중 하나 이상(또는 이들의 조합)을 포함할 수 있지만 이로 한정되지 않는다:According to different embodiments, one or more different threads or instances of the method 10 may be used to trigger the initiation of at least one instance of the method 10 (e. G., The minimum threshold criteria) May be initiated in response to detection of one or more conditions or events that satisfy different types of criteria. Examples of various types of conditions or events that may trigger the initiation and / or implementation of one or more different threads or instances of a method may include, but are not limited to, one or more of the following:

· 예를 들어, 이로 한정되지 않는, 하기 중 하나 이상과 같은 다중모드 가상 어시스턴트(1002)의 인스턴스와의 사용자 세션:User session with an instance of a multimode virtual assistant 1002, such as, but not limited to, one or more of the following:

o 시작되는 이동 디바이스 애플리케이션, 예를 들어, 다중모드 가상 어시스턴트(1002)의 실시예를 구현하고 있는 이동 디바이스 애플리케이션;a mobile device application that implements an embodiment of a starting mobile device application, e.g., a multimode virtual assistant 1002;

o 시작되는 컴퓨터 애플리케이션, 예를 들어, 다중모드 가상 어시스턴트(1002)의 실시예를 구현하고 있는 애플리케이션;o an application that implements an embodiment of a starting computer application, e.g., a multimode virtual assistant 1002;

o "스피치 입력 버튼"과 같은, 눌려진 이동 디바이스 상에서의 전용 버튼;a dedicated button on the pressed mobile device, such as a "speech input button";

o 헤드셋, 전화기 핸드셋 또는 기지국, GPS 내비게이션 시스템, 소비자 기기, 원격 제어, 또는 어시스턴트의 호출과 연관될 수 있는 버튼을 갖는 임의의 다른 디바이스와 같은, 컴퓨터 또는 이동 디바이스에 부착된 주변 디바이스 상에서의 버튼;o a button on a peripheral device attached to a computer or mobile device, such as a headset, a telephone handset or any other device having a base station, a GPS navigation system, a consumer device, a remote control, or any other device having a button that can be associated with an invocation of an assistant;

o 웹 브라우저로부터 다중모드 가상 어시스턴트(1002)를 구현한 웹 사이트로 시작된 웹 세션;a web session initiated from a web site that implements a multimode virtual assistant 1002 from a web browser;

o 기존의 웹 브라우저 세션 내부로부터 예를 들어 다중모드 가상 어시스턴트(1002) 서비스가 요청되는, 다중모드 가상 어시스턴트(1002)를 구현한 웹 사이트로 시작된 상호작용;o Interaction initiated from a web site that implements a multi-mode virtual assistant 1002, for example from within an existing web browser session, where a multimode virtual assistant 1002 service is requested;

o 다중모드 가상 어시스턴트(1002)의 실시예와의 통신을 중재하는 양식 서버(1426)로 전송된 이메일 메시지;o an email message sent to the forms server 1426 that mediates communication with the embodiment of the multimode virtual assistant 1002;

o 다중모드 가상 어시스턴트(1002)의 실시예와의 통신을 중재하는 양식 서버(1426)로 텍스트 메시지가 전송된다;o A text message is sent to the form server 1426 that mediates communication with the embodiment of the multimode virtual assistant 1002;

o 다중모드 가상 어시스턴트(1002)의 실시예와의 통신을 중재하는 양식 서버(1434)에 대해 전화 호출이 이루어진다;o A telephone call is made to the form server 1434 that mediates communication with the embodiment of the multimode virtual assistant 1002;

o 다중모드 가상 어시스턴트(1002)의 실시예를 제공하고 있는 애플리케이션에 경보 또는 통보와 같은 이벤트가 전송된다.o An event, such as an alert or notification, is sent to the application providing an embodiment of the multimode virtual assistant 1002.

· 다중모드 가상 어시스턴트(1002)를 제공하는 디바이스가 켜지고/켜지거나 시작될 때.When a device providing a multimode virtual assistant (1002) is turned on / turned on or started.

상이한 실시예들에 따르면, 방법(10)의 하나 이상의 상이한 스레드들 또는 인스턴스들이 수동으로, 자동으로, 정적으로, 동적으로, 동시에, 그리고/또는 이들의 조합들로 개시되고/되거나 구현될 수 있다. 부가적으로, 방법(10)의 상이한 인스턴스들 및/또는 실시예들이 하나 이상의 상이한 시간 간격들로(예컨대, 특정 시간 간격 동안에, 규칙적인 주기적 간격들로, 불규칙적인 주기적 간격들로, 요구 시 등에) 개시될 수 있다.According to different embodiments, one or more different threads or instances of the method 10 may be initiated and / or implemented manually, automatically, statically, dynamically, concurrently, and / or in combinations thereof . Additionally, it should be appreciated that different instances and / or embodiments of the method 10 may be implemented in one or more different time intervals (e.g., during a particular time interval, at regular periodic intervals, at irregular periodic intervals, ).

적어도 하나의 실시예에서, 방법(10)의 주어진 인스턴스는 본 명세서에 설명된 바와 같이 핸즈-프리 콘텍스트의 검출을 포함하여, 특정 태스크들 및/또는 동작들을 수행할 때 다양한 상이한 유형들의 데이터 및/또는 다른 유형들의 정보를 이용 및/또는 생성할 수 있다. 데이터는 또한 임의의 다른 유형의 입력 데이터/정보 및/또는 출력 데이터/정보를 포함할 수 있다. 예를 들어, 적어도 하나의 실시예에서, 방법(10)의 적어도 하나의 인스턴스는 예를 들어 하나 이상의 데이터베이스들과 같은 하나 이상의 상이한 유형들의 소스들로부터의 정보를 액세스, 처리, 및/또는 달리 이용할 수 있다. 적어도 하나의 실시예에서, 데이터베이스 정보의 적어도 일부분은 하나 이상의 로컬 및/또는 원격 메모리 디바이스들과의 통신을 통해 액세스될 수 있다. 부가적으로, 방법(10)의 적어도 하나의 인스턴스는 예를 들어 로컬 메모리 및/또는 원격 메모리 디바이스들에 저장될 수 있는 하나 이상의 상이한 유형들의 출력 데이터/정보를 생성할 수 있다.In at least one embodiment, a given instance of the method 10 may be used to detect various different types of data and / or actions when performing certain tasks and / or operations, including detection of a hands-free context as described herein. Or other types of information. The data may also include any other type of input data / information and / or output data / information. For example, in at least one embodiment, at least one instance of the method 10 may access, process, and / or otherwise use information from one or more different types of sources, such as, for example, one or more databases . In at least one embodiment, at least a portion of the database information may be accessed through communication with one or more local and / or remote memory devices. Additionally, at least one instance of the method 10 may generate one or more different types of output data / information that may be stored, for example, in local memory and / or remote memory devices.

적어도 하나의 실시예에서, 방법(10)의 주어진 인스턴스의 초기 구성은 하나 이상의 상이한 유형들의 초기화 파라미터들을 사용하여 수행될 수 있다. 적어도 하나의 실시예에서, 초기화 파라미터들의 적어도 일부분은 하나 이상의 로컬 및/또는 원격 메모리 디바이스들과의 통신을 통해 액세스될 수 있다. 적어도 하나의 실시예에서, 방법(10)의 인스턴스에 제공된 초기화 파라미터들의 적어도 일부분은 입력 데이터/정보에 대응할 수 있고/있거나 이로부터 도출될 수 있다.In at least one embodiment, the initial configuration of a given instance of the method 10 may be performed using one or more different types of initialization parameters. In at least one embodiment, at least a portion of the initialization parameters may be accessed through communication with one or more local and / or remote memory devices. In at least one embodiment, at least a portion of the initialization parameters provided to the instance of method 10 may correspond to / derived from / from the input data / information.

도 7의 특정한 예에서, 단일 사용자는 스피치 입력 능력들을 갖는 클라이언트 애플리케이션으로부터의 네트워크를 통해 다중모드 가상 어시스턴트(1002)의 인스턴스를 액세스한다고 가정된다. 일 실시예에서, 어시스턴트(1002)는 이동 컴퓨팅 디바이스, 개인 휴대 정보 단말기, 이동 전화기, 스마트폰, 랩탑, 태블릿 컴퓨터, 소비자 전자 디바이스, 음악 플레이어 등과 같은 디바이스(60) 상에 설치된다. 어시스턴트(1002)는 사용자들이 그래픽 사용자 인터페이스의 직접 조작 및/또는 표시뿐만 아니라 음성 입력 및 출력을 통해(예를 들어, 터치스크린을 통해) 어시스턴트(1002)와 상호작용하게 하는 사용자 인터페이스와 관련되어 동작한다.In the particular example of FIG. 7, a single user is assumed to access an instance of the multimode virtual assistant 1002 over the network from a client application with speech input capabilities. In one embodiment, the assistant 1002 is installed on a device 60 such as a mobile computing device, a personal digital assistant, a mobile phone, a smartphone, a laptop, a tablet computer, a consumer electronic device, a music player, The assistant 1002 is associated with a user interface that allows users to interact with the assistant 1002 through direct input and / or output of the graphical user interface as well as voice input and output (e.g., via a touch screen) do.

디바이스(60)는 디바이스가 핸즈-프리 콘텍스트에 있는지 여부를 검출하기 위해(20) 분석될 수 있는 현재 상태(11)를 갖는다. 핸즈-프리 콘텍스트는 자동이든 수동이든 간에 임의의 적용 가능한 검출 메커니즘 또는 메커니즘들의 조합을 사용하여, 상태(11)에 기초하여 검출될 수 있다(20). 예들이 위에 기재되어 있다.The device 60 has a current state 11 that can be analyzed 20 to detect whether the device is in a hands-free context. The hands-free context may be detected 20 based on state 11, using any applicable detection mechanism or combination of mechanisms, whether automatic or manual. Examples are described above.

핸즈-프리 콘텍스트가 검출될 때(20), 그 정보는, 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 9월 30일자로 출원된, 발명의 명칭이 "가상 어시스턴트에서의 명령들의 처리를 용이하게 하기 위한 콘텍스트 정보의 사용"인 관련된 미국 특허 출원 제13/250,854호에 설명된 바와 같이, 어시스턴트에 다양한 프로세스들을 알리기 위해 사용될 수 있는 다른 콘텍스트 정보(1000)에 부가된다.When the hands-free context is detected (20), the information is provided in the name of the invention, which was filed on September 30, 2011, the entire disclosure of which is incorporated herein by reference, To the other context information 1000 that can be used to notify various processes to the assistant, as described in related US patent application Ser. No. 13 / 250,854, entitled " Use of context information to facilitate processing ".

스피치 입력이 유도되며 해석된다(100). 유도는 임의의 적합한 모드로 프롬프트들을 제공하는 것을 포함할 수 있다. 따라서, 핸즈-프리 콘텍스트가 검출되는지 여부에 의존하여, 다양한 실시예들에서, 어시스턴트(1002)는 몇몇 입력 모드들 중 하나 이상을 제공할 수 있다. 이들은 예를 들어 하기를 포함할 수 있다:A speech input is derived and interpreted (100). Induction may include providing prompts in any suitable mode. Thus, depending on whether a hands-free context is detected, in various embodiments, the assistant 1002 may provide one or more of several input modes. These may include, for example, the following:

· 활성 타이핑-입력 유도 절차를 유발할 수 있는 타이핑 입력용 인터페이스;Active typing-an interface for typing input that can trigger an input induction procedure;

· 활성 스피치 입력 유도 절차를 유발할 수 있는 스피치 입력용 인터페이스.An interface for speech input that can trigger an active speech input induction procedure.

· 활성 GUI-기반 입력 유도를 유발할 수 있는, 메뉴로부터 입력들을 선택하기 위한 인터페이스.An interface for selecting inputs from a menu that can trigger an active GUI-based input induction.

예를 들어, 핸즈-프리 콘텍스트가 검출된다면, 스피치 입력은 톤 또는 다른 가청 프롬프트에 의해 유도될 수 있으며, 사용자의 스피치는 텍스트로서 해석될 수 있다. 그러나, 당업자는 다른 입력 모드들이 제공될 수 있음을 인식할 것이다.For example, if a hands-free context is detected, the speech input may be derived by a tone or other audible prompt, and the user's speech may be interpreted as text. However, those skilled in the art will recognize that other input modes may be provided.

단계(100)의 출력은 입력 스피치의 텍스트의 한 세트의 후보 해석들일 수 있다. 이러한 세트의 후보 해석들은 언어 해석기(2770)(자연 언어 프로세서(natural language processor) 또는 NLP로 또한 지칭됨)에 의해 처리되며(200), 이는 텍스트 입력을 분석하여 사용자의 의도의 한 세트의 가능한 의미론적 해석들을 생성한다.The output of step 100 may be a set of candidate interpretations of the text of the input speech. Candidate interpretations of this set are processed (200) by a language interpreter 2770 (also referred to as a natural language processor or NLP), which analyzes the text input to determine a set of possible meanings Generate theoretical interpretations.

단계(300)에서, 사용자의 의도의 이들 표현(들)은 다이얼로그 흐름 프로세서(2780)에 전달되며, 이는 태스크 단계들로서 사용자의 의도를 조작화하기 위해 다이얼로그 및 흐름 분석 절차의 실시예를 구현한다. 다이얼로그 흐름 프로세서(2780)는 의도의 어떤 해석이 가장 가능성이 높은지를 결정하고, 태스크 모델의 파라미터들 및 도메인 모델들의 인스턴스들에 이러한 해석을 매핑시키며, 태스크 흐름에서의 다음 흐름 단계를 결정한다. 적절하다면, 핸즈-프리 동작에 적응된 하나 이상의 태스크 흐름 단계(들)가 선택된다(310). 예를 들어, 전술된 바와 같이, 텍스트 메시지를 수정하기 위한 태스크 흐름 단계(들)는 핸즈-프리 콘텍스트가 검출될 때 상이할 수 있다.At step 300, these representations of the user's intent are conveyed to the dialog flow processor 2780, which implements an embodiment of the dialogue and flow analysis procedure to manipulate the user's intent as the task steps. Dialogue flow processor 2780 determines which interpretation of intent is most likely, maps these interpretations to the task model parameters and instances of domain models, and determines the next flow step in the task flow. If appropriate, one or more task flow step (s) adapted to the hands-free operation is selected (310). For example, as described above, the task flow step (s) for modifying a text message may differ when a hands-free context is detected.

단계(400)에서, 식별된 흐름 단계(들)가 실행된다. 일 실시예에서, 흐름 단계(들)의 호출은 사용자의 요청 대신에 한 세트의 서비스들을 호출하는, 서비스 조합 컴포넌트(2782)에 의해 수행된다. 일 실시예에서, 이들 서비스들은 일부 데이터를 공통 결과에 제공한다.In step 400, the identified flow step (s) is executed. In one embodiment, the invocation of the flow step (s) is performed by a service combination component 2782 that invokes a set of services instead of the user's request. In one embodiment, these services provide some data to a common result.

단계(500)에서, 다이얼로그 응답이 생성된다. 일 실시예에서, 다이얼로그 응답 생성(500)은 핸즈-프리 콘텍스트의 상태에 의해 영향을 받는다. 따라서, 핸즈-프리 콘텍스트가 검출될 때, 상이하고/하거나 부가적인 다이얼로그 유닛들이 오디오 채널을 사용하여 프리젠테이션을 위해 선택될 수 있다(510). 예를 들어, "전송할 준비가 되었습니까?"와 같은 부가적인 프롬프트들이 구두로 말하여질 수 있으며, 반드시 스크린 상에 표시되는 것은 아닐 수 있다. 일 실시예에서, 핸즈-프리 콘텍스트의 검출은 예를 들어 입력을 확인하기 위해, 부가적인 입력(520)에 대한 프롬프팅에 영향을 줄 수 있다.At step 500, a dialog response is generated. In one embodiment, the dialog response generator 500 is affected by the state of the hands-free context. Thus, when a hands-free context is detected, different and / or additional dialog units may be selected for presentation using the audio channel (510). For example, additional prompts such as "Are you ready to transmit? &Quot; may be verbally spoken and may not necessarily be displayed on the screen. In one embodiment, the detection of the hands-free context may affect the prompting for the additional input 520, for example to confirm the input.

단계(700)에서, 다중모드 출력(일 실시예에서, 구두 및 시각 콘텐트를 포함함)이 사용자에게 제공되고, 사용자는 이어서 스피치 입력을 사용하여 선택적으로 가시 응답할 수 있다.At step 700, a multimode output (including, in one embodiment, verbal and visual content) is provided to the user and the user can then selectively respond visibly using the speech input.

응답을 보고/보거나 들은 후, 사용자가 끝낸다면(790), 방법은 종료한다. 사용자가 끝내지 않는다면, 루프의 다른 반복이 단계(100)로 되돌아감으로써 개시된다.If the user has finished (790) after seeing / viewing or listening to the response, the method ends. If the user does not finish, another iteration of the loop is initiated by returning to step 100.

본 명세서에 설명된 바와 같이, 검출된 핸즈-프리 콘텍스트를 포함한 콘텍스트 정보(1000)는 방법(10)의 다양한 단계들에 영향을 미치도록 시스템의 다양한 컴포넌트들에 의해 사용될 수 있다. 예를 들어, 도 7에 묘사된 바와 같이, 핸즈-프리 콘텍스트를 포함한 콘텍스트(1000)는 단계(100, 200, 300, 310, 500, 510, 및/또는 520)들에서 사용될 수 있다. 그러나, 당업자는 핸즈-프리 콘텍스트를 포함한 콘텍스트 정보(1000)의 사용이 이들 특정 단계들로 한정되지 않으며, 시스템은 본 발명의 필수적인 특성들로부터 벗어나지 않고, 다른 부분들에서 또한 콘텍스트 정보를 사용할 수 있다는 것을 인식할 것이다. 어시스턴트(1002)의 동작의 다양한 단계들에서 콘텍스트(1000)의 사용의 추가 설명은, 그 전체 개시 내용들이 본 명세서에 참고로 포함된, 2011년 9월 30일자로 출원된, 발명의 명칭이 "가상 어시스턴트에서의 명령들의 처리를 용이하게 하기 위한 콘텍스트 정보의 사용"인 관련된 미국 특허 출원 제13/250,854호에 그리고 2009년 6월 5일자로 출원된, "콘텍스트 보이스 명령"에 대한 관련된 미국 특허 출원 제12/479,477호에 제공되어 있다.Context information 1000, including the detected hands-free context, as described herein, can be used by various components of the system to affect the various stages of the method 10. For example, as depicted in FIG. 7, a context 1000 including a hands-free context may be used in steps 100, 200, 300, 310, 500, 510, and / However, those skilled in the art will appreciate that the use of context information 1000, including the hands-free context, is not limited to these specific steps, and that the system can also use context information in other parts without deviating from the essential characteristics of the present invention &Lt; / RTI > Further description of the use of context 1000 in various steps of the operation of the assistant 1002 may be found in U.S. Patent Application Serial No. 10/1992, filed September 30, 2011, the entire disclosure of which is incorporated herein by reference, Related US Patent Application No. 13 / 250,854, entitled " Use of Context Information to Facilitate Processing of Instructions in a Virtual Assistant ", and related U.S. Patent Application, entitled "Context Voice Instructions, " filed June 5, 12 / 479,477.

게다가, 당업자는 방법(10)의 상이한 실시예들이 도 7에 묘사된 특정 실시예에 예시된 것들 외에 부가적인 특징들 및/또는 동작들을 포함할 수 있고/있거나 도 7의 특정 실시예에 예시된 바와 같은 방법(10)의 동작들 및/또는 특징들의 적어도 일부분을 생략할 수 있다는 것을 인식할 것이다.In addition, those skilled in the art will appreciate that different embodiments of method 10 may include additional features and / or operations in addition to those illustrated in the particular embodiment depicted in Figure 7 and / At least a portion of the operations and / or features of the method 10 as described above may be omitted.

핸즈-프리 콘텍스트에 대한 단계(100, 200, 300, 310, 500, 510, 및/또는 520)들의 적응이 이하에 보다 상세히 설명된다.The adaptation of steps 100, 200, 300, 310, 500, 510, and / or 520 for the hands-free context is described in more detail below.

핸즈-프리 콘텍스트에 대한 입력 유도 및 해석(100)의 적응Adaptation of input induction and interpretation (100) to a hands-free context

스피치 입력의 유도 및 해석(100)은, 수 개의 방식들 중 임의의 것으로, 개별적으로 또는 임의의 조합으로, 핸즈-프리 콘텍스트에 적응될 수 있다. 전술된 바와 같이, 일 실시예에서, 핸즈-프리 콘텍스트가 검출된다면, 스피치 입력은 톤 및/또는 다른 가청 프롬프트에 의해 유도될 수 있으며, 사용자의 스피치는 텍스트로서 해석된다. 일반적으로, 다중모드 가상 어시스턴트(1002)는 오디오 입력을 위한 (예를 들어, 블루투스-연결 마이크로폰들 또는 다른 부착된 주변 장치들과 같은) 다수의 가능한 메커니즘들, 및 어시스턴트(1002)를 호출하기 위한 (예를 들어, 주변 장치 상에서의 버튼을 누르는 것 또는 디바이스(60)에 근접하여 모션 제스처를 사용하는 것과 같은) 다수의 가능한 메커니즘들을 제공할 수 있다. 어시스턴트(1002)가 어떻게 호출되는지 그리고/또는 어떤 메커니즘이 오디오 입력을 위해 사용되는지에 관한 정보는 핸즈-프리 콘텍스트가 활성인지 여부를 표시하기 위해 사용될 수 있으며 핸즈-프리 경험을 변경하기 위해 사용될 수 있다. 더 구체적으로, 그러한 정보는 입력 및 출력을 위해 특정한 오디오 경로를 사용하도록 단계(100)에 지시하기 위해 사용될 수 있다.The derivation and interpretation (100) of the speech input may be adapted to the hands-free context, in any of several ways, individually or in any combination. As described above, in one embodiment, if a hands-free context is detected, the speech input may be derived by a tone and / or other audible prompt, and the user's speech is interpreted as text. In general, the multimode virtual assistant 1002 includes a number of possible mechanisms for audio input (e.g., Bluetooth-connected microphones or other attached peripherals) and a plurality of possible mechanisms for calling the assistant 1002 (E. G., By pressing a button on a peripheral device or using a motion gesture in close proximity to the device 60). Information about how the assistant 1002 is invoked and / or what mechanism is used for audio input can be used to indicate whether the hands-free context is active and can be used to modify the hands-free experience . More specifically, such information may be used to direct step 100 to use a particular audio path for input and output.

게다가, 핸즈-프리 콘텍스트가 검출될 때, 오디오 입력 디바이스들이 사용되는 방식이 변경될 수 있다. 예를 들어, 핸즈-온 모드에서, 인터페이스는 어시스턴트(1002)가 스피치 입력을 듣는 것을 시작하게 하기 위해 사용자가 버튼을 누르거나 물리적인 제스처를 만드는 것을 요구할 수 있다. 대조적으로, 핸즈-프리 모드에서, 인터페이스는 어시스턴트(1002)에 의한 출력의 모든 인스턴스 후 입력을 위해 계속해서 프롬프팅할 수 있거나, 양 방향들로 연속적인 스피치를 허용할 수 있다(어시스턴트(1002)가 여전히 말하는 동안 사용자가 어시스턴트(1002)를 중단시킬 수 있게 함).In addition, when the hands-free context is detected, the manner in which the audio input devices are used can be changed. For example, in the hands-on mode, the interface may require the user to press a button or make a physical gesture to cause the assistant 1002 to begin listening to the speech input. In contrast, in the hands-free mode, the interface may continue to prompt for input after all instances of output by the assistant 1002, or may allow continuous speech in both directions (assistant 1002) Allowing the user to interrupt the assistant 1002 while still speaking.

핸즈-프리 콘텍스트에 대한 자연 언어 처리(200)의 적응Adaptation of Natural Language Processing (200) to Hands-Free Context

자연 언어 처리(NLP)(200)는, 예를 들어, 특히 핸즈-프리 동작에 잘 맞춰진 특정한 음성 응답들에 대한 지원을 부가함으로써, 핸즈-프리 콘텍스트에 적응될 수 있다. 그러한 응답은, 예를 들어 "예", "메시지를 읽어" 그리고 "그것을 변경해"를 포함할 수 있다. 일 실시예에서, 그러한 응답들에 대한 지원은 핸즈-온 상황에서 사용 가능한 음성 명령들에 대한 지원에 더하여 제공될 수 있다. 따라서, 예를 들어, 일 실시예에서, 사용자는 스크린 상에 나타나는 명령을 말함으로써 그래픽 사용자 인터페이스를 동작시킬 수 있을 것이다(예를 들어, "전송"으로 라벨링된 버튼이 스크린 상에 나타날 때, 음성 단어 "전송" 및 그의 의미론적 등가물들을 이해하기 위한 지원이 제공될 수 있다). 핸즈-프리 콘텍스트에서, 사용자가 스크린을 볼 수 없을 수 있다는 사실을 책임지기 위해 부가적인 명령들이 인식될 수 있다.The natural language processing (NLP) 200 can be adapted to the hands-free context, for example, by adding support for specific voice responses that are well-suited to hands-free operation in particular. Such a response may include, for example, "yes "," read message ", and "change it. In one embodiment, support for such responses may be provided in addition to support for voice commands available in a hands-on situation. Thus, for example, in one embodiment, the user would be able to operate the graphical user interface by saying commands appearing on the screen (e.g., when a button labeled "Transmit" appears on the screen, Support for understanding the word "transmission" and its semantic equivalents can be provided). In the hands-free context, additional commands may be recognized to account for the fact that the user may not be able to view the screen.

핸즈-프리 콘텍스트의 검출은 또한 어시스턴트(1002)에 의한 단어들의 해석을 변경할 수 있다. 예를 들어, 핸즈-프리 콘텍스트에서, 어시스턴트(1002)는 명령 "조용히 해!" 및 그의 의미론적 변형들을 인식하기 위해 그리고 그러한 코멘트에 응답하여 모든 오디오 출력을 끄기 위해 조정될 수 있다. 비-핸즈-프리 콘텍스트에서, 그러한 명령은 적합하지 않은 것으로서 무시될 수 있다.The detection of the hands-free context may also change the interpretation of the words by the assistant 1002. [ For example, in the hands-free context, the assistant 1002 sends the command "Keep quiet! And to turn off all audio output in order to recognize its semantic variants and in response to such comments. In the non-hands-free context, such an instruction may be ignored as not suitable.

핸즈-프리 콘텍스트에 대한 태스크 흐름(300)의 적응The adaptation of the task flow 300 to the hands-free context

사용자의 의도와 연관된 태스크(들), 태스크(들)에 대한 파라미터(들) 및/또는 실행을 위한 태스크 흐름 단계(300)들을 식별하는 것을 포함하는 단계(300)는 수 개의 방식들 중 임의의 것으로, 개별적으로 또는 조합하여 핸즈-프리 콘텍스트에 적응될 수 있다.Step 300, which includes identifying task (s) associated with a user's intention, parameter (s) for task (s) and / or task flow steps 300 for execution, , Can be adapted to the hands-free context individually or in combination.

일 실시예에서, 핸즈-프리 동작에 적응된 하나 이상의 부가적인 태스크 흐름 단계(들)가 동작을 위해 선택된다(310). 예들은 구두로 콘텐트를 검토하고 확인하기 위한 단계들을 포함한다. 게다가, 핸즈-프리 콘텍스트에서, 어시스턴트(1002)는, 그렇지 않을 경우 디스플레이 스크린 상에 제공될, 결과들의 리스트들을 읽을 수 있다. 리스트에서의 개개의 아이템들과 상호작용하기 위해 구두 명령들이 제공될 수 있다. 예를 들어, 수 개의 인입 텍스트 메시지들이 사용자에게 제공될 것이며, 핸즈-프리 콘텍스트가 검출된다면, 식별된 태스크 흐름 단계들은 개별적으로 각각의 텍스트 메시지를 소리내어 읽는 것, 및 사용자가 음성 명령을 제공하게 하기 위해 각각의 메시지 이후에 일시 정지하는 것을 포함할 수 있다.In one embodiment, one or more additional task flow step (s) adapted for hands-free operation is selected for operation (310). Examples include orally verifying and identifying content. In addition, in the hands-free context, the assistant 1002 can read lists of results that would otherwise be provided on the display screen. Oral commands may be provided to interact with the individual items in the list. For example, a number of incoming text messages may be provided to the user, and if a hands-free context is detected, the identified task flow steps may include separately reading each text message aloud and providing the user with a voice command And then pausing after each message.

일 실시예에서, 태스크 흐름들은 핸즈-프리 콘텍스트를 위해 수정될 수 있다. 예를 들어, 노트 애플리케이션에서 메모하기 위한 태스크 흐름은 통상적으로 콘텐트를 위해 프롬프팅하는 것 및 즉시 콘텐트를 노트에 부가하는 것을 수반할 수 있다. 그러한 동작은, 콘텐트가 시각적 인터페이스에서 즉시 보여지고 직접 조작에 의한 수정에 즉시 이용 가능한 핸즈-온 환경에서 적절할 수 있다. 그러나, 핸즈-프리 콘텍스트가 검출될 때, 태스크 흐름은 예를 들어 콘텐트를 구두로 검토하며 콘텐트가 노트에 부가되기 전에 콘텐트의 수정을 허용하도록 수정될 수 있다. 이는 스피치 구술 오류들이 영구적인 문서에 저장되기 전에 사용자가 스피치 구술 오류들을 파악하게 한다.In one embodiment, the task flows may be modified for a hands-free context. For example, a task flow for taking notes in a note application may typically involve prompting for the content and immediately adding the content to the note. Such an operation may be appropriate in a hands-on environment where content is immediately visible in a visual interface and immediately available for direct manipulation correction. However, when a hands-free context is detected, the task flow may be modified to, for example, review the content verbally and allow modification of the content before the content is added to the note. This allows the user to identify speech dictation errors before the speech dictation errors are stored in a permanent document.

일 실시예에서, 핸즈-프리 콘텍스트는 또한 주어진 시간에 허용되는 태스크들을 제한하기 위해 사용될 수 있다. 예를 들어, 사용자의 디바이스가 핸즈-프리 콘텍스트에 잇을 때, 또는 차량의 운전과 같은 특정 핸즈-프리 콘텍스트에 있을 때, 비디오들을 재생하는 것을 허용하지 않도록 소정의 방침이 구현될 수 있다.In one embodiment, the hands-free context may also be used to limit tasks that are allowed at a given time. For example, a predetermined policy may be implemented to not allow videos to be played back when the user's device is in a hands-free context, or when in a particular hands-free context, such as driving a vehicle.

일 실시예에서, 어시스턴트(1002)는 단지 핸즈-프리 콘텍스트에 적용 가능한 담론(discourse) 및/또는 태스크들의 이용 가능한 전체 도메인들을 만들 수 있다. 예들은 제한된 시력 또는 손들의 제한된 사용을 갖는 사람을 위해 설계된 것들과 같은 접근성 모드들을 포함한다. 이들 접근성 모드들은 예를 들어 "버튼을 누르세요" 또는 "스크롤 업하세요"와 같은 명령들을 인식하기 위해, 주어진 애플리케이션 플랫폼 상에서의 임의의 GUI를 동작시키기 위한 핸즈-프리 대안들로서 구현되는 명령들을 포함한다. 단지 핸즈-프리 모드들에서 적용 가능할 수 있는 다른 태스크들은 "내 자동차의 블루투스 키트를 사용해" 또는 "속도를 낮춰[텍스트-투-스피치 출력]"와 같은 핸즈-프리 경험 자체와 관련된 태스크들을 포함한다.In one embodiment, the assistant 1002 may only make available the entire available domains of discourse and / or tasks applicable to the hands-free context. Examples include accessibility modes such as those designed for people with limited vision or limited use of hands. These accessibility modes include, for example, instructions implemented as hands-free alternatives for operating any GUI on a given application platform to recognize commands such as "press a button" or " . Other tasks that may only be applicable in hands-free modes include tasks related to the hands-free experience itself such as " using my car's Bluetooth kit "or" slowing down [text- .

핸즈-프리 콘텍스트에 대한 다이얼로그 생성(500)의 적응Adaptation of Dialog Generation (500) for Hands-Free Context

다양한 실시예들에서, 다수의 기술들 중 임의의 것이 핸즈-프리 콘텍스트에 적응하도록 다이얼로그 생성(500)을 수정하기 위해 사용될 수 있다.In various embodiments, any of a number of techniques may be used to modify the dialog generation 500 to adapt to the hands-free context.

핸즈-온 인터페이스에서, 사용자의 입력의 어시스턴트(1002)의 해석은 기록 시 반향될 수 있지만, 그러한 피드백은 핸즈-프리 콘텍스트에 있을 때 사용자에게 가시적이지 않을 수 있다. 따라서, 일 실시예에서, 핸즈-프리 콘텍스트가 검출될 때, 어시스턴트(1002)는 사용자의 입력을 의역하기 위해 텍스트-투-스피치(TTS) 기술을 사용한다. 그러한 의역은 선택적일 수 있으며; 예를 들어, 텍스트 메시지를 전송하기 이전에, 어시스턴트(1002)는 사용자가 디스플레이 스크린을 볼 수 없을지라도 사용자가 텍스트 메시지의 내용을 확인할 수 있도록 텍스트 메시지를 말할 수 있다.On the hands-on interface, the interpretation of the assistant's 1002 of the user's input may be echoed upon recording, but such feedback may not be visible to the user when in the hands-free context. Thus, in one embodiment, when a hands-free context is detected, the assistant 1002 uses a text-to-speech (TTS) technique to parse the user's input. Such term may be optional; For example, prior to sending a text message, the assistant 1002 may speak a text message so that the user can view the contents of the text message, even though the user can not see the display screen.

사용자의 스피치를 의역할 때, 그리고 스피치의 어떤 부분들을 의역할지에 대한 판단은 태스크- 및/또는 흐름-특정 다이얼로그들에 의해 구동될 수 있다. 예를 들어, "read my new message"와 같은 사용자의 음성 명령에 응답하여, 일 실시예에서, 어시스턴트(1002)는 명령어를 의역하지 않는데, 그 이유는 명령어가 이해되었음이 메시지를 읽는 어시스턴트(1002)의 응답으로부터 명백하기 때문이다. 그러나, 사용자의 입력이 단계(100)에서 인식되지 않거나 단계(200)에서 이해되지 않을 때와 같은 다른 상황들에서, 어시스턴트(1002)는 입력이 이해되지 않은 이유를 사용자에게 알리기 위해 사용자의 음성 입력을 의역하려고 시도할 수 있다. 예를 들어, 어시스턴트(1002)는 "저는 'reel my newt massage'를 이해하지 못했습니다. 다시 시도하세요"를 말할 수 있다.The determination of when to interpret a user's speech and which portions of speech to translate may be driven by task- and / or flow-specific dialogs. For example, in response to a user's voice command, such as "read my new message ", in one embodiment, the assistant 1002 does not parse the command because the command is understood. ) Is clear from the response. However, in other situations, such as when the user's input is not recognized at step 100 or not at step 200, the assistant 1002 may use the voice input of the user to inform the user why the input is not understood Can try to paraphrase. For example, the assistant 1002 can say "I did not understand the reel my newt massage " Please try again."

일 실시예에서, 정보의 구두 의역은 디바이스 상에서의 개인 데이터와 다이얼로그 템플릿들을 조합할 수 있다. 예를 들어, 텍스트 메시지를 읽을 때, 일 실시예에서, 어시스턴트(1002)는 "당신은 $사람으로부터 새로운 메시지를 가지고 있다. $메시지를 말한다." 형태의 변수들을 갖는 음성 출력 템플릿을 사용한다. 상기 템플릿에서의 변수들은 사용자 데이터로 대체될 수 있으며 그 후 디바이스(60) 상에서 구동하는 프로세스에 의해 스피치로 바뀐다. 본 발명이 클라이언트/서버 환경에 구현되는 일 실시예에서, 그러한 기술은, 개인 데이터가 디바이스(60) 상에 남아있을 수 있으며 서버로부터의 출력 템플릿의 수신 시 채워질 수 있기 때문에, 여전히 출력의 개인화를 허용하면서 사용자들의 프라이버시를 보호하는 것을 도울 수 있다.In one embodiment, the verbal paraphrase of the information may combine the dialogue templates with the personal data on the device. For example, when reading a text message, in one embodiment, the assistant 1002 reads "You have a new message from $ person, say $ message." We use a speech output template with variables of the form. The variables in the template can be replaced with user data and then switched to speech by a process running on the device 60. [ In one embodiment where the present invention is implemented in a client / server environment, such a technique may still allow personalization of the output, since personal data may remain on the device 60 and be filled upon receipt of an output template from the server Allowing users to protect their privacy.

일 실시예에서, 핸즈-프리 콘텍스트가 검출될 때, 특히 핸즈-프리 콘텍스트들에 맞춰진 상이한 그리고/또는 부가적인 다이얼로그 유닛들이 오디오 채널을 사용하여 프리젠테이션을 위해 선택될 수 있다(510). 어떤 다이얼로그 유닛들을 선택할지를 결정하기 위한 코드 또는 규칙들은 핸즈-프리 콘텍스트의 세부 사항들에 민감할 수 있다. 이러한 방식으로, 일반적인 다이얼로그 생성 컴포넌트는 반드시 상이한 핸즈-프리 상황들에 대한 별개의 사용자 경험을 만들어내지 않고도, 다양한 핸즈-프리 변화들을 지원하도록 적응되고 확장될 수 있다.In one embodiment, when a hands-free context is detected, different and / or additional dialog units, particularly adapted to the hands-free contexts, may be selected for presentation using the audio channel (510). The code or rules for determining which dialogue units to select may be sensitive to the details of the hands-free context. In this way, a generic dialog generation component can be adapted and extended to support a variety of hands-free variations without necessarily creating a separate user experience for the different hands-free situations.

일 실시예에서, 텍스트 및 GUI 출력 유닛들을 생성하는 동일한 메커니즘은 오디오(음성 단어) 출력 양식을 위해 맞춰진 텍스트들로 주석을 달 수 있다. 예를 들어:In one embodiment, the same mechanism for generating text and GUI output units can annotate with tailored text for the audio (spoken word) output form. E.g:

· 일 실시예에서, 다이얼로그 생성 컴포넌트는 TTS를 사용하여 다이얼로그 생성 컴포넌트의 기록된 다이얼로그 응답들의 모두를 읽음으로써 핸즈-프리 콘텍스트에 대해 적응될 수 있다.In one embodiment, the dialog generation component can be adapted to the hands-free context by reading all of the recorded dialog responses of the dialog generation component using the TTS.

· 일 실시예에서, 다이얼로그 생성 컴포넌트는 TTS를 통해, 그리고 다른 다이얼로그 응답들에 대해 TTS 변형들을 사용하여, 다이얼로그 생성 컴포넌트의 기록된 다이얼로그 응답의 일부를 글자 그대로 읽음으로써 핸즈-프리 콘텍스트에 대해 적응될 수 있다.In one embodiment, the dialog generation component is adapted to the hands-free context by literally reading some of the recorded dialog response of the dialog generation component, via TTS, and using TTS variants for other dialog responses .

· 일 실시예에서, 그러한 주석 달기는 다이얼로그 생성으로부터 사용자 데이터를 분리하는 가변적 대체 템플릿 메커니즘을 지원한다.In one embodiment, such annotation supports a flexible alternate template mechanism for separating user data from dialog generation.

· 일 실시예에서, 그래픽 사용자 인터페이스 요소들은 이들이 어떻게 TTS를 통해 구두로 의역되어야 하는지를 표시하는 텍스트로 주석을 달 수 있다.In one embodiment, the graphical user interface elements may annotate with text indicating how they should be verbally translated via the TTS.

· 일 실시예에서, TTS 텍스트들은, 그렇지 않을 경우 구두법(punctuation) 또는 시각적 렌더링에서 전달될 것을 말로 전달하기 위해, 보이스, 말하기 속도, 피치, 일시 정지들, 및/또는 다른 파라미터들이 사용되도록 조정될 수 있다. 예를 들어, 사용자의 단어들을 다시 반복할 때 사용되는 보이스는 다른 다이얼로그 유닛들을 위해 사용되는 것과는 상이한 보이스일 수 있거나 상이한 운율을 사용할 수 있다. 다른 예로서, 보이스 및/또는 운율은 콘텐트 또는 명령어들이 말하여지는지 여부에 의존하여 상이할 수 있다. 다른 예로서, 일시 정지들은 이해를 돕기 위해, 상이한 의미들을 갖는 텍스트의 섹션들 사이에 삽입될 수 있다. 예를 들어, 메시지를 의역하며 확인을 요청할 때, 일시 정지는 콘텐트 "당신의 메시지는 …라고 읽혀집니다"의 의역과 확인 "전송할 준비가 되었습니까?"에 대한 프롬프트 사이에 삽입될 수 있다.In one embodiment, the TTS texts can be adjusted to use voice, speaking rate, pitch, pauses, and / or other parameters to convey what is otherwise conveyed in punctuation or visual rendering. have. For example, the voices used to repeat the user's words may be voices different from those used for other dialog units, or may use a different rhyme. As another example, the voice and / or rhyme may differ depending on whether the content or commands are spoken. As another example, pauses can be inserted between sections of text with different meanings, for ease of understanding. For example, when you parse a message and ask for confirmation, the pause can be inserted between the paraphrase of the content "Your message is read ..." and the prompt "Are you ready to transmit?".

일 실시예에서, 비-핸즈-프리 콘텍스트들은 핸즈-프리 콘텍스트들에 대해 전술된 바와 같이 TTS를 사용하는 유사한 메커니즘들을 사용하여 향상될 수 있다. 예를 들어, 다이얼로그는 기록된 텍스트 및 GUI 요소들에 더하여 구두-전용(verbal-only) 프롬프트들을 생성할 수 있다. 예를 들어, 일부 상황들에서, 어시스턴트(1002)는 전송 버튼의 온스크린 표시를 증강시키기 위해 구두로 "그것을 전송할까요?"라고 말할 수 있다. 일 실시예에서, 핸즈-프리 및 비-핸즈-프리 콘텍스트들 둘 모두를 위해 사용된 TTS 출력은 각각의 경우에 대해 맞춰질 수 있다. 예를 들어, 어시스턴트(1002)는 핸즈-프리 콘텍스트에 있을 때에 더 긴 일시 정지들을 사용할 수 있다.In one embodiment, the non-hands-free contexts may be enhanced using similar mechanisms using TTS as described above for the hands-free contexts. For example, a dialog may generate verbal-only prompts in addition to recorded text and GUI elements. For example, in some situations, the assistant 1002 may orally say "send it?" To augment the on-screen display of the send button. In one embodiment, the TTS output used for both hands-free and non-hands-free contexts can be tailored for each case. For example, the assistant 1002 may use longer pauses when in a hands-free context.

일 실시예에서, 핸즈-프리 콘텍스트의 검출은 또한 응답을 위해 사용자를 자동으로 프롬프팅할지 여부 및 그 시기를 결정하기 위해 사용될 수 있다. 예를 들어, 하나의 당사자가 다른 당사자가 듣는 동안 말하도록 어시스턴트(1002)와 사용자 사이에서의 상호작용이 사실상 동시 발생할 때, 설계 선택은 어시스턴트(1002)가 말한 후 어시스턴트(1002)가 사용자로부터의 스피치 입력을 자동으로 듣기 시작해야 하는지 여부 및 그 시기에 대해 이루어질 수 있다. 핸즈-프리 콘텍스트의 세부 사항들이 다이얼로그의 이러한 자동-시작-듣기 속성을 위한 다양한 방침들을 구현하기 위해 사용될 수 있다. 예들은, 제한 없이, 하기를 포함한다:In one embodiment, the detection of the hands-free context may also be used to determine when and whether to automatically prompt the user for a response. For example, when an interaction between an assistant 1002 and a user occurs substantially concurrently so that one party talks while the other party listens, the design selection is made by the assistant 1002 after the assistant 1002 has said, And whether or not the speech input should be automatically started to be heard. The details of the hands-free context can be used to implement various policies for this auto-start-listening attribute of the dialog. Examples include, without limitation, the following:

· 상시 자동-시작-듣기;· Always automatic - Start - Listen;

· 핸즈-프리 콘텍스트에 있을 때만 자동-시작-듣기;· Auto-start-listen only when in hands-free context;

· 소정의 태스크 흐름 단계들 및 다이얼로그 상태들에 대해서만 자동-시작-듣기;Automatic-start-listening only for certain task flow steps and dialog states;

· 핸즈-프리 콘텍스트에서의 소정의 태스크 흐름 단계들 및 다이얼로그 상태들에 대해서만 자동-시작-듣기.Automatic-start-listening only for certain task flow steps and dialog states in a hands-free context.

다른 실시예들에서, 핸즈-프리 콘텍스트의 검출은 또한, 예를 들어, 하기와 같은 다이얼로그의 다른 파라미터들에 대하여 선택에 영향을 미칠 수 있다:In other embodiments, the detection of a hands-free context may also affect selection for other parameters of the dialogue, for example:

· 사용자에게 제공하기 위한 옵션들의 리스트들의 길이;The length of lists of options to provide to the user;

· 리스트들을 읽을지 여부;Whether to read lists;

· 단일 또는 다수의 값의 답변들을 갖는 질문들을 할지 여부;· Whether to ask questions with single or multiple value answers;

· 단지 직접 조작 인터페이스를 사용하여 제공될 수 있는 데이터에 대해 프롬프팅할지 여부;· Whether to prompt for data that can only be provided using the direct manipulation interface;

따라서, 다양한 실시예들에서, 핸즈-프리 콘텍스트는, 일단 검출되면, 다중모드 가상 어시스턴트(1002)와 같은 복합 시스템의 다양한 처리 단계들을 적응시키기 위해 사용될 수 있는 시스템측 파라미터이다. 본 명세서에 설명된 다양한 방법들은 동일한 기본 시스템으로부터의 일련의 사용자 경험들을 지원하기 위해 핸즈-프리 콘텍스트들에 대한 어시스턴트(1002)의 일반적인 절차들을 적응시키기 위한 방식들을 제공한다.Thus, in various embodiments, the hands-free context is a system-side parameter that, once detected, can be used to adapt various processing stages of a complex system, such as a multimode virtual assistant 1002. The various methods described herein provide ways to adapt the general procedures of the assistant 1002 to the hands-free contexts to support a series of user experiences from the same basic system.

콘텍스트를 수집, 전달, 표현, 및 액세스하기 위한 다양한 메커니즘들이, 그 전체 개시 내용이 본 명세서에 참고로 포함된, 2011년 9월 30일자로 출원된, 발명의 명칭이 "가상 어시스턴트에서의 명령들의 처리를 용이하게 하기 위한 콘텍스트 정보의 사용"인 관련된 미국 특허 출원 제13/250,854호에 설명되어 있다. 당업자는 또한 그러한 기술들이 핸즈-프리 콘텍스트에 적용 가능하다는 것을 인식할 것이다.Various mechanisms for collecting, communicating, expressing, and accessing contexts are known in the art, entitled "Virtual Assistants, " filed on September 30, 2011, the entire disclosure of which is incorporated herein by reference. Use of context information to facilitate processing "in copending U. S. Patent Application Serial No. < RTI ID = 0.0 > 13 / 250,854. &Lt; / RTI > Those skilled in the art will also recognize that such techniques are applicable to hands-free contexts.

사용 케이스들Use cases

하기의 사용 케이스들이 핸즈-프리 콘텍스트에서 어시스턴트(1002)의 동작의 예들로서 제공된다. 당업자는 사용 케이스들이 예시적이며, 단지 예시 목적들을 위해 제공된다는 것을 인식할 것이다.The following use cases are provided as examples of the operation of the assistant 1002 in the hands-free context. Those skilled in the art will recognize that the use cases are illustrative and are provided for illustrative purposes only.

전화 사용 케이스들Phone use cases

일 실시예에서, 핸즈-프리 콘텍스트에 있을 때, 어시스턴트(1002)는 사용자가 디바이스를 탭핑하거나 달리 터치함이 없이 호출될 사람을 특정할 수 있다면 사용자가 누군가에게 전화를 할 수 있게 한다. 예들은 연락처 이름으로 전화하는 것, 전화 번호(사용자에 의해 인용된 숫자들)로 전화하는 것 등을 포함한다. 부가적인 음성 프롬프트들에 의해 모호성이 해결될 수 있다. 예들이 이하에 나타나 있다.In one embodiment, when in the hands-free context, the assistant 1002 allows the user to call someone if the user can specify who to call without tapping or otherwise touching the device. Examples include calling a contact name, dialing a telephone number (numbers quoted by the user), and the like. Ambiguities can be solved by additional voice prompts. Examples are shown below.

예 1: 모호하지 않은 연락처에 전화하기Example 1: Calling an Unambiguous Contact

· 사용자의 음성 입력: "아담 스미스에게 전화해"· Enter your voice: "Call Adam Smith"

· 어시스턴트(1002)의 음성 출력: "아담 스미스의 모바일에 전화하는 중입니다."· Assistant (1002) voice output: "I am calling Adam Smith's mobile."

· 전화가 걸린다· I get a phone call.

유사한 상호작용이 하기의 사용 케이스들 중 임의의 것에 대해 일어날 것이다:A similar interaction will occur for any of the following use cases:

· 이름으로 연락처에 전화 ("아담 스미스에게 전화해")· Call the contact by name ("Call Adam Smith")

· 이름, 비-디폴트 전화 번호로 연락처에 전화 ("아담 스미스 모바일에 전화해")· Call the contact by name, non-default phone number ("Call Adam Smith Mobile")

· 번호로 전화("800 555 1212에 전화해")· Call the number ("800 555 1212")

· 관계 별명으로 연락처에 전화 ("엄마에게 전화해")· Call your contact by relationship nickname ("Call Mom")

· 위치 별명으로 연락처에 전화 ("집에 전화해")· Call your contact by location nickname ("call home")

· 페이스타임(FaceTime)을 통해 전화 ("아담 스미스에게 페이스타임해")· Phone through FaceTime ("Face Time to Adam Smith")

· 콘텍스트로부터 다시 전화 ("그에게 다시 전화해")· Call back from the context ("Call him again")

예 2: 전화 호출을 취소Example 2: Cancel a phone call

· 사용자가 잘못된 사람에게 전화함을 알게 됨; 사용자는 전화를 취소하기 위해 홈 버튼을 한 번 누르거나, "정지"라 말한다· The user is notified that he is calling the wrong person; The user presses the Home button once to cancel the call, or says "Stop"

· 전화가 취소된다· The call is canceled.

예 3: 모호한 이름으로 전화Example 3: Call by ambiguous name

· 사용자의 음성 입력: "아담에게 전화해"· Enter your voice: "Call Adam"

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 아담에 대한 다수의 매칭들을 찾았습니다"· Voice output of assistant (1002): "<Voice change> We found many matches for Adam"

· 어시스턴트(1002)의 시각적 출력: "<보이스 변화> 아담에 대한 다수의 매칭들을 찾았습니다"Visual output of the assistant (1002): "<Voice change> I found many matches for Adam"

· 어시스턴트(1002)의 음성 출력: 이름들을 읽는다Voice output of the assistant 1002: Read names

· 어시스턴트(1002)의 시각적 출력:Visual output of the assistant 1002:

모호성 해소화(disambiguation) 메뉴Disambiguation Menu

아담 체이어 집Adam Chayer House

아담 샌들러 집Adam Sandler House

아담 스미스 모바일Adam Smith Mobile

· 사용자의 음성 입력: "아담 체이어" · Voice input by the user: "Adam Chayer"

· 어시스턴트(1002)의 음성 출력: "아담 체이어에게 전화하는 중입니다"· Voice output of assistant (1002): "I am calling Adam Chayer"

· 전화가 걸린다· I get a phone call.

예 4: 이중으로 모호한 이름으로 전화하기Example 4: Calling a double ambiguous name

· 사용자의 음성 입력: "폴에게 전화해"· Enter your voice: "Call Paul"

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 폴에 대한 다수의 매칭들을 찾았습니다"Voice output of assistant (1002): "< < Voice change > >

· 어시스턴트(1002)의 시각적 출력: "<보이스 변화> 폴에 대한 다수의 매칭들을 찾았습니다"Visual output of assistant (1002): "< < Voice change > >

모호성 해소화 메뉴Clear ambiguity menu

폴 홀든 집Paul Holden House

폴 마르코스 아이폰Paul Marcos iPhone

폴 마르코스 모바일Paul Marcos Mobile

폴 마르코스 집Paul Marcos House

파블로 피카소 집Pablo Picasso House

· 사용자의 음성 입력: "폴 마르코스"· Voice input by the user: "Paul Marcos"

· 어시스턴트(1002)의 음성 출력: "폴 마크코스에 대해 어떤 번호입니까: 아이폰, 모바일, 또는 집?"· Voice output of assistant (1002): "What is Paul Mark course number: iPhone, Mobile, or Home?"

· 어시스턴트(1002)의 시각적 출력: "폴 마르코스에 대해 어떤 번호입니까?"Visual output of the assistant (1002): "What is the number for Paul Marcos?"

모호성 해소화 메뉴Clear ambiguity menu

폴 마르코스 아이폰Paul Marcos iPhone

폴 마르코스 모바일Paul Marcos Mobile

폴 마르코스 집Paul Marcos House

· 사용자의 음성 입력: "모바일"· User's voice input: "Mobile"

· 어시스턴트(1002)의 음성 출력: "폴 마르코스 <일시정지> 모바일에 전화 중입니다"· Voice output of assistant (1002): "Paul Marcos <Pause> I am calling on mobile"

· 전화가 걸린다· I get a phone call.

예 5: 모호성이 없는, 이름으로 사업체에 전화하기Example 5: Calling a business by name without ambiguity

· 사용자의 음성 입력: "샌프란시스코에 있는 소니 메트레온에 전화해"· Enter your voice: "Call Sonny Metreon in San Francisco"

· 어시스턴트(1002)의 음성 출력: "4번가에 있는 AMC 로우스 메트레온 16을 찾았습니다"· Voice output of assistant (1002): "AMC rou mat metron 16 found on the 4th street"

· 어시스턴트(1002)의 시각적 출력: "4번가에 있는 AMC 로우스 메트레온 16을 찾았습니다"· Visual output of assistant (1002): "I found AMC Row Mettreon 16 on Fifth Avenue"

로컬 검색 발췌(Local Search Snippet)Local Search Snippet

AMC 로우스 메트레온 16AMC Low Mat Metron 16

샌프란시스코, 4번가 101San Francisco, 4th Avenue 101

(888) 262-4386(888) 262-4386

<취소 버튼> <통화 버튼><Cancel button> <Call button>

· 어시스턴트(1002)의 음성 출력: "전화를 걸까요?"Voice output of assistant (1002): "Would you like to make a call?"

· 사용자의 음성 입력: "전화해"· Enter your voice: "Call me"

· 어시스턴트(1002)의 음성 출력: "AMC 로우스 메트레온 16에 전화 중입니다"· Voice output of assistant (1002): "Calling AMC Low Metron 16"

· 전화가 걸린다· I get a phone call.

예 6: 다수의 매칭들이 있는, 이름으로 사업체에 전화하기Example 6: Calling a business by name, with multiple matches

· 사용자의 음성 입력: "월그린에 전화해" · Voice input by the user: "Call Wolgreen"

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 월그린에 대한 여러 개의 매칭들을 찾았습니다"· Voice output of assistant (1002): "I found several matches for <month change> <voice change>

· 어시스턴트(1002)의 시각적 출력: "<보이스 변화> 월그린에 대한 여러 개의 매칭들을 찾았습니다"Visual output of the assistant (1002): "I found several matches for the <green>

로컬 탐색 발췌들의 리스트List of local navigation excerpts

월그린 스토어 로스 가토스Wall Green Store Los Gatos

로스 가토스 N 산타 크루즈 애비뉴 423Los Gatos N Santa Cruz Avenue 423

월그린Wall Green

캘리포니아주 캠벨 캠벨 애비뉴 1570Campbell Campbell Avenue 1570, California

월그린Wall Green

캘리포니아주 쿠퍼티노 볼링거 로드 1 2001Cupertino Bowling Lord, California 1 2001

월그린Wall Green

캘리포니아주 새너제이 사라토가 애비뉴 1334San Jose, California Saratoga Avenue 1334

월그린Wall Green

캘리포니아주 새너제이 유니온 애비뉴 2680San Jose Union Avenue 2680, CA

· 어시스턴트(1002)의 음성 출력: "당신에게 이들을 읽어줄까요?"Voice output of Assistant (1002): "Can I read these to you?"

· 사용자의 음성 입력: "그래"· User's voice input: "Yes"

· 어시스턴트(1002)의 음성 출력: "로스 가토스의 노스 산타 크루즈 애비뉴 423에 있는 월그린 스토어 로스 가토스 <일시정지> 2. 캠벨의 캠벨 애비뉴 1570에 있는 월그린 <일시정지> 3. ..."· Voice of Assistant (1002): "Wolgreen Store Los Gatos at North Santa Cruz Avenue 423 in Los Gatos" Pause "2. Wolly Green <Pause> at Campbell Avenue 1570 in Campbell. "

· 사용자의 음성 입력: "쿠퍼티노에 있는 것"· User's voice input: "What's in Cupertino"

· 어시스턴트(1002)의 음성 출력: "쿠퍼티노에 있는 월그린에 전화 중입니다"· Audio output of assistant (1002): "I am calling Walton Green in Cupertino"

· 전화가 걸린다· I get a phone call.

예 6: 자동차에서 예정된 전화하기Example 6: Making a scheduled call from your car

· 사용자가 집으로 운전하기 시작한다· The user starts driving home

· 어시스턴트(1002)가 자동차내 리마인더(In Car Reminder)의 톤을 낸다.· The assistant (1002) tones in the car reminder.

리마인더 경보Reminder alarm

아담 스미스에게 전화하기Call Adam Smith

<닫기 버튼> <통화 버튼><Close button> <Call button>

· 사용자의 음성 입력: "읽어"· User's voice input: "Read"

· 어시스턴트(1002)의 음성 출력: "당신은 아담 스미스에게 전화할 리마인더가 있습니다"· Voice output of assistant (1002): "You have a reminder to call Adam Smith"

· 사용자의 음성 입력: "지금 그에게 전화해"· Enter your voice: "Call him now"

· 어시스턴트(1002)의 음성 출력: "아담 스미스의 모바일에 전화하는 중입니다"· Voice output of Assistant (1002): "I am calling Adam Smith's mobile"

· 전화가 걸린다· I get a phone call.

예 7: 새로운 Example 7: New 보이스voice 메일들을 재생하기 Playing mail

· 사용자의 음성 입력: "보이스 메일을 재생해" · Voice input by the user: "Play voice mail"

· 어시스턴트(1002)의 음성 출력: "당신은 3개의 새로운 보이스 메일 메시지들이 있습니다"Voice output of Assistant (1002): "You have three new voicemail messages"

· 어시스턴트(1002)는 프롬프트들을 갖고, 한 번에 하나씩, 보이스 메일들을 재생한다.Assistant 1002 has prompts and plays voice mails one at a time.

SMS 텍스트 메시징 사용 케이스들SMS text messaging use cases

예 8: 단일 텍스트 메시지 경보를 읽기Example 8: Reading a single text message alert

· 어시스턴트(1002)의 오디오 출력: 경보 또는 경보들을 읽거나, 인입 텍스트 메시지를 표시하는 사운드를 발생시킨다Audio output of assistant 1002: to read alerts or alarms, or to generate sound indicative of an incoming text message

텍스트 메시지Text message

메리 리차드Mary Richard

오늘 밤에 저녁 식사 같이 할 시간 있니?Do you have time to have dinner tonight?

<닫기 버튼> <답신 버튼><Close button> <Reply button>

· 사용자의 음성 입력: "읽어"· User's voice input: "Read"

· 어시스턴트(1002)의 음성 출력: "메리 리차드로부터의 메시지는 <보이스 변화> 오늘 밤에 저녁 식사 같이 할 시간이 있니?"Voice output from Assistant (1002): "The message from Mary Richards is: <Voice change> Do you have time to have dinner tonight?"

예 9: 텍스트 메시지에 답신하기(예 8로부터 이어짐)Example 9: Replying to a text message (from Example 8)

· 사용자의 음성 입력: "답신해 <일시 정지> 물론, 8시는 어때?"· Voice input by the user: "Reply" <Pause> How about 8:00? "

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 물론 8시는 어때?라는 메시지로 메리 리차드에게 답할 준비가 되었습니다"· Voice of Assistant (1002): "I am ready to answer Mary Richard with the message"

수신인: 메리 리차드To: Mary Richard

물론 8시는 어때 Of course, what about 8:00?

<취소 버튼> <전송 버튼><Cancel button> <Transfer button>

· 사용자의 음성 입력: "전송해"· User's voice input: "Send"

· 어시스턴트(1002)의 오디오 출력: "전송 중입니다": 또는 송출 텍스트 메시지 사운드를 발생시킨다· Audio output of the assistant (1002): "Transmitting": or emits an outgoing text message sound

예 10: 하나의 수신인에게 텍스트 메시지를 전송하기Example 10: Sending a text message to a single recipient

· 사용자의 음성 입력: "내가 15분 늦을거라고 리사에게 말해" · Enter your voice: "Tell Lisa I'll be 15 minutes late"

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 내가 15분 늦을거야라는 메시지로 리사 곤잘레스에게 텍스트를 전송할 준비가 되었습니다"· Voice of Assistant (1002): Ready to send text to Lisa Gonzalez with the message "<Voice Change> I will be 15 minutes late"

텍스트 메시지Text message

수신인: 리사 곤잘레스To: Lisa Gonzalez

내가 15분 늦을거야I'll be 15 minutes late.

<취소 버튼> <전송 버튼><Cancel button> <Transfer button>

· 사용자의 음성 입력: "전송해"· User's voice input: "Send"

· 어시스턴트(1002)의 오디오 출력: "전송 중입니다"; 또는 송출 텍스트 메시지 사운드를 발생시킴· Audio output of Assistant (1002): "Transmitting"; Or send text message sound

예 11: 애매한 하나의 수신인에게 텍스트 메시지를 전송하기Example 11: Sending a text message to an ambiguous recipient

· 사용자의 음성 입력: "회의 장소가 어디니?라는 메시지로 아담에게 텍스트를 보내"· Enter your voice: "Send the text to Adam with the message" Where is the meeting place? "

· 어시스턴트(1002)의 음성 출력: 이름들을 읽는다.Voice output of assistant 1002: Read names.

모호성 해소화 메뉴Clear ambiguity menu

아담 체이어 집Adam Chayer House

아담 샌들러 집Adam Sandler House

아담 스미스 모바일Adam Smith Mobile

· 사용자의 음성 입력: "아담 체이어"· Voice input by the user: "Adam Chayer"

· 어시스턴트(1002)의 음성 출력: "아담 체이어" Voice output of the assistant 1002: "Adam Chayer"

텍스트 메시지Text message

수신인: 아담 체이어To: Adam Chayer

회의 장소가 어디니?Where is the meeting place?

<취소 버튼> <전송 버튼><Cancel button> <Transfer button>

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 회의 장소가 어디니?라는 메시지로 아담 체이어에게 텍스트를 전송할 준비가 되었습니다"· Voice output of assistant (1002): "We are ready to send text to Adam Chayer with the message" Where is the venue change? "

· 사용자의 음성 입력: "전송해"· User's voice input: "Send"

· 어시스턴트(1002)의 오디오 출력: "전송 중입니다"; 또는 송출 텍스트 메시지 사운드를 발생시킨다· Audio output of Assistant (1002): "Transmitting"; Or an outgoing text message sound

예 12: 메시지 앱으로부터 SMS 메시지를 읽기Example 12: Reading an SMS message from a message app

텍스트 메시지Text message

베티 조 비알코브스키Betty Jo Vialkovski

너는 나를 낸시라 부를 수 있어You can call me Nancy.

<닫기 버튼> <답신 버튼><Close button> <Reply button>

· 사용자의 음성 입력: "읽어"· User's voice input: "Read"

· 어시스턴트(1002)의 음성 출력: "베티 조 비알코브스키로부터의 메시지 <보이스 변화> 너는 나를 낸시라 부를 수 있어"· Voice output of assistant (1002): "Message from Betty Joe Vialkovsky <Voice change> You can call me Nancy"

예 13: 메시지 앱의 콘텍스트에서 답신하기(예 12로부터 이어짐)Example 13: Returning from the context of a message app (from Example 12)

· 사용자의 음성 입력: "답신해 <일시 정지> 안녕 낸시"· Input voice of user: "Reply" <Pause> Hello Nancy "

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 안녕 낸시라는 메시지로 베티 조 비알코브스키에게 답할 준비가 되었습니다"· Audio output of Assistant (1002): "<Voice Change> Hello Nancy, I am ready to answer Betty Joe Vialkovsky"

수신인: 베티 조 비알코브스키To: Betty Joe Vialkowski

안녕 낸시Hello Nancy

<취소 버튼> <전송 버튼><Cancel button> <Transfer button>

· 사용자의 음성 입력: "전송해"· User's voice input: "Send"

· 어시스턴트(1002)의 오디오 출력: "전송 중입니다"; 또는 송출 텍스트 메시지 사운드를 발생시킨다.· Audio output of Assistant (1002): "Transmitting"; Or an outgoing text message sound.

예 14: 읽지 않은 모든 SMS 메시지들을 읽기Example 14: Reading All Unread SMS Messages

멜라니 하버(1)Melanie Harbor (1)

베티 조 비알코브스키(3)Betty Joe Vialkowski (3)

오드리 파버(1)Audrey Faber (1)

· 사용자의 음성 입력: "내 새로운 메시지들을 읽어"· Enter your voice: "Read my new messages"

· 어시스턴트(1002)의 음성 출력: "당신은 4개의 새로운 메시지들이 있습니다. 멜라니 하버로부터 <보이스 변화> 저녁 식사 어때? <일시 정지> 베티 조 비알코브스키로부터, 당신은 2개의 메시지들이 있습니다. <일시 정지> 1. <보이스 변화> 내 이름은 맥길입니다. <일시 정지> 2. <보이스 변화> 나는 나 자신을 릴이라 불러요. "<일시 정지> 3. <보이스 변화> 당신은 나를 낸시라 부를 수 있어요"· Voice of Assistant (1002): "You have four new messages from Melanie Harbor. How about dinner?" Pause "From Betty Joe Vialkovsky, you have two messages: Pause> 1. <Change Voice> My name is McGill. <Pause> 2. <Change Voice> I call myself Lil. "<Pause> 3. <Voice change> You call me Nancy I can "

예 15: 메시지 앱의 콘텍스트에서 이름으로 답신하기Example 15: Returning the name from the context of a message app

· 사용자의 음성 입력: "내가 15분 늦을거라고 오드리에게 말해"· Enter your voice: "Tell Audrey I'll be 15 minutes late"

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 내가 15분 늦을거야라는 메시지로 오드리 파버에게 텍스트를 전송할 준비가 되었습니다"· Voice output of Assistant (1002): "I am ready to send text to Audrey Faber in the message" I am going to be 15 minutes late "

텍스트 메시지Text message

수신인: 오드리 파버To: Audrey Faber

내가 15분 늦을거야I'll be 15 minutes late.

<취소 버튼> <전송 버튼><Cancel button> <Transfer button>

· 어시스턴트(1002)의 음성 출력: "전송할까요?"Voice output of assistant (1002): "Do you want to transfer?"

· 사용자의 음성 입력: "그래"· User's voice input: "Yes"

· 어시스턴트(1002) 오디오 출력: "전송 중입니다"; 또는 송출 텍스트 메시지 사운드를 발생시킨다· Assistant (1002) Audio output: "Transferring"; Or an outgoing text message sound

예 16: 다수의 수신인에게 텍스트 메시지를 전송하기Example 16: Sending text messages to multiple recipients

일 실시예에서, 이러한 태스크는 핸즈-프리 콘텍스트에 대한 범주 밖인 것으로 결정된다. 따라서, 어시스턴트(1002)는 모호성 해소화를 위한 탭핑으로 복귀한다In one embodiment, this task is determined to be out of the scope for the hands-free context. Therefore, the assistant 1002 returns to tapping for resolution of ambiguity

· 사용자의 음성 입력: "내가 우리의 중요한 회의에 15분 늦을거야라고 리사, 폴 및 낸시에게 말해"· Enter your voice: "Tell Lisa, Paul and Nancy that I'll be 15 minutes late for our important meeting"

· 어시스턴트(1002)의 음성 출력: "이들 사람들 중에서 선택해주세요"· Voice output of assistant (1002): "Please choose from these people"

· 어시스턴트(1002)의 시각적 출력: 다수의 수신인들을 선택하기 위한 GUIVisual output of the assistant 1002: GUI for selecting a plurality of recipients

· 사용자는 GUI를 통해 수신인들을 선택한다The user selects the recipients via the GUI

· 어시스턴트(1002)의 음성 출력: "<보이스 변화> 내가 우리의 중요한 회의에 15분 늦을거야라는 메시지로 리사 곤잘레스, 폴 마르코스, 오드리 파버에게 텍스트를 전송할 준비가 되었습니다"· Voice output from Assistant (1002): "I am ready to send text to Lisa Gonzalez, Paul Marcos and Audrey Faber in a message that" I'll be 15 minutes late for our important meeting "

텍스트 메시지Text message

수신인: 리사 곤잘레스, 폴 마르코스, 오드리 파버Recipients: Lisa Gonzalez, Paul Marcos, Audrey Faber

내가 우리의 중요한 회의에 15분 늦을거야I'll be 15 minutes late for our important meeting.

<취소 버튼> <전송 버튼><Cancel button> <Transfer button>

· 사용자의 음성 입력: "전송해"· User's voice input: "Send"

경보들 및 Alarms and 리마인더들Reminders 사용 케이스들 Use cases

예 17: 단일 Example 17: Single 리마인더Reminders 경보를 읽기 Read alerts

· 어시스턴트(1002)는 리마인더 톤을 낸다Assistant (1002) gives a reminder tone

리마인더들Reminders

메리 리차드에 전화하기Call Mary Richard

<닫기 버튼> <통화 버튼><Close button> <Call button>

· 사용자의 음성 입력: "읽어"· User's voice input: "Read"

· 어시스턴트(1002)의 음성 출력: "메리 리차드에게 전화하기 위한 리마인더"Voice output of Assistant (1002): "Reminder for calling Mary Richard"

예 18: 리마인더 경보를 실행하기(예 17로부터 이어진다)Example 18: Executing the reminder alarm (following example 17)

· 사용자의 음성 입력: "그녀에게 전화해"· Enter your voice: "Call her"

· 어시스턴트(1002)의 음성 출력: "메리 리차드, 모바일에 전화하는 중입니다"· Voice output of assistant (1002): "Mary Richard, I'm calling on mobile"

· 전화가 걸린다· I get a phone call.

예 19: 경보를 갖는 새로운 리마인더를 생성하기Example 19: Creating a new reminder with an alert

· 사용자의 음성 입력: "오후 5시에 밥 존스에게 전화하는 것을 내게 리마인드해"· Enter your voice: "Remind me to call Bob Jones at 5 pm"

· 어시스턴트(1002)의 음성 출력: "확인해주세요: <보이스 변화> 밥 존스에게 전화하기에 대한 오늘 오후 5시의 리마인더"Voice of Assistant (1002): "Please check: <Voice Change> Reminders at 5pm to call Bob Jones"

리마인더를 확인해주세요Please check your reminder.

밥 존스에게 전화하기Call Bob Jones

리마인더 시간: 오늘 오후 5:00Reminder Time: 5:00 PM today

<취소> <저장><Cancel> <Save>

· 일시 정지 · Pause

· 어시스턴트(1002)의 음성 출력: "맞습니까?"· Voice output of assistant (1002): "Is that correct?"

· 사용자의 음성 입력: "그래, 저장해"· Enter your voice: "Yes, save it"

· 어시스턴트(1002)의 음성 출력: "당신의 리마인더가 저장되었습니다"· Voice output of Assistant (1002): "Your reminder has been saved"

예 20: 어떤 경보 PI도 없는 새로운 리마인더를 생성하기Example 20: Create a new reminder without any alarm PIs

· 사용자의 음성 입력: "내게 빵 사는 것을 리마인드해" 또는 "빵 사는 것을 기억해"· Enter your voice: "Remind me to buy bread" or "Remember to buy bread"

· 어시스턴트(1002)의 음성 출력: "확인해주세요: 빵 사는 것을 기억하기"· Voice output of assistant (1002): "Please confirm: Remember to buy bread"

리마인더를 확인해주세요Please check your reminder.

빵 살 것Buy bread

<취소> <저장><Cancel> <Save>

· 일시 정지· Pause

· 사용자의 음성 입력: "응" · User's voice input: "Yes"

· 어시스턴트(1002)의 음성 출력: "당신의 해야 할 일 리스트에 추가하였습니다"Voice output of Assistant (1002): "Added to your to-do list"

캘린더 사용 케이스들Calendar Use Cases

예 21: 간단한 약속을 생성하기(주어진 설명 및 날짜/시간)Example 21: Creating a simple appointment (given description and date / time)

· 사용자의 음성 입력: "회사 피크닉을 계획하기 위해 내일 2시에 회의 일정을 잡아"· Voice of the user: "Schedule a meeting at 2 o'clock tomorrow to plan a company picnic"

· 어시스턴트(1002)의 음성 출력: "내일 오후 2시에 대한 당신의 캘린더에 그것을 넣겠습니다"· Voice output from Assistant (1002): "I'll put it in your calendar for 2 o'clock tomorrow"

설명: 회사 피크닉을 계획하기Description: Plan a company picnic

언제: 내일 오후 2시When: 2 pm tomorrow

예 22: 간단한 약속을 생성하기(주어진 설명이 없음)Example 22: Creating a simple appointment (no description given)

· 사용자의 음성 입력: "내일 2시에 캐롤과 만날거야"· Enter your voice: "I'll meet Carol at 2 o'clock tomorrow"

설명: 만남Description: Meet

언제: 내일 오후 2시When: 2 pm tomorrow

예 23: 간단한 약속을 생성하기(주어진 시간이 없음)Example 23: Creating a simple appointment (no time given)

· 사용자의 음성 입력: "다음 목요일에 치과 약속"· Voice of the user: "Dental appointment next Thursday"

· 어시스턴트(1002)의 음성 출력: "다음 목요일 몇 시 입니까?"· Audio output of Assistant (1002): "What time is next Thursday?"

· 사용자의 음성 입력: "3시 정각"· Voice input by the user: "3 o'clock"

· 어시스턴트(1002)의 음성 출력: "다음 목요일 오후 3시에 대한 당신의 캘린더에 그것을 넣겠습니다"· Audio output from Assistant (1002): "I'll put it on your calendar for 3pm next Thursday"

설명: 치과 약속Description: Dental appointments

언제: 다음 목요일 - 오후 3시When: Next Thursday - 3:00 pm

예 24: 간단한 약속을 생성하기(어떤 주어진 날짜 또는 시간도 없음)Example 24: Create a simple appointment (no given date or time)

· 사용자의 음성 입력: "회사 피크닉을 계획하기 위한 회의 일정을 잡아"· Voice input by the user: "Schedule a meeting to plan a company picnic"

· 어시스턴트(1002)의 음성 출력: "언제 회의를 하고 싶습니까?"Voice output of assistant (1002): "When do you want to meet?"

· 어시스턴트(1002)의 시각적 출력: 편집 가능한 약속 형태Visual output of the assistant 1002: Editable appointment type

· 사용자의 음성 입력: "화요일 3시 정각"· Voice input by the user: "Tuesday at 3 o'clock"

· 어시스턴트(1002)의 음성 출력: "화요일 오후 3시에 대한 당신의 캘린더에 그것을 넣겠습니다"· Audio output from Assistant (1002): "I'll put it on your calendar for about 3 pm Tuesday"

설명: 회사 피크닉을 계획하기 위한 회의Description: Meeting to plan company picnic

언제: 화요일 - 오후 3시When: Tuesday - 3:00 pm

당업자는 상기 예들이 단지 특정한 상황들에서 핸즈-프리 콘텍스트의 사용을 예시한 것임을 인식할 것이다. 부가적인 사용들은 예를 들어 지도, 음악과 같은 매체의 재생 등을 포함한다.Those skilled in the art will recognize that the above examples merely illustrate the use of a hands-free context in certain situations. Additional uses include, for example, playback of media such as maps, music, and the like.

본 발명은 가능한 실시예들에 대하여 특히 상세히 설명되었다. 당업자들은 본 발명이 다른 실시예들에서 실시될 수 있다는 것을 이해할 것이다. 먼저, 컴포넌트들의 특정한 명명(naming), 용어들의 대문자화, 속성들, 데이터 구조들, 또는 임의의 다른 프로그래밍 또는 구조적 태양은 의무적이거나 중요하지 않으며, 본 발명을 구현하는 메커니즘들 또는 그들의 특징들은 상이한 이름들, 포맷들, 또는 프로토콜들을 가질 수 있다. 또한, 시스템은 설명된 바와 같이, 하드웨어 및 소프트웨어의 조합을 통해, 또는 전체적으로 하드웨어 요소들로, 또는 전체적으로 소프트웨어 요소들로 구현될 수 있다. 또한, 본 명세서에 설명된 다양한 시스템 컴포넌트들 사이에서의 기능의 특정한 분할은 의무적인 것이 아니라 단지 예시적이고, 단일 시스템 컴포넌트에 의해 수행된 기능들은 대신에 다수의 컴포넌트들에 의해 수행될 수 있으며, 다수의 컴포넌트들에 의해 수행된 기능들은 대신에 단일 컴포넌트에 의해 수행될 수 있다. The present invention has been described in detail with respect to possible embodiments. Those skilled in the art will appreciate that the present invention may be practiced in other embodiments. First, the naming of the components, capitalization of terms, attributes, data structures, or any other programming or architectural aspect is not mandatory or critical, and the mechanisms or features implementing the invention may be different names Formats, or protocols. The system may also be implemented as a software component, either as a whole, as a whole, or as a combination of hardware and software, as described. It should also be appreciated that a particular division of functionality between the various system components described herein is merely illustrative and that the functions performed by a single system component may instead be performed by multiple components, The functions performed by the components of the system 100 may instead be performed by a single component.

다양한 실시예들에서, 본 발명은 전술된 기술들을 개별적으로 또는 임의의 조합으로 수행하기 위한 시스템 또는 방법으로서 구현될 수 있다. 다른 실시예에서, 본 발명은 컴퓨팅 디바이스 또는 다른 전자 디바이스에서의 프로세서가 전술된 기술들을 수행하게 하기 위해, 비일시적 컴퓨터-판독 가능한 저장 매체 및 상기 매체 상에 인코딩된, 컴퓨터 프로그램 코드를 포함한 컴퓨터 프로그램 제품으로서 구현될 수 있다.In various embodiments, the present invention may be implemented as a system or method for performing the techniques described above, either individually or in any combination. In another embodiment, the present invention provides a computer program product, comprising computer readable storage medium and computer program code encoded on the medium, for causing a processor in a computing device or other electronic device to perform the techniques described above, Product. &Lt; / RTI >

"일 실시예" 또는 "실시예"에 대한 명세서에서의 참조는 실시예들과 관련되어 설명된 특정한 특징, 구조, 또는 특성이 본 발명의 적어도 하나의 실시예에 포함된다는 것을 의미한다. 명세서에서의 다양한 곳들에서의 구절("일 실시예에서")의 출현들은 반드시 모두 동일한 실시예를 나타내는 것은 아니다.Reference in the specification to "one embodiment" or " an embodiment "means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrases ("in one embodiment") in various places in the specification are not necessarily all referring to the same embodiment.

상기의 일부 부분들은 컴퓨팅 디바이스의 메모리 내에서의 데이터 비트들에 대한 동작들의 알고리즘들 및 심볼 표현들에 관하여 제공된다. 이들 알고리즘 설명들 및 표현들은 데이터 처리 기술 분야의 숙련자들에 의해 사용되어 다른 당업자에게 그들의 작업의 본질을 가장 효과적으로 전달하는 수단이다. 알고리즘은, 본 명세서에서 그리고 일반적으로, 원하는 결과로 이어지는 단계들(명령어들)의 일관성 있는 시퀀스인 것으로 고려된다. 상기 단계들은 물리적인 양들의 물리적인 조작들을 요구하는 것들이다. 보통, 필수적이지는 않지만, 이들 양은 저장, 전달, 조합, 비교 및 달리 조작될 수 있는 전기, 자기, 또는 광학 신호들의 형태를 취한다. 때때로, 주로 공통 사용의 이유들로 인해, 이러한 신호들을 비트들, 값들, 요소들, 심볼들, 글자들, 용어들, 숫자들 등으로서 언급하는 것이 편리하다. 더욱이, 때때로, 일반성의 손실 없이, 모듈들 또는 코드 디바이스들로서 물리적 양들의 물리적 조작들을 요구하는 단계들의 소정의 배열들을 언급하는 것이 또한 편리하다.Some of the above portions are provided with respect to algorithms and symbol representations of operations on data bits in a memory of a computing device. These algorithmic descriptions and representations are used by those skilled in the data processing arts to most effectively convey the essence of their work to others skilled in the art. An algorithm is considered herein to be a consistent sequence of steps (instructions) leading to a desired result and in general. These steps are those that require physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic, or optical signals that can be stored, transferred, combined, compared, and otherwise manipulated. Sometimes it is convenient to refer to these signals as bits, values, elements, symbols, letters, terms, numbers, etc., mainly for reasons of common usage. Moreover, it is sometimes also convenient to mention certain arrangements of steps that require physical manipulations of physical quantities as modules or code devices, without loss of generality.

그러나, 이들 및 유사한 용어들 모두는 적절한 물리적 양들과 연관되며 단지 이들 양에 적용된 편리한 라벨들임을 명심해야 한다. 만일 다음의 논의로부터 명백한 바와 같이 달리 구체적으로 서술되지 않는다면, 설명 전체에 걸쳐, "처리하는" 또는 "컴퓨팅하는" 또는 "계산하는" 또는 "표시하는" 또는 "결정하는" 등과 같은 용어들을 이용하는 논의들은 컴퓨터 시스템 메모리들 또는 레지스터들 또는 다른 이러한 정보 저장, 송신 또는 디스플레이 디바이스들 내에서의 물리적 (전자) 양들로서 표현된 데이터를 조작 및 변형하는 컴퓨터 시스템, 또는 유사한 전자 컴퓨팅 모듈 및/또는 디바이스의 동작 및 프로세스들을 나타낸다는 것이 이해된다.It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Quot; or " computing "or" displaying "or" determining ", etc. throughout the description, unless explicitly stated otherwise, as apparent from the following discussion Quot; are intended to encompass all types of computer systems and / or devices that operate on and modify data represented as physical (electronic) quantities within computer system memories or registers or other such information storage, transmission or display devices, or similar electronic computing modules and / And processes.

본 발명의 소정의 태양들은 알고리즘의 형태로 본 명세서에 설명된 프로세스 단계들 및 명령어들을 포함한다. 본 발명의 프로세스 단계들 및 명령어들은 소프트웨어, 펌웨어, 및/또는 하드웨어로 구체화될 수 있으며, 소프트웨어로 구체화될 때, 다양한 운영 시스템들에 의해 사용된 상이한 플랫폼들 상에 존재하도록 다운로딩되며 그 플랫폼들로부터 동작될 수 있다는 것에 주목하여야 한다.Certain aspects of the invention include process steps and instructions described herein in the form of algorithms. The process steps and instructions of the present invention may be embodied in software, firmware, and / or hardware and, when embodied in software, are downloaded to be on different platforms used by various operating systems, Lt; / RTI >

본 발명은 또한 본 발명에서의 동작들을 수행하기 위한 장치와 관련된다. 이러한 장치는 요구된 목적들을 위해 특별하게 구성될 수 있거나, 컴퓨팅 디바이스에 저장된 컴퓨터 프로그램에 의해 선택적으로 활성화되거나 재구성된 범용 컴퓨팅 디바이스를 포함할 수 있다. 그러한 컴퓨터 프로그램은, 이로 한정되지 않는, 플로피 디스크들, 광 디스크들, CD-ROM들, 자기-광학 디스크들을 비롯한 임의의 유형의 디스크, ROM들, RAM들, EPROM들, EEPROM들, 자기 또는 광학 카드들, ASIC들, 또는 전자 명령어들을 저장하기에 적합하고 컴퓨터 시스템 버스에 각각 결합된 임의의 유형의 매체와 같은 컴퓨터-판독 가능한 저장 매체에 저장될 수 있다. 또한, 본 명세서에 참조된 컴퓨팅 디바이스들은 단일 프로세서를 포함할 수 있거나 증가된 컴퓨팅 능력을 위한 다수의 프로세서 설계들을 이용하는 아키텍처들일 수 있다.The present invention also relates to an apparatus for performing operations in the present invention. Such a device may be specially configured for the required purposes, or it may comprise a general purpose computing device selectively activated or reconfigured by a computer program stored on the computing device. Such computer programs may include but are not limited to any type of disk, including floppy disks, optical disks, CD-ROMs, magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, Cards, ASICs, or any type of medium suitable for storing electronic instructions and coupled to a computer system bus, respectively. In addition, the computing devices referred to herein may include a single processor or architectures that utilize multiple processor designs for increased computing power.

본 명세서에 제공된 알고리즘들 및 디스플레이들은 임의의 특정한 컴퓨팅 디바이스, 가상화 시스템, 또는 다른 장치와 본질적으로 관련되지 않는다. 다양한 범용 시스템들이 본 명세서에서의 교시들에 따라 프로그램들과 함께 또한 사용될 수 있거나, 요구된 방법 단계들을 수행하기 위해 보다 특수화된 장치를 구성하는 것이 편리한 것으로 입증될 수 있다. 다양한 이들 시스템들을 위해 요구된 구조는 본 명세서에 제공된 설명으로부터 명백할 것이다. 게다가, 본 발명은 임의의 특정한 프로그래밍 언어와 관련하여 설명되지 않는다. 다양한 프로그래밍 언어들이 본 명세서에 설명된 바와 같은 본 발명의 교시들을 구현하기 위해 사용될 수 있으며, 특정한 언어들에 대한 상기 임의의 참조들은 본 발명의 구현 및 최상 모드의 개시를 위해 제공된다는 것이 이해될 것이다.The algorithms and displays provided herein are not inherently related to any particular computing device, virtualization system, or other device. Various general purpose systems may also be used with the programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. The architecture required for a variety of these systems will be apparent from the description provided herein. In addition, the present invention is not described with respect to any particular programming language. It will be appreciated that various programming languages may be used to implement the teachings of the invention as described herein and that any reference to particular languages is provided for the implementation of the invention and for the initiation of the best mode .

따라서, 다양한 실시예들에서, 본 발명은 컴퓨터 시스템, 컴퓨팅 디바이스, 또는 다른 전자 디바이스, 또는 이들 또는 복수개의 이들의 임의의 조합을 제어하기 위한 소프트웨어, 하드웨어, 및/또는 다른 요소들로서 구현될 수 있다. 그러한 전자 디바이스는, 당업계에 잘 알려진 기술들에 따라, 예를 들어, 프로세서, 입력 디바이스(예를 들어, 키보드, 마우스, 터치패드, 트랙패드, 조이스틱, 트랙볼, 마이크로폰, 및/또는 이들의 임의의 조합), 출력 디바이스(예를 들어, 스크린, 스피커 및/또는 기타 등등), 메모리, 장기 저장 장치(예를 들어, 자기 저장 장치, 광학 저장 장치, 및/또는 기타 등등), 및/또는 네트워크 연결성을 포함할 수 있다. 그러한 전자 디바이스는 휴대 가능하거나 휴대 가능하지 않을 수 있다. 본 발명을 구현하기 위해 사용될 수 있는 전자 디바이스들의 예들은 이동 전화기, 개인 휴대 정보 단말기, 스마트폰, 키오스크, 데스크탑 컴퓨터, 랩탑 컴퓨터, 태블릿 컴퓨터, 소비자 전자 디바이스, 소비자 엔터테인먼트 디바이스; 음악 플레이어; 카메라; 텔레비전; 셋톱 박스; 전자 게임 유닛; 등을 포함한다. 본 발명을 구현하기 위한 전자 디바이스는, 예를 들어 미국 캘리포니아주 쿠퍼티노의 애플, 인크.로부터 입수 가능한, iOS 또는 MacOS와 같은 임의의 운영 시스템, 또는 디바이스 상에서의 사용을 위해 적응되는 임의의 다른 운영 시스템을 사용할 수 있다.Thus, in various embodiments, the invention may be implemented as software, hardware, and / or other components for controlling a computer system, a computing device, or other electronic device, or any combination of these or a plurality thereof . Such an electronic device may be, for example, a processor, an input device (e.g., a keyboard, a mouse, a touchpad, a trackpad, a joystick, a trackball, a microphone, and / A storage device (e.g., a magnetic storage device, an optical storage device, and / or the like), and / or a storage device Connectivity can be included. Such an electronic device may not be portable or portable. Examples of electronic devices that can be used to implement the invention include mobile phones, personal digital assistants, smart phones, kiosks, desktop computers, laptop computers, tablet computers, consumer electronic devices, consumer entertainment devices; Music player; camera; television; Set top box; An electronic game unit; And the like. An electronic device for implementing the invention may be any operating system, such as iOS or MacOS, available from Apple Inc. of Cupertino, California, USA, or any other operating system adapted for use on a device The system can be used.

본 발명은 제한된 수의 실시예들에 대하여 설명되었지만, 상기 설명의 이익을 갖는 당업자는 본 명세서에 설명된 바와 같은 본 발명의 범주로부터 벗어나지 않는 다른 실시예들이 고안될 수 있다는 것을 이해할 것이다. 게다가, 본 명세서에 사용된 언어는 주로 가독성 및 설명 목적들을 위해 선택되며, 본 발명의 요지를 상세히 기술하거나 제한하기 위해 선택되지 않을 수 있다는 것에 주목하여야 한다. 따라서, 본 발명의 개시는 특허청구범위에 기재된 본 발명의 범주를 제한하는 것이 아닌 예시적인 것으로 의도된다.While the present invention has been described with respect to a limited number of embodiments, those skilled in the art having the benefit of the foregoing description will appreciate that other embodiments may be devised which do not depart from the scope of the invention as described herein. In addition, it should be noted that the language used herein is primarily selected for readability and illustrative purposes, and may not be selected to describe or limit the gist of the invention in detail. Accordingly, the disclosure of the present invention is intended to be illustrative rather than limiting the scope of the invention as set forth in the claims.

Claims

A computer-implemented method for interpreting user input to perform a task on a computing device having at least one processor,
Detecting, at the processor, whether a hands-free context is active;
At the output device, prompting the user for input;
At the input device, receiving user input;
In the processor, interpreting the received user input to derive a representation of the user's intention;
Identifying at least one task and at least one parameter for the task based at least in part on the derived representation of the user intention;
In the processor, executing the at least one task using the at least one parameter to derive a result;
Generating, in the processor, a dialog response based on the derived result; And
In the output device, outputting the generated dialog response
/ RTI >
In response to detecting that the device is in a hands-free context, prompting the user for the input, receiving the user input, interpreting the received user input, Wherein at least one of the steps of: identifying at least one parameter for the task, and generating the dialog response is performed in a manner that complies with the constraints associated with the hands-free context. .

2. The method of claim 1, wherein at least two interaction modes are available for user interaction with the computing device,
In response to detecting that the device is in a hands-free context, prompting the user for the input, receiving the user input, interpreting the received user input, Wherein at least one of the steps of identifying at least one parameter for the task and generating the dialog response is performed using a first interaction mode adapted for a hands-
In response to detecting that the device is not in a hands-free context, prompting the user for the input, receiving the user input, interpreting the received user input, And identifying at least one parameter for the task, and wherein generating at least one of the steps of generating the dialog response is performed using a second interaction mode that is not adapted to the hands-free operation. A computer implemented method for.

3. The method of claim 1 or 2, wherein detecting whether the hands-free context is active comprises:
A condition that indicates whether the computing device is located in a geographic location;
A condition indicating whether the current time is within a time of day;
A condition that indicates whether the computing device is connected to another device;
A condition that indicates whether the computing device is moving beyond a critical speed;
A condition that indicates whether the computing device is near a user; And
A condition indicating the presence of a specific word in the received user input
And detecting a condition selected from the group consisting of: < RTI ID = 0.0 > a < / RTI >

3. The method of claim 1 or claim 2, wherein prompting the user for the input comprises:
In response to detecting that the device is not in a hands-free context, prompting the user through a first output mode that is not adapted to the hands-free context; And
In response to detecting that the device is in a hands-free context, prompting the user through a second output mode adapted to the hands-free context
Wherein the computer-readable medium comprises computer-executable instructions for:

5. The method of claim 4,
Wherein prompting the user through the first output mode comprises prompting the user through a visual output mode,
Wherein prompting the user through the second output mode comprises prompting the user through an audible output mode.

6. The method of claim 5,
Wherein prompting the user through the visual output mode comprises displaying a prompt on a display screen,
Wherein prompting the user through the audible output mode comprises outputting a spoken prompt. &Lt; Desc / Clms Page number 21 >

3. The method of claim 1 or 2, wherein interpreting the received user input comprises: in response to detecting that the device is in a hands-free context, using the vocabulary associated with the hands- And interpreting the user input.

The method of claim 1 or 2, wherein identifying at least one task and at least one parameter for the task comprises: in response to detecting that the device is in a hands-free context, And performing at least one task flow identification step.

delete

3. The method of claim 1 or claim 2, wherein generating the dialog response comprises generating a dialog response in a speech-based output mode in response to detecting that the device is in a hands- A computer implemented method for interpreting an input.

delete

2. The method of claim 1, wherein detecting whether the hands-free context is active comprises:
Receiving a user input specifying a hands-free context;
Receiving data indicative of environmental conditions associated with the hands-free context from at least one sensor;
Detecting a connection of a peripheral device associated with the hands-free context;
Detecting a connection release of a peripheral device not associated with the hands-free context;
Detecting communication with an onboard system of the vehicle;
Detecting a current position; And
The step of detecting the current speed
And at least one step selected from the group consisting of: < RTI ID = 0.0 > a < / RTI >

delete

A computer program product for storing a computer program for interpreting user input to perform a task on a computing device having at least one processor,
The computer program comprising:
At least one processor,
Detecting whether the hands-free context is active,
Causing the output device to prompt the user for input,
Receiving user input via an input device,
Interpreting the received user input to derive a representation of the user's intent;
Identifying at least one task and at least one parameter for the task based at least in part on an expression of the derived user intention,
Executing the at least one task using the at least one parameter to derive a result,
Generating a dialog response based on the derived result, and
Causing the output device to output the generated dialog response
, &Lt; / RTI >
Responsive to detecting that the device is in a hands-free context, the computer program further comprises at least one processor prompting the user for the input, receiving the user input, interpreting the received user input Identifying at least one task and at least one parameter for the task, and generating the dialog response in a manner that complies with the constraints associated with the hands-free context The computer program product comprising: a computer readable medium;

21. The method of claim 20, wherein at least two interaction modes are available for user interaction with the computing device,
Responsive to detecting that the device is in a hands-free context, the computer program further comprises at least one processor prompting the user for the input, receiving the user input, interpreting the received user input Identifying at least one task and at least one parameter for the task, and generating the dialog response using a first interaction mode adapted for a hands-free operation Lt; / RTI >
In response to detecting that the device is not in a hands-free context, the computer program causes the at least one processor to prompt the user for the input, receive the user input, Identifying at least one task and at least one parameter for the task, and generating the dialog response using a second interaction mode that is not adapted to the hands-free operation The computer program product comprising: a computer readable medium;

22. The computer program according to claim 20 or 21, wherein the at least one processor is configured to detect whether a hands-free context is active,
At least one processor,
A condition that indicates whether the computing device is located in a geographic location;
A condition indicating whether the current time is within a day's time;
A condition that indicates whether the computing device is connected to another device;
A condition that indicates whether the computing device is moving beyond a critical speed;
A condition that indicates whether the computing device is near a user; And
A condition indicating the presence of a specific word in the received user input
And a computer program product configured to detect a condition selected from the group consisting of: < RTI ID = 0.0 > a < / RTI >

22. The computer program according to claim 20 or 21, wherein the output device is configured to prompt the user for input,
A computer program configured to cause the output device to prompt the user through a first output mode that is not adapted to the hands-free context, in response to detecting that the device is not in a hands-free context; And
Wherein the output device is configured to prompt the user via a second output mode adapted to the hands-free context in response to detecting that the device is in a hands-free context.
Readable medium.

22. The computer program according to claim 20 or 21, wherein the at least one processor is configured to cause the received user input to be interpreted by the at least one processor in response to detecting that the device is in a hands- And a computer program configured to interpret the received user input using a vocabulary associated with a hands-free operation.

22. The computer program according to claim 20 or 21, wherein the at least one processor is configured to identify at least one task and at least one parameter for the task, wherein the at least one processor comprises: Readable medium having computer-executable instructions configured to perform at least one task flow identification step associated with a hands-free operation, in response to detecting a presence in a context.

22. The computer program product of claim 20 or claim 21, wherein the at least one processor is configured to cause a dialog response to be generated, wherein the at least one processor is configured to: A computer program configured to generate a dialog response in an output mode.

21. The computer program product of claim 20, wherein the at least one processor is configured to detect whether a hands-free context is active,
At least one processor,
Receiving user input specifying a hands-free context;
Receiving data indicative of environmental conditions associated with the hands-free context from at least one sensor;
Detecting a connection of a peripheral device associated with the hands-free context;
Detecting disconnection of peripheral devices not associated with the hands-free context;
Detecting communication with the onboard system of the vehicle;
Detecting the current position; And
Detecting the current speed
&Lt; / RTI > wherein the computer program is adapted to perform at least one selected from the group consisting of: < RTI ID = 0.0 >

28. The method according to any one of claims 20, 21 and 27,
The computer program configured to cause an output device to prompt the user includes a computer program configured to cause the output device to prompt the user via a dialog interface,
The computer program configured to cause at least one processor to receive user input, the computer program comprising a computer program configured to cause at least one processor to receive user input via the interactive interface.

delete

A system for interpreting user input to perform a task on a computing device,
An output device configured to prompt the user for input;
An input device configured to receive user input;
Determining whether a hands-free context is active, communicatively coupled to the output device and to the input device, interpreting the received user input to derive a representation of the user's intent, Identifying at least one task and at least one parameter for the task based at least in part on an expression of an intention, executing the at least one task using the at least one parameter to derive a result And at least one processor configured to perform a step of generating a dialog response based on the derived result
/ RTI >
Wherein the output device is further configured to output the generated dialog response,
In response to detecting that the device is in a hands-free context, prompting the user for the input, receiving the user input, interpreting the received user input, Wherein at least one of identifying at least one parameter for the task and generating the dialog response is performed in a manner that complies with the constraints associated with the hands-free context.

32. The method of claim 31, wherein at least two interaction modes are available for user interaction,
In response to detecting that the device is in a hands-free context, prompting the user for the input, receiving the user input, interpreting the received user input, Wherein at least one of identifying at least one parameter for the task and generating the dialog response is performed using a first interaction mode adapted for a hands-
In response to detecting that the device is not in a hands-free context, prompting the user for the input, receiving the user input, interpreting the received user input, And interpreting user input, wherein at least one of identifying at least one parameter for the task and generating the dialog response is performed using a second interaction mode that is not adapted to a hands-free operation For the system.

33. The system of claim 31 or 32, wherein the at least one processor comprises:
A condition that indicates whether the computing device is located in a geographic location;
A condition indicating whether the current time is within a day's time;
A condition that indicates whether the computing device is connected to another device;
A condition that indicates whether the computing device is moving beyond a critical speed;
A condition that indicates whether the computing device is near a user; And
A condition indicating the presence of a specific word in the received user input
To determine whether a hands-free context is active by detecting a condition selected from the group comprising: < RTI ID = 0.0 > a < / RTI >

33. The apparatus as claimed in claim 31 or 32,
In response to detecting that the device is not in a hands-free context, by prompting the user through a first output mode that is not adapted to the hands-free context, and
In response to detecting that the device is in a hands-free context, by prompting the user through a second output mode adapted to the hands-free context,
A system for interpreting user input, the system configured to prompt a user for input.

35. The method of claim 34,
Wherein the first output mode includes a visual output mode,
Wherein the second output mode comprises an audible output mode,
Wherein the output device comprises:
A display screen configured to prompt the user via the visual output mode, and
A speaker configured to prompt the user through the audible output mode;
The system comprising: means for receiving a user input;

36. The method of claim 35,
Wherein the display screen is configured to display a visual prompt,
Wherein the speaker is configured to output a voice prompt.

33. The method of claim 31 or 32, wherein in response to detecting that the device is in a hands-free context, the at least one processor is configured to interpret the received user input using a vocabulary associated with a hands- , A system for interpreting user input.

33. The method of claim 31 or 32, wherein in response to detecting that the device is in a hands-free context, the at least one processor performs at least one task flow identification step associated with a hands- And to identify at least one parameter for the task.

33. The method of claim 31 or 32, wherein in response to detecting that the device is in a hands-free context, the at least one processor is configured to generate the dialog response in a speech- For the system.

33. The system of claim 31 or 32, wherein the at least one processor comprises:
User input specifying a hands-free context;
Data from at least one sensor indicative of an environmental condition associated with the hands-free context;
Connection of peripheral devices associated with the hands-free context;
Disconnection of peripheral devices not associated with the hands-free context;
Communication with the vehicle's onboard system;
Current location; And
Current speed
Free context is active based on at least one selected from the group consisting of: < RTI ID = 0.0 > a < / RTI >

delete