KR20190074344A

KR20190074344A - Dialogue processing apparatus and dialogue processing method

Info

Publication number: KR20190074344A
Application number: KR1020170175608A
Authority: KR
Inventors: 이정엄; 신동수; 김선아
Original assignee: 현대자동차주식회사; 기아자동차주식회사
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2019-06-28
Also published as: KR102371513B1

Abstract

One aspect of the present invention provides a dialog system and a dialog processing method which provide an efficient service by determining when the dialog system intervenes in a dialog at a time and respond to the speech of any of users in a situation where the users are present in a vehicle. According to one embodiment, the dialog system comprises: a storage unit storing at least one user information; a first sensor unit obtaining speech information on at least one user; a second sensor unit obtaining image information on at least one user; and a dialog management device determining a reference user uttering an input start signal among at least one user based on at least one of the user information, the speech information, and the image information and determining an action to be performed by the vehicle based on at least one command included in the speech of the reference user.

Description

[0001] DIALOGUE PROCESSING APPARATUS AND DIALOGUE PROCESSING METHOD [0002]

개시된 발명은 사용자와의 대화를 통해 사용자의 의도를 파악하고 사용자에게 필요한 정보나 서비스를 제공하는 대화 시스템, 이를 포함하는 차량 및 대화 처리 방법에 관한 것이다.The disclosed invention relates to a conversation system that grasps a user's intention through a conversation with a user and provides information or services necessary for a user, a vehicle including the same, and a conversation processing method.

차량용 AVN이나 대부분의 모바일 기기는 작은 화면 및 작은 버튼으로 인해, 사용자에게 시각적인 정보를 제공하거나 사용자의 입력을 수신함에 있어 불편함이 발생할 수 있다. Car AVNs, or most mobile devices, can be inconvenient to provide visual information to users or receive user input due to small screens and small buttons.

특히, 사용자가 운전 중 시각적인 정보를 확인하거나 기기를 조작하기 위해, 시선을 이동시키고 스티어링 휠에서 손을 떼는 것은 안전 운전에 위협이 되는 요소로 작용한다. In particular, shifting the gaze and releasing the steering wheel as a means for the user to visually identify information or operate the device during operation is a threat to safe driving.

한편, 차량 내 여러 명이 탑승하여 하나의 대화 시스템에 명령을 입력 할 경우, 즉 다수의 사용자가 존재하는 상황에서 대화의 주도권이 시간에 따라 변경될 수 있다. 이와 같은 경우 대화 시스템이 어느 시점에 개입하고 누구의 말에 응대를 해야 하는지 판단, 결정하는 기능이 필요하게 된다.On the other hand, when a plurality of people in a vehicle ride on a vehicle and input commands to one conversation system, that is, when there are a plurality of users, the initiative of conversation can be changed over time. In such a case, it is necessary to have a function of judging and determining at what point the conversation system should intervene and respond to who's speech.

한편 이러한 기능을 기초로 여러 명의 탑승자가 존재하는 상황에서도 사용자와의 대화를 통해 사용자의 의도를 파악하고, 사용자에게 필요한 서비스를 제공하는 대화 시스템이 차량에 적용될 경우 보다 안전하고 편리하게 서비스를 제공할 수 있을 것으로 기대된다.On the other hand, based on these functions, even when a plurality of passengers are present, it is possible to grasp the intention of the user through conversation with the user and provide a safe and convenient service when the conversation system providing the necessary service to the user is applied to the vehicle It is expected to be possible.

본 발명의 일 측면은 차량의 다수의 사용자가 존재하는 상황에서 대화 시스템이 어느 시점에 개입하고 어떤 사용자의 말에 응답을 실시하는지 결정하여 효율적인 서비스를 제공하는 대화 시스템 및 대화 처리 방법을 제공한다.One aspect of the present invention provides an interactive system and a method for processing a conversation that provide efficient services by determining when a conversation system intervenes in a situation where a plurality of users of the vehicle exist and responding to a user's speech.

일 실시예에 따른 대화 시스템은, 적어도 한 명의 사용자 정보를 저장하는 저장부; 상기 적어도 한 명의 사용자의 발화 정보를 획득하는 제1센서부; 상기 적어도 한 명의 사용자의 영상 정보를 획득하는 제2센서부; 및 상기 사용자 정보, 상기 발화 정보 및 상기 영상 정보 중 적어도 하나를 기초로, 상기 적어도 한 명의 사용자 중 입력 개시 신호를 발화하는 기준 사용자를 판단하고, 상기 기준 사용자의 발화에 포함된 적어도 하나의 명령어를 기초로 차량이 수행할 액션을 결정하는 대화관리기;를 포함한다.An interactive system according to an exemplary embodiment includes: a storage unit for storing at least one user information; A first sensor unit for obtaining speech information of the at least one user; A second sensor unit for acquiring image information of the at least one user; And determining at least one user among the at least one user based on at least one of the user information, the ignition information, and the image information, and determining at least one command included in the reference user's utterance And a conversation manager for deciding an action to be performed by the vehicle based on the conversation.

상기 저장부는, 상기 적어도 한 명의 사용자의 기준 위치 정보를 저장하고,Wherein the storage unit stores reference position information of the at least one user,

상기 대화관리기는, 상기 발화 정보를 기초로 상기 적어도 한 명의 사용자의 현재 위치 정보를 도출하고, 상기 현재 위치 정보를 상기 기준 위치 정보에 대응하여 상기 기준 사용자를 판단할 수 있다.The dialogue manager may derive current position information of the at least one user based on the utterance information and determine the reference user corresponding to the reference position information.

상기 저장부는, 상기 적어도 한 명의 사용자의 기준 음성 정보를 저장하고,Wherein the storage unit stores reference voice information of the at least one user,

상기 대화관리기는, 상기 발화 정보를 상기 기준 음성 정보에 대응하여 상기 기준 사용자를 판단할 수 있다.The dialogue manager may determine the reference user corresponding to the reference speech information with the speech information.

상기 제1센서부는, 차량에 내부에 위치하고 빔포밍(Beamforming) 신호를 수신하는 적어도 하나의 마이크를 포함하고,Wherein the first sensor unit includes at least one microphone located inside the vehicle and receiving a beamforming signal,

상기 대화관리기는, 상기 적어도 하나의 마이크가 출력하는 상기 빔포밍 신호를 기초로 상기 적어도 한 명의 사용자의 현재 위치 정보를 도출할 수 있다.The dialogue manager may derive current position information of the at least one user based on the beamforming signal output by the at least one microphone.

상기 제1센서부는, 상기 입력 개시 신호의 발화 시점 이전의 상기 적어도 한 명의 사용자의 과거 발화 정보를 획득하고,Wherein the first sensor unit acquires past speech information of the at least one user before a point of utterance of the input start signal,

상기 대화관리기는, 상기 과거 발화 정보를 기초로 상기 사용자 정보를 생성하여 저장할 수 있다.The dialogue manager may generate and store the user information based on the past speech information.

상기 대화관리기는, 상기 영상 정보를 기초로 상기 적어도 한 명의 사용자의 위치 정보를 도출할 수 있다.The dialogue manager may derive location information of the at least one user based on the image information.

상기 제2센서부는, 상기 적어도 한 명의 사용자의 입술 주변부를 포함하는 안면 영상 정보를 획득하고,Wherein the second sensor unit obtains facial image information including the lip periphery of the at least one user,

상기 대화관리기는, 상기 적어도 한 명의 사용자의 입술 움직임을 기초로 상기 입력 개시 신호의 발화 여부를 판단하여, 상기 기준 사용자를 판단할 수 있다.The dialogue manager may determine whether the input start signal is ignited based on the lip movement of the at least one user to determine the reference user.

일 실시예에 따른 대화 처리 방법은 적어도 한 명의 사용자 정보를 저장하고,The dialog processing method according to one embodiment stores at least one user information,

상기 적어도 한 명의 사용자의 발화 정보를 획득하고, 상기 적어도 한 명의 사용자의 영상 정보를 획득하고, 상기 사용자 정보, 상기 발화 정보 및 상기 영상 정보 중 적어도 하나를 기초로, 상기 적어도 한 명의 사용자 중 입력 개시 신호를 발화하는 기준 사용자를 판단하고, 상기 기준 사용자의 발화에 포함된 적어도 하나의 명령어를 기초로 차량이 수행할 액션을 결정하는 것을 포함한다.Acquiring image information of the at least one user and generating input information of the at least one user based on at least one of the user information, Determining a reference user to fire a signal, and determining an action to be performed by the vehicle based on at least one instruction included in the reference user's utterance.

상기 적어도 한 명의 사용자 정보를 저장하는 것은, 상기 적어도 한 명의 사용자의 기준 위치 정보를 저장하는 것을 포함하고,Wherein storing the at least one user information comprises storing reference location information of the at least one user,

상기 기준 사용자를 판단하는 것은, 상기 발화 정보를 기초로 상기 적어도 한 명의 사용자의 현재 위치 정보를 도출하고, 상기 현재 위치 정보를 상기 기준 위치 정보에 대응하여 상기 기준 사용자를 판단하는 것을 포함할 수 있다.Determining the reference user may include deriving current position information of the at least one user based on the ignition information and determining the reference user corresponding to the reference position information with the current position information .

상기 적어도 한 명의 사용자 정보를 저장하는 것은, 상기 적어도 한 명의 사용자의 기준 음성 정보를 저장하는 것을 포함하고,Wherein storing the at least one user information comprises storing reference voice information of the at least one user,

상기 기준 사용자를 판단하는 것은, 상기 발화 정보를 상기 기준 음성 정보에 대응하여 상기 기준 사용자를 판단하는 것을 포함할 수 있다.The determination of the reference user may include determining the reference user corresponding to the reference speech information.

상기 적어도 한 명의 사용자의 발화 정보를 획득하는 것은, 차량에 내부에 위치하고 빔포밍(Beamforming) 신호를 수신하는 것을 포함하고,Obtaining the at least one user's speech information includes receiving a beamforming signal located within the vehicle,

상기 기준 사용자를 판단하는 것은, 상기 빔포밍 신호를 기초로 상기 적어도 한 명의 사용자의 현재 위치 정보를 도출하는 것을 포함할 수 있다.Determining the reference user may include deriving current position information of the at least one user based on the beamforming signal.

상기 적어도 한 명의 사용자의 발화 정보를 획득하는 것은, 상기 입력 개시 신호의 발화 시점 이전의 상기 적어도 한 명의 사용자의 과거 발화 정보를 획득하는 것을 포함하고,Wherein obtaining the speech information of the at least one user includes obtaining past speech information of the at least one user prior to the speech start time of the input start signal,

상기 적어도 한 명의 사용자 정보를 저장하는 것은, 상기 과거 발화 정보를 기초로 상기 사용자 정보를 생성하여 저장하는 것을 포함할 수 있다.Storing the at least one user information may include generating and storing the user information based on the past speech information.

일 실시예에 따른 대화 처리 방법은 상기 영상 정보를 기초로 상기 적어도 한 명의 사용자의 위치 정보를 도출하는 것을 더 포함할 수 있다.The method may further include deriving location information of the at least one user based on the image information.

상기 적어도 한 명의 사용자의 영상 정보를 획득하는 것은,Obtaining the image information of the at least one user comprises:

상기 적어도 한 명의 사용자의 입술 주변부를 포함하는 안면 영상 정보를 획득하는 것을 더 포함하고,Further comprising obtaining facial image information comprising a lip periphery of the at least one user,

상기 기준 사용자를 판단하는 것은, 상기 적어도 한 명의 사용자의 입술 움직임을 기초로 상기 입력 개시 신호의 발화 여부를 판단하여, 상기 기준 사용자를 판단하는 것을 포함할 수 있다.The determining of the reference user may include determining whether the input start signal is ignited based on the lip motion of the at least one user and determining the reference user.

일 실시예에 따른 대화 시스템 및 대화 처리 방법은, 차량의 다수의 사용자가 존재하는 상황에서 대화 시스템이 어느 시점에 개입하고 어떤 사용자의 말에 응답을 실시하는지 결정하여 효율적인 서비스를 제공할 수 있다.The dialog system and the dialog processing method according to an embodiment can provide an efficient service by determining at what point the conversation system intervenes in a situation where there are a plurality of users of the vehicle and which user's response is performed.

도1은 일 실시예에 따른 대화 시스템의 제어 블록도이다.
도2는 차량 내부의 구성을 나타낸 도면이다.
도3은 일 실시예에 따른 발화 정보를 획득하는 동작을 나타낸 도면이다.
도4는 일 실시예에 따른 빔포밍 신호를 출력하는 동작을 나타낸 도면이다.
도5는 일 실시예에 따른 영상 정보를 획득하는 동작을 나타낸 도면이다.
도6은 일 실시예에 따른 입술의 움직을 기초로 입력 개시 신호를 도출하는 동작을 설명하기 위한 도면이다.
도7은 일 실시예에 따른 순서도이다.1 is a control block diagram of an interactive system according to one embodiment.
2 is a view showing a configuration inside the vehicle.
3 is a diagram illustrating an operation for obtaining speech information according to an embodiment.
4 is a diagram illustrating an operation of outputting a beamforming signal according to an embodiment.
5 is a flowchart illustrating an operation of acquiring image information according to an exemplary embodiment of the present invention.
6 is a diagram for explaining an operation of deriving an input start signal based on the movement of the lips according to an embodiment.
7 is a flowchart according to an embodiment.

명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 명세서가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 발명이 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 '부, 모듈, 부재, 블록'이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의'부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다.Like reference numerals refer to like elements throughout the specification. The present specification does not describe all elements of the embodiments, and redundant description between general contents or embodiments in the technical field of the present invention will be omitted. The term 'part, module, member, or block' used in the specification may be embodied in software or hardware, and a plurality of 'part, module, member, and block' may be embodied as one component, It is also possible that a single 'part, module, member, block' includes a plurality of components.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only the case directly connected but also the case where the connection is indirectly connected, and the indirect connection includes connection through the wireless communication network do.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Also, when an element is referred to as "comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.The singular forms " a " include plural referents unless the context clearly dictates otherwise.

각 단계들에 붙여지는 부호는 각 단계들을 식별하기 위해 사용되는 것으로 이들 부호는 각 단계들 상호 간의 순서를 나타내는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다. The codes attached to each step are used to identify each step, and these codes do not indicate the order of the steps, and each step is different from the sequence specified unless explicitly stated in the context. .

이하 첨부된 도면들을 참고하여 대화 시스템(100), 이를 포함하는 차량 및 대화 처리 방법의 실시예를 상세하게 설명한다. Hereinafter, an embodiment of the dialog system 100, the vehicle including the same, and the dialogue processing method will be described in detail with reference to the accompanying drawings.

일 실시예에 따른 대화 시스템(100)은 사용자의 음성 및 음성 외 입력을 이용하여 사용자의 의도를 파악하고 사용자의 의도에 적합한 서비스 또는 사용자에게 필요한 서비스를 제공하는 장치로서, 서비스 제공의 일 수단 또는 사용자의 의도를 명확히 파악하기 위한 일 수단으로 시스템 발화를 출력함으로써 사용자와 대화를 수행할 수 있다. The conversation system 100 according to an exemplary embodiment is a device for identifying a user's intention using a user's voice and voice input and providing a service suitable for a user's intention or a service required for a user, The user can perform conversation with the user by outputting the system utterance as a means for clearly grasping the intention of the user.

당해 실시예에서 사용자에게 제공되는 서비스는 정보의 제공, 차량의 제어, 오디오/비디오/내비게이션 기능의 실행, 외부 서버로부터 가져온 컨텐츠의 제공 등 사용자의 필요나 사용자의 의도에 부응하기 위해 수행되는 모든 동작을 포함할 수 있다.In this embodiment, the service provided to the user includes all operations performed to meet the needs of the user, such as providing information, controlling the vehicle, executing audio / video / navigation functions, . &Lt; / RTI >

또한, 일 실시예에 따른 대화 시스템(100)은 차량 환경에 특화된 대화 처리 기술을 제공함으로써, 차량이라는 특수한 환경에서 사용자의 의도를 정확히 파악할 수 있다.In addition, the conversation system 100 according to one embodiment can provide a conversation processing technique specific to a vehicle environment, thereby accurately grasping the intention of a user in a special environment of a vehicle.

이러한 대화 시스템(100)과 사용자를 연결하는 게이트 웨이는 차량 또는 차량에 연결된 모바일 기기가 될 수 있다. 후술하는 바와 같이, 대화 시스템(100)은 차량에 마련될 수도 있고, 차량 외부의 원격 서버에 마련되어 차량 또는 차량에 연결된 모바일 기기와의 통신을 통해 데이터를 주고 받을 수도 있다. The gateway connecting the conversation system 100 and the user may be a vehicle or a mobile device connected to the vehicle. As described later, the conversation system 100 may be provided in a vehicle, or may be provided in a remote server outside the vehicle, and may exchange data through communication with a vehicle or a mobile device connected to the vehicle.

또한, 대화 시스템(100)의 구성 요소 중 일부는 차량에 마련되고 일부는 원격 서버에 마련되어 대화 시스템(100)의 동작을 차량과 원격 서버에서 부분적으로 수행하는 것도 가능하다.It is also possible that some of the components of the conversation system 100 are provided in the vehicle and some are provided in the remote server to partially perform the operation of the conversation system 100 in the vehicle and the remote server.

도1은 일 실시예에 따른 대화 시스템(100)의 제어 블록도이다.1 is a control block diagram of an interactive system 100 according to one embodiment.

도1을 참조하면, 일 실시예에 따른 대화 시스템(100)은 사용자의 음성 및 음성 외 입력을 포함하는 사용자 입력이나 차량과 관련된 정보 또는 사용자와 관련된 정보를 포함하는 입력을 처리하는 입력 처리기(110), 입력 처리기(110)의 처리 결과를 이용하여 사용자의 의도를 파악하고, 사용자의 의도나 차량의 상태에 대응되는 액션을 결정하는 대화 관리기(120), 대화 관리기(120)의 출력 결과에 따라 특정 서비스를 제공하거나 대화를 계속 이어 나가기 위한 시스템 발화를 출력하는 결과 처리기(130) 및 대화 시스템(100)이 후술하는 동작을 수행하기 위해 필요한 각종 정보를 저장하는 저장부(140)를 포함한다.Referring to FIG. 1, an interactive system 100 according to one embodiment includes an input processor 110 (not shown) that processes input including user input, vehicle-related information, A dialogue manager 120 for determining an intention of the user by using the processing result of the input processor 110 and determining an action corresponding to the intention of the user or the state of the vehicle, A result processor 130 for outputting a system utterance for providing a specific service or continuing a conversation, and a storage unit 140 for storing various information necessary for the conversation system 100 to perform an operation described later.

입력 처리기(110)는 제1센서부(110-1)와 제2센서부(110-2)를 포함할 수 있다.The input processor 110 may include a first sensor unit 110-1 and a second sensor unit 110-2.

제1센서부(110-1)는 발화를 포함한 음성 신호를 입력 마이크와 같은 구성으로 마련될 수 있다.The first sensor unit 110-1 may be provided with the same structure as the input microphone with a speech signal including a speech.

제1센서부(110-1)는 적어도 한 명의 사용자의 발화 정보를 획득할 수 있다.The first sensor unit 110-1 can acquire at least one user's utterance information.

또한 제1센서부(110-1)는 차량에 내부에 위치하고 빔포밍(Beamforming) 신호를 출력할 수 있다. 또한 제1센서부(110-1)는 상기 입력 개시 신호의 발화 시점 이전의 상기 적어도 한 명의 사용자의 과거 발화 정보를 획득할 수 있다.Also, the first sensor unit 110-1 may be located inside the vehicle and output a beamforming signal. Also, the first sensor unit 110-1 may acquire the past speech information of the at least one user before the utterance start point of the input start signal.

제2센서부(110-2)는 영상 신호를 획득하는 카메라로 마련될 수 있다.The second sensor unit 110-2 may be a camera for acquiring a video signal.

제2센서부(110-2)는 적어도 한 명의 사용자의 영상 정보를 획득할 수 있다.The second sensor unit 110-2 can acquire image information of at least one user.

제2센서부(110-2)는 적어도 한 명의 사용자의 입술 주변부를 포함하는 안면 영상 정보를 획득할 수 있다.The second sensor unit 110-2 may acquire facial image information including the lip periphery of at least one user.

상술한 바와 같이 사용자 음성과 음성 외 입력, 두 가지 종류의 입력을 수신할 수 있다. 음성 외 입력은 사용자의 제스처 인식이나, 입력 장치의 조작을 통해 입력되는 사용자의 음성 외 입력, 차량의 상태를 나타내는 차량 상태 정보, 차량의 주행 환경과 관련된 주행 환경 정보, 사용자의 상태를 나타내는 사용자 정보 등을 포함할 수 있다. 이러한 정보들 외에도 차량과 사용자와 관련된 정보로서, 사용자의 의도를 파악하거나 사용자에게 서비스를 제공하기 위해 사용될 수 있는 정보이면, 모두 입력 처리기(110)의 입력이 될 수 있다. 사용자는 운전자와 탑승자를 모두 포함할 수 있다. As described above, it is possible to receive two types of inputs, that is, a user voice and a voice input. The non-voice input includes a user's gesture recognition, a non-voice input of the user input through the operation of the input device, vehicle status information indicating the status of the vehicle, driving environment information related to the driving environment of the vehicle, And the like. In addition to this information, information related to the vehicle and the user can be input to the input processor 110 if the information can be used to grasp the user's intention or provide the service to the user. The user may include both the driver and the passenger.

입력 처리기(110)는 입력된 사용자 음성을 인식하여 텍스트 형태의 발화문으로 변환하고, 사용자의 발화문에 자연어 이해(Natural Language Understanding) 기술을 적용하여 사용자의 의도를 파악한다. The input processor 110 recognizes the input user speech, converts the input user speech into a textual speech, and grasps the user's intention by applying a Natural Language Understanding technique to the user's utterance.

또한, 입력 처리기(110)는 사용자 음성 외에 차량의 상태나 주행 환경과 관련된 정보를 수집하고, 수집된 정보를 이용하여 상황을 이해한다. In addition, the input processor 110 collects information related to the state of the vehicle and the driving environment in addition to the user's voice, and uses the collected information to understand the situation.

입력 처리기(110)는 자연어 이해를 통해 파악한 사용자의 의도와 상황에 관련된 정보 등을 대화 관리기(120)로 전달한다. 입력처리기(110)는 사용자의 발화를 입력 받을 수 있다. 발화는 후술하는 바와 같이 음성입력장치(210)를 통해 입력 받을 수 있다. 또한 입력 처리기(110)는 사용자의 발화시 사용자의 피드백 반응을 입력 받을 수 있다. 피드백 반응은 발화와 같은 음성 반응일 수 있으며 음성 외의 반응일 수 있다. 사용자의 피드백 반응이 음성 반응인 경우 발화와 같이 음성입력장치(210)를 통하여 피드백 반응을 입력 받을 수 있다. 또한 사용자의 피드백 반응이 음성 외 반응인 경우 음성 외 입력장치(220)를 통하여 사용자의 피드백 반응을 입력 받을 수 있다. 입력 처리기(110)는 이러한 피드백 반응을 대화관리기(120)로 전달 할 수 있다. The input processor 110 transmits the information related to the intention of the user and the situation, etc., obtained through natural language understanding, to the dialogue manager 120. The input processor 110 may receive a user's utterance. The speech may be input through the speech input device 210 as described later. In addition, the input processor 110 may receive a feedback response of the user when the user utters the speech. The feedback reaction may be a negative reaction, such as an ignition, and may be a non-negative reaction. When the feedback reaction of the user is a voice response, a feedback reaction can be inputted through the voice input device 210 like a speech. Also, if the user's feedback reaction is an out-of-speech response, the user's feedback reaction can be input through the non-speech input device 220. The input processor 110 may forward this feedback response to the conversation manager 120.

대화 관리기는 상기 사용자 정보, 상기 발화 정보 및 상기 영상 정보 중 적어도 하나를 기초로, 상기 적어도 한 명의 사용자 중 입력 개시 신호를 발화하는 기준 사용자를 판단하고, 상기 기준 사용자의 발화에 포함된 적어도 하나의 명령어를 기초로 차량이 수행할 액션을 결정할 수 있다.The conversation manager determines a reference user for uttering an input start signal among the at least one user based on at least one of the user information, the utterance information, and the image information, and the at least one Based on the command, the action to be performed by the vehicle can be determined.

대화 관리기는 상기 발화 정보를 기초로 상기 적어도 한 명의 사용자의 현재 위치 정보를 도출하고, 상기 현재 위치 정보를 상기 기준 위치 정보에 대응하여 상기 기준 사용자를 판단할 수 있다. 대화관리기는 발화 정보를 상기 기준 음성 정보에 대응하여 상기 기준 사용자를 판단할 수 있다.The dialogue manager may derive the current position information of the at least one user based on the ignition information and determine the reference user corresponding to the reference position information. The dialogue manager may determine the reference user corresponding to the reference speech information with the speech information.

대화관리기는 상기 적어도 하나의 마이크가 출력하는 상기 빔포밍 신호를 기초로 상기 적어도 한 명의 사용자의 현재 위치 정보를 도출할 수 있다.The dialogue manager may derive current position information of the at least one user based on the beamforming signal output by the at least one microphone.

대화관리기는, 상기 과거 발화 정보를 기초로 상기 사용자 정보를 생성하여 저장할 수 있다.The conversation manager may generate and store the user information based on the past speech information.

대화관리기는 영상 정보를 기초로 상기 적어도 한 명의 사용자의 위치 정보를 도출할 수 있다.The dialogue manager may derive the location information of the at least one user based on the image information.

대화관리기는 적어도 한 명의 사용자의 입술 움직임을 기초로 상기 입력 개시 신호의 발화 여부를 판단하여, 상기 기준 사용자를 판단할 수 있다.The dialogue manager may determine whether the input start signal is ignited based on at least one user's lip movement to determine the reference user.

대화 관리기(120)는 입력 처리기(110)로부터 전달된 사용자의 의도, 상황에 관련된 정보 등에 기초하여 사용자의 의도와 현재 상황에 대응되는 액션을 결정하고, 해당 액션을 수행하기 위해 필요한 인자들을 관리한다. The dialogue manager 120 determines an action corresponding to the user's intention and the current situation based on the intention of the user, information related to the situation, etc. transmitted from the input processor 110, and manages factors necessary for performing the action .

당해 실시예에서 액션은 특정 서비스를 제공하기 위해 수행되는 모든 동작을 의미할 수 있으며, 액션의 종류는 미리 정의될 수 있다. 경우에 따라, 서비스의 제공과 액션의 수행은 동일한 의미일 수 있다. 특히, 대화 관리기(120)는 입력 처리기(110)로부터 입력 받은 피드백 반응을 기초로 학습하고, 사용자의 의도를 파악 할 수 있다.In this embodiment, an action may mean all actions performed to provide a particular service, and the type of action may be predefined. In some cases, provision of a service and performance of an action may have the same meaning. In particular, the conversation manager 120 learns based on the feedback response received from the input processor 110 and grasps the intention of the user.

예를 들어, 도메인/액션 추론 규칙 DB에 길 안내, 차량 상태 점검, 주유소 추천 등과 같은 액션이 미리 정의될 수 있고, 저장된 추론 규칙에 따라 사용자의 발화에 대응되는 액션, 즉 사용자가 의도하는 액션을 미리 정의된 액션 중에서 추출할 수 있다. For example, an action such as a route guidance, a vehicle condition check, a petrol station recommendation, or the like may be predefined in the domain / action reasoning rule DB, and an action corresponding to the user's utterance according to the stored reasoning rule, You can extract from predefined actions.

액션의 종류에 대해서는 제한을 두지 않으며, 대화 시스템(100)이 차량(200) 또는 모바일 기기(400)를 통해 수행 가능한 것으로서, 미리 정의되고, 그 추론 규칙이나 다른 액션/이벤트와의 관계 등이 저장되어 있으면 액션이 될 수 있다. There is no restriction on the type of action, and the conversation system 100 is predefined as being executable through the vehicle 200 or the mobile device 400, and the relationship with the inference rule or other action / event is stored If it is, it can be an action.

대화 관리기(120)는 결정된 액션에 관한 정보를 결과 처리기(130)로 전달한다.The conversation manager 120 delivers the information about the determined action to the result processor 130. [

결과 처리기(130)는 전달된 액션을 수행하기 위해 필요한 대화 응답 및 명령어를 생성하여 출력한다. 대화 응답은 텍스트, 이미지 또는 오디오로 출력될 수 있고, 명령어가 출력되면 출력된 명령어에 대응되는 차량 제어, 외부 컨텐츠 제공 등의 서비스가 수행될 수 있다. The result processor 130 generates and outputs an interactive response and a command necessary to perform the delivered action. The conversation response may be output as text, image, or audio, and when the command is output, services such as vehicle control and external content provision corresponding to the output command may be performed.

저장부(140)는 적어도 한 명의 사용자 정보를 저장할 수 있다. 저장부(140)는 적어도 한 명의 사용자의 기준 위치 정보를 저장할 수 있다. 저장부(140)는 상기 적어도 한 명의 사용자의 기준 음성 정보를 저장할 수 있다.The storage unit 140 may store at least one user information. The storage unit 140 may store reference position information of at least one user. The storage unit 140 may store the reference voice information of the at least one user.

저장부(140)는 대화 처리 및 서비스 제공에 필요한 각종 정보를 저장한다. 예를 들어, 자연어 이해에 사용되는 도메인, 액션, 화행, 개체명과 관련된 정보를 미리 저장할 수 있고, 입력된 정보로부터 상황을 이해하기 위해 사용되는 상황 이해 테이블을 저장할 수도 있으며, 차량에 마련된 센서가 감지한 데이터, 사용자와 관련된 정보, 액션 수행을 위해 필요한 정보를 미리 저장할 수도 있다. 특히 저장부(140)는 사용자의 발화에 대응되는 사용자의 피드백 정보를 저장 할 수 있다. 저장부(140)에 저장되는 정보들에 관한 더 자세한 설명은 후술하도록 한다.The storage unit 140 stores various kinds of information necessary for conversation processing and service provisioning. For example, information related to a domain, an action, an action, and an object name used for natural language understanding can be stored in advance, a situation understanding table used for understanding the situation can be stored from input information, One data, information related to the user, and information necessary for performing an action may be stored in advance. In particular, the storage unit 140 may store feedback information of a user corresponding to a user's utterance. A more detailed description of the information stored in the storage unit 140 will be described later.

저장부(140)는 캐쉬, ROM(Read Only Memory), PROM(Programmable ROM), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM) 및 플래쉬 메모리(Flash memory)와 같은 비휘발성 메모리 소자 또는 RAM(Random Access Memory)과 같은 휘발성 메모리 소자 또는 하드디스크 드라이브(HDD, Hard Disk Drive), CD-ROM과 같은 저장 매체 중 적어도 하나로 구현될 수 있으나 이에 한정되지는 않는다. 저장부(140)는 제어부와 관련하여 전술한 프로세서와 별개의 칩으로 구현된 메모리일 수 있고, 프로세서와 단일 칩으로 구현될 수도 있다.The storage unit 140 may be a non-volatile memory device such as a cache, a read only memory (ROM), a programmable ROM (PROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM) (Random Access Memory), or a storage medium such as a hard disk drive (HDD) and a CD-ROM. However, the present invention is not limited thereto. The storage unit 140 may be a memory implemented in a chip separate from the processor described above in connection with the control unit, and may be implemented as a single chip with the processor.

전술한 바와 같이, 대화 시스템(100)은 차량 환경에 특화된 대화 처리 기술을 제공한다. 대화 시스템(100)의 구성요소가 전부 차량에 포함될 수도 있고, 일부만 포함될 수도 있다. 대화 시스템(100)은 원격 서버에 마련되고 차량은 대화 시스템(100)과 사용자 사이의 게이트웨이의 역할만 할 수도 있다. 어느 경우이던지, 대화 시스템(100)은 차량 또는 차량과 연결된 모바일 기기를 통해 사용자와 연결될 수 있다. As described above, the conversation system 100 provides a conversation processing technique specific to the vehicle environment. All of the components of the conversation system 100 may be included in the vehicle, or only a part thereof may be included. The conversation system 100 may be provided at a remote server and the vehicle may serve only as a gateway between the conversation system 100 and the user. In either case, the conversation system 100 may be connected to a user via a vehicle or a mobile device associated with the vehicle.

도1에 도시된 대화 시스템(100)의 구성 요소들의 성능에 대응하여 적어도 하나의 구성요소가 추가되거나 삭제될 수 있다. 또한, 구성 요소들의 상호 위치는 시스템의 성능 또는 구조에 대응하여 변경될 수 있다는 것은 당해 기술 분야에서 통상의 지식을 가진 자에게 용이하게 이해될 것이다.At least one component may be added or deleted corresponding to the capabilities of the components of the interactive system 100 shown in FIG. It will be readily understood by those skilled in the art that the mutual position of the components can be changed corresponding to the performance or structure of the system.

한편, 도1에서 도시된 각각의 구성요소는 소프트웨어 및/또는 Field rogrammable Gate Array(FPGA) 및 주문형 반도체(ASIC, Application Specific Integrated Circuit)와 같은 하드웨어 구성요소를 의미한다.Each of the components shown in FIG. 1 refers to a hardware component such as software and / or a field programmable gate array (FPGA) and an application specific integrated circuit (ASIC).

도2는 차량 내부의 구성을 나타낸 도면이다.2 is a view showing a configuration inside the vehicle.

도2를 참조하면, 차량(200) 내부의 대시보드(201)의 중앙 영역인 센터페시아(203)에는 오디오 기능, 비디오 기능, 내비게이션 기능 또는 전화 걸기 기능을 포함하는 차량의 제어를 수행하기 위해 필요한 화면을 표시하는 디스플레이(231)와 사용자의 제어 명령을 입력 받기 위한 입력 버튼(221)이 마련될 수 있다. 2, the center fascia 203, which is a central area of the dashboard 201 inside the vehicle 200, is required to perform control of the vehicle including an audio function, a video function, a navigation function, A display 231 for displaying a screen and an input button 221 for receiving a control command of the user may be provided.

또한, 운전자의 조작 편의성을 위해 스티어링 휠(207)에도 입력 버튼(223)이 마련될 수 있고, 운전석(254a)과 조수석(254b) 사이의 센터 콘솔 영역(202)에 입력 버튼의 역할을 수행하는 조그 셔틀(225)이 마련될 수도 있다. An input button 223 may be provided on the steering wheel 207 for convenience of operation of the driver and may be provided on the center console area 202 between the driver's seat 254a and the passenger's seat 254b A jog shuttle 225 may be provided.

디스플레이(231), 입력 버튼(221) 및 각종 기능을 전반적으로 제어하는 프로세서를 포함하는 모듈을 AVN(Audio Video Navigation) 단말이라 할 수도 있고, 헤드유닛(Head Unit)이라 할 수도 있다. A module including a display 231, an input button 221 and a processor for controlling various functions may be referred to as an AVN (Audio Video Navigation) terminal or a head unit.

디스플레이(231)는 LCD(Liquid Crystal Display), LED(Light Emitting Diode), PDP(Plasma Display Panel), OLED(Organic Light Emitting Diode), CRT(Cathode Ray Tube) 등의 다양한 디스플레이 장치 중 하나로 구현될 수 있다.The display 231 may be implemented as one of various display devices such as a Liquid Crystal Display (LCD), a Light Emitting Diode (LED), a Plasma Display Panel (PDP), an Organic Light Emitting Diode (OLED), and a Cathode Ray Tube have.

입력 버튼(221)은 도 2에 도시된 바와 같이 디스플레이(231)와 인접한 영역에 하드 키 타입으로 마련될 수도 있고, 디스플레이(231)가 터치 스크린 타입으로 구현되는 경우에는 디스플레이(231)가 입력 버튼(221)의 기능도 함께 수행할 수 있다. The input button 221 may be provided in a hard key type in an area adjacent to the display 231 as shown in FIG. 2. When the display 231 is implemented as a touch screen type, It is also possible to perform the function of the control unit 221 as well.

차량(1)은 사용자의 명령을 음성 입력 장치(210)를 통해 음성으로 입력 받을 수 있다. 음성 입력 장치(210)는 음향을 입력 받아 전기적인 신호로 변환하여 출력하는 마이크를 포함할 수 있다. The vehicle 1 can receive a voice command of the user through the voice input device 210. [ The audio input device 210 may include a microphone for receiving sound, converting the sound into an electrical signal, and outputting the electrical signal.

사용자 입력 중 음성을 제외한 입력은 음성 외 입력 장치(220)를 통해 입력될 수 있다. 음성 외 입력 장치(220)는 사용자의 조작을 통해 명령을 입력 받는 입력 버튼(221, 223)과 조그 셔틀(225)을 포함할 수 있다. The input other than the voice during the user input may be input through the voice input device 220. The non-voice input device 220 may include input buttons 221 and 223 and a jog shuttle 225 for receiving a command through a user's operation.

효과적인 음성의 입력을 위하여 음성 입력 장치(210)는 도 2에 도시된 바와 같이 헤드라이닝(205)에 마련될 수 있으나, 차량(200)의 실시예가 이에 한정되는 것은 아니며, 대시보드(201) 위에 마련되거나 스티어링 휠(207)에 마련되는 것도 가능하다. 이 외에도 사용자의 음성을 입력 받기에 적합한 위치이면 어디든 제한이 없다.The voice input device 210 may be provided in the head lining 205 as shown in FIG. 2 for the effective voice input, but the embodiment of the vehicle 200 is not limited thereto, and may be provided on the dashboard 201 Or may be provided on the steering wheel 207. In addition, there is no limit to the location where the user's voice can be input.

차량(200) 내부에는 사용자와 대화를 수행하거나, 사용자가 원하는 서비스를 제공하기 위해 필요한 음향을 출력하는 스피커(232)가 마련될 수 있다. 일 예로, 스피커(232)는 운전석 도어(253a) 및 조수석 도어(253b) 내측에 마련될 수 있다.Inside the vehicle 200, a speaker 232 may be provided to output a sound necessary for conducting a conversation with a user or providing a desired service to the user. For example, the speaker 232 may be provided inside the driver's seat door 253a and the passenger-seat door 253b.

스피커(232)는 내비게이션 경로 안내를 위한 음성, 오디오/비디오 컨텐츠에 포함되는 음향 또는 음성, 사용자가 원하는 정보나 서비스를 제공하기 위한 음성, 사용자의 발화에 대한 응답으로서 생성된 시스템 발화 등을 출력할 수 있다.The speaker 232 outputs audio or voice included in the audio / video contents, voice for providing information or service desired by the user, system utterance generated as a response to the user's utterance, etc. .

일 실시예에 따른 대화 시스템(100)은 차량 환경에 특화된 대화 처리 기술을 이용하여 사용자의 라이프스타일에 최적화된 서비스를 제공하고, 커넥티드카(Connected Car), 사물인터넷(IoT), 인공지능(AI) 등의 기술을 이용한 새로운 서비스를 구성할 수 있다. 또한 대화 시스템(100)은 사용자의 피드백 반응에 기초한 피드백 정보를 기초로 학습할 수 있고 학습한 결과를 기초로 사용자의 의도를 파악 할 수 있다. The interactive system 100 according to an exemplary embodiment provides a service optimized for a lifestyle of a user by using a conversation processing technique specialized in a vehicle environment, and can provide services optimized for a connected car, an Internet (IoT), an artificial intelligence AI) can be configured. Also, the conversation system 100 can learn based on the feedback information based on the feedback response of the user and grasp the user's intention based on the learning result.

일 실시예에 따른 대화 시스템(100)과 같이 차량 환경에 특화된 대화 처리 기술을 적용할 경우, 운전자의 직접 주행 상황에서, 주요 상황(Context)에 대한 인지 및 대응이 용이하다. 유량 부족, 졸음 운전 등 주행에 영향을 미치는 요소에 가중치를 부여하여 서비스를 제공할 수 있고, 대부분의 경우 목적지로 이동 중인 조건을 기반으로 주행 시간, 목적지 정보 등 서비스 제공을 위해 필요한 정보를 용이하게 획득할 수 있다. In the case of applying the conversation processing technique specific to the vehicle environment such as the interactive system 100 according to the embodiment, it is easy to recognize and respond to the main context in the direct driving situation of the driver. In many cases, it is possible to provide information necessary for providing service such as travel time and destination information on the basis of the condition being moved to the destination in most cases. Can be obtained.

또한, 운전자의 의도를 파악하고, 기능을 제안하는 지능적인 서비스를 용이하게 구현할 수 있다. 이는 운전자의 직접 주행 상황에 있어 실시간 정보와 액션을 우선적으로 고려하기 때문이다. 일 예로, 주행 중에 운전자가 주유소를 검색하면, 이는 지금 주유소에 가겠다는 운전자의 의도로 해석될 수 있다. 그러나, 차량이 아닌 환경에서 주유소를 검색할 경우 지금 주유소를 가겠다는 의도 외에 위치 정보 조회, 전화번호 조회, 가격 조회 등 다른 다양한 의도로도 해석되는 것이 가능하다.Further, it is possible to easily realize an intelligent service that grasps the intention of a driver and suggests a function. This is because priority is given to real-time information and action in the driver's direct driving situation. For example, if a driver searches for a gas station while driving, this can be interpreted as the intention of the driver to go to the gas station now. However, if a gas station is searched for in a non-vehicle environment, it can be interpreted as various other intentions such as location information inquiry, telephone number inquiry, and price inquiry besides the intention of going to a gas station now.

또한, 차량은 한정적인 공간이지만, 그 안에 다양한 상황이 존재할 수 있다. 예를 들어, 렌터카 등 생소한 인터페이스의 운전 상황, 대리 운전을 맡긴 상황, 세차 등 차량 관리 상황, 유아를 탑승시키는 상황, 특정 목적지를 찾아가는 상황 등에서 대화 시스템(100)을 활용할 수 있다. Further, although the vehicle is a limited space, various situations may exist therein. For example, the interactive system 100 can be utilized in a driving situation of an unfamiliar interface such as a rental car, a situation in which a surrogate operation is left, a vehicle management situation such as a car wash, a situation in which a child is boarded,

또한, 차량 점검 단계, 출발 준비 단계, 주행 단계, 주차 단계 등 차량의 주행과 그 전후를 구성하는 각각의 단계에서도 다양한 서비스의 기회와 대화 상황들이 발생한다. 특히, 차량 문제의 대처 방안을 모르는 상황, 차량과 각종 외부 기기 연동 상황, 연비 등 주행 습관 확인 상황, 스마트 크루즈 컨트롤(Smart Cruise Control) 등의 안전 지원 기능 활용 상황, 내비게이션 조작 상황, 졸음 운전 상황, 매일 동일한 경로를 반복해서 주행하는 상황, 주정차가 가능한 곳인지 확인해야 하는 상황 등에서 대화 시스템(100)을 활용할 수 있다. In addition, various service opportunities and conversation situations arise at each stage of the vehicle driving, including the vehicle inspection phase, the starting preparatory phase, the driving phase, and the parking phase. Especially, the situation of using the safety support functions such as the situation of not knowing how to cope with the vehicle problem, the situation of interlocking of the vehicle with various external devices, driving habit confirmation such as fuel consumption, Smart Cruise Control, The conversation system 100 can be utilized in a situation in which the user repeatedly travels the same route every day, or in a situation where it is necessary to confirm whether or not the parking space is available.

또한 차량(200)은 에어컨시설(222)를 구비 하여 차량 내부의 온도를 적절하게 유지할 수 있다.In addition, the vehicle 200 may include an air conditioner 222 to properly maintain the temperature inside the vehicle.

카메라는차량내부에 마련될 수 있으며, 적어도 한 명의 사용자의 영상 정보를 획득할 수 있다. 카메라는CCD(Charge-Coupled Device) 카메라 또는 CMOS 컬러 이미지 센서를 포함할 수 있다. 여기서 CCD 및 CMOS는 모두 카메라의 렌즈를 통해 들어온 빛을 전기 신호로 바꾸어 저장하는 센서를 의미한다. 구체적으로 CCD(Charge-Coupled Device) 카메라는 전하 결합 소자를 사용하여 영상을 전기 신호로 변환하는 장치이다. 또한, CIS(CMOS Image Sensor)는 CMOS 구조를 가진 저소비, 저전력형의 촬상소자를 의미하며, 디지털 기기의 전자 필름 역할을 수행한다. 일반적으로 CCD는 CIS보다 감도가 좋아 차량(1)에 많이 쓰이지만, 반드시 이에 한정되는 것은 아니다. 즉, 개시된 발명에서 카메라는 위치와 장치에 제한이 없으며, 차량(1)에 탑승한 사용자의 영상 정보를 획득하면 충분하다. The camera may be provided inside the vehicle and may acquire image information of at least one user. The camera may include a Charge-Coupled Device (CCD) camera or a CMOS color image sensor. Here, both CCD and CMOS means a sensor which converts light received through a lens of a camera into an electric signal and stores it. Specifically, a CCD (Charge-Coupled Device) camera is a device that converts an image into an electrical signal using a charge coupled device. In addition, CIS (CMOS Image Sensor) means a low-power, low-power type image pickup device having a CMOS structure and serves as an electronic film of a digital device. Generally, the CCD is more sensitive than the CIS and is widely used in the vehicle 1, but the present invention is not limited thereto. That is, in the disclosed invention, the camera is not limited in location and apparatus, and it is sufficient to acquire image information of the user who boarded the vehicle 1.

도3은 일 실시예에 따른 발화 정보를 획득하는 동작을 나타낸 도면이다.3 is a diagram illustrating an operation for obtaining speech information according to an embodiment.

도3을 참고하면, 차량에 구비된 대화 시스템(100)은 제1센서부(110-1)가 마련될 수 있다 전술한 바와 같이 제1센서부(110-1)는 마이크로 구비되어 적어도 한 명의 사용자(P1 내지 P4)의 발화 정보를 획득할 수 있다. 한편, 차량에 마련된 제1센서부(110-1)는 적어도 한 명의 사용자가 입력 개시 신호를 획득할 수 있다. 3, the interactive system 100 provided in the vehicle may include a first sensor unit 110-1. As described above, the first sensor unit 110-1 is micro- It is possible to obtain the speech information of the users P1 to P4. On the other hand, the first sensor unit 110-1 provided in the vehicle can acquire an input start signal by at least one user.

한편, 사용자 상호간은 대화를 나눌 수 있고, 사용자간 대화 도중 대화 시스템(100)에 신호를 입력하기 위한 입력 개시 신호를 발화 할 수 있다. 예를 들어 한 사용자(P1)는 '잘 들어!', '자! OOO야!'와 같은 발화를 통하여 입력 개시 신호를 발화 할 수 있다. 사용자(P1)이 상술한 바와 같은 입력 개시 신호를 발화한다면, 대화 관리기는 상기 사용자를 기준 사용자로 결정할 수 있다. 기준 사용자는 적어도 한 명의 사용자 중 대화 관리기에 직접적인 명령을 입력하는 사용자로, 대화관리기가 기준 사용자를 결정하게 되면, 대화 관리기는 기준 사용자의 발화를 기초로 차량이 수행할 액션을 결정하게 된다. On the other hand, the users can talk with each other and can utter an input start signal for inputting a signal to the conversation system 100 during conversation between users. For example, a user (P1) might say 'Listen!' The input start signal can be generated through the ignition such as " OOO! If the user P1 fires the input start signal as described above, the conversation manager can determine the user as the reference user. The reference user is a user of at least one user who inputs a direct command to the conversation manager. When the conversation manager decides the reference user, the conversation manager decides the action to be performed by the vehicle based on the reference user's utterance.

한편 제1센서부(110-1)는 입력 개시 신호 발화 전에도 사용자들(P1 내지 P4)의 발화 정보를 획득할 수 있고, 대화 관리기는 이를 기초로 사용자 정보를 생성할 수 있다.On the other hand, the first sensor unit 110-1 can acquire the speech information of the users P1 to P4 before the input start signal is ignited, and the dialogue manager can generate the user information based on the information.

한편, 도3에서 설명한 입력 개시 신호는 사용자에 따라서 미리 결정될 수 있으며 언어의 종류나 형태의 제한은 없다.On the other hand, the input start signal described with reference to FIG. 3 can be predetermined according to the user, and there is no limitation on the type or form of the language.

도4는 일 실시예에 따른 빔포밍 신호를 출력하는 동작을 나타낸 도면이다.4 is a diagram illustrating an operation of outputting a beamforming signal according to an embodiment.

도4를 참고하면, 마이크로 마련된 제1센서부(110-1)가 빔포밍 신호를 수신하는 것을 나타내고 있다. Referring to FIG. 4, the first sensor unit 110-1, which is provided with a microwave, receives a beam forming signal.

빔포밍(Beamforming)은 일정한 배열을 이루고 있는 송/수신기를 이용해서 무선 신호를 보내거나 받아들일 때 방향성을 부여하기 위한 신호처리 기술로, 신호를 받아들이는 경우에는 각 수신기에 도착하는 신호의 위상차에 의해 특정 방향의 신호는 서로 상쇄되고 다른 방향의 신호는 서로 보강되는 성질을 이용하는데, 여러 수신기에 도착하는 신호들을 그대로 받아들여 합하기만 하는 것이 아니라 가중치와 지연시간을 조절한 후에 합하면 원하는 방향의 신호만을 상대적으로 크게 받아들일 수 있는 것을 마이크에 적용한 것이다. 즉 마이크를 포함한 제1센서부(110-1)는 적어도 한 명의 사용자(P1 내지 P4)의 발화 신호를 수신하는데 있어 발화 신호에 방향성을 부여할 수 있다. 즉 제1센서부(110-1)가 수신하는 발화 신호의 방향성을 인식하여 신호를 수신할 수 있다. 제1센서부(110-1)가 방향성을 갖고 발화 신호를 수신하면, 대화 시스템(100)은 각각의 발화 신호의 형성된 위치를 기초로 사용자의 위치를 검출할 수 있다. Beamforming is a signal processing technique for giving direction when transmitting or receiving a wireless signal using a transmitter / receiver having a constant arrangement. In case of receiving a signal, The signals in a specific direction are canceled each other and the signals in the other direction are reinforced with each other. Instead of merely receiving signals arriving at a plurality of receivers, the weight and the delay time are adjusted, The only thing that can accept a relatively large signal is the microphone. That is, the first sensor unit 110-1 including a microphone can impart directionality to the speech signal in receiving a speech signal of at least one user (P1 to P4). That is, the first sensor unit 110-1 can recognize the directionality of the speech signal received and receive the signal. When the first sensor unit 110-1 is directional and receives a speech signal, the conversation system 100 can detect the position of the user based on the formed position of each speech signal.

예를 들어 사용자(P3)의 발화 신호(S3)에 기초하여 방향성을 갖는 경우 제1센서부(110-1)는 사용자(P3)의 발화 신호를 수신하고 대화관리기는 이와 관련된 방햐성을 도출하여 사용자(P3)의 위치 정보를 도출할 수 있다. 대화 시스템(100)은 상술한 방법으로 도출한 사용자의 위치 정보와 저장부(140)에 저장된 사용자 정보를 대응시켜 위치 정보에 대응되는 사용자를 기준 사용자로 결정할 수 있다. 대화 시스템(100)이 결정한 기준 사용자의 발화에 포함된 적어도 하나의 명령어를 기초로 차량이 수행할 액션을 결정할 수 있다.For example, in the case where the first sensor unit 110-1 has directionality based on the ignition signal S3 of the user P3, the first sensor unit 110-1 receives the ignition signal of the user P3 and the dialogue manager derives the relatedness The location information of the user P3 can be derived. The conversation system 100 can determine the user corresponding to the position information as the reference user by associating the user's position information derived by the above method with the user information stored in the storage unit 140. [ The interactive system 100 may determine an action to be performed by the vehicle based on at least one instruction included in the determined user's utterance.

한편 빔포밍으로 사용자의 위치를 결정하는 동작은 상술한 동작에 한정되지 아니하며, 한 사용자(P3)가 아닌 다른 사용자(P1,P2,P4)도 기준 사용자로 결정될 수 있다.Meanwhile, the operation of determining the position of the user by beamforming is not limited to the above-described operation, and other users P1, P2, and P4 other than the user P3 may be determined as the reference user.

도5는 일 실시예에 따른 영상 정보를 획득하는 동작을 나타낸 도면이고, 도6은 일 실시예에 따른 입술의 움직을 기초로 입력 개시 신호를 도출하는 동작을 설명하기 위한 도면이다.FIG. 5 illustrates an operation of acquiring image information according to an exemplary embodiment. FIG. 6 illustrates an operation of deriving an input start signal based on movement of the lip according to an exemplary embodiment. Referring to FIG.

도5를 참고하면, 제2센서부(110-2)는 상술한 바와 같이 영상을 획득할 수 있는 구성으로 마련될 수 있다. 제2센서부(110-2)는 차량에 탑승한 적어도 한 명의 영상 정보를 획득할 수 있다. 영상 정보를 통하여 대화 관리기는 사용자의 제스처 인식, 표정을 포함한 정보를 획득할 수 있다. 예를 들어 제2센서부(110-2)가 사용자(P3)가 손으로 부채질을 하는 제스쳐를 취하고 발화하는 영상 정보를 획득한 경우 해당 사용자가 차량에 마련된 에어컨을 제어하고자 하는 의도인 것으로 파악할 수 있으므로, 대화 시스템(100)은 이를 기초로 해당 사용자(P3)를 기준 사용자로 결정할 수 있다.Referring to FIG. 5, the second sensor unit 110-2 may be configured to acquire an image as described above. The second sensor unit 110-2 can acquire at least one image information boarded on the vehicle. Through the image information, the conversation manager can acquire information including the user's gesture recognition and facial expression. For example, when the second sensor unit 110-2 acquires the image information that the user P3 takes a gesture of hand-blown and fires, the user can determine that the intention is to control the air conditioner provided in the vehicle Therefore, the conversation system 100 can determine the corresponding user P3 as a reference user based on this.

도6은 도5에서 더 나아가 대화 시스템(100)이 획득한 영상을 기초로 독화(Lip reading)를 수행하는 것을 설명하기 위한 도면이다. 사용자가 발화하는데 있어서 입술의 모양을 달리할 수 있다. 이를 기초로 사용자의 발화 여부 및 대화 시스템(100)의 명령여부를 판단할 수 있다. FIG. 6 is a diagram for explaining the LIP reading based on the image obtained by the dialog system 100 in FIG. The user may vary the shape of the lips in firing. Based on this, it is possible to determine whether or not the user is speaking and whether or not the conversation system 100 is commanded.

구체적으로 제2센서부(110-2)는 사용자의 안면 영상을 획득할 수 있으며, 안면 영상에는 입술 주변부를 포함할 수 있다. 대화 시스템(100)은 획득한 영상을 기초로 입술 주변부의 특징점(C)을 도출할 수 있다. 대화 시스템(100)은 추출한 특징점의 움직임을 기초로 사용자의 발화 여부를 판단할 수 있다. 즉, 사용자의 발화는 마이크를 포함하고 있는 제1센서부(110-1)가 획득한 발화 정보로 판단할 수 있지만, 제2센서부(110-2)가 획득한 영상 정보로 판단할 수도 있으며, 더 나아가 영상에 포함된 사용자의 입술 주변부의 영상을 기초로 판단 가능하다. 예를 들어 사용자가 '잘들어!'와 같이 발화한 경우 대화 시스템(100)은 영상 정보에 포함된 입술 주변부의 안면 영상의 특징점을 분석하여 사용자의 발화 여부를 판단할 수 있고, 이를 기초로 해당 사용자가 기준사용자인 것으로 결정할 수 있다.Specifically, the second sensor unit 110-2 may acquire a user's facial image, and the facial image may include a lip peripheral. The dialog system 100 can derive the feature point C of the peripheral portion of the lip based on the acquired image. The conversation system 100 can determine whether or not the user is speaking based on the movement of the extracted minutiae. That is, the user's utterance can be determined as the utterance information acquired by the first sensor unit 110-1 including the microphone, but may be determined as the image information acquired by the second sensor unit 110-2 , And furthermore, based on the image of the peripheral portion of the user's lips included in the image. For example, when the user utteres a voice such as 'Listen!', The conversation system 100 can determine whether or not the user has uttered the voice by analyzing the minutiae of the facial image of the periphery of the lip included in the image information, It can be determined that the user is the reference user.

한편 도5및 도6에서 설명한 획득한 영상정보를 해석하여 사용자를 결정하는 동작은 본 발명의 일 실시예에 불과하며 영상 정보를 해석하여 사용자의 발화 여부를 분석하는 동작의 제한은 없다.Meanwhile, the operation of determining the user by interpreting the obtained image information described in FIGS. 5 and 6 is only an embodiment of the present invention, and there is no limitation on the operation of analyzing the user's ignition by analyzing the image information.

도7은 일 실시예에 따른 순서도이다.7 is a flowchart according to an embodiment.

도7을 참고하면, 대화 시스템(100)은 발화 정보 및 영상 정보를 획득할 수 있다(1001). 이를 기초로 적어도 한 명의 사용자가 입력 개시 신호를 발화하였는지 여부를 판단할 수 있다(1002). 이를 기초로 입력 개시 신호를 발화한 사용자를 기준 사용자로 판단 할 수 있다(1003). 이후 기준 사용자가 발화한 명령어를 기초로 차량이 수행할 액션을 결정할 수 있다.(1004).Referring to FIG. 7, the conversation system 100 may obtain the speech information and the image information (1001). Based on this, it can be determined whether at least one user has uttered the input start signal (1002). Based on this, the user who has uttered the input start signal can be determined as the reference user (1003). The action to be performed by the vehicle can then be determined based on the command that the reference user has uttered (1004).

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be embodied in the form of a recording medium storing instructions executable by a computer. The instructions may be stored in the form of program code and, when executed by a processor, may generate a program module to perform the operations of the disclosed embodiments. The recording medium may be embodied as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. The computer-readable recording medium includes all kinds of recording media in which instructions that can be decoded by a computer are stored. For example, it may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device, or the like.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다.본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 발명이 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or essential characteristics thereof. It will be appreciated that the present invention may be practiced. The disclosed embodiments are illustrative and should not be construed as limiting.

100: 대화 시스템
110: 입력 처리기
110-1 : 제1센서부
110-2 : 제2센서부
120: 대화 관리기
130: 결과 처리기
140 : 저장부
200: 차량100: Conversation system
110: input processor
110-1:
110-2: second sensor unit
120: Conversation Manager
130: result handler
140:
200: vehicle

Claims

A storage unit for storing at least one user information;
A first sensor unit for obtaining speech information of the at least one user;
A second sensor unit for acquiring image information of the at least one user; And
Determining a reference user for uttering an input start signal among the at least one user based on at least one of the user information, the utterance information, and the image information,
And a dialogue manager for determining an action to be performed by the vehicle based on at least one command included in the speech of the reference user.

The method according to claim 1,
Wherein,
Storing reference position information of the at least one user,
The conversation manager includes:
Deriving current position information of the at least one user based on the ignition information,
And determines the reference user corresponding to the current position information as the reference position information.

The method according to claim 1,
Wherein,
Storing reference voice information of the at least one user,
The conversation manager includes:
And determines the reference user corresponding to the reference speech information as the speech information.

The method according to claim 1,
Wherein the first sensor unit comprises:
And at least one microphone positioned within the vehicle and receiving a beamforming signal,
The conversation manager includes:
Wherein the current position information of the at least one user is derived based on the beamforming signal output by the at least one microphone.

The method according to claim 1,
Wherein the first sensor unit comprises:
Acquiring past speech information of the at least one user before the speech start time of the input start signal,
The conversation manager includes:
And generating and storing the user information based on the past speech information.

The method according to claim 1,
The conversation manager includes:
And derives positional information of the at least one user based on the image information.

The method according to claim 1,
Wherein the second sensor unit comprises:
Acquiring facial image information including a lip periphery of the at least one user,
The conversation manager includes:
And determining whether the input start signal is ignited based on the lip motion of the at least one user to determine the reference user.

Storing at least one user information,
Acquiring the speech information of the at least one user,
Acquiring image information of the at least one user,
Determining a reference user for uttering an input start signal among the at least one user based on at least one of the user information, the utterance information, and the image information,
And determining an action to be performed by the vehicle based on at least one instruction included in the reference user's utterance.

9. The method of claim 8,
Storing the at least one user information comprises:
Storing reference position information of the at least one user,
Determining the reference user may include:
Deriving current position information of the at least one user based on the ignition information,
And determining the reference user corresponding to the current position information as the reference position information.

9. The method of claim 8,
Storing the at least one user information comprises:
And storing the reference voice information of the at least one user,
Determining the reference user may include:
And determining the reference user corresponding to the reference speech information with the speech information.

9. The method of claim 8,
Obtaining the speech information of the at least one user comprises:
The method comprising: receiving a beamforming signal located inside the vehicle,
Determining the reference user may include:
And deriving current position information of the at least one user based on the beamforming signal.

9. The method of claim 8,
Obtaining the speech information of the at least one user comprises:
And obtaining past speech information of the at least one user prior to the speech start time of the input start signal,
Storing the at least one user information comprises:
And generating and storing the user information based on the past speech information.

9. The method of claim 8,
And deriving location information of the at least one user based on the image information.

9. The method of claim 8,
Obtaining the image information of the at least one user comprises:
Further comprising obtaining facial image information comprising a lip periphery of the at least one user,
Determining the reference user may include:
Determining whether the input start signal is ignited based on the lip motion of the at least one user, and determining the reference user.