KR101933822B1

KR101933822B1 - Intelligent speaker based on face reconition, method for providing active communication using the speaker, and computer readable medium for performing the method

Info

Publication number: KR101933822B1
Application number: KR1020180007673A
Authority: KR
Inventors: 김계영; 김혜민; 박선영; 한지연
Original assignee: 숭실대학교산학협력단
Priority date: 2018-01-22
Filing date: 2018-01-22
Publication date: 2019-03-29

Abstract

According to the present invention, an intelligent speaker based on face recognition comprises: an input unit for photographing a user sensed in the front so as to generate a photographed image; a control unit for analyzing the photographed image to classify the user, analyzing voice data for each user to select a conversation sentence for a topic of interest of the user, and determining whether the conversation sentence is actively output; and an output unit for outputting the conversation sentence by voice. Accordingly, an active conversation sentence can be output before a call of the user.

Description

TECHNICAL FIELD [0001] The present invention relates to an intelligent speaker based on face recognition, a method for providing active conversation using the same, and a recording medium for performing the same. [0002]

본 발명은 얼굴인식 기반 지능형 스피커, 이를 이용한 능동적인 대화 제공 방법 및 이를 수행하기 위한 기록매체에 관한 것으로서, 더욱 상세하게는 스피커 전방에 위치하는 사용자를 구분하여 사용자별 대화 주제를 동적으로 선택하여 음성으로 출력하는 얼굴인식 기반 지능형 스피커, 이를 이용한 능동적인 대화 제공 방법 및 이를 수행하기 위한 기록매체에 관한 것이다.The present invention relates to a face recognition-based intelligent speaker, an active dialogue providing method using the same, and a recording medium for performing the same. More particularly, the present invention relates to a face recognition- To an intelligent speaker based on face recognition, a method for providing active conversation using the intelligent speaker, and a recording medium for performing the same.

인공지능은 4차 산업혁명의 도래와 함께 주목받고 있는 분야 중 하나로 현재 구글의 AlphaGo같은 특정 분야에 한해 전문가에 가까운 지능을 가진 AI나 애플의 Siri처럼 사용자의 명령을 수행하는 AI가 주로 개발되어 있다. 다만 대부분 소프트웨어 자체로만 개발되어 있고 하드웨어와 접목한 AI도 개발은 되어있지만 아직은 실생활에 큰 부분을 차지하지는 못 하고 있다. 그중 가장 상용화가 되어 있는 것이 인공지능 스피커이다.Artificial intelligence is one of the areas of attention with the advent of the Fourth Industrial Revolution, and AI has been developed mainly for AI, which has close intelligence to experts and Siri of Apple, for specific fields such as Google's current AlphaGo, . However, most of them are developed only by the software itself, and the AI that has been combined with the hardware has also been developed, but it does not take a big part in real life yet. The most commercially available speaker is artificial intelligent speaker.

하지만, 종래의 상용화된 인공지능 스피커는 사용자의 인식은 특정 단어에 대한 음성 인식을 통해서만 가능하며 이런 방식을 통해 인식하지 않았을 경우 기기 쪽에서 어떠한 행동도 하지 않는 수동적 태도를 보인다는 한계가 있다. 즉, 사용자가 인공지능 스피커를 먼저 호출할 때까지 대기하고 있기 때문에 사용자가 인공지능 스피커를 활성화시키기 위한 호출 과정을 매번 수행해야 되는 불편함이 있다.However, the conventional commercial artificial intelligent speaker is limited in that the user's recognition is possible only through voice recognition of a specific word, and if the user does not recognize it through such a method, the device shows a passive attitude that does not perform any action. That is, since the user is waiting for the artificial intelligent speaker to be called first, there is an inconvenience that the user has to perform the calling process for activating the artificial intelligent speaker every time.

한국등록특허 제10-0656550호Korean Patent No. 10-0656550 한국공개특허 제10-2017-0027706호Korean Patent Publication No. 10-2017-0027706

본 발명의 일측면은 주변에 위치하는 사용자를 인식하여 사용자가 먼저 호출하지 않더라도 사용자별로 설정된 대화 주제에 대한 음성을 능동적으로 출력하는 얼굴인식 기반 지능형 스피커, 이를 이용한 능동적인 대화 제공 방법 및 이를 수행하기 위한 기록매체를 제공한다.According to an aspect of the present invention, there is provided a face recognition-based intelligent speaker for recognizing a user located in the vicinity and actively outputting a voice on a conversation topic set for each user even if the user does not call it first, And a recording medium.

본 발명의 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problem of the present invention is not limited to the technical problems mentioned above, and other technical problems which are not mentioned can be understood by those skilled in the art from the following description.

본 발명의 일 실시예에 따른 얼굴인식 기반 지능형 스피커는, 전방에 감지되는 사용자를 촬영하여 촬영영상을 생성하는 입력부, 상기 촬영영상을 분석하여 사용자를 구분하고, 사용자별 음성데이터를 분석하여 사용자의 관심 주제에 대한 대화 문장을 선택하며, 상기 대화 문장의 능동적인 출력 여부를 결정하는 제어부 및 상기 대화 문장을 음성으로 출력하는 출력부를 포함한다.According to an embodiment of the present invention, an intelligent speaker based on face recognition includes an input unit for capturing a user sensed in front and generating a sensed image, a user by analyzing the sensed image, analyzing voice data for each user, A control unit for selecting a conversation sentence with respect to a subject of interest, determining whether to output the conversation sentence actively, and an output unit for outputting the conversation sentence as a voice.

상기 제어부는, 이전 시점에 저장된 사용자의 음성 데이터를 문자화하고, 문자화된 음성 데이터로부터 적어도 하나의 키워드를 추출하고, 추출되는 키워드의 반복 출현 횟수에 따라 사용자의 관심 주제를 설정하며, 학습된 복수의 학습문장 중 설정된 관심 주제와 관련된 어느 하나의 대화 문장을 선택할 수 있다.The control unit characterizes the voice data of the user stored at the previous time, extracts at least one keyword from the voice data, sets a topic of interest of the user according to the number of times of repeated occurrence of the keyword, Any one sentence sentence related to the topic of interest set in the learning sentence can be selected.

상기 관심 주제는 시간별 또는 장소별로 복수 개로 설정되고, 상기 제어부는, 상기 사용자가 감지되는 시간 또는 장소에 따라 서로 다른 대화 문장을 선택할 수 있다.A plurality of the topics of interest may be set for each time or place, and the controller may select different conversation sentences according to the time or location at which the user is sensed.

상기 제어부는, 상기 촬영영상을 분석한 결과 상기 촬영영상에 제1 사용자 및 제2 사용자의 얼굴이 감지되는 것으로 확인되면, 데이터베이스부에 저장된 제1 사용자의 음성 데이터와 제2 사용자의 음성 데이터를 분석하여 상기 제1 사용자와 상기 제2 사용자의 공통 관심 주제에 대한 대화 문장을 선택할 수 있다.If it is determined that the faces of the first user and the second user are detected in the captured image as a result of analyzing the captured image, the controller analyzes the voice data of the first user and the voice data of the second user stored in the database unit A dialogue sentence for a common interest topic of the first user and the second user can be selected.

상기 제어부는, 상기 제2 사용자에 대한 이전 시점의 음성 데이터를 분석하여, 총 대화 시간이 미리 설정된 임계시간 이하인 것으로 확인되면, 상기 제1 사용자의 관심 주제에 대한 대화 문장을 선택할 수 있다.The controller analyzes voice data at a previous time point for the second user and can select a conversation sentence for a topic of interest of the first user when it is confirmed that the total conversation time is less than a preset threshold time.

상기 제어부는, 상기 데이터베이스부에 저장된 제1 사용자의 음성 데이터와 제2 사용자의 음성 데이터를 분석한 결과, 상기 제1 사용자 상기 제2 사용자간 공통되는 관심 주제가 존재하지 않는 것으로 확인되는 경우, 상기 제1 사용자에 대한 제1 대화 문장과 상기 제2 사용자에 대한 제2 대화 문장을 각각 선택할 수 있다.When it is determined that the first user and the second user do not have a common interest topic as a result of analyzing the voice data of the first user and the voice data of the second user stored in the database unit, A first dialogue sentence for one user and a second dialogue sentence for the second user, respectively.

또한, 본 발명의 일 실시예에 따른 얼굴인식 기반 지능형 스피커를 이용한 능동적인 대화 제공 방법은, 전방에 감지되는 사용자를 촬영하여 촬영영상을 생성하는 단계, 상기 촬영영상을 분석하여 사용자를 구분하고, 사용자별 음성데이터를 분석하여 사용자의 관심 주제에 대한 대화 문장을 선택하며, 상기 대화 문장의 능동적인 출력 여부를 결정하는 단계 및 상기 대화 문장을 음성으로 출력하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method of providing an active dialogue using an intelligent speaker based on face recognition, the method comprising: generating a captured image by capturing a user sensed in front of the user; Analyzing voice data per user to select a conversation sentence with respect to a subject of interest of the user, determining whether to output the conversation sentence actively, and outputting the conversation sentence by voice.

상기 대화 문장을 선택하는 것은, 상기 촬영영상을 분석한 결과 상기 촬영영상에 제1 사용자 및 제2 사용자의 얼굴이 감지되는 것으로 확인되면, 데이터베이스부에 저장된 제1 사용자의 음성 데이터와 제2 사용자의 음성 데이터를 분석하여 상기 제1 사용자와 상기 제2 사용자의 공통 관심 주제에 대한 대화 문장을 선택하는 것을 특징으로 할 수 있다.The selection of the conversation sentence is performed by analyzing the captured image and if it is confirmed that the faces of the first user and the second user are detected in the captured image, And analyzing the voice data to select a conversation sentence with respect to a subject of common interest of the first user and the second user.

상기 대화 문장의 능동적인 출력 여부를 결정하는 것은, 사용자의 얼굴이 미리 정해진 임계시간 이상 지속적으로 감지되는 것으로 확인되면, 상기 사용자가 상기 얼굴인식 기반 지능형 스피커를 호출하기 전에 상기 대화 문장이 출력되도록 제어하는 것을 특징으로 할 수 있다.Determining whether or not the conversation sentence is actively output is determined if the face of the user is continuously detected for a predetermined threshold time or longer so that the conversation sentence is output before the user calls the intelligent speaker based on the face recognition .

상술한 본 발명의 일측면에 따르면, 스피커 전방에 위치하는 사용자가 감지되면 사용자가 호출하지 않더라도 사용자의 관심 주제에 따른 대화 문장을 출력하여 사용자와 능동적인 대화를 수행할 수 있으며, 스피커를 호출하는 제품을 호출하는 불필요한 과정을 제거하여 명령을 하는데 필요한 시간을 줄임으로써 사용자 편의를 증대시킬 수 있다.According to an aspect of the present invention, when a user located in front of a speaker is detected, a conversation sentence according to a subject of interest of the user can be output to perform active conversation with a user even if the user is not called, By eliminating the unnecessary process of calling the product, it is possible to increase the convenience of the user by reducing the time required for the instruction.

도 1은 본 발명의 일 실시예에 따른 얼굴인식 기반 지능형 스피커의 개략적인 구성이 도시된 블록도이다.
도 2는 도 1의 제어부의 구체적인 구성이 도시된 도면이다.
도 3은 도 1의 얼굴인식 기반 지능형 스피커를 이용한 능동적인 대화 제공 방법의 개략적인 흐름이 도시된 순서도이다.1 is a block diagram showing a schematic configuration of an intelligent speaker based on face recognition according to an embodiment of the present invention.
2 is a diagram showing a specific configuration of the control unit of FIG.
FIG. 3 is a flowchart showing a schematic flow of a method for providing an active conversation using the intelligent speaker based on the face recognition of FIG.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 얼굴인식 기반 지능형 스피커 (100)의 개략적인 구성이 도시된 블록도이다.1 is a block diagram showing a schematic configuration of an intelligent speaker 100 based on face recognition according to an embodiment of the present invention.

본 발명에 따른 얼굴인식 기반 지능형 스피커 (100)는 사용자의 얼굴을 인식하여 사용자별 관심 주제에 적합한 대화 문장을 결정함으로써 의미없는 대화 시간을 단축시킬 수 있으며, 상황에 따라 사용자가 먼저 호출하기 전에 능동적으로 대화 문장을 출력하여 사용자의 관심과 흥미를 유발시킬 수 있다.The intelligent speaker 100 based on the face recognition according to the present invention can shorten the meaningless conversation time by recognizing the face of the user and determining a conversation sentence suitable for the subject of interest for each user, A dialogue sentence may be output to induce interest and interest of the user.

구체적으로, 본 발명에 따른 얼굴인식 기반 지능형 스피커(100)는 입력부(110), 출력부(120), 데이터베이스부(130) 및 제어부(140)를 포함한다.The intelligent speaker 100 according to the present invention includes an input unit 110, an output unit 120, a database unit 130, and a control unit 140.

입력부(110)는 얼굴인식 기반 지능형 스피커(100)로 입력되는 데이터를 감지하거나 입력받을 수 있다. 특히, 입력부(110)는 사용자로부터 발성되는 음성 데이터를 감지하거나, 얼굴인식 기반 지능형 스피커(100) 주변에 위치하는 사용자의 얼굴을 감지할 수 있다. 이를 위해, 입력부(110)는 마이크 모듈, 카메라 모듈 및 센서모듈을 포함할 수 있다. 이 외에도, 입력부(110)는 얼굴인식 기반 지능형 스피커(100)의 조작을 제어하기 위한 적어도 하나의 입력 버튼이 구비될 수 있다.The input unit 110 may sense or receive data input to the intelligent speaker 100 based on the face recognition. In particular, the input unit 110 may sense voice data generated by the user or may detect a user's face located around the intelligent speaker 100 based on the face recognition. To this end, the input unit 110 may include a microphone module, a camera module, and a sensor module. In addition, the input unit 110 may include at least one input button for controlling the operation of the intelligent speaker 100 based on the face recognition.

출력부(120)는 후술하는 제어부(130)에 의해 결정된 응답 데이터를 음성으로 출력할 수 있다. 이를 위해, 출력부(120)는 전기적인 신호를 음파로 변환시키는 스피커 모듈을 포함할 수 있다. 또한, 출력부(120)는 얼굴인식 기반 지능형 스피커(100)의 동작 상태를 나타내기 위한 LED, 디스플레이 패널 등을 더 포함할 수 있다.The output unit 120 can output the response data determined by the control unit 130, which will be described later, by voice. To this end, the output unit 120 may include a speaker module for converting electrical signals into sound waves. The output unit 120 may further include an LED, a display panel, and the like for indicating an operation state of the intelligent speaker 100 based on the face recognition.

데이터베이스부(130)는 얼굴인식 기반 지능형 스피커(100)로 입력되거나 얼굴인식 기반 지능형 스피커(100)에 의해 생성된 데이터를 저장할 수 있다. 구체적으로, 데이터베이스부(130)는 사용자별 얼굴 이미지를 저장하여 추후에 입력되는 촬영 이미지와 비교할 수 있도록 할 수 있다. 또한, 데이터베이스부(130)는 사용자별 대화 로그를 관리하여 제어부(140)가 대화 로그를 분석하여 사용자별 관심 주제를 판단할 수 있도록 할 수 있다.The database unit 130 may store data generated by the face recognition-based intelligent speaker 100 or the face recognition-based intelligent speaker 100. Specifically, the database unit 130 may store a user-specific face image so that the face image may be compared with a later-captured image. In addition, the database unit 130 may manage the per-user conversation log so that the control unit 140 may analyze the conversation log to determine a subject of interest for each user.

제어부(140)는 얼굴인식 기반 지능형 스피커(100)의 전반적인 동작을 제어할 수 있다. 구체적으로, 제어부(140)는 입력부(110)를 통해 입력되는 음성 데이터에 대한 응답 데이터를 결정하여 출력부(120)를 통해 출력되도록 제어할 수 있다. 특히, 제어부(140)는 스피커 전방에 위치하는 사용자를 구분하여 사용자의 구분에 따른 사용자별 대화 주제를 설정하고, 사용자가 얼굴인식 기반 지능형 스피커(100)와 대화를 수행하기를 원하는지를 판단하여 사용자가 호출하지 않더라도 능동적으로 대화 문장이 출력되도록 제어할 수 있다. 이와 관련된 구체적인 내용은 도 2를 참조하여 후술하기로 한다.The control unit 140 can control the overall operation of the intelligent speaker 100 based on the face recognition. Specifically, the control unit 140 may determine the response data for the voice data input through the input unit 110 and output the response data through the output unit 120. In particular, the control unit 140 classifies the users located in front of the speakers, sets a conversation topic for each user according to the user's classification, determines whether the user desires to perform conversation with the intelligent speaker 100 based on the face recognition, It is possible to actively control the output of the conversation sentence even if it is not called. Specific details related thereto will be described later with reference to Fig.

이와 같이, 본 발명에 따른 얼굴인식 기반 지능형 스피커(100)는 인공지능의 인식 방식을 음성만이 아닌 카메라를 통해 시야를 갖게 하여 얼굴 인식이 가능하도록 그 범위를 확장한다. 또한, 사용자의 음성 데이터를 수집하여 그를 바탕으로 사용자 맞춤형으로 발전시킴으로써 사용자가 먼저 호출하지 않아도 기기가 먼저 적절한 질문 등을 하여 능동적으로 대화를 시작하도록 한다. 그리고 이 모든 기능을 특수한 장비가 아닌 쉽게 접할 수 있는 디바이스만을 이용하여 수행할 수 있도록 구현한다. As described above, the intelligent speaker 100 based on the facial recognition according to the present invention extends the range of the recognition method of artificial intelligence so that face recognition can be performed by providing a visual field through a camera rather than voice only. Also, by collecting the user's voice data and developing it to a user-customized form based on the user's voice data, the user can first actively start a conversation by asking an appropriate question, etc., without first calling the user. And all of these functions are implemented so that they can be performed using only devices that are easily accessible, not special devices.

도 2는 도 1의 제어부(140)의 구체적인 구성이 도시된 블록도이다.2 is a block diagram showing a specific configuration of the control unit 140 of FIG.

구체적으로, 본 발명에 따른 제어부(140)는 얼굴 인식부(141), 대화 주제 설정부(142) 및 대화 수행부(143)를 포함한다.In detail, the control unit 140 according to the present invention includes a face recognizing unit 141, a conversation subject setting unit 142, and a conversation performing unit 143.

얼굴 인식부(141)는 입력부(110)의 카메라 모듈에 의해 생성되는 이미지를 이용하여 얼굴인식 기반 지능형 스피커(100)의 전방에 위치하는 사용자를 감지할 수 있다. 구체적으로, 얼굴 인식부(141)는 입력부(110)의 센서모듈에 의해 생성되는 센싱정보를 이용하여 얼굴인식 기반 지능형 스피커(100) 주변에 사람이 위치하는지를 판단할 수 있다. 얼굴 인식부(141)는 얼굴인식 기반 지능형 스피커(100)의 소정 반경 내에 접근된 물체가 존재하는 것으로 판단되는 경우, 입력부(110)의 카메라 모듈을 작동시켜 촬영영상이 생성되도록 제어할 수 있다. The face recognition unit 141 may detect a user located in front of the intelligent speaker 100 based on the face recognition using the image generated by the camera module of the input unit 110. [ Specifically, the face recognition unit 141 may determine whether a person is located around the intelligent speaker 100 based on the face recognition, using the sensing information generated by the sensor module of the input unit 110. [ The face recognizing unit 141 may control the camera module of the input unit 110 to operate to generate a captured image when it is determined that there is an object approaching within a predetermined radius of the intelligent speaker 100 based on the face recognition.

얼굴 인식부(141)는 생성된 촬영영상으로부터 적어도 하나의 얼굴 객체가 감지되는지 여부를 판단할 수 있다. 얼굴 인식부(141)는 기 공지된 다양한 영상 인식 기술을 이용하여 촬영된 이미지에 포함된 객체를 구분할 수 있다. 얼굴 인식부(141)는 구분된 어느 하나의 객체를 데이터베이스부(130)에 사전에 저장된 사용자별 얼굴 이미지와 비교하여 촬영영상 내에 특정 사용자의 얼굴이 존재하는지를 판단할 수 있다.The face recognition unit 141 may determine whether at least one face object is detected from the generated captured image. The face recognition unit 141 can identify objects included in the photographed image by using various known image recognition technologies. The face recognition unit 141 may compare any one of the separated objects with the user-specific face image stored in the database unit 130 to determine whether a specific user's face exists in the captured image.

얼굴 인식부(141)는 촬영영상 내의 특정 객체가 미리 저장된 얼굴 이미지와 일치하는 것으로 확인되면, 일치하는 얼굴 이미지에 매칭된 식별정보를 조회하여 어느 사용자가 감지되었는지를 확인할 수 있다. When it is confirmed that the specific object in the photographed image coincides with the previously stored face image, the face recognizing unit 141 can inquire the identification information matched with the matching face image to confirm which user is detected.

대화 주제 설정부(142)는 입력부(110)로부터 입력되는 사용자의 질문 또는 명령음성을 감지하여 이에 대한 응답 문장을 선택할 수 있다. 또한, 대화 주제 설정부(142)는 구분된 사용자에 대한 대화 주제를 설정할 수 있다. 대화 주제 설정부(142)는 데이터베이스부(130)를 조회하여 사용자별로 저장된 음성 데이터를 분석할 수 있다. 대화 주제 설정부(142)는 사용자별 음성 데이터를 분석하여 해당 사용자의 관심 주제를 설정할 수 있다. 예컨대, 대화 주제 설정부(142)는 음성 데이터를 문자화하고, 문자화된 음성 데이터로부터 적어도 하나의 키워드를 추출하고, 추출되는 키워드의 반복 출현 횟수에 따라 사용자의 관심 주제를 설정할 수 있다. 대화 주제 설정부(142)는 데이터베이스부(130)에 학습된 복수의 학습문장 중 설정된 관심 주제에 적합한 대화 문장을 선택할 수 있다. 이때, 대화 주제 설정부(142)는 다양한 기준에 의해 대화 문장을 선택할 수 있다.The conversation subject setting unit 142 may detect a user's question or command voice input from the input unit 110 and select a response sentence for the voice. In addition, the conversation subject setting unit 142 may set a conversation subject for the separated user. The conversation subject setting unit 142 may inquire the database unit 130 to analyze the voice data stored for each user. The conversation subject setting unit 142 may analyze the voice data for each user to set a subject of interest of the user. For example, the conversation subject setting unit 142 may characterize voice data, extract at least one keyword from the characterized voice data, and set the subject of interest of the user according to the number of times the extracted keyword is repeatedly displayed. The conversation subject setting unit 142 can select a conversation sentence suitable for the subject of interest that has been set in the plurality of learning sentences learned in the database unit 130. [ At this time, the conversation subject setting unit 142 can select the conversation sentence by various criteria.

일 예로, 대화 주제 설정부(142)는 사용자의 관심 분야를 시간별 또는 장소별로 구분하여 동일한 사용자의 얼굴이 인식되더라도 상황에 맞는 대화 문장이 출력되도록 제어할 수 있다.For example, the conversation subject setting unit 142 may control the output of the conversation sentence according to the situation even if the face of the same user is recognized by dividing the user's interest field by time or place.

구체적으로, 얼굴인식 기반 지능형 스피커(100)가 이동 가능한 형태인 경우, 대화 주제 설정부(142)는 얼굴인식 기반 지능형 스피커(100)에 더 구비된 통신부(미도시)를 이용하여 현재 위치를 측정하고, 측정된 위치에 적합한 대화 문장을 선택할 수 있다. 예컨대, 대화 주제 설정부(142)는 얼굴인식 기반 지능형 스피커(100)의 현재 위치가 서울인 것으로 판단되면, 서울 날씨에 대한 정보를 검색하여 이에 대한 대화 문장을 설정할 수 있다.Specifically, when the face recognition-based intelligent speaker 100 is movable, the conversation subject setting unit 142 measures the current position using a communication unit (not shown) And can select a conversation sentence suitable for the measured position. For example, if the current location of the intelligent speaker 100 based on the face recognition is determined to be Seoul, the conversation subject setting unit 142 may search for information about the Seoul weather and set a conversation sentence therefor.

또한, 대화 주제 설정부(142)는 사용자가 감지되는 시간을 확인하여 시간별로 서로 다른 대화 문장을 선택할 수 있다. 예를 들어, 대화 주제 설정부(142)는 출근 시간으로 설정된 시간에 사용자의 얼굴이 감지되는 경우 오늘의 날씨 또는 출근길 교통 상황에 대한 대화 문장을 선택할 수 있다. 반면, 대화 주제 설정부(142) 동일한 사용자의 얼굴이 오후에 감지되는 경우 오전 시간과는 다른 대화 문장을 선택할 수 있다. 또한, 대화 주제 설정부(142)는 사용자의 얼굴이 주말에 감지되는 경우 나들이 추천 장소에 대한 대화 문장을 선택할 수 있다.In addition, the conversation subject setting unit 142 can check the time when the user is sensed and select different conversation sentences by time. For example, the conversation subject setting unit 142 may select a conversation sentence for today's weather or the traffic on the commute route when the face of the user is detected at the time set as the commute time. On the other hand, if the face of the same user is detected in the afternoon, the conversation subject setting unit 142 can select a conversation sentence different from the morning time. In addition, the conversation subject setting unit 142 may select a conversation sentence for the recommendation place when the user's face is detected on weekends.

한편, 대화 주제 설정부(142)는 촬영영상에 복수의 얼굴이 감지되는 경우, 데이터베이스부(130)에 저장된 사용자별 음성 데이터를 분석하여 대화 문장을 결정할 수 있다.Meanwhile, when a plurality of faces are detected on the photographed image, the conversation subject setting unit 142 may analyze the voice data per user stored in the database unit 130 to determine a conversation sentence.

구체적으로, 대화 주제 설정부(142)는 촬영영상에 제1 사용자 및 제2 사용자의 얼굴이 감지되는 것으로 확인되면, 데이터베이스부(130)를 조회하여 제1 사용자에 대한 음성 데이터 및 제2 사용자에 대한 음성 데이터를 구분하여 분석할 수 있다. 이때, 대화 주제 설정부(142)는 제2 사용자의 음성 데이터를 분석한 결과 총 대화 시간이 미리 설정된 임계시간 이하인 것으로 확인되면, 제1 사용자에 대한 대화 문장만을 선택할 수 있다. 즉, 대화 주제 설정부(142)는 얼굴인식 기반 지능형 스피커(100)가 자주 사용하는 사용자와 대화를 수행할 수 있도록 할 수 있다.Specifically, when it is confirmed that the faces of the first user and the second user are detected on the photographed image, the conversation subject setting unit 142 inquires the database unit 130 to display voice data for the first user and voice data for the second user It is possible to distinguish and analyze voice data. At this time, the conversation subject setting unit 142 can select only the conversation sentence for the first user when it is confirmed that the total conversation time is less than the predetermined threshold time as a result of analyzing the voice data of the second user. That is, the conversation subject setting unit 142 may enable the intelligent speaker 100 based on the face recognition to perform conversation with a user who is frequently used.

다른 예로, 대화 주제 설정부(142)는 제1 사용자 및 제2 사용자의 공통 관심사에 대한 대화 문장을 선택할 수 있다. 즉, 대화 주제 설정부(142)는 제1 사용자와 제2 사용자의 음성 데이터를 분석한 결과 두 사용자 모두 임계시간 이상 얼굴인식 기반 지능형 스피커(100)와 대화를 수행한 것으로 확인되면, 두 사용자의 공통된 관심 주제에 대한 대화 문장을 선택할 수 있다.As another example, the conversation subject setting unit 142 may select a conversation sentence for the common interests of the first user and the second user. That is, if the conversation subject setting unit 142 determines that both the users have performed the conversation with the intelligent speaker 100 based on the face recognition for a predetermined period of time as a result of analyzing the voice data of the first user and the second user, You can select conversation sentences for common topics of interest.

이때, 대화 주제 설정부(143)는 두 사용자 간에 공통된 관심 주제가 존재하지 않는 것으로 확인되는 경우, 각각의 사용자에 대한 대화 문장을 별로도 선택함으로써, 두 개의 대화 문장이 순차적으로 출력되도록 제어할 수 있다. 이 과정에서, 대화 주제 설정부(143)는 각각의 대화 문장에 사용자를 식별할 수 있는 호칭 음성을 더 부가할 수 있다. At this time, if it is confirmed that there is no common subject of interest among the two users, the conversation subject setting unit 143 may control to sequentially output two conversation sentences by selecting a conversation sentence for each user . In this process, the conversation subject setting unit 143 may further append a naming voice capable of identifying the user to each conversation sentence.

이와 같이, 대화 주제 설정부(142)는 카메라 모듈을 통해 사용자가 인식되면 사용자의 과거 음성 데이터를 분석하여 능동적으로 출력한 문장을 선택하거나, 사용자로부터 질문 또는 명령 음성이 감지되는 경우 이에 대한 응답 문장을 선택할 수 있다.As described above, the conversation subject setting unit 142 analyzes the past voice data of the user when the user is recognized through the camera module and selects a sentence that is actively outputted, or when a question or command voice is detected from the user, Can be selected.

대화 수행부(143)는 선택된 대화 문장의 출력 여부를 동적으로 결정할 수 있다. 즉, 대화 수행부(143)는 사용자가 얼굴인식 기반 지능형 스피커(100)을 호출하지 않더라도 선택된 대화 문장을 능동적으로 출력할지 여부를 판단할 수 있다.The dialogue execution unit 143 can dynamically determine whether to output the selected dialogue sentence. That is, the conversation performing unit 143 can determine whether to output the selected conversation sentence actively, even if the user does not call the intelligent speaker 100 based on the face recognition.

일반적인 상황에서, 대화 수행부(143)는 카메라 모듈을 통해 사용자가 감지되면, 사용자가 얼굴인식 기반 지능형 스피커(100)을 호출하지 않더라도 대화 주제 설정부(142)에 의해 선택된 대화 문장이 출력되도록 제어할 수 있다. 예를 들어, 대화 수행부(143)는 비활성화 상태에서 특정 사용자 감지되는 경우, 사용자에 대한 음성 데이터를 분석하여 사용자가 주로 어떠한 주제에 관심을 갖는지에 대한 정보를 대화 주제 설정부(142)로부터 전달받을 수 있다. 그리고, 대화 수행부(143)는 사용자가 기기를 호출하지 않더라도 선택된 대화 문장, 예를 들어 오늘의 날씨 정보에 대한 음성을 먼저 출력할 수 있다. In a general situation, the conversation performing unit 143 controls the conversation subject setting unit 142 to output the conversation sentence selected by the conversation subject setting unit 142 even if the user does not call the intelligent speaker 100 based on the face recognition, can do. For example, when a specific user is detected in the inactive state, the conversation performing unit 143 analyzes the voice data for the user and transmits information about a user's main interest to the conversation subject setting unit 142 Can receive. Then, the conversation performing unit 143 may output a selected conversation sentence, for example, a voice for today's weather information, even if the user does not call the device.

한편, 대화 수행부(143)는 상황에 따라 선택된 대화 문장을 능동적으로 출력할지 여부를 판단할 수 있다. 예를 들어, 대화 수행부(143)는 감지되는 사용자의 얼굴이 미리 정해진 임계시간 이상 지속적으로 감지되는 경우 선택된 대화 문장이 출력되도록 제어할 수 있다. 이는, 사용자가 우연히 얼굴인식 기반 지능형 스피커(100) 근처를 지날 때 불필요한 대화 문장이 출력되는 것을 방지하기 위함이다.Meanwhile, the conversation performing unit 143 may determine whether to output the selected conversation sentence actively according to the situation. For example, the dialogue execution unit 143 may control the selected dialogue sentence to be output when the detected face of the user is continuously detected for a predetermined threshold time or longer. This is to prevent unnecessary conversation sentences from being output when the user passes by near the intelligent speaker 100 based on the face recognition.

도 3은 도 1의 얼굴인식 기반 지능형 스피커(100)를 이용한 능동적인 대화 제공 방법의 개략적인 흐름이 도시된 순서도이다.FIG. 3 is a flowchart showing a schematic flow of an active dialog providing method using the face recognition based intelligent speaker 100 of FIG.

얼굴인식 기반 지능형 스피커(100)는 전방을 촬영하여 사용자에 대한 촬영영상을 생성하고, 이를 분석하여 사용자를 구분할 수 있다(310). 얼굴인식 기반 지능형 스피커(100)는 센서 모듈을 이용하여 소정 거리 이내에 물체가 접근하면 카메라 모듈을 작동시켜 특정 영역에 대한 촬영영상을 생성할 수 있다. 얼굴인식 기반 지능형 스피커(100)는 촬영영상에 포함된 객체를 미리 저장된 사용자별 얼굴 이미지와 비교하여 주변에 미리 등록된 사용자가 존재하는지 여부를 판단할 수 있다.The intelligent speaker 100 based on face recognition recognizes a user by capturing an image of the user and generating an image of the user. The intelligent speaker 100 based on the face recognition can generate a photographed image for a specific region by operating the camera module when an object approaches within a predetermined distance using the sensor module. The intelligent speaker 100 based on the face recognition may compare the object included in the photographed image with a previously stored face image per user to determine whether a user registered in advance is present in the vicinity.

얼굴인식 기반 지능형 스피커(100)는 구분된 사용자에 대한 음성데이터를 분석하여 사용자의 관심 주제에 대한 대화 문장을 선택하고, 대화 문장의 능동적인 출력 여부를 결정할 수 있다(320).The intelligent speaker 100 based on the face recognition may analyze the voice data for the separated user to select a conversation sentence with respect to a subject of interest of the user and determine whether the conversation sentence is actively output or not (320).

일 예로, 대화 문장은 사용자가 감지되는 시간 또는 장소에 따라 상이하게 설정된 관심 주제에 따라 결정될 수 있다. 또한, 촬영영상에 미리 등록된 복수의 사용자가 감지되는 경우, 복수의 사용자간 공통된 관심 주제에 대한 대화 문장이 선택되거나 주로 사용하는 사용자에 대한 대화 문장이 선택될 수 있다.As an example, the conversation sentence may be determined according to a topic of interest set differently depending on the time or place at which the user is perceived. Also, when a plurality of users registered in advance in the photographed image are detected, a dialogue sentence for a common interest topic among a plurality of users can be selected or a dialogue sentence for a user who is mainly used can be selected.

이후, 얼굴인식 기반 지능형 스피커(100)는 선택된 대화 문장을 음성으로 출력할 수 있다(330). Thereafter, the face recognition-based intelligent speaker 100 can output the selected conversation sentence by voice (330).

이때, 얼굴인식 기반 지능형 스피커(100)는 선택된 문장의 출력 여부를 동적으로 결정할 수 있다. 예를 들어, 얼굴인식 기반 지능형 스피커(100)는 감지되는 사용자의 얼굴이 미리 정해진 임계시간 이상 지속적으로 감지되는 경우 선택된 대화 문장이 출력되도록 제어할 수 있다. 즉, 사용자가 얼굴인식 기반 지능형 스피커(100)를 호출하지 않더라도 본 발명에 따른 얼굴인식 기반 지능형 스피커(100)는 상황에 따라 주변에 위치한 사용자에게 능동적으로 대화 문장을 출력함으로써 사용의 편의성의 향상될 뿐 아니라 얼굴인식 기반 지능형 스피커(100)에 대한 흥미를 지속적으로 유발시킬 수 있다.At this time, the face recognition-based intelligent speaker 100 can dynamically determine whether or not the selected sentence is output. For example, the face recognition-based intelligent speaker 100 can control the selected conversation sentence to be output when the detected face of the user is continuously detected for a predetermined threshold time or more. That is, even if the user does not call the intelligent speaker 100 based on the face recognition, the intelligent speaker 100 based on the face recognition according to the present invention outputs the conversational sentence actively to the user located in the vicinity according to the situation, As well as the interest in the intelligent speaker 100 based on face recognition.

이와 같은, 얼굴인식 기반 지능형 스피커를 이용한 능동적인 대화 제공 방법을 제공하는 기술은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.Such a technique for providing an active dialogue providing method using an intelligent speaker based on face recognition may be implemented in an application or may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer readable recording medium have. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be ones that are specially designed and configured for the present invention and are known and available to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

100: 얼굴인식 기반 지능형 스피커
110: 입력부
120: 출력부
130: 데이터베이스부
140: 제어부100: Intelligent speaker based on face recognition
110: input unit
120: Output section
130:
140:

Claims

An input unit for capturing a user sensed in front and generating a photographed image;
A control unit for analyzing the photographed image to identify users, analyzing voice data for each user to select a conversation sentence for a topic of interest of the user, and determining whether to output the conversation sentence actively; And
And an output unit for outputting the conversation sentence by voice,
Wherein,
If it is determined that the face of the first user and the face of the second user are detected in the captured image as a result of analyzing the captured image, analyzing the voice data of the first user and the voice data of the second user stored in the database unit, Selecting a conversation sentence with respect to a user and a common interest topic of the second user,
When it is determined that there is no common interest topic between the first user and the second user as a result of analyzing the voice data of the first user and the voice data of the second user stored in the database unit, Wherein the first dialogue sentence and the second dialogue sentence for the second user are selected so as to sequentially output the first dialogue sentence and the second dialogue sentence, Further adding a designation for distinguishing the second user from the first user, and further adding a designation for distinguishing the second user from the first user in the second dialogue.

The method according to claim 1,
Wherein,
The method comprising the steps of: characterizing voice data of a user stored at a previous point of time; extracting at least one keyword from the character data; setting a topic of interest of the user according to the number of repeated occurrences of the extracted keyword; An intelligent speaker based on face recognition that selects any one conversation sentence related to a topic of interest.

3. The method of claim 2,
A plurality of the topics of interest are set for each hour or place,
Wherein,
And selecting different conversation sentences according to a time or place where the user is sensed.

delete

The method according to claim 1,
Wherein,
Analyzing speech data at a previous time point for the second user and selecting a conversation sentence for a topic of interest of the first user if it is determined that the total talk time is below a preset threshold time.

delete

A method for providing active dialogue using an intelligent speaker based on face recognition,
Generating a photographed image by photographing a user sensed in front;
Analyzing the photographed image to identify users, analyzing voice data for each user to select a conversation sentence for a topic of interest of the user, and determining whether the conversation sentence is actively output or not; And
And outputting the conversation sentence by voice,
Selecting the conversation sentence may include:
If it is determined that the face of the first user and the face of the second user are detected in the captured image as a result of analyzing the captured image, analyzing the voice data of the first user and the voice data of the second user stored in the database unit, Selecting a conversation sentence with respect to a user and a common interest topic of the second user,
When it is determined that there is no common interest topic between the first user and the second user as a result of analyzing the voice data of the first user and the voice data of the second user stored in the database unit, A first dialogue sentence for the first user and a second dialogue sentence for the second user are respectively selected and the first dialogue sentence further includes a name for distinguishing the first user from the second user, Wherein the sentence further includes a name for distinguishing the second user from the first user. &Lt; RTI ID = 0.0 > 11. < / RTI >

delete

8. The method of claim 7,
Determining whether to actively output the conversation sentence includes:
Characterized in that when the face of the user is continuously detected for a predetermined threshold time or longer, the controller controls the user to output the conversation sentence before the user calls the intelligent speaker based on the face recognition. How to provide active dialogue.

A computer-readable recording medium on which is recorded a computer program for performing a method of providing an active dialogue using an intelligent speaker based on face recognition according to claim 7.