KR20190100703A

KR20190100703A - Speaker with motion recognition technology using artificial intelligence and control method therefore

Info

Publication number: KR20190100703A
Application number: KR1020180020591A
Authority: KR
Inventors: 배석형; 박성현; 박석준; 송다영; 한경진; 하회리; 임재환
Original assignee: 한국과학기술원
Priority date: 2018-02-21
Filing date: 2018-02-21
Publication date: 2019-08-29

Abstract

The present invention relates to an artificial intelligent speaker capable of a movement using sound source position recognition technology, which comprises: a voice input unit having at least one microphone for receiving an externally generated voice signal; a sound source tracking unit for tracking a sound source of a received voice signal through sound source recognition technology using an arduino; a motor driving unit for rotating an artificial intelligent speaker toward a position of the tracked sound source; and a control unit for receiving direction information from the sound source tracking unit to drive the motor so as to allow the artificial intelligent speaker to be oriented in a sound source direction. The sound source tracking unit can determine a threshold of the magnitude of a voice to distinguish a voice of a speaker from other voices, and can extract a coordinate of a sound source of the voice signal exceeding the threshold to drive the motor through the control unit.

Description

Artificial speaker capable of movement using sound source position recognition technology and its control method {SPEAKER WITH MOTION RECOGNITION TECHNOLOGY USING ARTIFICIAL INTELLIGENCE AND CONTROL METHOD THEREFORE}

본 발명은 인공지능 스피커에 관한 것으로서, 보다 상세하게는 사용자가 말하는 방향으로 움직이기 위하여 음원의 위치를 추적하는 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커에 관한 것이다.The present invention relates to an artificial intelligence speaker, and more particularly, to an artificial intelligence speaker capable of movement using a sound source position recognition technology for tracking the position of a sound source to move in a direction in which a user speaks.

사용자의 음성을 인식하여 간편하게 노래를 재생하거나, 통신망에 연결된 가전기기를 제어할 수 있는 등의 다양한 기능을 가진 인공지능(artificial intelligence: AI) 스피커가 개발되었다. 도 1에 도시된 바와 같이, 인공지능 스피커는 지능형 개인 비서(intelligent personal assistant: IPA) 분야의 한 상품으로서 판매되고 있으며, 다수의 글로벌 기업들은 이러한 분야에서 우위를 점하기 위하여 노력하고 있다. 나아가, 도 2에 도시된 바와 같이, 시장조사기관 트랜드포스(TrendForce)는 글로벌 음성인식 시장이 2016년 약 26억 달러에서 2021년 약 160억 규모에 이를 것으로 전망하였고, 연구자문회사 가트너(Gartner)는 인공지능 스피커 시장이 2015년 약 3.6억 달러에서 2020년 약 21억 달러로 급성장할 것으로 전망하였다. Artificial intelligence (AI) speakers have been developed that can recognize a user's voice, play songs easily, or control household appliances connected to a communication network. As shown in FIG. 1, AI speakers are sold as a product in the field of intelligent personal assistants (IPAs), and many global companies are striving to gain an edge in this field. Further, as shown in FIG. 2, market research firm TrendForce predicts that the global voice recognition market will reach about 16 billion in 2021 from about $ 2.6 billion in 2016, and research consultant Gartner Forecasts that the AI speaker market will grow from about $ 3.6 billion in 2015 to about $ 2.1 billion in 2020.

이렇게, 인공지능 스피커 시장은 제4차 산업혁명 시대를 맞아 성장을 거듭할 것으로 예상되지만, 종래의 인공지능 스피커는 단순히 음성만을 출력하는 것이 대부분이다. 실제로 대화를 할 때에는 말에 포함된 언어적 요소보다는 행동이나 표정 등의 비언어적 요소가 훨씬 더 많은 정보를 포함할 수 있으므로, 대부분 음성만이 출력되는 종래의 인공지능 스피커의 사용자들은 인공지능 스피커와 친근하게 소통하고 있다는 느낌을 받기 어렵다는 문제가 있었다. 도 3에 도시된 바와 같이, 캘리포니아대학교 로스엔젤레스(UCLA)의 심리학과 교수인 앨버트 메라비언이 발표한 7-38-55 법칙은 대화를 할 때에 말에 포함된 언어적 내용(7%)보다 표정이나 몸짓 등 시각적인 이미지(55%)가 의사소통에 훨씬 더 많은 영향을 미친다는 결과를 나타낸다. 따라서, 종래의 단순히 음성로만 소통하는 인공지능 스피커는 표정이나 몸짓 등의 움직임이 배제되어 있으므로, 인간과 상호작용할 수 있는 폭이 대폭 줄어들게 된다.As described above, the AI speaker market is expected to continue to grow in the era of the fourth industrial revolution, but most of the conventional AI speakers merely output voice. In fact, when talking, non-verbal elements, such as actions or facial expressions, may contain much more information than verbal elements in speech, so users of conventional AI speakers, which mostly output only voice, are familiar with AI speakers. There was a problem that it is difficult to feel that they are communicating. As shown in Fig. 3, the 7-38-55 law published by Albert Meravion, professor of psychology at the University of California, Los Angeles (UCLA), is more expressive than the verbal content (7%) contained in speech. Visual images such as gestures (55%) have a much greater impact on communication. Therefore, the conventional artificial intelligence speaker that communicates only by voice is excluded from movements such as facial expressions and gestures, so that the width of the human speaker can be significantly reduced.

또한, 종래의 인공지능 스피커는 도 4에 도시된 바와 같은 지능형 관제 서비스, 잡음 이음 측정기 또는 음성 추적 카메라 등에 사용되는 음원 위치 인식 기능을 사용하여, 사용자가 인공지능 스피커의 이름을 부르고, 음악 재생이나 날씨 체크 등의 정해진 기능을 말하면 그 기능을 인공지능 스피커가 실행하는 방식에 한정되어 있었다. 따라서, 다수의 인공지능 스피커의 사용자들은 종래의 인공지능 스피커를 사용한 결과 '대답은 잘하는데 대화는 잘 안 통한다'라고 평가하는 경우가 많았다. 여기서의 '대화'는 두 화자의 소통이 원활하게 이루어질 때 '잘 통한다'라고 할 수 있는 것이다. 사용자와 인공지능 스피커가 서로 대화를 하는 화자라고 할 때, 사용자는 대화를 하려고 노력하지만 종래의 인공지능 스피커는 소통을 하기보다는 간단한 대답 임무만을 수행하는 경우가 대부분이었다. 즉, 종래의 인공지능 스피커는 인간의 오감 중 하나인 청각만을 사용하고, 다른 하나인 움직임을 사용하지 않는 경우가 다수이므로, 사용자와 인공지능 스피커의 상호작용이 잘 이루어지지 않는다는 문제점이 있었다.In addition, the conventional artificial intelligence speaker uses a sound source location recognition function used in an intelligent control service, a noise measuring device, or a voice tracking camera as shown in FIG. 4, so that a user calls the name of the artificial intelligence speaker, Speaking of a specific function such as a weather check, the function was limited to the way that the AI speaker executes the function. Therefore, many users of the artificial intelligence speaker often evaluate that the result of using the conventional artificial intelligence speaker is 'the answer is good but the conversation is not good'. 'Conversation' here can be said to 'go through' when communication between the two speakers is smooth. When the user and the AI speaker are talking to each other, the user tries to talk, but most of the conventional AI speaker performs a simple answer task rather than communicating. That is, the conventional artificial intelligence speaker uses only hearing, which is one of the five senses of humans, and in many cases, does not use the other movement, there is a problem in that the interaction between the user and the artificial intelligence speaker is difficult.

종래의 특허문헌인 국제특허공개 WO 2014/109422 A1호는 도 4의 우측에 도시된 바와 같은 음성 추적 장치 및 그 제어 방법을 개시한다. 그러나 이 종래기술은 마이크를 이용하여 음원을 추적하고, 추적된 음원을 향해 회전하는 카메라만 개시되어 있을 뿐, 사용자와 더욱 잘 소통하기 위하여 인공지능 스피커가 상황에 맞는 행동과 소리를 내는 방식은 개시되어 있지 않다.Conventional patent document WO 2014/109422 A1 discloses a voice tracking device and a control method thereof as shown on the right side of FIG. 4. However, this prior art discloses only a camera that tracks a sound source using a microphone and rotates toward the tracked sound source, and a method in which an artificial intelligence speaker makes contextual actions and sounds to communicate better with a user is disclosed. It is not.

특히, 인공지능 스피커는 가정 등에서 사용자와 대부분의 시간을 함께하는 경우가 많으므로, 종래의 인공지능 스피커와 같이 CD플레이어, 컴퓨터, 핸드폰 등의 전자 장치에서 음원이 출력되어 사용자가 일방적(one-way)으로 음원을 감상하는 방식만으로는, 애완동물과 같은 친근함까지 요하는 사용자의 감성을 충족시킬 수 없게 된다는 문제점이 있었다.In particular, since the AI speaker often spends most of its time with the user at home, the sound source is output from an electronic device such as a CD player, a computer, a mobile phone, and the like. There is a problem in that only by listening to the sound source, the user's sensitivity that requires familiarity such as pets cannot be satisfied.

한편, 종래의 인공지능 스피커 중에는 음악 재생시 다수의 LED 조명을 조절하여 음악의 생기를 사용자에게 전달하고자 하는 장치가 있다. 그러나, 이러한 구성도 인공지능 스피커의 다양한 움직임이 배제되어 있어, 비트 및 무드에 맞는 회전과 회전 속도 등으로 음악이 전달하고자 하는 분위기를 표현할 수 없기 때문에, 정적인 의사소통 방식으로서의 한계가 있다. 또한, 종래의 인공지능 스피커는 단지 메신저의 텍스트로서 소리의 의미를 전달해주는 경우가 대부분인 반면, 사용자와의 의사소통에 도움을 주는 다양한 움직임을 표현하지 못한다는 문제점이 있었다.On the other hand, among the artificial intelligence speakers of the prior art there is a device to deliver the life of music to the user by adjusting a plurality of LED lights when playing music. However, such a configuration also excludes various movements of the artificial intelligence speaker, and thus cannot express the atmosphere to be transmitted by the rotation and the rotation speed suitable for the beat and mood, and thus there is a limitation as a static communication method. In addition, the conventional artificial intelligence speaker has a problem in that it does not express various movements to help communication with the user while most of the cases merely convey the meaning of sound as text of a messenger.

본 발명의 목적은 효과적인 의사소통을 위해 음원 위치 인식 기술로 사용자를 감지한 후 움직임과 소리로 사용자와 소통하는 인공지능 스피커를 제공함에 있다. 특히, 본 발명은 음원의 음성 신호를 통해 음원의 위치를 효과적으로 추적할 수 있는 음원 위치 인식 기술을 사용하여, 소리나는 방향으로 인공지능 스피커의 방향을 전환할 수 있게 하였다.An object of the present invention is to provide an artificial intelligence speaker that communicates with the user by movement and sound after detecting the user by the sound source location recognition technology for effective communication. In particular, the present invention uses a sound source position recognition technology that can effectively track the position of the sound source through the sound signal of the sound source, it was possible to switch the direction of the artificial intelligence speaker in the sound direction.

또한, 사용자를 바라보면서 인사할 때 고개를 숙이거나, 예 또는 아니오로 답하면서 고개를 끄덕이고 가로젓거나, 음악을 재생하고 춤을 추는 등 여러 상황에 맞는 몸짓을 구현하여, 사용자와의 상호작용이 훨씬 잘 이루어지는 인공지능 스피커를 제공함에 있다.You can also interact with the user by gesturing for different situations, such as bowing your head when you greet the user, answering yes or no, nodding your head, playing music, dancing, etc. This is to provide a much better AI speaker.

위와 같은 과제를 해결하기 위한 본 발명의 제1 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커는 외부에서 발생한 음성 신호를 수신하는 적어도 하나 이상의 마이크가 구비된 음성 입력부, 아두이노(arduino)를 이용한 음원 위치 인식 기술을 통하여 수신한 음성 신호의 음원을 추적하는 음원 추적부, 추적된 음원의 위치를 향하여 인공지능 스피커를 회전시키는 모터 구동부 및 음원 추적부에서 방향 정보를 입력받아 모터를 구동하여 인공지능 스피커가 음원 방향으로 향할 수 있도록 하는 제어부를 포함할 수 있으며, 음원 추적부는 발화자의 음성과 그 외의 음성을 구분하기 위하여 음성의 크기의 임계값(threshold)을 정하고, 임계값을 넘는 음성 신호의 음원의 좌표를 추출하여 제어부를 통해 모터를 구동시킬 수 있다.An artificial intelligence speaker capable of moving using a sound source position recognition technology according to a first embodiment of the present invention for solving the above problems is a voice input unit having at least one microphone for receiving an externally generated voice signal, Arduino ( arduino), the sound source tracking unit that tracks the sound source of the received voice signal through the sound source position recognition technology, the motor driver that rotates the AI speaker toward the tracked sound source, and the sound source tracking unit receives the direction information from the motor. It may include a control unit for driving the AI speaker in the direction of the sound source, the sound source tracking unit determines a threshold of the loudness of the voice to distinguish the voice of the talker and other voices, and exceeds the threshold The motor may be driven through the control unit by extracting the coordinates of the sound source of the voice signal.

본 발명의 제2 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커는 음원 추적부는 임계값(threshold) 이상의 음성 신호 좌표를 저장하여 가장 빈도가 높은 방위값(azimuth)과 고도(elevation)의 값을 인공지능 스피커가 움직임을 구현할 때 반영하도록 설정될 수 있다.In the artificial intelligence speaker capable of moving using the sound source position recognition technology according to the second embodiment of the present invention, the sound source tracker stores voice signal coordinates equal to or greater than a threshold, so that the most frequent azimuth and elevation ) May be set to reflect the value of the AI speaker when implementing the movement.

본 발명의 다른 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커는 음원 추적부는 수신한 음성 신호를 구분하여, 구분된 음성 신호의 종류에 따라 각각 다른 행동과 소리를 선택적으로 구현할 수 있다.According to another embodiment of the present invention, an artificial intelligence speaker capable of moving using a sound source position recognition technology may distinguish a received voice signal from a sound source tracker and selectively implement different actions and sounds according to the type of the divided voice signals. have.

본 발명의 또 다른 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커는 인공지능 스피커는 회전 가능한 암(arm)을 더 구비하고 모터는 제1 모터 및 제2 모터로 이루어져 있고, 제1 모터는 인공지능 스피커를 수직축을 중심으로 회전시키며, 제2 모터는 암의 움직임을 구현할 수 있도록 할 수 있다.According to another embodiment of the present invention, an artificial intelligence speaker capable of moving using a sound source position recognition technology may further include a rotatable arm, and the motor may include a first motor and a second motor. The first motor rotates the AI speaker about the vertical axis, and the second motor can implement the movement of the arm.

본 발명의 일 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커의 제어 방법은 음성 입력부, 음원 추적부, 모터 구동부 및 제어부를 포함하는 인공지능 스피커의 제어 방법에 있어서, 외부에서 발생한 음성 신호를 수신하는 단계, 수신한 음성 신호의 음원을 추적하는 단계, 추적된 음원의 위치로 인공지능 스피커를 회전시키는 단계 및 사용자의 음성과 그 외의 음성을 구분하기 위하여 음성의 크기의 임계값을 정하고, 임계값을 넘는 음성 신호의 음원의 좌표를 추출하여 음성이 발화자의 음성인지 확인하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method of controlling an artificial speaker capable of moving using a sound source position recognition technology includes a voice input unit, a sound source tracking unit, a motor driving unit, and a controller. Receiving a voice signal, tracking a sound source of the received voice signal, rotating the artificial speaker to the position of the tracked sound source, and setting a threshold value of the voice size to distinguish the user voice from other voices. And extracting the coordinates of the sound source of the voice signal exceeding the threshold value to determine whether the voice is the voice of the talker.

본 발명의 다른 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커의 제어 방법은 발화자의 음성인지 확인하는 단계는 임계값 이상의 소리 좌표를 저장하여 가장 빈도가 높은 방위값과 고도의 값을 인공지능 스피커가 움직임을 구현할 때 반영하도록 설정하는 단계를 더 포함할 수 있다.According to another embodiment of the present invention, a method of controlling an artificial speaker capable of moving using sound source position recognition technology may include determining whether a voice of a talker is a voice by storing sound coordinates of a threshold value or more, so that azimuth and altitude values are most frequently used. It may further comprise setting to reflect when the artificial intelligence speaker implements the movement.

본 발명의 또 다른 실시예에 따른 음원 위치 인식 기술을 이용한 움직임이 가능한 인공지능 스피커의 제어 방법은 수신한 음성 신호를 구분하여 인공지능 스피커가 구분된 음성 신호의 종류에 따라 각각 다른 행동과 소리를 선택적으로 구현할 수 있도록 하는 움직임 구현 단계를 더 포함할 수 있다.According to another exemplary embodiment of the present invention, a method of controlling an artificial speaker capable of moving using sound source position recognition technology may include different actions and sounds according to types of voice signals in which the artificial intelligence speaker is divided by classifying received voice signals. The method may further include a motion implementing step for selectively implementing.

본 발명에 따른 인공지능 스피커는 행동이나 표정 등의 비언어적 요소를 상황에 맞게 구현할 수 있으므로, 사람과 효율적인 의사소통를 하며 상호작용할 수 있는 효과가 있다. 본 발명은 음원의 음성 신호를 통해 음원의 위치를 효과적으로 추적할 수 있는 음원 위치 인식 기술을 사용하여, 소리나는 방향으로 인공지능 스피커의 방향을 전환할 수 있다.The artificial intelligence speaker according to the present invention can implement non-verbal elements such as actions or facial expressions according to the situation, and thus, there is an effect that can effectively communicate with and interact with a person. The present invention can change the direction of the artificial intelligence speaker in a sound direction by using a sound source position recognition technology that can effectively track the position of the sound source through the sound signal of the sound source.

또한, 본 발명은 인공지능 스피커의 다양한 움직임을 구현할 수 있다. 즉, 사용자를 바라보면서 인사할 때 고개를 숙이거나, 예 또는 아니오로 답하면서 고개를 끄덕이고 가로젓거나, 음악을 재생하고 춤을 추는 등 여러 상황에 맞는 몸짓을 구현하여, 사용자와 인공지능 스피커의 상호작용이 훨씬 잘 일어나는 효과가 있다.In addition, the present invention can implement various movements of the artificial intelligence speaker. In other words, the user and the AI speaker can be used to implement gestures suitable for various situations, such as bowing the head while greeting the user, answering yes or no, nodding the head, playing music, and dancing. Has a much better effect of interaction.

도 1은 종래의 인공지능 스피커들을 도시한 도면이다.
도 2는 글로벌 음성인식 시장 및 전 세계 스마트 스피커 시장 전망을 도시한 그래프이다.
도 3은 메라비언의 법칙을 도시한 그래프이다.
도 4는 종래의 지능형 관제 서비스의 과정 및 잡음 이음 측정기와 음성 추적 카메라를 개략적으로 도시한 순서도이다.
도 5는 본 발명의 인공지능 스피커의 블록 구성도(block diagram)을 개략적으로 도시한 도면이다.
도 6은 본 발명의 인공지능 스피커가 움직임이 없는 경우와 움직이는 경우 사용자와 소통하는 방식을 개략적으로 도시한 도면이다.
도 7은 본 발명의 인공지능 스피커가 작동할 수 있는 환경을 만드는 셋업 코드(set up code)를 도시한 도면이다.
도 8은 본 발명의 제1 실시예 및 제2 실시예에 따른 인공지능 스피커에서 불필요한 잡음을 최대한 필터링하기 위한 임계값 코드(threshold code)를 도시한 도면이다.
도 9는 본 발명의 인공지능 스피커에 음성 신호가 들어왔는지 여부를 체크하여 상황에 따라 대처하기 위한 신호 확인 코드(signal check code)를 도시한 도면이다.
도 10은 본 발명의 인공지능 스피커의 움직임을 개략적으로 도시한 도면이다.
도 11은 본 발명의 인공지능 스피커가 반려동물과 놀아주는 기대효과를 개략적으로 도시한 도면이다.
도 12는 본 발명의 실시 예에 따른 인공지능 스피커의 제어 방법을 도시한 순서도이다.1 is a diagram illustrating a conventional artificial intelligence speaker.
2 is a graph showing the global voice recognition market and the global smart speaker market forecast.
3 is a graph illustrating the law of Meravion.
4 is a flowchart schematically illustrating a process of a conventional intelligent control service and a noise measuring instrument and a voice tracking camera.
5 is a schematic block diagram of an artificial intelligence speaker of the present invention.
FIG. 6 is a diagram schematically illustrating a method in which an artificial intelligence speaker of the present invention communicates with a user when there is no movement and when there is no movement.
7 is a diagram illustrating a set up code for creating an environment in which the artificial intelligence speaker of the present invention can operate.
8 is a diagram illustrating a threshold code for maximally filtering out unnecessary noise in an artificial intelligence speaker according to the first and second embodiments of the present invention.
FIG. 9 is a diagram illustrating a signal check code for checking whether a voice signal is input to the artificial intelligence speaker of the present invention and coping with a situation.
10 is a view schematically showing the movement of the artificial intelligence speaker of the present invention.
11 is a view schematically showing the expected effect of the artificial intelligence speaker of the present invention play with the companion animal.
12 is a flowchart illustrating a method of controlling an artificial intelligence speaker according to an embodiment of the present invention.

전술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 실시예를 통하여 보다 분명해질 것이다.The foregoing objects, features, and advantages will become more apparent from the following examples taken in conjunction with the accompanying drawings.

특정한 구조 내지 기능적 설명들은 단지 본 발명의 개념에 따른 실시예를 설명하기 위한 목적으로 예시된 것으로, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 출원의 명세서에서 설명된 실시예들에 한정되는 것으로 해석되어서는 아니된다.Specific structural to functional descriptions are merely illustrated for the purpose of describing embodiments in accordance with the inventive concept, and embodiments in accordance with the inventive concept may be embodied in various forms and are described in the specification of the present application. It should not be construed as limited to these.

본 발명의 개념에 따른 실시예는 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있으므로 특정 실시예들은 도면에 예시하고 본 출원의 명세서에 상세하게 설명하고자 한다. 그러나 이는 본 발명의 개념에 따른 실시예들을 특정한 개시 형태에 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Embodiments according to the concept of the present invention can be variously modified and have a variety of forms specific embodiments will be illustrated in the drawings and described in detail in the specification of the present application. However, this is not intended to limit the embodiments in accordance with the concept of the present invention to a particular disclosed form, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention.

본 출원의 명세서에서 사용하는 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로서, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서 "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. As used herein, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof that is described, and that one or more other features or numbers, It is to be understood that it does not exclude in advance the possibility of the presence or addition of steps, actions, components, parts or combinations thereof.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써 본 발명을 상세히 설명하도록 한다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Like reference numerals in the drawings denote like elements.

도 5에 도시된 바와 같이, 인공지능 스피커(1)의 음성 추적 장치(100)는 음성 입력부(110), 음원 추적부(120), 모터 구동부(130), 제어부(140), 영상 촬영부(150)를 포함할 수 있다. As shown in FIG. 5, the voice tracking device 100 of the artificial intelligence speaker 1 includes a voice input unit 110, a sound source tracking unit 120, a motor driver 130, a controller 140, and an image photographing unit ( 150).

인공지능 스피커(1)는 사람의 형상과 유사하게 하여 친근감을 불러일으키기 위해 사람의 머리, 몸체, 팔 등에 대응하는 부분이 구비되어 각각 움직일 수 있는 형태로 이루어질 수 있으므로, 인공지능 스피커(1)가 전체적으로 움직이게 될 수 있다. 예를 들면, 인공지능 스피커(1)는 회전 가능한 암(arm)이 구비될 수 있으며, 회전 가능한 암 위에는 하드보드지를 올리고 그 위에 약 12cm 직경의 플라스틱 반구를 붙여 얼굴 형태를 만들 수 있다. 또한, 내부의 뼈대 구조가 바깥으로 노출되지 않도록 하기 위해 하드보드지를 둥글게 둘러 내부를 일부 가리고, 전체를 천으로 덮도록 구성할 수도 있다. 이러한 인공지능 스피커(1)의 형태는 사람뿐만 아니라, 강아지, 유령 등 여러 형태로 이루어질 수 있으며, 실제 움직임에는 몸짓이 포함되기 때문에 고정된 암을 종이로 만들어 양옆에 부착할 수도 있다. 얼굴을 표현하는 부분에는 LED나 화면 전환 디스플레이 등을 사용하여 표정을 구현할 수 있으며, 대화할 때 사용자가 응시할 수 있도록 눈의 형상을 부착할 수도 있다. 이러한 LED는 360도 회전하도록 이루어져 음성 위치가 인식된 방향으로 움직이도록 구성될 수도 있다. 이를 통해, 다양한 표현이 가능해지면서 심미성도 충족시키는 것이 가능하다.The artificial intelligence speaker (1) has a portion corresponding to the human head, body, arms, etc. in order to induce a sense of familiarity similar to the shape of the human being, so that the artificial speaker (1) Can be moved as a whole. For example, the artificial intelligence speaker 1 may be provided with a rotatable arm. The rotatable arm may be placed on a rotatable arm, and a plastic hemisphere having a diameter of about 12 cm may be attached thereto to form a face. In addition, in order to prevent the internal skeleton structure from being exposed to the outside, it may be configured to cover the inside of the hardboard paper by covering it with a round and cover the whole with a cloth. The AI speaker 1 may be formed in various forms, such as a dog, a ghost, as well as a person, and since a gesture is included in actual movement, a fixed arm may be made of paper and attached to both sides. The expression of the face can be implemented using an LED or a screen switching display, and the shape of the eye can be attached so that the user can stare at the conversation. The LED may be configured to rotate 360 degrees to move the voice position in the recognized direction. Through this, it is possible to satisfy aesthetics while allowing various expressions.

음성 입력부(110)는 외부에서 발생한 음성 신호를 수신하기 위한 것으로서, 적어도 하나 이상의 마이크를 포함할 수 있다. 마이크에는 외부의 음성 신호를 발생되는 잡음(noise)을 제거하기 위한 다양한 장치가 구비될 수 있다. 또한, 모터는 추적된 음원의 위치로 인공지능 스피커(1)를 회전시켜 인공지능 스피커(1)가 음원 방향으로 향하게 하므로, 마이크는 인공지능 스피커(1) 내부에 음원 방향으로 향하게 될 인공지능 스피커(1)의 정면을 향하여 구비될 수 있다. 마이크는 사용자가 바라보게 되는 인공지능 스피커의 헤드부에 위치하는 것이 바람직하지만, 이에 제한되는 것은 아니다. 한편, 마이크는 인공지능 스피커에 구비되는 웹캠과 같은 카메라에 내장되어 영상 인식 기능과 함께 음성 인식 기능을 구비할 수도 있다. 마이크는 적어도 한 개 이상으로 이루어질 수 있다.The voice input unit 110 is for receiving an externally generated voice signal and may include at least one microphone. The microphone may be provided with various devices for removing noise generated from an external voice signal. In addition, since the motor rotates the AI speaker 1 to the position of the tracked sound source so that the AI speaker 1 is directed in the direction of the sound source, the microphone is the AI speaker that will be directed in the sound source direction inside the AI speaker 1. It may be provided toward the front of (1). The microphone is preferably located in the head portion of the artificial intelligence speaker that the user is looking at, but is not limited thereto. On the other hand, the microphone may be built in a camera such as a webcam provided in the artificial intelligence speaker may have a voice recognition function in addition to the image recognition function. The microphone may be made of at least one.

음원 추적부(120)는 수신된 음성 신호를 이용하여 음성 신호의 음원을 추적할 수 있다. 음원 추적부(120)는 아두이노(arduino)를 이용한 음원 위치 인식 기술을 가질 수 있다. 아두이노란 오픈소스를 기반으로 한 단일 보드 마이크로 컨트롤러로 완성된 보드와 관련 개발 도구 및 환경을 의미한다. 예를 들면, 아두이노 중 하나인 OpenCM 9.04를 이용한 음원 추적부(120)에서 방향 정보를 입력받아 인공지능 스피커의 움직임을 제어할 수 있다. 나아가, 음원 추적부(120)는 수신한 음성 신호를 판별하여 제어부(140)를 통해 모터를 구동시킴으로써 인공지능 스피커의 움직임을 구현할 수 있다. 나아가, 음원 추적부(120)는 사람마다 목소리의 파형이 다른 것을 이용하여 자신의 목소리를 등록한 사람들 중에서도 누가 말한 것인지를 인식하여 그 사용자에게 맞는 대답을 하도록 이루어질 수 있고, 음성 위치 인식 기술을 사용하여 사용자의 음성이 들리는 위치에서만 음성 인식이 이루어지도록 구성될 수도 있다.The sound source tracking unit 120 may track the sound source of the voice signal using the received voice signal. The sound source tracking unit 120 may have a sound source position recognition technology using Arduino. Arduino refers to a board complete with an open source, single board microcontroller and related development tools and environments. For example, the sound source tracking unit 120 using one of Arduino's OpenCM 9.04 receives direction information to control the movement of the AI speaker. In addition, the sound source tracking unit 120 may determine the received voice signal and drive the motor through the controller 140 to implement the movement of the artificial intelligence speaker. Furthermore, the sound source tracking unit 120 may be configured to recognize who speaks among those who have registered their voices by using different waveforms of voices for each person, and to give an answer that is appropriate for the user. The voice recognition may be configured only at a location where the user's voice is heard.

한편, 도 6에 도시된 바와 같이, 예를 들어, 인공지능 스피커(1)의 간단한 모형으로 도 6의 좌측에 도시된 도면과 같이 사용자가 질문을 할 때 아무런 움직임 없이 인공지능 스피커의 기능이 실행되는 상황과, 도 6의 우측에 도시된 도면과 같이 사용자가 질문을 하고 인공지능 스피커가 대답할 때 사용자 쪽을 바라보는 움직임과 약간의 들썩거림을 추가하는 상황을 가정해 볼 수 있다. 이 경우, '오늘 날씨 어때"와 같은 질문에 대하여 인공지능 스피커는 공통적으로 답을 음성으로 내주지만, 도 6의 우측에 도시된 도면과 같은 인공지능 스피커의 단순한 움직임에 의하여도 사용자는 인공지능 스피커와의 소통이 향상된 것을 체감할 수 있게 된다.On the other hand, as shown in Figure 6, for example, a simple model of the artificial intelligence speaker 1, as shown in the figure shown on the left of Figure 6 when the user asks a question, the function of the artificial intelligence speaker is executed without any movement It may be assumed that a situation in which the user asks a question and adds a slight shake when the user asks a question and the AI speaker answers, as shown in the drawing on the right side of FIG. 6. In this case, the AI speaker commonly answers a question such as 'how about the weather today', but the user may not use the AI speaker even with the simple movement of the AI speaker as shown in the right side of FIG. 6. You can feel the improved communication with the.

도 7 내지 도 9는 인공지능 스피커(1)가 효과적인 의사소통을 위해 음원 위치 인식 기술로 사용자를 감지한 후 움직임과 소리로 사용자와 소통하도록 하는 코드를 매트랩(MATLAB)을 사용하여 도시한 것이다.7 to 9 illustrate a code for using the MATLAB to allow the artificial intelligence speaker 1 to communicate with the user through movement and sound after detecting the user with the sound source location recognition technology for effective communication.

도 7은 인공지능 스피커(1)가 작동할 수 있는 환경을 만들기 위한 코드를 나타낸다. 이에 따르면, 마이크 어레이(array)를 이용해 소리의 위치를 파악한 후, TCP로 그 위치 좌표를 받을 수 있는 환경을 조성한다. 또한, 두 개의 액츄에이터(actuator)를 사용하기 위한 OpenCM용 환경과 웹캠 등의 영상 촬영부(150), 마이크 등의 음성 입력부(110)를 사용할 수 있도록 설정한다. 그리고 추후에 사용될 변수들을 초기화(initialize)한다.7 shows code for creating an environment in which the AI speaker 1 can operate. According to this, the position of the sound is determined by using an array of microphones, and then an environment in which the position coordinates are received by TCP is created. In addition, the OpenCM environment for using the two actuators (actuator) and the video recording unit 150, such as a webcam, the audio input unit 110, such as a microphone is set to be used. And initialize the variables to be used later.

도 8은 불필요한 잡음을 최대한 필터링하기 위한 제1 실시예 및 제2 실시예를 나타내는 코드를 나타낸다.8 shows a code representing a first embodiment and a second embodiment for maximally filtering out unwanted noise.

제1 실시예는 음원 추적부(120)가 발화자의 음성과 그 외의 음성을 구분하기 위하여 음성의 크기의 임계값(threshold)을 정하고, 임계값을 넘는 음성 신호의 음원의 좌표를 추출하여 제어부(140)를 통해 모터를 구동시킬 수 있다. 즉, 임계값을 정하여 음성의 크기가 임계값을 넘지 않으면 그때 수신되는 음성의 좌표를 무시하는 것이다. In the first exemplary embodiment, the sound source tracking unit 120 determines a threshold of a loudness of a voice to distinguish between a talker's voice and other voices, and extracts coordinates of a sound source of a voice signal exceeding the threshold. 140 may drive the motor. In other words, if the loudness of the voice does not exceed the threshold by setting a threshold, the coordinates of the voice received at that time are ignored.

제2 실시예는 음원 추적부(120)가 임계값 이상의 음성 신호 좌표를 저장하여 가장 빈도가 높은 방위값(azimuth)과 고도(elevation)의 값을 인공지능 스피커가 움직임을 구현할 때 반영하도록 설정한 것이다. 주위의 소음과 같은 잡음이 같은 위치에서 반복될 확률은 낮기 때문에, 사용자가 발화할 때에는 특정 위치의 좌표가 반복되어 들어오지만, 잡음과 같은 소리들은 반복된 좌표 값을 가질 확률이 적다. 따라서, 실제로 사용자가 발화하는 사이에 또는 그 이전, 이후에 발생할 수 있는 큰 잡음으로 인하여 제1 실시예에 따른 음원 추적부(120)는 음성 신호 좌표 값을 받아들이지만, 결국, 제2 실시예에 따라 저장된 데이터에서 가장 빈번한 음성 신호 좌표 값은 변하는 일이 거의 없으므로, 제2 실시예에 따른 인공지능 스피커는 계속 사용자를 바라볼 수 있게 된다. 한편, 제1 실시예와 제2 실시예는 각각 순서의 제한 없이 실시될 수 있다.In the second embodiment, the sound source tracking unit 120 stores voice signal coordinates of a threshold value or higher to reflect the most frequent azimuth and elevation values when the AI speaker implements movement. will be. Since the likelihood of noise such as ambient noise is repeated at the same position is low, the coordinates of a specific position are repeatedly input when the user speaks, but sounds like noise are less likely to have repeated coordinate values. Thus, the sound source tracker 120 according to the first embodiment accepts the voice signal coordinate values due to the loud noise that may actually occur between, before, or after the user speaks, but in the end, Accordingly, since the most frequent voice signal coordinate values in the stored data rarely change, the artificial intelligence speaker according to the second embodiment can continuously look at the user. On the other hand, the first embodiment and the second embodiment can be implemented without restriction in order, respectively.

정리하면, 도 12에 도시된 바와 같이, 인공지능 스피커(1)의 제어 방법은 외부에서 발생한 음성 신호를 수신하는 단계(S1), 수신한 음성 신호의 음성을 추적하는 단계(S2), 추적된 음원의 위치로 인공지능 스피커(1)를 회전시키는 단계(S3), 사용자의 음성과 그 외의 음성을 구분하기 위하여 음성의 크기의 임계값을 정하고, 임계값을 넘는 음성 신호의 음원의 좌표를 추출하여 상기 음성이 발화자의 음성인지 확인하는 단계(S4)를 포함할 수 있다. 나아가, 발화자의 음성인지 확인하는 단계(S4)는 임계값 이상의 소리 좌표를 저장하여 가장 빈도가 높은 방위값과 고도의 값을 인공지능 스피커가 움직임을 구현할 때 반영하도록 설정하는 단계를 더 포함할 수 있다.In summary, as shown in FIG. 12, the method of controlling the artificial intelligence speaker 1 includes receiving an externally generated voice signal (S1), tracking a voice of the received voice signal (S2), and tracked the Rotating the artificial intelligence speaker 1 to the position of the sound source (S3), to determine the threshold of the loudness of the voice to distinguish the user's voice and other voices, and extract the coordinates of the sound source of the voice signal exceeding the threshold And checking whether the voice is the voice of the talker (S4). In addition, the step S4 of confirming whether the voice of the talker is stored may further include storing sound coordinates equal to or greater than a threshold value to reflect the most frequent azimuth and altitude values when the AI speaker implements the movement. have.

도 9는 음성 신호가 수신되었는지 여부를 판단하여 상황에 따라 대처할 수 있게 하기 위한 제3 실시예를 나타내는 코드를 나타낸다.9 is a code illustrating a third embodiment for determining whether a voice signal has been received and allowing a situation to be dealt with according to a situation.

제3 실시예는 도 9에 도시된 코드의 signal_check라는 함수에서 시그널을 판별한다. 예를 들어, 인공지능 스피커(1)가 인사를 하는 경우는 시그널 변수의 값을 1로, 예 또는 아니오에 대한 대답이 필요한 경우는 2로, 사용자의 패션을 평가하는 등 사용자의 사진을 찍고 인식하는 경우는 3으로, 노래를 재생하고 춤을 추는 경우는 4로 코드를 바꾼다. 또한, 단순히 소리의 위치로 고개를 돌릴 때의 시그널 변수의 값은 0이다. 이를 통하여, 인공지능 스피커(1)가 상황에 맞는 행동과 소리를 낼 수 있게 한다. 이에 따라, 사용자와의 의사소통에 도움을 주는 다양한 이모티콘의 움직임들도 인공지능 스피커(1)에서 구현 가능하므로 사용자에게 친근함을 주는 효과가 있다.The third embodiment determines a signal in a function called signal_check of the code shown in FIG. For example, when the AI speaker 1 greets, the signal variable value is 1, and when the answer is yes or no, the value is 2, and the user's fashion is taken and recognized. Change the chord to 3 if you are playing a song, or 4 if you are playing a song and dancing. Also, the value of the signal variable when simply turning to the position of the sound is zero. Through this, the artificial intelligence speaker (1) can make a behavior and sound in accordance with the situation. Accordingly, the movement of various emoticons to help the communication with the user can also be implemented in the artificial intelligence speaker (1) has the effect of giving a user friendly.

즉, 도 12에 도시된 바와 같이, 인공지능 스피커의 제어 방법은 수신한 음성 신호를 구분하여 인공지능 스피커가 구분된 음성 신호의 종류에 따라 각각 다른 행동과 소리를 낼 수 있도록 하는 움직임 구현 단계(S5)를 더 포함할 수 있다. That is, as shown in FIG. 12, the control method of the AI speaker may include: a motion implementing step of dividing the received voice signal so that the AI speaker may make different actions and sounds according to the type of the divided voice signal ( S5) may be further included.

이러한 실시예들을 통하여, 도 10에 도시된 바와 같이, 인공지능 스피커(1)는 단순히 음성뿐만 아니라, 소프트웨어 코드와 두 개의 모터와 같은 장치들로 인하여 이루어지는 행동으로 사용자와 의사소통할 수 있다. 예를 들어, 인사할 때 고개를 숙이는 법, 예 또는 아니오로 답하면서 끄덕이거나 가로젓는 법, 음악을 재생하고 춤을 추는 법 등 여러 상황에 맞는 몸짓을 구현할 수 있으며, 이러한 모든 행동들을 사용자를 바라보면서 수행할 수 있으므로 사용자와 인공지능 스피커(1)와의 상호작용이 훨씬 더 잘 이루어질 수 있다. 나아가, 사용자의 소리가 나는 위치를 바라보는 기능과 고개를 끄덕이거나 가로젓는 등의 다양한 행동을 구현할 수 있으므로, 인공지능 스피커(1)는 마치 애완동물이 집에 막 들어온 주인을 반기는 것처럼 더욱 실감나게 반가움 등의 감정을 사용자에게 전달할 수 있는 효과가 있다. 또한, 음악 재생 기능에 있어서도, 단순히 음악 재생시 LED 조명 등을 이용한 정적인 의사소통이 아닌, 인공지능 스피커(1)는 비트에 맞는, 무드에 맞는 회전과 회전 속도 조절로 음악이 전달하고자 하는 분위기를 잘 표현할 수 있다.Through these embodiments, as shown in FIG. 10, the AI speaker 1 can communicate with the user not only by voice, but also by actions made by devices such as software code and two motors. For example, you can implement gestures for different situations, such as bowing your head in greetings, answering yes or no, nodding or intercepting, playing music and dancing, and expecting all of these actions. As you can see and do it, the interaction between the user and the AI speaker 1 can be much better. Furthermore, since the user can implement various functions such as viewing the user's voice position and nodding or shaking the head, the AI speaker 1 feels as if the pet is welcoming the owner. There is an effect that can convey the feelings such as niceness to the user. In addition, even in the music playback function, the artificial intelligence speaker 1 is not simply a static communication using the LED light, etc. when playing music, the atmosphere that the music is intended to deliver by adjusting the rotation and rotation speed according to the mood, according to the beat Can be expressed well.

한편, 모터 구동부(130)에서는 음원 추적부(120)를 통해 추적된 음원의 위치로 마이크를 회전시킬 수 있다. 모터 구동부(130)에는 적어도 한 개 이상의 모터가 구비될 수 있다. 모터 구동부(130)는 인공지능 스피커(1)를 회전시켜 인공지능 스피커(1)에 구비된 음성 입력부(110)의 마이크를 음원의 방향으로 함께 회전시킬 수 있을 뿐만 아니라, 마이크가 영상 촬영부(150)의 카메라에 부착되어 있는 경우, 카메라를 음원의 방향으로 회전시켜 이에 따라 마이크도 함께 음원의 방향으로 회전되게 할 수도 있다. 또한, 모터 구동부(130)는 마이크 또는 카메라를 상하좌우 방향으로 회전시킬 수 있으며, 이에 제한되지 않고 다양한 방향의 움직임을 표현하는 것이 가능하다.On the other hand, the motor driver 130 may rotate the microphone to the position of the sound source tracked through the sound source tracking unit 120. The motor driver 130 may be provided with at least one motor. The motor driver 130 may rotate the artificial intelligence speaker 1 to rotate the microphone of the voice input unit 110 included in the artificial intelligence speaker 1 in the direction of the sound source, and the microphone may capture the image capture unit ( When attached to the camera of 150, the camera may be rotated in the direction of the sound source so that the microphone may also be rotated in the direction of the sound source. In addition, the motor driver 130 may rotate the microphone or the camera in the vertical, horizontal, left and right directions, without being limited thereto, and may express movement in various directions.

나아가, 모터 구동부(130)의 모터는 제1 모터 및 제2 모터로 이루어질 수 있다. 제1 모터는 인공지능 스피커(1)를 수직축을 중심으로 회전시킬 수 있으며, 제2 모터는 인공지능 스피커(1)에 구비된 회전 가능한 암(arm)의 움직임을 구현할 수 있도록 이루어질 수 있다. 이를 통하여 인공지능 스피커(1)는 몸체의 회전뿐만 아니라 암(arm)의 회전도 가능하여, 인사를 하고 춤을 추는 등 사용자와 상호작용하기 위한 다양한 움직임을 구현할 수 있다.In addition, the motor of the motor driver 130 may include a first motor and a second motor. The first motor may rotate the AI speaker 1 about the vertical axis, and the second motor may be configured to implement the movement of the rotatable arm provided in the AI speaker 1. Through this, the artificial intelligence speaker 1 can not only rotate the body but also rotate the arm, and can implement various movements for interacting with the user, such as greeting and dancing.

제어부(140)는 모터를 구동하기 위한 것으로서, 아두이노(arduino)를 이용한 음원 위치 인식 기술을 갖는 음원 추적부에서 방향 정보를 입력받아 모터를 구동하여 마이크가 음원 방향으로 향할 수 있도록 한다. 또한, 음성 추적 장치(100)의 전반적인 동작을 제어할 수 있도록 이루어질 수도 있다. 한편, 제어부(14)는 마이크 어레이(array)를 통해 음성 신호를 수신한 후, 빔포밍(beam forming) 기술을 이용하여 음원의 위치를 실시간으로 추적할 수 있다. 마이크 어레이의 배열 방법에 따라 추적할 수 있는 범위가 달라지는데, 평면 위에 배열되어 있다면 X축, Y축의 2방향을, 입체적으로 배열되어 있다면 X축, Y축, Z축의 3방향을 추적할 수 있다. 또한, 구형 어레이를 이용하여 평면으로 구를 둘러싸서, 입체 기술이 아닌 평면 기술을 사용하지만 360도 전체를 추적할 수도 있다. 나아가, 인공지능 스피커(1)는 여러 센서를 통해 받아들인 데이터를 실시간으로 인터넷과 주고받는 다양한 사물인터넷(IoT) 기술과 결합하여 상황에 맞는 정보를 인터넷을 통해 제어부(140)로 수신하는 방식으로 사용자에 대하여 적절한 대답과 행동을 할 수 있도록 구현할 수도 있다.The controller 140 is for driving a motor, and receives a direction information from a sound source tracking unit having a sound source position recognition technology using an arduino to drive the motor so that the microphone can be directed in the sound source direction. In addition, it may be made to control the overall operation of the voice tracking device 100. Meanwhile, the controller 14 may receive a voice signal through a microphone array and then track the position of the sound source in real time using a beam forming technique. Depending on how the microphone array is arranged, the range that can be tracked varies. If it is arranged on a plane, it can track two directions of X, Y, and three directions of X, Y, and Z. It is also possible to use a spherical array to enclose a sphere in a plane, tracking a full 360 degrees using a planar technique rather than a stereoscopic technique. Furthermore, the artificial intelligence speaker 1 combines the data received through various sensors with various Internet of Things (IoT) technologies that exchange data with the Internet in real time to receive information on the situation to the controller 140 via the Internet. It can also be implemented to provide appropriate answers and actions for the user.

한편, 영상 촬영부(150)는 사용자를 영상으로 촬영할 수 있도록 하는 것으로서, 사용 환경에 따라 적어도 한 개 이상의 카메라를 포함할 수 있다. 영상 촬영부(150)의 카메라는 음성 입력부(110)의 마이크를 내장하도록 이루어질 수 있으며, 각각 분리되도록 이루어질 수도 있다. 영상 촬영부(150)의 카메라는 사진을 촬영하는 기능에 제한되지 않으며 동영상 촬영 기능과 같이 사용자와 소통할 수 있는 다양한 기능들이 구비될 수 있다. 또한, 이러한 영상 촬영부(150)는 인공지능 스피커(1)에 구비된 음성 위치 인식 기술과 움직임을 통하여, 음성이 발생하는 방향으로 카메라를 회전시켜 사진을 찍고 인터넷에 사진 데이터를 업로드 및 분석할 수도 있다.The image capturing unit 150 may capture a user as an image, and may include at least one or more cameras according to a usage environment. The camera of the image capturing unit 150 may be configured to incorporate a microphone of the voice input unit 110, or may be separated from each other. The camera of the image capturing unit 150 is not limited to a function of capturing a picture, and may be provided with various functions for communicating with a user, such as a video capturing function. In addition, the image capturing unit 150 rotates the camera in a direction in which the voice is generated through the voice position recognition technology and the movement provided in the artificial intelligence speaker 1, and uploads and analyzes the photo data on the Internet. It may be.

인공지능 스피커(1)는 이러한 구성들을 통하여, 사용자와의 소통에 중점을 두고 상황에 맞게 움직이는 기능을 가질 수 있으므로, 인공지능 스피커(1)와 친근하게 대화하는 것을 요하는 사용자들의 욕구를 충족시킬 수 있다. 또한, 도 11에 도시된 바와 같이, 아이나 반려동물과 놀아주는 목적으로도 활용할 수 있다. 아이들은 아이의 반응에 따른 다양한 움직임을 추가한 인공지능 스피커(1)가 예를 들어, 구연, 참참참(손 놀이) 등을 한다면 아이들의 훌륭한 놀이 친구가 될 수 있는 것이다. 반려동물의 경우, 예를 들어, 고양이의 경우 사람이 흔드는 장난감을 잡으려 움직이며 노는 경우가 많다. 여기에 움직이는 인공지능 스피커를 이용해 고양이의 발소리 위치를 인식하며 움직인다면 좋은 놀이 기구가 될 수 있는 효과도 있다.Through this configuration, the AI speaker 1 may have a function of moving according to the situation with an emphasis on communication with the user, thereby satisfying the needs of users who need to communicate with the AI speaker 1 intimately. Can be. In addition, as shown in Figure 11, it can be used for the purpose of playing with children and pets. Children can be a great play buddy of children if the artificial intelligence speaker (1) that adds various movements according to the child's reaction, for example, play or play (hand play). In the case of pets, for example, in the case of cats, people often play with moving toys. In addition, if you use the moving artificial intelligence speaker to recognize the position of the cat's footsteps can also be a good ride.

이상, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명하였으나, 본 발명이 그에 한정되는 것은 아니며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면, 이러한 기재로부터 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 치환, 변경 및/또는 변경 가능하다. 그러므로, 본 발명의 범위는 상술한 실시예에 국한되어 정해져서는 아니 되며, 후술하는 청구범위뿐만 아니라 그에 균등한 것들에 의해 정해져야 할 것이다. As mentioned above, although preferred embodiments of the present invention have been described with reference to the accompanying drawings, the present invention is not limited thereto, and those skilled in the art to which the present invention pertains may have the technical idea of the present invention. Substitution, modification and / or modification are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the above-described embodiment, but should be defined by the claims below and equivalents thereof.

1: 인공지능 스피커 110: 음성 입력부
120: 음원 추적부 130: 모터 구동부
140: 제어부 150: 영상 촬영부1: artificial intelligence speaker 110: voice input unit
120: sound source tracking unit 130: motor drive unit
140: control unit 150: image capturing unit

Claims

A voice input unit 110 provided with at least one microphone for receiving an externally generated voice signal;
A sound source tracking unit 120 for tracking a sound source of the received voice signal through a sound source position recognition technology using an arduino;
A motor driver 130 for rotating the artificial intelligence speaker 1 toward the position of the tracked sound source; And
It receives a direction information from the sound source tracking unit 120 includes a control unit 140 to drive the motor to the AI speaker 1 in the direction of the sound source,
The sound source tracking unit 120 sets a threshold of the loudness of the voice to distinguish the voice of the talker and other voices, extracts the coordinates of the sound source of the voice signal exceeding the threshold, and controls the controller 140. To drive the motor through
AI speaker that can move using sound source location technology.

The method of claim 1,
The sound source tracking unit 120 stores voice signal coordinates equal to or greater than a threshold so that the AI speaker 1 reflects the values of azimuth and elevation most frequently when the AI speaker 1 implements the movement. Set
AI speaker that can move using sound source location technology.

The method according to claim 1 or 2,
The sound source tracking unit 120 classifies the received voice signals, and may selectively implement different actions and sounds according to the type of the divided voice signals.
AI speaker that can move using sound source location technology.

The method according to claim 1 or 2,
The artificial intelligence speaker 1 further includes a rotatable arm,
The motor consists of a first motor and a second motor,
The first motor rotates the artificial intelligence speaker 1 about the vertical axis,
The second motor to implement the movement of the arm
AI speaker that can move using sound source location technology.

In the control method of the artificial intelligence speaker 1 including the voice input unit 110, the sound source tracking unit 120, the motor driver 130 and the control unit 140,
Receiving an externally generated voice signal (S1);
Tracking (S2) a sound source of the received voice signal;
Rotating the artificial intelligence speaker (1) to the position of the tracked sound source (S3); And
Determining a threshold of a loudness of the voice to distinguish the user's voice from other voices, and extracting coordinates of the sound source of the voice signal exceeding the threshold to determine whether the voice is the voice of the talker (S4).
A method for controlling an artificial intelligence speaker using sound source position recognition technology.

The method of claim 5,
The step (S4) of confirming whether the voice of the talker is voice may further include storing sound coordinates equal to or greater than a threshold value to reflect the most frequent azimuth and altitude values when the AI speaker 1 implements the movement. doing
A method for controlling an artificial intelligence speaker using sound source position recognition technology.

The method of claim 5,
Further comprising a motion implementing step (S5) to distinguish the received voice signal so that the artificial intelligence speaker 1 can selectively implement different actions and sounds according to the type of the divided voice signal.
A method for controlling an artificial intelligence speaker using sound source position recognition technology.