KR102594683B1

KR102594683B1 - Electronic device for speech recognition and method thereof

Info

Publication number: KR102594683B1
Application number: KR1020230010284A
Authority: KR
Inventors: 류희섭; 권남영; 박경미; 복찬식; 최찬희
Original assignee: 삼성전자주식회사
Priority date: 2021-01-26
Filing date: 2023-01-26
Publication date: 2023-10-26
Also published as: KR20220020859A; KR20230153969A; KR102494051B1; KR20230020472A

Abstract

전자 장치 및 이의 음성 인식 방법을 제공한다. 본 전자 장치의 음성 인식 방법은 트리거 음성을 수신하는 단계, 트리거 음성을 분석하여 음성 인식을 위한 사용자 음성의 특성을 저장하는 단계, 사용자 음성이 입력된 경우, 사용자 음성이 사용자 음성의 특성에 해당하는지 여부를 판단하는 단계 및 사용자 음성이 상기 사용자 음성의 특성에 해당하는 경우, 사용자 음성에 대한 음성 인식을 수행하는 단계를 포함한다. 이에 의해, 사용자는 용이하게 전자 장치의 음성 인식 기능을 사용할 수 있게 된다. An electronic device and a voice recognition method thereof are provided. The voice recognition method of this electronic device includes the steps of receiving a trigger voice, analyzing the trigger voice and storing characteristics of the user's voice for voice recognition, and, when the user's voice is input, determining whether the user's voice corresponds to the characteristics of the user's voice. It includes the step of determining whether or not the user's voice corresponds to the characteristics of the user's voice, and performing voice recognition on the user's voice. As a result, users can easily use the voice recognition function of the electronic device.

Description

Electronic device and its voice recognition method { ELECTRONIC DEVICE FOR SPEECH RECOGNITION AND METHOD THEREOF }

본 발명은 전자 장치 및 이의 음성 인식 방법에 관한 것으로, 더욱 상세하게는 트리거 음성을 이용하여 음성 인식을 수행할 수 있는 전자 장치 및 이의 음성 인식 방법에 관한 것이다. The present invention relates to an electronic device and a voice recognition method thereof, and more specifically, to an electronic device capable of performing voice recognition using a trigger voice and a voice recognition method thereof.

사용자는 전자 장치를 더욱 편리하게 사용하고자 한다. 이에 따라, 전자 장치를 용이하게 제어하는 기술은 점차 발달하는 추세이다. 즉, 전자 장치를 용이하게 제어하는 기술은 전자 장치에 부착된 입력부를 통해 전자 장치를 제어하던 방법에서 시작하여, 근거리에서도 제어 가능한 외부 리모트 컨트롤러(Remote Controller)를 이용하는 방법으로 발전하였다. 최근에는 전자 장치에서 터치 패널을 구비하는 디스플레이부의 사용이 일반화됨에 따라, 터치 입력을 이용하여 전자 장치를 제어하는 기술이 보편적으로 사용된다. Users want to use electronic devices more conveniently. Accordingly, technology for easily controlling electronic devices is gradually developing. In other words, technology for easily controlling electronic devices started from controlling electronic devices through an input unit attached to the electronic device, and has developed into a method using an external remote controller that can be controlled from a short distance. Recently, as the use of display units including touch panels in electronic devices has become common, technology for controlling electronic devices using touch input is commonly used.

그러나 터치로 전자 장치를 제어하는 방법은 사용자가 반드시 전자 장치와 터치할 수 있는 거리 내에 있어야 한다는 불편함이 존재한다. 따라서 사용자의 근거리 내에서 리모트 컨트롤러와 같은 외부 기기 없이 전자 장치를 제어하려는 니즈(needs)가 증가하였다. However, the method of controlling an electronic device by touch has the inconvenience that the user must be within a touching distance of the electronic device. Therefore, the need to control electronic devices within a short distance of the user without an external device such as a remote controller has increased.

이에 따라, 음성 인식 기술이 개발되었으나, 음성 인식 기술은 사용자의 음성 외에 외부 잡음(예를 들면, 반려 동물의 소리)에도 반응한다는 문제점이 존재한다. 따라서 사용자의 음성에만 반응하도록 하기 위해 전자 장치를 제어하기 위한 음성을 발화할 때마다 전자 장치에 부착된 또는 리모트 컨트롤러에 부착된 버튼을 누르는 방법이 사용되었다. Accordingly, voice recognition technology has been developed, but there is a problem that voice recognition technology responds not only to the user's voice but also to external noise (for example, the sound of a companion animal). Therefore, in order to respond only to the user's voice, a method of pressing a button attached to the electronic device or a remote controller was used every time a voice to control the electronic device was uttered.

그러나 이러한 방법은 사용자가 매 발화시마다 전자 장치에 부착된 또는 리모트 컨트롤러에 부착된 버튼을 누르는 등의 행위가 필요하므로 사용자에게 불편을 초래하였다. However, this method caused inconvenience to the user because it required the user to press a button attached to the electronic device or a remote controller every time he or she spoke.

따라서, 외부 잡음이 존재하거나, 복수의 사용자가 존재하는 경우에도 사용자가 용이하게 음성 인식 기능을 사용할 수 있도록 하는 기술에 대한 필요성이 대두하였다. Accordingly, there is a need for technology that allows users to easily use the voice recognition function even when there is external noise or a plurality of users.

본 발명은 상술한 필요성에 따라 안출된 것으로, 본 발명의 목적은 사용자가 용이하게 음성 인식 기능을 사용할 수 있는 전자 장치 및 이의 음성 인식 방법을 제공함에 있다. The present invention was developed in response to the above-mentioned needs, and the purpose of the present invention is to provide an electronic device and a voice recognition method thereof that allow users to easily use a voice recognition function.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 전자 장치의 음성 인식 방법은 사용자 음성이 입력된 경우, 상기 입력된 사용자 음성이 트리거 음성에 해당하는지 판단하는 단계, 상기 사용자 음성이 트리거 음성으로 판단된 경우, 상기 전자 장치의 모드를 음성 인식 모드로 변환하고, 상기 사용자 음성을 분석하여 상기 사용자 음성의 특성을 저장하는 단계, 상기 전자 장치를 제어하기 위한 제어 음성이 입력된 경우, 상기 제어 음성을 분석하여 분석된 제어 음성의 특성을 상기 사용자 음성의 특성과 비교하는 단계 및 상기 비교 결과를 바탕으로 상기 제어 음성에 대응되는 기능을 수행하는 단계를 포함할 수 있다. According to an embodiment of the present invention for achieving the above object, a voice recognition method of an electronic device includes the steps of, when a user voice is input, determining whether the input user voice corresponds to a trigger voice, and the user voice is a trigger voice. If determined, converting the mode of the electronic device to a voice recognition mode, analyzing the user's voice and storing characteristics of the user's voice, if a control voice for controlling the electronic device is input, the control It may include analyzing the voice and comparing the characteristics of the analyzed control voice with the characteristics of the user's voice, and performing a function corresponding to the control voice based on the comparison result.

한편, 상기 저장하는 단계는, 상기 사용자 음성의 에너지, 주파수 대역폭, 상기 사용자 음성의 발화 시의 울림값 및 음성 신호 대 잡음비 중 적어도 하나를 상기 사용자 음성의 특성으로 저장하는 것을 특징으로 할 수 있다. Meanwhile, the storing step may be characterized by storing at least one of the energy of the user's voice, frequency bandwidth, resonance value when the user's voice is uttered, and voice signal-to-noise ratio as a characteristic of the user's voice.

그리고 상기 수행하는 단계는, 상기 제어 음성의 에너지, 주파수 대역폭, 상기 제어 음성의 발화 시의 울림값 및 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터가 상기 트리거 음성의 에너지, 주파수 대역폭, 상기 트리거 음성의 발화 시의 울림값 및 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터의 기 설정된 범위 내인 경우, 상기 제어 음성에 대응되는 기능을 수행하는 것을 특징으로 할 수 있다. In the performing step, data obtained by analyzing at least one of the energy of the control voice, the frequency bandwidth, the ringing value when the control voice is uttered, and the voice signal-to-noise ratio are analyzed to determine the energy of the trigger voice, the frequency bandwidth, and the trigger voice. If at least one of the ringing value and voice signal-to-noise ratio during utterance is within a preset range of the analyzed data, a function corresponding to the control voice may be performed.

한편, 상기 비교하는 단계는, 복수의 사용자 음성이 입력된 경우, 상기 복수의 사용자 음성을 분석하여, 상기 복수의 사용자 음성을 분석한 결과 중 상기 저장된 사용자 음성의 분석 결과와 매칭되는 결과를 갖는 적어도 하나의 사용자 음성을 상기 제어 음성으로 판단하는 단계를 더 포함하며, 상기 수행하는 단계는, 상기 판단된 적어도 하나의 제어 음성에 대응되는 기능을 수행하는 것을 특징으로 할 수 있다. Meanwhile, in the comparing step, when a plurality of user voices are input, the plurality of user voices are analyzed, and at least one of the results of analyzing the plurality of user voices matches the analysis result of the stored user voice. The method may further include determining one user voice as the control voice, and the performing step may be characterized by performing a function corresponding to the determined at least one control voice.

그리고 상기 분석된 제어 음성의 특성이 상기 사용자 음성 특성에 해당하지 않는 경우, 상기 제어 음성을 바이 패스하는 단계를 더 포함하는 것을 특징으로 할 수 있다. And, if the characteristics of the analyzed control voice do not correspond to the user voice characteristics, the method may further include bypassing the control voice.

한편, 제1항에 있어서, 상기 사용자 음성이 트리거 음성으로 판단되어 상기 전자 장치의 모드를 음성 인식 모드로 변환한 경우, 상기 음성 인식 모드 상태임을 알려주는 UI를 디스플레이하는 단계를 더 포함할 수 있다. Meanwhile, the method of claim 1, when the user's voice is determined to be a trigger voice and the mode of the electronic device is converted to a voice recognition mode, the step of displaying a UI indicating that the voice recognition mode is in the state may be further included. .

그리고 상기 디스플레이하는 단계는, 상기 비교 결과에 따라 상기 제어 음성에 대응되는 기능을 수행하는 경우 상기 UI에 인디케이터를 디스플레이하는 것을 특징으로 할 수 있다. And the displaying step may be characterized by displaying an indicator on the UI when a function corresponding to the control voice is performed according to the comparison result.

한편, 상기 트리거 음성은, 상기 전자 장치의 모드를 상기 음성 인식을 수행하기 위한 음성 인식 모드로 전환하기 위해 기 설정된 음성인 것을 특징으로 할 수 있다. Meanwhile, the trigger voice may be a voice preset to switch the mode of the electronic device to a voice recognition mode for performing voice recognition.

그리고 상기 사용자 음성 및 상기 제어 음성은 외부 장치 또는 상기 전자 장치 중 적어도 하나에 포함된 마이크를 통해 수신되는 것을 특징으로 할 수 있다. Additionally, the user voice and the control voice may be received through a microphone included in at least one of an external device or the electronic device.

한편, 기 설정된 시간 동안 상기 제어 음성이 입력되지 않는 경우, 상기 음성 인식 모드를 종료하는 단계를 더 포함할 수 있다. Meanwhile, if the control voice is not input for a preset time, the method may further include terminating the voice recognition mode.

한편, 본 발명의 일 실시 예에 따른, 전자 장치는 사용자 음성을 입력받는 음성 수신부, 상기 수신된 음성을 분석하는 음성 신호 분석부, 상기 사용자 음성의 특성을 저장하는 저장부 및 상기 입력된 사용자 음성이 트리거 음성에 해당하는지 판단하여, 상기 사용자 음성이 트리거 음성으로 판단된 경우, 상기 전자 장치의 모드를 음성 인식 모드로 변환하고, 상기 사용자 음성을 분석하여 상기 사용자 음성의 특성을 상기 저장부에 저장하도록 제어하는 제어부를 포함하고, 상기 제어부는, 상기 전자 장치를 제어하기 위한 제어 음성이 입력된 경우, 상기 제어 음성을 분석하도록 상기 음성 신호 분석부를 제어하고, 상기 분석된 제어 음성의 특성을 상기 사용자 음성의 특성과 비교하여 상기 비교 결과를 바탕으로 상기 제어 음성에 대응되는 기능을 수행하는 것을 특징으로 할 수 있다. Meanwhile, according to an embodiment of the present invention, an electronic device includes a voice receiving unit that receives a user's voice, a voice signal analysis unit that analyzes the received voice, a storage unit that stores characteristics of the user's voice, and the input user voice. It is determined whether the user's voice corresponds to the trigger voice, and if the user's voice is determined to be a trigger voice, the mode of the electronic device is converted to a voice recognition mode, the user's voice is analyzed, and the characteristics of the user's voice are stored in the storage. and a control unit configured to control the electronic device, wherein, when a control voice for controlling the electronic device is input, the control unit controls the voice signal analysis unit to analyze the control voice, and determines the characteristics of the analyzed control voice to the user. It may be characterized in that it compares the characteristics of the voice and performs a function corresponding to the control voice based on the comparison result.

그리고 상기 제어부는, 상기 사용자 음성의 에너지, 주파수 대역폭, 상기 사용자 음성의 발화 시의 울림값 및 음성 신호 대 잡음비 중 적어도 하나를 상기 사용자 음성의 특성으로 저장하도록 상기 저장부를 제어하는 것을 특징으로 할 수 있다. And the control unit may be characterized in that it controls the storage unit to store at least one of the energy of the user's voice, the frequency bandwidth, the resonance value when the user's voice is uttered, and the voice signal-to-noise ratio as a characteristic of the user's voice. there is.

한편, 상기 제어부는, 상기 제어 음성의 에너지, 주파수 대역폭, 상기 제어 음성의 발화 시의 울림값 및 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터가 상기 트리거 음성의 에너지, 주파수 대역폭, 상기 트리거 음성의 발화 시의 울림값 및 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터의 기 설정된 범위 내인 경우, 상기 제어 음성에 대응되는 기능을 수행하도록 제어하는 것을 특징으로 할 수 있다. Meanwhile, the control unit analyzes at least one of the energy of the control voice, the frequency bandwidth, the ringing value when the control voice is uttered, and the voice signal-to-noise ratio. The energy of the trigger voice, the frequency bandwidth, and the trigger voice If at least one of the resonance value at the time of speech and the voice signal-to-noise ratio is within a preset range of the analyzed data, the control may be performed to perform a function corresponding to the control voice.

그리고 상기 제어부는, 상기 음성 수신부를 통해 복수의 사용자 음성이 입력된 경우, 상기 복수의 사용자 음성을 분석하도록 상기 음성 신호 분석부를 제어하며, 상기 복수의 사용자 음성을 분석한 결과 중 상기 저장된 사용자 음성의 분석 결과와 매칭되는 결과를 갖는 적어도 하나의 사용자 음성을 상기 제어 음성으로 판단하여, 상기 판단된 적어도 하나의 제어 음성에 대응되는 기능을 수행하도록 제어하는 것을 특징으로 할 수 있다. And the control unit, when a plurality of user voices are input through the voice receiver, controls the voice signal analysis unit to analyze the plurality of user voices, and selects the stored user voice among the results of analyzing the plurality of user voices. It may be characterized by determining at least one user voice that matches the analysis result as the control voice and controlling it to perform a function corresponding to the determined at least one control voice.

한편, 상기 제어부는, 상기 분석된 제어 음성의 특성이 상기 사용자 음성 특성에 해당하지 않는 경우, 상기 제어 음성을 바이 패스하도록 제어하는 것을 특징으로 할 수 있다. Meanwhile, the control unit may be configured to control the control voice to be bypassed if the analyzed characteristics of the control voice do not correspond to the user voice characteristics.

그리고 디스플레이부를 더 포함하고, 상기 제어부는, 상기 사용자 음성이 트리거 음성으로 판단되어 상기 전자 장치의 모드를 음성 인식 모드로 변환한 경우, 상기 음성 인식 모드 상태임을 알려주는 UI를 디스플레이하도록 상기 디스플레이부를 제어하는 것을 특징으로 할 수 있다. And further comprising a display unit, wherein the control unit controls the display unit to display a UI indicating the voice recognition mode state when the user's voice is determined to be a trigger voice and the mode of the electronic device is converted to the voice recognition mode. It can be characterized as:

한편, 상기 제어부는, 상기 비교 결과에 따라 상기 제어 음성에 대응되는 기능을 수행하는 경우 상기 UI에 인디케이터를 디스플레이하도록 상기 디스플레이부를 제어하는 것을 특징으로 할 수 있다. Meanwhile, the control unit may control the display unit to display an indicator on the UI when performing a function corresponding to the control voice according to the comparison result.

그리고 상기 트리거 음성은, 상기 전자 장치의 모드를 상기 음성 인식을 수행하기 위한 음성 인식 모드로 전환하기 위해 기 설정된 음성인 것을 특징으로 할 수 있다. In addition, the trigger voice may be a voice preset to change the mode of the electronic device to a voice recognition mode for performing the voice recognition.

한편, 통신부를 더 포함하고, 상기 제어부는, 상기 사용자 음성 및 상기 제어 음성이 외부 장치를 통해 입력된 경우, 상기 입력된 음성을 수신하도록 상기 통신부를 제어하는 것을 특징으로 할 수 있다. Meanwhile, it may further include a communication unit, wherein the control unit controls the communication unit to receive the input voice when the user voice and the control voice are input through an external device.

그리고 상기 제어부는, 기 설정된 시간 동안 상기 제어 음성이 입력되지 않는 경우, 상기 음성 인식 모드를 종료하도록 제어하는 것을 특징으로 할 수 있다.The control unit may control the voice recognition mode to end when the control voice is not input for a preset time.

본 발명의 다양한 실시예에 따르면, 사용자 발화한 사용자 음성이 사용자에 의한 의도적 발화인지 여부를 판단하여 사용자는 용이하게 전자 장치의 음성 인식 기능을 사용할 수 있게 된다. According to various embodiments of the present invention, the user can easily use the voice recognition function of the electronic device by determining whether the user's voice uttered is intentional by the user.

도 1은 본 발명의 일 실시 예에 따른, 전자 장치에 대해 트리거 음성을 입력하는 방법을 도시한 도면,
도 2는 본 발명의 일 실시예에 따른, 외부 장치를 이용하여 트리거 음성을 입력하는 방법을 도시한 도면,
도 3은 본 발명의 일 실시예에 따른, 전자 장치의 구성을 간략히 도시한 블럭도,
도 4는 본 발명의 일 실시예에 따른, 전자 장치의 구성을 구체적으로 도시한 블럭도,
도 5는 본 발명의 일 실시예에 따른, 사용자 음성을 분석하여 음성 인식을 수행하는 방법을 도시한 흐름도,
도 6은 본 발명의 일 실시예에 따른, 사용자 음성을 분석하여 음성 인식을 수행하고 음성 인식 모드를 종료하는 방법을 도시한 흐름도, 그리고
도 7 내지 도 10은 본 발명의 실시 예에 따른, UI를 디스플레이하는 도면을 도시한 도면이다. 1 is a diagram illustrating a method of inputting a trigger voice to an electronic device according to an embodiment of the present invention;
Figure 2 is a diagram illustrating a method of inputting a trigger voice using an external device according to an embodiment of the present invention;
3 is a block diagram briefly illustrating the configuration of an electronic device according to an embodiment of the present invention;
4 is a block diagram specifically illustrating the configuration of an electronic device according to an embodiment of the present invention;
Figure 5 is a flowchart showing a method of performing voice recognition by analyzing a user's voice according to an embodiment of the present invention;
Figure 6 is a flowchart showing a method of analyzing a user's voice to perform voice recognition and terminating the voice recognition mode, according to an embodiment of the present invention; and
7 to 10 are diagrams showing a UI display according to an embodiment of the present invention.

이하에서는 첨부된 도면을 참조하여, 본 발명의 다양한 실시 예를 좀더 상세하게 설명한다. 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다. 그리고 후술 되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, various embodiments of the present invention will be described in more detail with reference to the attached drawings. In describing the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. The terms described below are defined in consideration of the functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification.

도 1은 본 발명의 일 실시예에 따른, 사용자가 전자 장치(100)의 음성 인식 기능을 사용하기 위해 트리거 음성을 발화하는 모습을 도시한 도면이다. 도 1에서는 전자 장치(100)의 예시로 TV가 도시되었으나 이는 일 실시예에 불과할 뿐, 전자 장치(100)는 음성 인식 기능이 포함된 휴대폰, 태블릿 PC, 디지털 카메라, 캠코더, 노트북 PC, PDA 등과 같은 다양한 전자 장치로 구현될 수 있다. FIG. 1 is a diagram illustrating a user uttering a trigger voice to use the voice recognition function of the electronic device 100, according to an embodiment of the present invention. In FIG. 1, a TV is shown as an example of the electronic device 100, but this is only an example. The electronic device 100 includes a mobile phone with a voice recognition function, a tablet PC, a digital camera, a camcorder, a laptop PC, a PDA, etc. It can be implemented with various electronic devices such as:

전자 장치(100)는 마이크를 포함할 수 있다. 따라서, 전자 장치(100)는 일정 거리 내에서 발화된 사용자의 음성을 수신할 수 있다. 그리고 사용자로부터 수신한 음성을 분석하여 트리거 음성인지 여부를 판단할 수 있다. 즉, 트리거 음성은 "Hi TV"와 같이 기 설정된 세네음절 길이의 짧은 단어일 수 있다. The electronic device 100 may include a microphone. Accordingly, the electronic device 100 can receive the user's voice spoken within a certain distance. Additionally, the voice received from the user can be analyzed to determine whether it is a trigger voice. That is, the trigger voice may be a short word with a preset length of three syllables, such as “Hi TV.”

수신한 음성이 기 설정된 트리거 음성인 것으로 판단되면, 전자 장치(100)는 음성 인식 모드로 전자 장치(100)의 모드를 변경할 수 있다. 또한, 전자 장치(100)는 트리거 음성에 포함된 사용자 음성의 특성을 저장할 수 있다.If it is determined that the received voice is a preset trigger voice, the electronic device 100 may change the mode of the electronic device 100 to the voice recognition mode. Additionally, the electronic device 100 may store characteristics of the user's voice included in the trigger voice.

구체적으로 사용자 음성의 특성은 사용자 발화 컨디션 및 발화 환경을 포함할 수 있다. 사용자 발화 컨디션은 사용자 음성의 에너지, 사용자 음성의 주파수 대역 분포를 포함할 수 있다. 그리고 발화 환경은 음성 발화 당시의 울림값(RT) 또는 음성 신호 대 잡음비(SNR)를 포함할 수 있다. Specifically, the characteristics of the user's voice may include the user's speech condition and speech environment. The user's speech condition may include the energy of the user's voice and the frequency band distribution of the user's voice. And the speech environment may include the resonance value (RT) or speech signal-to-noise ratio (SNR) at the time of speech.

즉, 사용자 음성의 특성은 사용자 음성의 에너지, 사용자 음성의 주파수 대역 분포, 음성 발화 당시의 울림값(RT) 또는 음성 신호 대 잡음비(SNR)를 포함할 수 있다. That is, the characteristics of the user's voice may include the energy of the user's voice, the frequency band distribution of the user's voice, the resonance value (RT) at the time of voice utterance, or the voice signal-to-noise ratio (SNR).

예를 들면, "Hi TV"와 같은 트리거 음성을 발화한 사용자의 음성의 크기, 사용자의 성별 또는 나이 등에 따른 주파수 대역 분포, 전자 장치(100)가 존재하는 위치에 따라 달라지는 사용자 음성의 울림값 또는 조용한 실내, 외부에 공사 현장의 존재, 반려 동물의 존재, 실내에 존재하는 사람의 수 등에 따라 달라지는 음성 신호 대 잡음비(SNR) 등이 사용자 음성의 특성에 해당할 수 있다. For example, the volume of the user's voice that uttered a trigger voice such as "Hi TV", frequency band distribution according to the user's gender or age, etc., the resonance value of the user's voice that varies depending on the location of the electronic device 100, or Voice signal-to-noise ratio (SNR), which varies depending on a quiet room, the presence of a construction site outside, the presence of companion animals, the number of people present indoors, etc., may correspond to the characteristics of the user's voice.

트리거 음성을 분석한 결과를 사용자 음성의 특성으로 저장한 후 사용자 음성이 입력된 경우, 전자 장치(100)는 입력된 사용자 음성을 분석하여 저장된 사용자 음성의 특성에 해당하는지 여부를 판단한다. 즉, 전자 장치(100)는 입력된 사용자 음성의 특성이 상술한 사용자 음성의 특성과 유사한지 여부를 판단한다. When the user's voice is input after storing the result of analyzing the trigger voice as a characteristic of the user's voice, the electronic device 100 analyzes the input user's voice and determines whether it corresponds to the characteristic of the stored user's voice. That is, the electronic device 100 determines whether the characteristics of the input user voice are similar to the characteristics of the user voice described above.

그리고 입력된 사용자 음성이 저장된 사용자 음성의 특성과 유사한 특성을 갖는 것으로 판단되면, 사용자 음성에 대한 음성 인식을 수행한다. 예를 들어, 사용자가 트리거 음성으로 "Hi TV"를 발화하고, 동일한 사용자 음성의 특성으로 "채널 십삼"이라고 발화하면, 전자 장치(100)는 디스플레이하는 채널을 13번으로 변경할 수 있다. And, if it is determined that the input user voice has characteristics similar to those of the stored user voice, voice recognition is performed on the user voice. For example, if the user utters “Hi TV” as a trigger voice and “channel 13” as the characteristic of the same user voice, the electronic device 100 may change the displayed channel to channel 13.

한편, 도 2는 발명의 일 실시예에 따른, 사용자가 전자 장치(100)의 음성 인식 기능을 사용하기 위해 마이크를 포함하는 리모컨(10)을 이용하여 트리거 음성을 발화하는 모습을 도시한 도면이다.Meanwhile, FIG. 2 is a diagram illustrating a user uttering a trigger voice using the remote control 10 including a microphone to use the voice recognition function of the electronic device 100, according to an embodiment of the invention. .

구체적으로, 리모컨(10)을 통해 사용자 음성이 수신되어 음성 인식 기능이 수행되는 경우, 사용자는 리모컨(10)의 음성을 수신하기 위한 입력 버튼(20)을 누른 뒤, 사용자 음성을 발화할 수 있다. 이때, 입력 버튼(20)을 누른 뒤 첫 번째로 수신되는 사용자 음성이 트리거 음성이 될 수 있다. 즉, 전자 장치(100)에 "Hi TV"와 같은 문구가 트리거 음성으로 기 설정된 경우라도, 입력 버튼(20)을 통해 음성이 입력된다는 신호가 수신되었으므로, 입력 버튼(20)을 누른 뒤 첫 번째로 수신되는 사용자 음성이 트리거 음성이 될 수 있다. Specifically, when the user's voice is received through the remote control 10 and the voice recognition function is performed, the user can press the input button 20 to receive the voice of the remote control 10 and then utter the user's voice. . At this time, the first user voice received after pressing the input button 20 may be the trigger voice. That is, even if a phrase such as “Hi TV” is preset as a trigger voice in the electronic device 100, a signal that a voice is input through the input button 20 has been received, so after pressing the input button 20, the first The user voice received may be the trigger voice.

따라서, 도 2에 도시된 바와 같이 사용자로부터 입력 버튼(20)이 눌러지고, "채널 십"이라는 사용자 음성이 수신되면, 전자 장치(100)는 "채널 십"을 트리거 음성으로 인식할 수 있다. 그리고 전자 장치(100)는 "채널 십"이라는 트리거 음성에 의해 음성 인식 모드로 변환하고 음성 인식이 수행된다. 즉, 전자 장치(100)는 디스플레이하는 채널을 십(10) 번으로 변경할 수 있다. Accordingly, as shown in FIG. 2, when the input button 20 is pressed by the user and a user voice called “channel ship” is received, the electronic device 100 may recognize “channel ship” as a trigger voice. Then, the electronic device 100 switches to the voice recognition mode by a trigger voice called “Channel Ship” and voice recognition is performed. That is, the electronic device 100 can change the displayed channel to number ten (10).

상술한 바와 같은 전자 장치에 의해, 사용자는 트리거 음성을 발화하여 전자 장치의 음성 인식 기능을 용이하게 사용할 수 있다. 전자 장치의 구체적인 음성 인식 방법에 대해서는 후술한다. With the electronic device described above, a user can easily use the voice recognition function of the electronic device by uttering a trigger voice. The specific voice recognition method of the electronic device will be described later.

도 3은 본 발명의 일 실시예에 따른, 전자 장치의 구성을 간략히 도시한 블럭도이다. 도 3에 도시된 바와 같이, 전자 장치(100)는 음성 수신부(110), 음성 신호 분석부(120), 저장부(130) 및 제어부(140)를 포함한다. 이때, 전자 장치(100)는 TV일 수 있으나, 이는 일 실시예에 불과할 뿐, 음성 인식 기능을 포함하는 태블릿 PC, 디지털 카메라, 캠코더, 노트북 PC, PDA, 휴대폰 등과 같은 다양한 전자 장치로 구현될 수 있다.Figure 3 is a block diagram briefly illustrating the configuration of an electronic device according to an embodiment of the present invention. As shown in FIG. 3, the electronic device 100 includes a voice reception unit 110, a voice signal analysis unit 120, a storage unit 130, and a control unit 140. At this time, the electronic device 100 may be a TV, but this is only an example and can be implemented as various electronic devices such as a tablet PC with a voice recognition function, a digital camera, a camcorder, a laptop PC, a PDA, a mobile phone, etc. there is.

음성 수신부(110)는 사용자 음성을 입력받기 위한 구성요소이다. 음성 수신부(110)는 마이크(미도시)를 포함할 수 있고, 마이크를 통해 사용자 음성을 입력받을 수 있다. 마이크는 전자 장치(100)에 포함될 수 있다. 또한, 마이크는 외부 장치에 포함될 수 있다. 예를 들면, 외부 장치는 리모컨일 수 있고, 리모컨은 마이크를 통해 사용자 음성을 수신하여, 음성 수신부(110)로 전달할 수 있다. The voice receiver 110 is a component for receiving user voice input. The voice receiver 110 may include a microphone (not shown) and can receive a user's voice through the microphone. A microphone may be included in the electronic device 100. Additionally, the microphone may be included in an external device. For example, the external device may be a remote control, and the remote control may receive the user's voice through a microphone and transmit it to the voice receiver 110.

즉, 음성 수신부(110)는 마이크를 통해 입력된 사용자의 음성을 수신하여 전기적인 음성 데이터로 처리한다. 그리고 음성 수신부(110)는 처리된 음성 데이터를 음성 신호 분석부(120)로 전송한다. That is, the voice receiver 110 receives the user's voice input through the microphone and processes it into electrical voice data. Then, the voice receiving unit 110 transmits the processed voice data to the voice signal analysis unit 120.

음성 신호 분석부(120)는 음성 수신부(110)를 통해 수신된 음성을 분석하기 위한 구성요소이다. 예를 들면, 음성 신호 분석부(120)는 수신된 사용자 음성의 에너지, 주파수 대역 분포 또는 사용자 음성의 울림값(RT, reverberation time)을 구하도록 사용자 음성을 분석할 수 있다.The voice signal analysis unit 120 is a component for analyzing the voice received through the voice receiver 110. For example, the voice signal analysis unit 120 may analyze the user's voice to obtain the energy, frequency band distribution, or reverberation time (RT) of the user's voice.

또한, 사용자 음성이 수신되면서 사용자 주변의 잡음 등이 함께 수신되는 경우가 많다. 따라서, 음성 신호 분석부(120)는 사용자 음성 신호 대 잡음비(SNR, signal to noise ratio)을 구하도록 사용자 음성을 분석할 수 있다.Additionally, when the user's voice is received, noise around the user is often also received. Accordingly, the voice signal analysis unit 120 may analyze the user's voice to obtain the user's voice signal-to-noise ratio (SNR).

한편, 저장부(130)는 음성 신호 분석부(120)에 의해 분석된 다양한 결과를 저장할 수 있다. 구체적으로, 저장부(130)는 분석된 사용자 음성의 특성, 즉 수신된 사용자 음성의 에너지, 주파수 대역 분포, 사용자 음성의 울림값 또는 사용자 음성 신호 대 잡음비를 저장할 수 있다. Meanwhile, the storage unit 130 may store various results analyzed by the voice signal analysis unit 120. Specifically, the storage unit 130 may store the characteristics of the analyzed user's voice, that is, the energy of the received user's voice, the frequency band distribution, the resonance value of the user's voice, or the signal-to-noise ratio of the user's voice.

그리고 저장부(130)는 전자 장치(100)를 구동하기 위한 다양한 소프트웨어 모듈 및 데이터 등을 저장할 수 있다. 예를 들어, 저장부(130)에는 음성 인식 모듈, 베이스 모듈, 센싱 모듈, 통신 모듈, 프리젠테이션 모듈, 웹 브라우저 모듈, 서비스 모듈을 포함하는 소프트웨어가 저장될 수 있다. And the storage unit 130 can store various software modules and data for driving the electronic device 100. For example, the storage unit 130 may store software including a voice recognition module, a base module, a sensing module, a communication module, a presentation module, a web browser module, and a service module.

한편, 제어부(140)는 전자 장치(100)의 전반적인 동작을 제어하기 위한 구성요소이다. 특히, 제어부(140)는 음성 수신부(110)를 통해 입력된 음성이 트리거 음성에 해당하는지 판단할 수 있다. Meanwhile, the control unit 140 is a component that controls the overall operation of the electronic device 100. In particular, the control unit 140 may determine whether the voice input through the voice receiver 110 corresponds to the trigger voice.

트리거 음성은 전자 장치(100)의 모드를 음성 인식을 수행하기 위한 음성 인식 모드로 전환하기 위한 음성이다. 구체적으로, 트리거 음성은 전자 장치(100)에 기설정된 세네음절 길이의 단어일 수 있다. 예를 들면, 전자 장치(100)는 "Hi TV"를 트리거 음성으로 초기 설정될 수 있다. 즉, 트리거 음성은 상술한 바와 같이 음성 수신부(110)를 통해 수신된 기 설정된 문구를 발화한 사용자 음성이거나 리모컨과 같은 전자 장치(100)의 외부 장치에 포함된 입력 버튼이 누르고 첫 번째로 입력된 사용자 음성일 수 있다. The trigger voice is a voice used to change the mode of the electronic device 100 to a voice recognition mode for performing voice recognition. Specifically, the trigger voice may be a word with a length of three syllables preset in the electronic device 100. For example, the electronic device 100 may be initially set to “Hi TV” as the trigger voice. In other words, the trigger voice is the user's voice that uttered a preset phrase received through the voice receiver 110 as described above, or the first input when an input button included in an external device of the electronic device 100, such as a remote control, is pressed. It may be a user voice.

제어부(140)는 입력된 음성이 트리거 음성에 해당하는 것으로 판단된 경우, 전자 장치(100)의 모드를 음성 인식 모드로 변환할 수 있다. 그리고 제어부(140)는 음성 신호 분석부(120)를 통해 트리거 음성을 분석하여, 사용자 음성 특성을 저장하도록 저장부(130)를 제어할 수 있다. If the control unit 140 determines that the input voice corresponds to a trigger voice, the control unit 140 may change the mode of the electronic device 100 to a voice recognition mode. Additionally, the control unit 140 may analyze the trigger voice through the voice signal analysis unit 120 and control the storage unit 130 to store user voice characteristics.

사용자 음성의 특성은 음성 신호 분석부(120)가 분석한 결과의 예시인 사용자 음성의 에너지, 주파수 대역 분포, 사용자 음성의 울림값 또는 사용자 음성 신호 대 잡음비가 포함될 수 있다. Characteristics of the user's voice may include energy of the user's voice, frequency band distribution, resonance value of the user's voice, or signal-to-noise ratio of the user's voice, which are examples of results analyzed by the voice signal analysis unit 120.

또한, 제어부(140)는 전자 장치(100)를 제어하기 위한 제어 음성이 입력된 경우, 제어 음성을 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. Additionally, when a control voice for controlling the electronic device 100 is input, the control unit 140 may control the voice signal analysis unit 120 to analyze the control voice.

제어부(140)는 분석된 제어 음성의 특성을 저장부(130)에 저장된 사용자 음성의 특성과 비교할 수 있다. 그리고 제어부(140)는 비교 결과를 바탕으로 제어 음성에 대응되는 기능을 수행하도록 제어할 수 있다. The control unit 140 may compare the characteristics of the analyzed control voice with the characteristics of the user's voice stored in the storage unit 130. And the control unit 140 can control to perform a function corresponding to the control voice based on the comparison result.

구체적으로 음성 수신부(110)를 통해 입력받은 제어 음성의 에너지, 주파수 대역폭, 제어 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터가 트리거 음성에 해당하는 사용자 음성의 에너지, 주파수 대역폭, 트리거 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터의 기 설정된 범위 내인 경우, 제어부(140)는 제어 음성에 대응되는 기능을 수행하도록 제어할 수 있다. Specifically, data analyzed at least one of the energy of the control voice input through the voice receiver 110, the frequency bandwidth, the ringing value when the control voice is uttered, or the voice signal-to-noise ratio is the energy of the user voice corresponding to the trigger voice, If at least one of the frequency bandwidth, ringing value when the trigger voice is uttered, or voice signal-to-noise ratio is within a preset range, the control unit 140 may control the data to perform a function corresponding to the control voice.

또한, 제어부(140)는 음성 신호 분석부(120)를 통해 분석한 사용자 음성의 사용자 발화 컨디션 또는 사용자 발화 환경을 사용자 음성의 특성으로 저장하도록 저장부(130)를 제어할 수 있다. Additionally, the control unit 140 may control the storage unit 130 to store the user speech condition or user speech environment of the user voice analyzed through the voice signal analysis unit 120 as a characteristic of the user voice.

한편, 제어부(140)는 기 설정된 시간이 소요되면, 상기 음성 인식 모드를 종료하도록 제어할 수 있다. 즉, 음성 수신부(110)를 통해 입력된 사용자 음성이 트리거 음성인 것으로 판단되어 음성 인식 모드가 수행된 경우라도 제어 음성이 입력되지 않아 기 설정된 시간이 소요된 경우 제어부(140)는 음성 인식 모드를 종료하도록 제어할 수 있다. Meanwhile, the control unit 140 can control the voice recognition mode to end when a preset time has elapsed. That is, even when the user's voice input through the voice receiver 110 is determined to be a trigger voice and the voice recognition mode is performed, if a preset time is elapsed because the control voice is not input, the controller 140 switches on the voice recognition mode. You can control it to end.

또한, 제어부(140)는 음성 신호 분석부(120)를 통해 분석한 제어 음성의 특성이 저장부(130)에 저장된 사용자 음성 특성에 해당하지 않는 경우, 제어 음성을 바이 패스하도록 제어할 수 있다. Additionally, if the characteristics of the control voice analyzed by the voice signal analysis unit 120 do not correspond to the user voice characteristics stored in the storage unit 130, the control unit 140 may control the control voice to be bypassed.

한편, 도 4에 따르면, 전자 장치(100)는 음성 수신부(110), 음성 신호 분석부(120), 저장부(130), 제어부(140), 영상 수신부(150), 영상 처리부(160), 디스플레이부(170), 오디오 처리부(180), 오디오 출력부(190), 통신부(200) 및 입력부(210)를 포함할 수 있다. 도 4는 전자 장치(200)가 음성 인식 기능, 통신 기능, 동영상 재생 기능, 디스플레이 기능 등과 같이 다양한 기능을 구비한 장치인 경우를 예로 들어 각종 구성 요소들을 종합적으로 도시한 것이다. 따라서, 실시 예에 따라서는, 도 4에 도시된 구성 요소 중 일부는 생략 또는 변경될 수도 있고, 다른 구성요소가 더 추가될 수도 있다.Meanwhile, according to FIG. 4, the electronic device 100 includes a voice reception unit 110, a voice signal analysis unit 120, a storage unit 130, a control unit 140, an image reception unit 150, an image processing unit 160, It may include a display unit 170, an audio processing unit 180, an audio output unit 190, a communication unit 200, and an input unit 210. FIG. 4 illustrates various components comprehensively, taking the case where the electronic device 200 is a device equipped with various functions such as a voice recognition function, a communication function, a video playback function, a display function, etc. Therefore, depending on the embodiment, some of the components shown in FIG. 4 may be omitted or changed, and other components may be added.

음성 수신부(110)는 사용자 음성을 입력받기 위한 구성요소이다. 음성 수신부(110)는 마이크(미도시)를 포함할 수 있고, 마이크를 통해 사용자 음성을 수신할 수 있다. 마이크는 전자 장치(100)에 포함될 수 있다. 또한, 마이크는 외부 장치에 포함될 수 있다. 예를 들면, 외부 장치는 리모컨일 수 있고, 리모컨은 마이크를 통해 사용자 음성을 수신하여, 음성 수신부(110)로 전달할 수 있다. The voice receiver 110 is a component for receiving user voice input. The voice receiver 110 may include a microphone (not shown) and may receive a user's voice through the microphone. A microphone may be included in the electronic device 100. Additionally, the microphone may be included in an external device. For example, the external device may be a remote control, and the remote control may receive the user's voice through a microphone and transmit it to the voice receiver 110.

트리거 음성은 전자 장치(100)의 모드를 음성 인식을 수행하기 위한 음성 인식 모드로 전환하기 위한 음성이다. 구체적으로, 트리거 음성은 전자 장치(100)에 기설정된 세네음절 길이의 단어일 수 있다. 예를 들면, 전자 장치(100)는 "Hi TV"를 트리거 음성으로 초기 설정될 수 있다. 즉, 트리거 음성은 상술한 바와 같이 음성 수신부(110)를 통해 수신된 기 설정된 문구를 발화한 사용자 음성이거나 리모컨과 같은 전자 장치(100)의 외부 장치에 포함된 입력 버튼이 누르고 첫 번째로 수신된 사용자 음성일 수 있다. The trigger voice is a voice used to change the mode of the electronic device 100 to a voice recognition mode for performing voice recognition. Specifically, the trigger voice may be a word with a length of three syllables preset in the electronic device 100. For example, the electronic device 100 may be initially set to “Hi TV” as the trigger voice. That is, the trigger voice is the user's voice that uttered a preset phrase received through the voice receiver 110 as described above, or the first voice received after pressing the input button included in an external device of the electronic device 100, such as a remote control. It may be a user voice.

구체적으로, 기 설정된 문구가 "Hi TV"인 경우, 마이크를 통해 "Hi TV"가 입력되면, 제어부(140)는 전자 장치(100)의 모드를 음성 인식 모드로 변환하고, 입력된 사용자 음성 "Hi TV"를 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. 마이크는 전자 장치(100)에 포함되거나, 리모컨과 같은 전자 장치(100)의 외부 장치에 포함될 수 있다. 또한, 리모컨에 포함된 입력 버튼을 누르는 사용자 명령이 입력되고 첫 번째로 음성이 입력된 경우, 제어부(140)는 입력된 첫 번째 음성을 트리거 음성으로 판단할 수 있다. Specifically, when the preset phrase is "Hi TV" and "Hi TV" is input through the microphone, the control unit 140 converts the mode of the electronic device 100 to voice recognition mode, and the input user voice " The voice signal analysis unit 120 can be controlled to analyze “Hi TV”. The microphone may be included in the electronic device 100 or may be included in a device external to the electronic device 100, such as a remote control. Additionally, when a user command to press an input button included in the remote control is input and a voice is input first, the control unit 140 may determine that the first voice input is a trigger voice.

즉, 제어부(140)는 입력된 음성이 기 설정된 트리거 음성에 해당하는 것으로 판단된 경우, 전자 장치(100)의 모드를 음성 인식 모드로 변환하고 음성 신호 분석부(120)를 통해 트리거 음성을 분석하여, 사용자 음성 특성을 저장하도록 저장부(130)를 제어할 수 있다. 예를 들면, 제어부(140)는 트리거 음성의 에너지, 주파수 대역폭, 트리거 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 등을 분석한 결과를 저장하도록 저장부(130)를 제어할 수 있다.That is, when the control unit 140 determines that the input voice corresponds to a preset trigger voice, the control unit 140 converts the mode of the electronic device 100 to voice recognition mode and analyzes the trigger voice through the voice signal analysis unit 120. Thus, the storage unit 130 can be controlled to store user voice characteristics. For example, the control unit 140 may control the storage unit 130 to store the results of analyzing the energy of the trigger voice, the frequency bandwidth, the resonance value when the trigger voice is uttered, or the voice signal-to-noise ratio.

음성 인식 모드로 변환한 후, 음성 수신부(110)를 통해 제어 음성이 입력되면, 제어부(140)는 제어 음성을 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. 제어 음성은 전자 장치(100)의 기능을 제어할 수 있는 모든 사용자 음성을 포함할 수 있다. 또한, 제어부(140)는 기 설정된 에너지 이상을 갖는 사용자 음성을 제어 음성으로 인식하여 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. After converting to the voice recognition mode, when a control voice is input through the voice receiver 110, the controller 140 can control the voice signal analysis unit 120 to analyze the control voice. The control voice may include any user voice that can control the functions of the electronic device 100. Additionally, the control unit 140 may control the voice signal analysis unit 120 to recognize and analyze a user's voice with energy exceeding a preset level as a control voice.

예를 들어, 음성 인식 모드로 변환한 후 음성 수신부(110)를 통해 "채널 십삼"이라는 제어 음성이 수신된 경우, 제어부(140)는 음성 신호 분석부(120)를 통해 "채널 십삼"이라는 제어 음성의 에너지, 주파수 대역폭, 제어 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 등을 분석하도록 제어할 수 있다. For example, when a control voice called “channel thirteen” is received through the voice receiver 110 after conversion to the voice recognition mode, the control unit 140 controls the voice signal “channel thirteen” through the voice signal analysis unit 120. It can be controlled to analyze the energy of the voice, frequency bandwidth, resonance value when uttering the control voice, or voice signal-to-noise ratio.

"채널 십삼"이라는 제어 음성의 에너지, 주파수 대역폭, 제어 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 등의 분석 결과가 저장부(130)에 저장된 트리거 음성의 에너지, 주파수 대역폭, 트리거 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 등을 분석한 결과의 기 설정된 범위 내에 해당하는 경우 제어부(140)는 제어 음성에 대해 음성 인식을 수행하여 전자 장치(100)가 디스플레이하는 영상의 채널을 십삼 번으로 변경할 수 있다. The energy, frequency bandwidth, and utterance of the trigger voice of the trigger voice stored in the storage unit 130 are analyzed results such as the energy, frequency bandwidth, and ringing value or voice signal-to-noise ratio of the control voice called "Channel 13" when the control voice is uttered. If the result of analyzing the resonance value of the poem or the voice signal-to-noise ratio falls within a preset range, the control unit 140 performs voice recognition on the control voice and changes the channel of the image displayed by the electronic device 100 to thirteen times. It can be changed to .

또는, 외부 장치에 포함된 입력 버튼을 누르고 "채널 십삼"이라는 사용자 음성이 수신되면, 제어부(140)는 "채널 십삼"을 트리거 음성으로 판단할 수 있다. 그리고 상술한 방법에 의해 제어부(140)는 "채널 십삼"을 분석하여 분석 결과를 사용자 음성의 특성으로 저장하고, 전자 장치(100)의 모드를 음성 인식 모드로 변환할 수 있다. 또한, 음성 인식을 수행하여 전자 장치(100)가 디스플레이하는 영상의 채널을 십삼(13) 번으로 변경할 수 있다. Alternatively, when an input button included in an external device is pressed and a user voice called “channel thirteen” is received, the control unit 140 may determine “channel thirteen” as the trigger voice. And by the above-described method, the control unit 140 can analyze “channel thirteen”, store the analysis results as characteristics of the user's voice, and convert the mode of the electronic device 100 to the voice recognition mode. Additionally, by performing voice recognition, the channel of the image displayed by the electronic device 100 can be changed to number thirteen (13).

그리고 디스플레이하는 영상의 채널을 십삼 번으로 변경한 뒤, 음성 수신부(110)를 통해 "볼륨 업"이라는 제어 음성이 수신되면, 제어부(140)는 음성 신호 분석부(120)를 통해 수신된 제어 음성을 분석하여 분석한 결과가 사용자 음성의 특성의 기 설정된 범위 내에 해당하는지 여부를 판단한다. 분석된 결과가 생성된 사용자 음성의 특성의 기 설정된 범위 내에 해당하는 경우, 제어부(140)는 음성 인식을 수행하여 전자 장치(100)가 디스플레이하는 영상의 볼륨을 올리도록 제어할 수 있다. After changing the channel of the video to be displayed to number 13, when a control voice saying “volume up” is received through the voice receiver 110, the controller 140 receives the control voice through the voice signal analysis unit 120. is analyzed to determine whether the analysis result falls within a preset range of the characteristics of the user's voice. If the analysis result falls within a preset range of the characteristics of the generated user's voice, the control unit 140 may perform voice recognition and control the electronic device 100 to increase the volume of the image displayed.

한편, 음성 인식 모드로 변환한 후 음성 수신부(110)를 통해 복수의 사용자 음성이 수신된 경우, 제어부(140)는 복수의 사용자 음성 각각 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. 그리고 복수의 사용자 음성 중 저장부(130)에 저장된 사용자 음성의 특성 결과와 유사한 분석결과를 가지는 사용자 음성을 제어 음성으로 판단할 수 있다. 제어부(140)는 판단된 제어 음성에 따라 음성 인식을 수행할 수 있다.Meanwhile, when a plurality of user voices are received through the voice reception unit 110 after conversion to the voice recognition mode, the control unit 140 may control the voice signal analysis unit 120 to analyze each of the plurality of user voices. And, among the plurality of user voices, a user voice that has analysis results similar to the characteristic results of the user voice stored in the storage unit 130 may be determined as the control voice. The control unit 140 may perform voice recognition according to the determined control voice.

예를 들어, 제1, 제2 및 제3 사용자가 전자 장치(100)의 근처에서 발화하고 있으며, 제1 사용자가 기 설정된 트리거 음성인 "Hi TV"를 발화하여, 음성 수신부(110)를 통해 제1 사용자의 음성이 수신되고 제어부(140)가 전자 장치(100)의 모드를 음성 인식 모드로 변환한 경우, 제어부(140)는 제1 사용자의 트리거 음성을 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. For example, the first, second, and third users are speaking near the electronic device 100, and the first user utters “Hi TV,” a preset trigger voice, through the voice receiver 110. When the first user's voice is received and the control unit 140 converts the mode of the electronic device 100 to the voice recognition mode, the control unit 140 uses the voice signal analysis unit 120 to analyze the first user's trigger voice. can be controlled.

구체적으로, 음성 신호 분석부(120)는 제1 사용자의 음성의 에너지, 제1 사용자 음성의 주파수 대역 분포를 분석할 수 있다. 그리고 음성 신호 분석부(120)는 제1 사용자의 음성 발화 당시의 울림값(RT), 제1 사용자의 음성 신호 대 잡음비(SNR)를 분석할 수 있다. 그리고 제어부(140)는 제1 사용자의 트리거 음성에 대한 분석 결과를 사용자 음성의 특성으로 저장하도록 저장부(130)를 제어할 수 있다. Specifically, the voice signal analysis unit 120 may analyze the energy of the first user's voice and the frequency band distribution of the first user's voice. Additionally, the voice signal analysis unit 120 may analyze the resonance value (RT) at the time of the first user's voice utterance and the first user's voice signal-to-noise ratio (SNR). Additionally, the control unit 140 may control the storage unit 130 to store the analysis result of the first user's trigger voice as a characteristic of the user's voice.

트리거 음성에 대한 사용자 음성의 특성이 저장된 후, 제1, 제2 및 제3 사용자가 각각 "볼륨 업", "종료" 및 "채널 십삼"을 발화하여 음성 수신부(110)를 통해 각 사용자의 음성이 입력된 경우, 제어부(140)는 입력된 각 사용자의 음성을 분석하도록 음성 신호 분석부(120)를 제어할 수 있다. After the characteristics of the user's voice for the trigger voice are stored, the first, second, and third users utter "volume up", "end", and "channel thirteen", respectively, and the voice of each user is transmitted through the voice receiver 110. When this is input, the control unit 140 can control the voice signal analysis unit 120 to analyze the input voice of each user.

음성 신호 분석부(120)는 각 사용자의 음성의 에너지, 주파수 대역 분포, 울림값 및 음성 신호대 잡음비 중 적어도 하나를 분석할 수 있다. 그리고 제어부(140)는 각 사용자 음성의 특성에 대한 분석 결과를 저장부(130)에 저장된 사용자 음성의 특성과 비교할 수 있다. 즉, 제어부(140)는 동일한 사용자가 발화한 "볼륨 업"을 분석한 결과가 저장부(130)에 저장된 사용자 음성의 특성에 가장 유사한 것으로 판단할 수 있다. 따라서, 제어부(140)는 "볼륨 업"에 따라 전자 장치(100)의 볼륨을 올리도록 제어할 수 있다. The voice signal analysis unit 120 may analyze at least one of the energy, frequency band distribution, resonance value, and voice signal-to-noise ratio of each user's voice. Additionally, the control unit 140 may compare the analysis results for the characteristics of each user's voice with the characteristics of the user's voice stored in the storage unit 130. That is, the control unit 140 may determine that the result of analyzing “volume up” uttered by the same user is most similar to the characteristics of the user's voice stored in the storage unit 130. Accordingly, the control unit 140 may control the electronic device 100 to increase the volume according to “volume up.”

또한, 제어부(140)는 저장부(130)에 저장된 사용자 음성의 특성과 분석 결과가 상이한 제2 및 제3 사용자의 발화는 바이 패스할 수 있다. 따라서, 전자 장치(100)는 "종료" 및 "채널 십삼"을 음성 수신부(110)를 통해 입력받았으나, 이에 대응되는 기능을 수행하지 않게 된다. Additionally, the control unit 140 may bypass utterances of second and third users whose voice characteristics and analysis results are different from those of the user stored in the storage unit 130. Accordingly, the electronic device 100 receives “end” and “channel thirteen” through the voice receiver 110, but does not perform the corresponding function.

한편, 제어부(140)는 음성 인식을 수행하고, 기 설정된 시간이 소요되면 사용자 음성에 대한 음성 인식을 종료하도록 제어할 수 있다. 즉, 사용자로부터 음성 인식을 종료하기 위한 명령이 입력되지 않아도, 기 설정된 시간 동안 음성 수신부(110)를 통해 제어 음성을 포함하여 사용자 음성이 수신되지 않으면, 제어부(140)는 전자 장치(100)가 음성 인식을 종료하도록 제어할 수 있다. Meanwhile, the control unit 140 can perform voice recognition and control the voice recognition of the user's voice to end when a preset time has elapsed. That is, even if a command to end voice recognition is not input from the user, if the user's voice, including the control voice, is not received through the voice receiver 110 for a preset time, the controller 140 determines that the electronic device 100 You can control to end voice recognition.

또한, 음성 신호 분석부(120)를 통해 분석한 제어 음성의 특성이 저장부(130)에 저장된 사용자 음성 특성에 해당하지 않는 경우, 제어부(140)는 제어 음성을 바이 패스하도록 제어할 수 있다. Additionally, if the characteristics of the control voice analyzed by the voice signal analysis unit 120 do not correspond to the user voice characteristics stored in the storage unit 130, the control unit 140 may control the control voice to be bypassed.

예를 들어, 제어부(140)는 음성 신호 분석부(120)를 통해 트리거 음성으로 판단된 사용자 음성을 분석하여 음성의 발화 에너지와 음성 신호 대 잡음비를 사용자 음성의 특성으로 저장하도록 저장부(130)를 제어한다. For example, the control unit 140 analyzes the user's voice determined as a trigger voice through the voice signal analysis unit 120 and stores the utterance energy of the voice and the voice signal-to-noise ratio as characteristics of the user's voice. control.

사용자 음성의 특성이 저장된 후, 음성 수신부(110)를 통해 제어 음성이 수신되면, 제어부(140)는 음성 신호 분석부(120)를 통해 사용자 음성을 분석한 결과가 저장부(130)에 저장된 사용자 음성의 특성에 해당하는지 여부를 판단한다. 그리고 제어부(140)는 사용자 음성이 트리거 음성과 발화 에너지 또는 음성 신호 대 잡음비가 상이하여 사용자 음성의 특성에 해당하지 않는 것으로 판단되면, 상술한 바와 같이 수신된 사용자 음성에 의해 전자 장치(100)가 제어되지 않도록 수신된 사용자 음성을 바이 패스하도록 제어할 수 있다.After the characteristics of the user's voice are stored, when a control voice is received through the voice receiver 110, the control unit 140 analyzes the user's voice through the voice signal analysis unit 120 and stores the user voice in the storage unit 130. Determine whether it corresponds to the characteristics of the voice. And, if the control unit 140 determines that the user's voice does not correspond to the characteristics of the user's voice because the trigger voice is different from the utterance energy or voice signal-to-noise ratio, the electronic device 100 is controlled by the user voice received as described above. It can be controlled to bypass the received user voice so that it is not controlled.

한편, 영상 수신부(150)는 다양한 소스를 통해 영상 데이터를 수신한다. 예를 들어, 영상 수신부(150)는 외부의 방송국으로부터 방송 데이터를 수신할 수 있으며, 외부 서버로부터 실시간으로 영상 데이터를 수신할 수 있으며, 내부의 저장부(130)에 저장된 영상 데이터를 수신할 수 있다.Meanwhile, the video receiver 150 receives video data through various sources. For example, the video receiver 150 can receive broadcast data from an external broadcasting station, can receive video data in real time from an external server, and can receive video data stored in the internal storage unit 130. there is.

영상 처리부(160)는 영상 수신부(150)에서 수신한 영상 데이터에 대한 처리를 수행하는 구성요소이다. 영상 처리부(160)에서는 영상 데이터에 대한 디코딩, 스케일링, 노이즈 필터링, 프레임 레이트 변환, 해상도 변환 등과 같은 다양한 이미지 처리를 수행할 수 있다. The image processing unit 160 is a component that processes image data received from the image receiving unit 150. The image processing unit 160 can perform various image processing such as decoding, scaling, noise filtering, frame rate conversion, and resolution conversion on image data.

디스플레이부(170)는 영상 수신부(150)로부터 수신한 영상 데이터를 영상 처리부(160)에서 처리한 비디오 프레임 및 그래픽 처리부(143)에서 생성된 다양한 화면 중 적어도 하나를 디스플레이한다. The display unit 170 displays at least one of video frames processed by the image processing unit 160 from the image data received from the image receiving unit 150 and various screens generated by the graphics processing unit 143.

특히, 디스플레이부(170)는 음성 인식 모드 상태임을 알려주는 UI를 디스플레이할 수 있다. 예를 들면, 음성 수신부(110)를 통해 "Hi TV"가 입력되고, 음성 신호 분석부(120)를 통해 입력된 "Hi TV"를 분석하여, 트리거 음성으로 판단되면, 디스플레이부(170)는 트리거 음성이 인식되었으며, 전자 장치(100)의 모드가 음성 인식 모드 상태임을 알려주는 UI를 디스플레이할 수 있다. In particular, the display unit 170 may display a UI indicating that it is in voice recognition mode. For example, if “Hi TV” is input through the voice receiver 110, and “Hi TV” input through the voice signal analysis unit 120 is analyzed and determined to be a trigger voice, the display unit 170 A UI may be displayed indicating that the trigger voice has been recognized and that the electronic device 100 is in a voice recognition mode.

구체적으로 도 7에 도시된 바와 같이 디스플레이부(170)는 음성 인식 기능의 예시로 "다음과 같이 말할 수 있습니다. 채널 십구, 볼륨 업" 등을 디스플레이할 수 있다. 그리고 디스플레이부(170)는 사용자 음성을 수신할 준비가 되었다는 문구로 "무엇을 하시겠어요?"를 디스플레이할 수 있다. Specifically, as shown in FIG. 7, the display unit 170 can display, as an example of a voice recognition function, “You can say the following: Channel 19, volume up.” Additionally, the display unit 170 may display “What would you like to do?” as a phrase indicating that it is ready to receive the user's voice.

또한, 음성 인식 모드 상태에서 제어 음성이 입력되어 제어 음성에 대응되는 기능을 수행하는 경우, 디스플레이부(170)는 제어부(140)의 제어에 의해 UI에 인디케이터를 디스플레이할 수 있다. Additionally, when a control voice is input in the voice recognition mode and a function corresponding to the control voice is performed, the display unit 170 may display an indicator on the UI under the control of the control unit 140.

구체적으로, 음성 인식 모드 상태에서 음성 수신부(110)를 통해 제어 음성이 입력되고 음성 신호 분석부(120)를 통해 제어 음성이 분석되는 동안, 디스플레이부(170)는 제어 음성이 분석 중임을 알리기 위해 디스플레이부(170)의 일부에 기 설정된 색(예를 들면, 백색)을 디스플레이하거나, 기 설정된 색(예를 들면, 백색)을 깜빡거리는 인디케이터를 디스플레이할 수 있다. 상술한 바와 같은 인디케이터는 일 실시 예에 불과할 뿐, 디스플레이부(170)는 다양한 형태의 인디케이터를 음성 인식 모드 상태임을 알려주는 UI에 디스플레이할 수 있다. Specifically, in the voice recognition mode, while the control voice is input through the voice receiver 110 and the control voice is analyzed through the voice signal analysis unit 120, the display unit 170 is used to inform that the control voice is being analyzed. A part of the display unit 170 may display a preset color (for example, white), or an indicator that blinks a preset color (for example, white) may be displayed. The indicator as described above is only an example, and the display unit 170 may display various types of indicators on the UI indicating that the voice recognition mode is in the state.

예를 들면, 도 8에 도시된 바와 같이 디스플레이부(170)는 UI에 포함된 마이크 모양의 아이콘에 인디케이터를 디스플레이할 수 있다. 그리고 음성을 인식 중임을 알리기 위해 디스플레이부(170)는 "음성을 인식중입니다."와 같은 문구를 함께 디스플레이할 수 있다. For example, as shown in FIG. 8, the display unit 170 may display an indicator on a microphone-shaped icon included in the UI. Additionally, to indicate that the voice is being recognized, the display unit 170 may display a phrase such as “Voice is being recognized.”

한편, 음성 인식 모드 상태임을 알려주는 UI는 디스플레이부(170)의 일부에 디스플레이될 수 있다. 예를 들면, 도 9에 도시된 바와 같이 트리거 음성 및 제어 음성이 입력된 경우에도, 디스플레이부(170)는 디스플레이하는 컨텐츠를 계속하여 디스플레이하면서 디스플레이부(170)의 일부에 음성 인식 모드 상태임을 알리는 UI를 디스플레이할 수 있다. Meanwhile, a UI indicating the voice recognition mode state may be displayed on a portion of the display unit 170. For example, as shown in FIG. 9, even when a trigger voice and a control voice are input, the display unit 170 continues to display the content and notifies a portion of the display unit 170 that it is in the voice recognition mode. UI can be displayed.

그리고 제어 음성이 입력되어 제어 음성을 분석하는 경우에도, 도 10에 도시된 바와 같이 디스플레이부(170)는 디스플레이하는 컨텐츠를 계속하여 디스플레이하면서, 디스플레이부(170)의 일부에 음성 인식 모드 상태임을 알리는 UI에 인디케이터를 디스플레이할 수 있다. And even when a control voice is input and the control voice is analyzed, as shown in FIG. 10, the display unit 170 continues to display the content and notifies a part of the display unit 170 that it is in the voice recognition mode. You can display indicators on the UI.

오디오 처리부(180)는 오디오 데이터에 대한 처리를 수행하는 구성요소이다. 오디오 처리부(180)에서는 오디오 데이터에 대한 디코딩이나 증폭, 노이즈 필터링 등과 같은 다양한 처리가 수행될 수 있다. 오디오 처리부(180)에서 처리된 오디오 데이터는 오디오 출력부(190)로 출력될 수 있다.The audio processing unit 180 is a component that processes audio data. The audio processing unit 180 may perform various processing such as decoding, amplification, noise filtering, etc. on audio data. Audio data processed in the audio processing unit 180 may be output to the audio output unit 190.

오디오 출력부(190)는 오디오 처리부(180)에서 처리된 각종 오디오 데이터뿐만 아니라 각종 알림 음이나 음성 메시지를 출력하는 구성이다. 이때, 오디오 출력부(190)는 스피커로 구현될 수 있으나, 이는 일 실시예에 불과할 뿐, 오디오 단자로 구현될 수 있다.The audio output unit 190 is configured to output various notification sounds or voice messages as well as various audio data processed by the audio processing unit 180. At this time, the audio output unit 190 may be implemented as a speaker, but this is only an example and may be implemented as an audio terminal.

통신부(200)는 다양한 유형의 통신방식에 따라 다양한 유형의 외부 기기와 통신을 수행하는 구성이다. 통신부(200)는 USB 모듈, 와이파이 모듈, 블루투스 모듈, NFC 모듈 등과 같은 다양한 통신 모듈로 포함할 수 있다. 이때, 와이파이 모듈, 블루투스 모듈, NFC 모듈은 각각 WiFi 방식, 블루투스 방식, NFC 방식으로 통신을 수행한다. 이 중 NFC 모듈은 135kHz, 13.56MHz, 433MHz, 860~960MHz, 2.45GHz 등과 같은 다양한 RF-ID 주파수 대역들 중에서 13.56MHz 대역을 사용하는 NFC(Near Field Communication) 방식으로 동작하는 모듈을 의미한다. 와이파이 모듈이나 블루투스 모듈을 이용하는 경우에는 SSID 및 세션 키 등과 같은 각종 연결 정보를 먼저 송수신하여, 이를 이용하여 통신 연결한 후 각종 정보들을 송수신할 수 있다. The communication unit 200 is configured to communicate with various types of external devices according to various types of communication methods. The communication unit 200 may include various communication modules such as a USB module, Wi-Fi module, Bluetooth module, and NFC module. At this time, the Wi-Fi module, Bluetooth module, and NFC module perform communication in the WiFi method, Bluetooth method, and NFC method, respectively. Among these, the NFC module refers to a module that operates in the NFC (Near Field Communication) method using the 13.56MHz band among various RF-ID frequency bands such as 135kHz, 13.56MHz, 433MHz, 860~960MHz, 2.45GHz, etc. When using a Wi-Fi module or a Bluetooth module, various connection information such as SSID and session key are first transmitted and received, and various information can be transmitted and received after establishing a communication connection using this.

또한, 통신부(200)는 사용자 음성 및 제어 음성이 외부 장치를 통해 입력된 경우, 입력된 음성을 수신할 수 있다. 예를 들면, 사용자 음성이 리모컨에 포함된 마이크를 통해 입력된 경우 전자 장치(100)는 통신부(200)를 통해 입력된 음성을 수신할 수 있다. Additionally, when the user voice and control voice are input through an external device, the communication unit 200 can receive the input voice. For example, when a user's voice is input through a microphone included in the remote control, the electronic device 100 may receive the voice input through the communication unit 200.

입력부(210)는 전자 장치(100)의 전반적인 동작을 제어하기 위한 사용자 명령을 수신한다. 이때, 입력부(210)는 상하 좌우의 4 방향 키 및 확인 키를 포함하는 리모컨으로 구현될 수 있으나, 이는 일 실시예에 불과할 뿐, 터치 스크린, 마우스, 포인팅 디바이스 등과 같은 다양한 입력 장치에 의해 구현될 수 있다. The input unit 210 receives user commands for controlling the overall operation of the electronic device 100. At this time, the input unit 210 may be implemented as a remote control including four directional keys up, down, left, and right and a confirmation key, but this is only an example and may be implemented by various input devices such as a touch screen, mouse, pointing device, etc. You can.

또한, 입력부(210)가 리모컨으로 구현되는 경우, 리모컨은 음성을 수신하기 위한 입력 버튼을 포함할 수 있다. 사용자는 리모컨의 입력 버튼을 누른 뒤, 사용자 음성을 발화할 수 있다. 또한, 입력 버튼을 누른 뒤 첫 번째로 수신되는 사용자 음성이 트리거 음성이 될 수 있다. Additionally, when the input unit 210 is implemented as a remote control, the remote control may include an input button for receiving voice. The user can press the input button on the remote control and then speak the user's voice. Additionally, the first user voice received after pressing the input button can be the trigger voice.

한편, 제어부(140)는 도 4에 도시된 바와 같이, RAM(141), ROM(142), 그래픽 처리부(143), 메인 CPU(144), 제1 내지 n 인터페이스(145-1 ~ 145-n), 버스(146)를 포함한다. 이때, RAM(141), ROM(142), 그래픽 처리부(143), 메인 CPU(144), 제1 내지 n 인터페이스(145-1 ~ 145-n) 등은 버스(146)를 통해 서로 연결될 수 있다. Meanwhile, as shown in FIG. 4, the control unit 140 includes RAM 141, ROM 142, graphics processing unit 143, main CPU 144, and first to n interfaces 145-1 to 145-n. ), including bus 146. At this time, RAM 141, ROM 142, graphics processing unit 143, main CPU 144, first to n interfaces (145-1 to 145-n), etc. may be connected to each other through bus 146. .

ROM(142)에는 시스템 부팅을 위한 명령어 세트 등이 저장된다. 턴 온 명령이 입력되어 전원이 공급되면, 메인 CPU(144)는 ROM(142)에 저장된 명령어에 따라 저장부(130)에 저장된 O/S를 RAM(141)에 복사하고, O/S를 실행시켜 시스템을 부팅시킨다. 부팅이 완료되면, 메인 CPU(144)는 저장부(130)에 저장된 각종 어플리케이션 프로그램을 RAM(141)에 복사하고, RAM(141)에 복사된 어플리케이션 프로그램을 실행시켜 각종 동작을 수행한다. The ROM 142 stores a set of instructions for booting the system. When a turn-on command is input and power is supplied, the main CPU 144 copies the O/S stored in the storage unit 130 to the RAM 141 according to the command stored in the ROM 142 and executes the O/S. to boot the system. When booting is complete, the main CPU 144 copies various application programs stored in the storage unit 130 to the RAM 141 and executes the application programs copied to the RAM 141 to perform various operations.

그래픽 처리부(143)는 연산부(미도시) 및 렌더링 부(미도시)를 이용하여 아이콘, 이미지, 텍스트 등과 같은 다양한 객체를 포함하는 화면을 생성한다. 연산부는 입력부(210)로부터 수신된 제어 명령을 이용하여 화면의 레이아웃에 따라 각 객체들이 표시될 좌표값, 형태, 크기, 컬러 등과 같은 속성값을 연산한다. 렌더링부는 연산부에서 연산한 속성값에 기초하여 객체를 포함하는 다양한 레이아웃의 화면을 생성한다. 렌더링부에서 생성된 화면은 디스플레이부(170)의 디스플레이 영역 내에 표시된다. 특히, 그래픽 처리부(143)는 트리거 음성이 인식되었으며, 사용자 음성을 수신할 수 있음을 나타내는 UI를 생성할 수 있다. The graphics processing unit 143 uses a calculation unit (not shown) and a rendering unit (not shown) to create a screen including various objects such as icons, images, and text. The calculation unit uses control commands received from the input unit 210 to calculate attribute values such as coordinates, shape, size, color, etc. for each object to be displayed according to the layout of the screen. The rendering unit generates screens with various layouts including objects based on the attribute values calculated by the calculation unit. The screen generated by the rendering unit is displayed within the display area of the display unit 170. In particular, the graphics processing unit 143 may generate a UI indicating that the trigger voice has been recognized and that the user voice can be received.

메인 CPU(144)는 저장부(130)에 액세스하여, 저장부(130)에 저장된 O/S를 이용하여 부팅을 수행한다. 그리고 메인 CPU(144)는 저장부(130)에 저장된 각종 프로그램, 컨텐츠, 데이터 등을 이용하여 다양한 동작을 수행한다. The main CPU 144 accesses the storage unit 130 and performs booting using the O/S stored in the storage unit 130. And the main CPU 144 performs various operations using various programs, contents, data, etc. stored in the storage unit 130.

또한, 제1 내지 n 인터페이스(295-1 내지 295-n)는 상술한 각종 구성요소들과 연결된다. Additionally, the first to n interfaces 295-1 to 295-n are connected to the various components described above.

한편, 도 5는 본 발명의 일 실시예에 따른, 사용자 음성을 분석하여 음성 인식을 수행하는 방법을 도시한 흐름도이다. Meanwhile, Figure 5 is a flowchart showing a method of performing voice recognition by analyzing a user's voice according to an embodiment of the present invention.

먼저, 전자 장치(100)는 사용자 음성을 입력받는다(S500). 전자 장치(100)는 전자 장치(100)에 포함된 마이크를 통해 사용자 음성을 입력받을 수 있다. 그리고 리모컨과 같은 외부 장치에 포함된 마이크를 통해 사용자 음성을 입력받을 수도 있다. First, the electronic device 100 receives a user's voice (S500). The electronic device 100 may receive a user's voice input through a microphone included in the electronic device 100. Additionally, the user's voice can be input through a microphone included in an external device such as a remote control.

전자 장치(100)는 입력된 사용자 음성이 트리거 음성에 해당하는지 여부를 판단한다(S510). 트리거 음성은 전자 장치의 모드를 음성 인식을 수행하기 위한 음성 인식 모드로 전환하기 위한 음성일 수 있다. 그리고 트리거 음성은 전자 장치(100)에 기 설정되거나, 사용자의 설정에 의해 일정한 길이를 가지는 특정 문구로 설정될 수 있다. 예를 들면, 트리거 음성이 "Hi, TV"로 기 설정된 경우, 전자 장치(100)는 수신한 사용자 음성이 "Hi, TV"인지 여부를 판단한다. 그리고 수신한 사용자 음성이 "Hi, TV"로 판단된 경우(S510-Y), 전자 장치(100)는 전자 장치의 모드를 음성 인식 모드로 변환한다(S520). The electronic device 100 determines whether the input user voice corresponds to the trigger voice (S510). The trigger voice may be a voice used to switch the mode of the electronic device to a voice recognition mode for performing voice recognition. Additionally, the trigger voice may be preset in the electronic device 100, or may be set to a specific phrase with a certain length according to the user's settings. For example, when the trigger voice is preset to “Hi, TV,” the electronic device 100 determines whether the received user voice is “Hi, TV.” And when the received user voice is determined to be “Hi, TV” (S510-Y), the electronic device 100 changes the mode of the electronic device to the voice recognition mode (S520).

한편, 사용자 음성이 리모컨과 같은 외부 장치를 통해 수신되는 경우, 전자 장치(100)는 리모컨과 같은 외부 장치에 포함된 입력 버튼이 눌러진 뒤, 첫 번째로 수신된 사용자 음성을 트리거 음성으로 인식할 수 있다. 즉, 트리거 음성이 "Hi TV"로 기설정된 경우에도 전자 장치(100)는 리모컨과 같은 외부 장치에 포함된 입력 버튼이 눌러진 뒤, 첫 번째로 "채널 십삼"이 입력되면, 입력된 "채널 십삼" 자체를 트리거 음성으로 인식할 수 있다. Meanwhile, when the user voice is received through an external device such as a remote control, the electronic device 100 recognizes the first received user voice as a trigger voice after the input button included in the external device such as a remote control is pressed. You can. In other words, even if the trigger voice is preset to “Hi TV,” the electronic device 100 detects the input “channel thirteen” as the first input after the input button included in an external device such as a remote control is pressed. “Thirteen” itself can be recognized as a trigger voice.

그리고 전자 장치(100)는 트리거 음성으로 판단된 사용자 음성을 분석하여 사용자 음성의 특성을 저장한다(S530). 구체적으로 사용자 음성의 특성은 사용자 발화 컨디션 및 발화 환경을 포함할 수 있다. 사용자 발화 컨디션은 사용자 음성의 에너지, 사용자 음성의 주파수 대역 분포를 포함할 수 있다. 그리고 발화 환경은 사용자 음성 발화 당시의 울림값(RT) 또는 음성 신호 대 잡음비(SNR)를 포함할 수 있다. 즉, 사용자 음성의 특성은 사용자 음성의 에너지, 사용자 음성의 주파수 대역 분포, 음성 발화 당시의 울림값(RT) 또는 음성 신호 대 잡음비(SNR)를 포함할 수 있다. Then, the electronic device 100 analyzes the user voice determined to be the trigger voice and stores the characteristics of the user voice (S530). Specifically, the characteristics of the user's voice may include the user's speech condition and speech environment. The user's speech condition may include the energy of the user's voice and the frequency band distribution of the user's voice. And the speech environment may include the resonance value (RT) or speech signal-to-noise ratio (SNR) at the time of the user's voice speech. That is, the characteristics of the user's voice may include the energy of the user's voice, the frequency band distribution of the user's voice, the resonance value (RT) at the time of voice utterance, or the voice signal-to-noise ratio (SNR).

전자 장치(100)를 제어하기 위한 제어 음성이 입력된 경우(S540-Y), 전자 장치(100)는 제어 음성을 분석하여 제어 음성의 특성을 사용자 음성의 특성과 비교한다(S550). 제어 음성은 전자 장치(100)의 기능을 제어할 수 있는 모든 사용자 음성을 포함할 수 있다. 또한, 전자 장치(100)는 기 설정된 에너지 이상을 갖는 사용자 음성을 제어 음성으로 인식하여 분석할 수 있다.When a control voice for controlling the electronic device 100 is input (S540-Y), the electronic device 100 analyzes the control voice and compares the characteristics of the control voice with the characteristics of the user's voice (S550). The control voice may include any user voice that can control the functions of the electronic device 100. Additionally, the electronic device 100 may recognize and analyze a user's voice with energy exceeding a preset level as a control voice.

그리고 전자 장치(100)는 비교 결과를 바탕으로 제어 음성에 대응되는 기능을 수행한다(S560). 구체적으로 제어 음성의 에너지, 주파수 대역폭, 상기 제어 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터가 트리거 음성에 해당하는 사용자 음성의 에너지, 주파수 대역폭, 상기 트리거 음성에 대응되는 사용자 음성의 발화 시의 울림값 또는 음성 신호 대 잡음비 중 적어도 하나를 분석한 데이터의 기 설정된 범위 내인 경우, 전자 장치(100)는 제어 음성에 대응되는 기능을 수행한다. Then, the electronic device 100 performs a function corresponding to the control voice based on the comparison result (S560). Specifically, data analyzing at least one of the energy of the control voice, the frequency bandwidth, the ringing value when the control voice is uttered, or the voice signal-to-noise ratio corresponds to the energy, frequency bandwidth, and trigger voice of the user's voice corresponding to the trigger voice. If at least one of the resonance value or the voice signal-to-noise ratio when the user's voice is uttered is within a preset range, the electronic device 100 performs a function corresponding to the control voice.

입력받은 사용자 음성 "Hi TV"가 트리거 음성으로 판단되어 전자 장치(100)의 모드를 음성 인식 모드로 변환하고, "Hi TV"의 에너지 및 "Hi TV" 발화 당시의 울림값을 사용자 음성 특성으로 저장한 후, 전자 장치(100)가 제어 음성 "채널 십삼"이 입력받은 경우를 예로 든다.The input user voice "Hi TV" is determined to be a trigger voice, and the mode of the electronic device 100 is converted to voice recognition mode, and the energy of "Hi TV" and the resonance value at the time of uttering "Hi TV" are converted into user voice characteristics. After saving, an example is given where the electronic device 100 receives the control voice “channel thirteen”.

전자 장치(100)는 입력된 제어 음성 "채널 십삼"의 에너지 및 제어 음성 발화 당시의 울림값을 분석한다. 그리고 분석한 데이터가 저장된 트리거 음성의 사용자 음성 특성의 데이터 값의 오차 범위 10% 내에 해당하는 경우, 전자 장치(100)는 트리거 음성을 발화한 사용자와 제어 음성을 발화한 사용자가 동일하다고 판단할 수 있다. 또한, 전자 장치(100)는 입력된 제어 음성이 사용자가 전자 장치(100)를 제어하기 위해 의도적으로 발화한 것으로 판단할 수 있다. 따라서 전자 장치(100)는 디스플레이되는 영상을 채널 십삼 번으로 변경하도록 제어할 수 있다. The electronic device 100 analyzes the energy of the input control voice “channel thirteen” and the resonance value at the time the control voice is uttered. And, if the analyzed data falls within the error range of 10% of the data value of the user voice characteristics of the stored trigger voice, the electronic device 100 may determine that the user who uttered the trigger voice and the user who uttered the control voice are the same. there is. Additionally, the electronic device 100 may determine that the input control voice was intentionally uttered by the user to control the electronic device 100. Accordingly, the electronic device 100 can control the displayed image to change to channel number 13.

한편, 비교 데이터의 오차 범위가 10%인 것은 일 실시 예에 불과할 뿐, 전자 장치(100)의 초기 설정 및 환경 등에 따라 상이할 수 있다. Meanwhile, the 10% error range of the comparison data is only an example, and may vary depending on the initial settings and environment of the electronic device 100.

또한, 발화 에너지 및 울림값은 사용자 음성의 특성의 일 실시예에 불과할 뿐, 음성의 주파수 대역 분포, 음성의 발화 시의 울림값(RT), 음성 신호 대 잡음비(SNR) 등 음성을 발화하는 사용자 및 발화 환경에 따라 달라지는 모든 값을 분석하여 사용자 음성 인식 조건으로 저장할 수 있음을 물론이다. In addition, utterance energy and resonance value are only examples of the characteristics of the user's voice, and the user who utters the voice, such as the frequency band distribution of the voice, the echo value (RT) when the voice is uttered, and the voice signal-to-noise ratio (SNR), etc. Of course, all values that vary depending on the speech environment can be analyzed and stored as user voice recognition conditions.

한편, 도 6은 본 발명의 일 실시예에 따른, 사용자 음성을 분석하여 음성 인식을 수행하고 음성 인식 모드를 종료하는 방법을 도시한 흐름도이다. Meanwhile, Figure 6 is a flowchart illustrating a method of analyzing a user's voice, performing voice recognition, and terminating the voice recognition mode, according to an embodiment of the present invention.

먼저, 전자 장치(100)는 사용자 음성을 입력받는다(S600). 전자 장치(100)는 전자 장치(100)에 포함된 마이크를 통해 사용자 음성을 입력받을 수 있다. 그리고 리모컨과 같은 외부 장치에 포함된 마이크를 통해 사용자 음성을 입력받을 수도 있다. First, the electronic device 100 receives a user's voice (S600). The electronic device 100 may receive a user's voice input through a microphone included in the electronic device 100. Additionally, the user's voice can be input through a microphone included in an external device such as a remote control.

전자 장치(100)는 입력된 사용자 음성이 트리거 음성에 해당하는지 여부를 판단한다(S610). 트리거 음성은 전자 장치의 모드를 음성 인식을 수행하기 위한 음성 인식 모드로 전환하기 위한 음성일 수 있다. 그리고 트리거 음성은 전자 장치(100)에 기 설정되거나, 사용자의 설정에 의해 일정한 길이를 가지는 특정 문구로 설정될 수 있다. 예를 들면, 트리거 음성이 "Hi, TV"로 기 설정된 경우, 전자 장치(100)는 수신한 사용자 음성이 "Hi, TV"인지 여부를 판단한다. 그리고 수신한 사용자 음성이 "Hi, TV"로 판단된 경우(S610-Y), 전자 장치(100)는 전자 장치의 모드를 음성 인식 모드로 변환한다(S620). The electronic device 100 determines whether the input user voice corresponds to the trigger voice (S610). The trigger voice may be a voice used to switch the mode of the electronic device to a voice recognition mode for performing voice recognition. Additionally, the trigger voice may be preset in the electronic device 100, or may be set to a specific phrase with a certain length according to the user's settings. For example, when the trigger voice is preset to “Hi, TV,” the electronic device 100 determines whether the received user voice is “Hi, TV.” And when the received user voice is determined to be “Hi, TV” (S610-Y), the electronic device 100 changes the mode of the electronic device to the voice recognition mode (S620).

그리고 전자 장치(100)는 트리거 음성으로 판단된 사용자 음성을 분석하여 사용자 음성의 특성을 저장한다(S630). 구체적으로 사용자 음성의 특성은 사용자 발화 컨디션 및 발화 환경을 포함할 수 있다. 사용자 발화 컨디션은 사용자 음성의 에너지, 사용자 음성의 주파수 대역 분포를 포함할 수 있다. 그리고 발화 환경은 사용자 음성 발화 당시의 울림값(RT) 또는 음성 신호 대 잡음비(SNR)를 포함할 수 있다. 즉, 사용자 음성의 특성은 사용자 음성의 에너지, 사용자 음성의 주파수 대역 분포, 음성 발화 당시의 울림값(RT) 또는 음성 신호 대 잡음비(SNR)를 포함할 수 있다. Then, the electronic device 100 analyzes the user voice determined to be the trigger voice and stores the characteristics of the user voice (S630). Specifically, the characteristics of the user's voice may include the user's speech condition and speech environment. The user's speech condition may include the energy of the user's voice and the frequency band distribution of the user's voice. And the speech environment may include the resonance value (RT) or speech signal-to-noise ratio (SNR) at the time of the user's voice speech. That is, the characteristics of the user's voice may include the energy of the user's voice, the frequency band distribution of the user's voice, the resonance value (RT) at the time of voice utterance, or the voice signal-to-noise ratio (SNR).

전자 장치(100)를 제어하기 위한 제어 음성이 입력된 경우(S640-Y), 전자 장치(100)는 제어 음성을 분석하여 제어 음성의 특성을 사용자 음성의 특성과 비교한다(S650). 제어 음성은 전자 장치(100)의 기능을 제어할 수 있는 모든 사용자 음성을 포함할 수 있다. 또한, 전자 장치(100)는 기 설정된 에너지 이상을 갖는 사용자 음성을 제어 음성으로 인식하여 분석할 수 있다.When a control voice for controlling the electronic device 100 is input (S640-Y), the electronic device 100 analyzes the control voice and compares the characteristics of the control voice with the characteristics of the user's voice (S650). The control voice may include any user voice that can control the functions of the electronic device 100. Additionally, the electronic device 100 may recognize and analyze a user's voice with energy exceeding a preset level as a control voice.

분석된 제어 음성의 특성이 저장된 사용자 음성의 특성의 기 설정된 범위 내 포함되는 경우(S660-Y), 전자 장치(100)는 제어 음성에 대응되는 기능을 수행한다(S670). If the characteristics of the analyzed control voice are within the preset range of the characteristics of the stored user voice (S660-Y), the electronic device 100 performs a function corresponding to the control voice (S670).

한편, 분석된 제어 음성의 특성이 저장된 사용자 음성의 특성의 기 설정된 범위 내 포함되지 않는 경우(S660-N), 전자 장치(100)는 제어 음성을 바이 패스한다(S680). Meanwhile, if the characteristics of the analyzed control voice do not fall within the preset range of the characteristics of the stored user voice (S660-N), the electronic device 100 bypasses the control voice (S680).

반면, 입력된 제어 음성 "채널 십삼"의 에너지 및 제어 음성 발화 당시의 울림값을 분석한 데이터가 트리거 음성 "Hi TV"의 에너지 및 발화 당시의 울림값을 분석한 데이터의 오차 범위 10% 내에 해당하지 않는 경우, 전자 장치(100)는 입력된 제어 음성 "채널 십삼"을 바이 패스할 수 있다. 따라서, 전자 장치(100)는 디스플레이하는 영상의 채널을 변경하지 않을 수 있다. On the other hand, the data analyzing the energy of the input control voice "Channel Thirteen" and the sound value at the time of utterance of the control voice fall within the 10% error range of the data analyzing the energy and sound value at the time of utterance of the trigger voice "Hi TV". If not, the electronic device 100 may bypass the input control voice “channel thirteen”. Accordingly, the electronic device 100 may not change the channel of the image being displayed.

한편, 기 설정된 시간이 소요되는 경우(S690-Y), 전자 장치(100)는 음성 인식 모드를 종료한다(S700). 즉, 제어 음성에 대응되는 기능을 수행한 뒤, 음성 인식 모드를 종료하기 위한 사용자 명령의 입력이 없어도, 기 설정된 시간 동안 제어 음성이 입력되지 않는 경우, 정자 장치(100)는 자동으로 음성 인식 모드를 종료한다. 따라서, 전자 장치(100)는 음성 인식 모드가 종료된 뒤 입력된 사용자 음성이 트리거 음성으로 다시 판단될 때까지 입력된 사용자 음성에 대응하여 전자 장치(100)의 기능을 수행하지 않을 수 있다. Meanwhile, when a preset time is elapsed (S690-Y), the electronic device 100 ends the voice recognition mode (S700). That is, after performing the function corresponding to the control voice, even if there is no input of a user command to end the voice recognition mode, if the control voice is not input for a preset time, the sperm device 100 automatically enters the voice recognition mode. Terminate. Accordingly, the electronic device 100 may not perform its function in response to the input user voice until the input user voice is again determined to be a trigger voice after the voice recognition mode ends.

상술한 바와 같은 음성 인식 방법에 따라, 사용자는 별도의 조작 없이도 음성을 발화하는 방법으로 용이하게 오차 없이 전자 장치의 기능을 제어할 수 있게 된다. According to the voice recognition method described above, the user can easily control the functions of the electronic device without error by uttering a voice without any additional manipulation.

상술한 다양한 실시 예들에 따른 전자 장치의 음성 인식 방법은 소프트웨어로 코딩되어 비일시적 판독 가능 매체(non-transitory readable medium)에 저장될 수 있다. 이러한 비일시적 판독 가능 매체는 다양한 장치에 탑재되어 사용될 수 있다. The voice recognition method of an electronic device according to the various embodiments described above may be coded into software and stored in a non-transitory readable medium. These non-transitory readable media can be mounted and used in various devices.

일 예로, 사용자 음성이 입력된 경우, 상기 입력된 사용자 음성이 트리거 음성에 해당하는지 판단하는 단계, 상기 사용자 음성이 트리거 음성으로 판단된 경우, 상기 전자 장치의 모드를 음성 인식 모드로 변환하고, 상기 사용자 음성을 분석하여 상기 사용자 음성의 특성을 저장하는 단계, 상기 전자 장치를 제어하기 위한 제어 음성이 입력된 경우, 상기 제어 음성을 분석하여 분석된 제어 음성의 특성을 상기 사용자 음성의 특성과 비교하는 단계 및 상기 비교 결과를 바탕으로 상기 제어 음성에 대응되는 기능을 수행하는 단계를 수행하기 위한 프로그램 코드가 비일시적 판독 가능 매체에 저장되어 제공될 수 있다. 그 밖에도, 상술한 다양한 실시 예들에서 설명한 절전 처리 방법이 프로그램으로 코딩되어 비일시적 판독 가능 매체에 저장될 수 있다. For example, when a user voice is input, determining whether the input user voice corresponds to a trigger voice; when the user voice is determined to be a trigger voice, converting the mode of the electronic device to a voice recognition mode, Analyzing a user's voice and storing the characteristics of the user's voice; When a control voice for controlling the electronic device is input, analyzing the control voice and comparing the characteristics of the analyzed control voice with the characteristics of the user's voice A program code for performing the step and performing the function corresponding to the control voice based on the comparison result may be stored and provided in a non-transitory readable medium. In addition, the power saving processing method described in the various embodiments described above may be coded into a program and stored in a non-transitory readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등이 될 수 있다.A non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories. Specifically, it can be a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, etc.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안 될 것이다.In addition, although preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the invention pertains without departing from the gist of the present invention as claimed in the claims. Of course, various modifications can be made by those of ordinary skill in the art, and these modifications should not be understood individually from the technical idea or perspective of the present invention.

100 : 전자 장치 110 : 음성 수신부
120 : 음성 신호 분석부 130 : 저장부
140 : 제어부 150 : 영상 수신부
160 : 영상 처리부 170 : 디스플레이부
180 : 오디오 처리부 190 : 오디오 출력부
200 : 통신부 210 : 입력부100: Electronic device 110: Voice receiver
120: Voice signal analysis unit 130: Storage unit
140: Control unit 150: Video receiving unit
160: image processing unit 170: display unit
180: audio processing unit 190: audio output unit
200: communication unit 210: input unit

Claims

In the display device,
video receiver;
display unit;
Voice input receiver;
Ministry of Communications; and
Controlling the display unit to output broadcast content received through the video receiver,
While the received broadcast content is being output on the display unit, a first voice input is received from an external device connected through the communication unit through the voice input receiver without receiving a signal corresponding to the voice input reception from the external device,
When the received first voice input corresponds to a preset trigger word, a voice recognition function corresponding to the received first voice input is performed, and a voice recognition function corresponding to the performance of the voice recognition function is performed together with the broadcast content being output. 1 Control the display unit to output UI,
If a second voice input corresponding to function control of the display device is received within a preset time after the voice recognition function is performed, control the function of the display device corresponding to the received second voice input,
If the second voice input corresponding to the function control of the display device is not received within a preset time after the voice recognition function is performed, the first voice input is received without a user command corresponding to termination of the performed voice recognition function. It includes a control unit that terminates the corresponding voice recognition function,
A display device wherein the preset trigger word can be changed according to user selection.

According to paragraph 1,
The control unit,
A display device that changes the state of the display device to a voice recognition state capable of performing the voice recognition function when the received first voice input corresponds to a trigger word that is preset or selected by the user.

According to paragraph 1,
The first UI corresponding to performing the voice recognition function is,
A display device comprising a UI indicating that the state of the display device is a voice recognition state in which the voice recognition function is performed and a state in which the second voice input can be received.

According to paragraph 1,
The control unit,
A display device that controls the display unit to display, when a second voice input corresponding to function control of the display device is received, a second UI indicating that the received second voice input is being processed along with the broadcast content being output. .

According to paragraph 4,
The first and second UI are each output to a partial area of the display unit.

According to paragraph 1,
The first UI corresponding to performing the voice recognition function is,
A display device comprising a plurality of texts corresponding to a plurality of voice inputs that can be input to control functions of the display device in response to the performed voice recognition function.

According to paragraph 1,
The control unit,
If a third voice input is not received within a preset time after the function of the display device corresponding to the received second voice input is controlled, the first voice is transmitted without a user command corresponding to termination of the performed voice recognition function. A display device that terminates the voice recognition function corresponding to input.

In clause 7,
The control unit,
If the voice recognition function is terminated because the third voice input is not received within the preset time, the voice recognition function is terminated through the voice input receiver until the fourth voice input corresponding to the preset trigger word is received through the voice input receiver. A display device that does not perform a function corresponding to received voice input.

According to paragraph 1,
The control unit,
A display device that identifies whether the received first voice input corresponds to the preset trigger word.

According to clause 9,
The voice input receiver includes a microphone of the display device,
The control unit,
When a signal corresponding to the reception of a voice input through the microphone of the external device is received through the communication unit, it is not identified whether the first voice input received through the microphone of the display device corresponds to the preset trigger word. Display device.

A method of controlling a display device including a voice input receiver, a communication unit, and a display unit,
outputting broadcast content to the display unit;
While the broadcast content is being output to the display unit, receiving a first voice input from an external device connected through the communication unit through the voice input receiver without receiving a signal corresponding to reception of voice input from the external device;
When the received first voice input corresponds to a preset trigger word, a voice recognition function corresponding to the received first voice input is performed, and a voice recognition function corresponding to the performance of the voice recognition function is performed together with the broadcast content being output. 1 Outputting UI to the display unit;
If a second voice input corresponding to function control of the display device is received within a preset time after the voice recognition function is performed, controlling a function of the display device corresponding to the received second voice input;
If the second voice input corresponding to the function control of the display device is not received within a preset time after the voice recognition function is performed, the first voice input is received without a user command corresponding to termination of the performed voice recognition function. Including; terminating the corresponding voice recognition function,
A control method in which the preset trigger word can be changed according to user selection.

According to clause 11,
If the received first voice input corresponds to a preset or a trigger word selected by the user, changing the state of the display device to a voice recognition state capable of performing the voice recognition function; control further comprising method.

According to clause 11,
The first UI corresponding to performing the voice recognition function is,
A control method comprising a UI indicating that the state of the display device is a voice recognition state in which the voice recognition function is performed and a state in which the second voice input can be received.

According to clause 11,
When a second voice input corresponding to function control of the display device is received, outputting a second UI indicating that the received second voice input is being processed along with the broadcast content being output to the display unit; Including, control method.

According to clause 14,
The first and second UI are each output to a partial area of the display unit.

According to clause 11,
The first UI corresponding to performing the voice recognition function is,
A control method comprising a plurality of texts corresponding to a plurality of voice inputs that can be input to control the function of the display device in response to the performed voice recognition function.

According to clause 11,
The ending step is,
If a third voice input is not received within a preset time after the function of the display device corresponding to the received second voice input is controlled, the first voice is transmitted without a user command corresponding to termination of the performed voice recognition function. A control method for terminating the voice recognition function corresponding to input.

According to clause 17,
If the voice recognition function is terminated because the third voice input is not received within the preset time, the voice recognition function is terminated through the voice input receiver until the fourth voice input corresponding to the preset trigger word is received through the voice input receiver. A control method in which a function corresponding to a received voice input is not performed.

According to clause 11,
Further comprising: identifying whether the received first voice input corresponds to the preset trigger word.

According to clause 19,
The voice input receiver includes a microphone of the display device,
When a signal corresponding to the reception of a voice input through the microphone of the external device is received through the communication unit, it is not identified whether the first voice input received through the microphone of the display device corresponds to the preset trigger word. Control method.