KR20210025348A

KR20210025348A - Electronic device and Method for controlling the electronic device thereof

Info

Publication number: KR20210025348A
Application number: KR1020190105272A
Authority: KR
Inventors: 이재철; 김해종
Original assignee: 삼성전자주식회사
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2021-03-09
Also published as: WO2021040201A1

Abstract

Disclosed is an electronic device. The electronic device includes: an acoustic amplification unit; a speaker; a microphone; a microphone acoustic amplification unit; and a processor which amplifies an audio signal through the acoustic amplification unit, outputs the amplified audio signal through the speaker, and when the audio signal output through the speaker and a user′s voice signal are input to the microphone, amplifies the signal input to the microphone through the microphone acoustic amplification unit, acquires a voice signal by performing acoustic echo cancellation on the amplified signal, and performs pre-processing for voice recognition preprocessing of the acquired voice signal. The processor determines a gain of the microphone acoustic amplification unit on the basis of the gain of the acoustic amplification unit, and amplifies a signal input through the microphone acoustic amplification unit on the basis of the determined gain. According to the present invention, voice recognition is improved by preventing the audio signal or a user′s voice signal output from the speaker and input to the microphone from being clipped.

Description

Electronic device and method for controlling the electronic device thereof

본 개시는 전자 장치 및 이의 제어 방법에 관한 것으로, 더욱 상세하게는 사용자 음성에 대한 응답을 제공하는 전자 장치 및 이의 제어 방법에 관한 것이다.The present disclosure relates to an electronic device and a method for controlling the same, and more particularly, to an electronic device for providing a response to a user's voice, and a method for controlling the same.

전자 기술의 발전으로, 근래에는 전자 장치는 사용자 음성을 입력받고, 사용자 음성에 대한 응답을 제공하고 있다. 예를 들어, 날씨를 문의하는 사용자 음성에 대해, 전자 장치는 현재 날씨에 대한 정보를 사용자에게 제공할 수 있다. With the development of electronic technology, in recent years, electronic devices receive a user voice and provide a response to the user voice. For example, in response to a user's voice inquiring about weather, the electronic device may provide information about the current weather to the user.

한편, 스피커가 내장된 전자 장치의 경우, 음성 인식률을 향상시키기 위해, 스피커에서 출력하는 오디오 신호를 마이크에 입력된 오디오 신호와 비교하여 음향 반향 제거(acoustic echo cancellation, AEC)를 수행하고 있다. 이 경우, 음향 반향 제거를 효과적으로 수행하지 못하는 경우, 사용자 음성에 대한 음성 인식률이 저하되는 문제점이 존재하였다. 또한, 마이크로 입력되는 사용자 음성이 지나치게 크거나 작은 경우에도, 사용자 음성에 대한 음성 인식률이 저하되는 문제점이 존재하였다.Meanwhile, in the case of an electronic device with a built-in speaker, acoustic echo cancellation (AEC) is performed by comparing an audio signal output from the speaker with an audio signal input to a microphone in order to improve a speech recognition rate. In this case, when the acoustic echo cancellation cannot be effectively performed, there is a problem in that the voice recognition rate for the user's voice is lowered. In addition, even when the user's voice input into the microphone is too loud or too small, there is a problem in that the voice recognition rate for the user's voice is deteriorated.

이에 따라, 보다 효과적인 음성 인식을 위한 방안의 모색이 요구된다.Accordingly, there is a need to find a method for more effective speech recognition.

본 개시는 상술한 문제점을 해결하기 위해 안출된 것으로, 본 개시의 목적은 음향 반향 제거를 수행함에 있어 마이크로 입력된 신호를 증폭하여 이용하는데, 이때, 신호 증폭에 이용되는 게인을 조절하는 전자 장치 및 이의 제어 방법을 제공함에 있다.The present disclosure was devised to solve the above-described problem, and an object of the present disclosure is to amplify and use a signal input to a microphone in performing acoustic echo cancellation, and at this time, an electronic device for controlling a gain used for signal amplification, and It is to provide a control method for this.

본 개시의 일 실시 예에 따른 전자 장치는 음향 증폭부, 스피커, 마이크, 마이크 음향 증폭부 및 오디오 신호를 상기 음향 증폭부를 통해 증폭하고, 상기 증폭된 오디오 신호를 상기 스피커를 통해 출력하고, 상기 스피커를 통해 출력된 오디오 신호 및 사용자의 음성 신호가 상기 마이크에 입력되면, 상기 마이크에 입력된 신호를 상기 마이크 음향 증폭부를 통해 증폭하고, 상기 증폭된 신호에 대한 음향 반향 제거(Acoustic Echo Cancellation, AEC)를 수행하여 상기 음성 신호를 획득하고, 상기 획득된 음성 신호에 대해 음성 인식을 위한 전처리를 수행하는 프로세서를 포함하며, 상기 프로세서는 상기 음향 증폭부의 게인에 기초하여 상기 마이크 음향 증폭부의 게인을 결정하고, 상기 결정된 게인에 기초하여 상기 마이크 음향 증폭부를 통해 상기 입력된 신호를 증폭할 수 있다.An electronic device according to an embodiment of the present disclosure amplifies an acoustic amplification unit, a speaker, a microphone, a microphone sound amplification unit, and an audio signal through the sound amplification unit, outputs the amplified audio signal through the speaker, and the speaker When the audio signal and the user's voice signal output through the microphone are input to the microphone, the signal input to the microphone is amplified through the microphone sound amplifying unit, and acoustic echo cancellation (AEC) for the amplified signal is amplified. And a processor configured to obtain the speech signal by performing a pre-processing for speech recognition on the obtained speech signal, wherein the processor determines a gain of the microphone sound amplifying unit based on a gain of the sound amplifying unit, and , Based on the determined gain, the input signal may be amplified through the microphone sound amplifying unit.

여기에서, 상기 프로세서는 상기 음향 증폭부의 게인과 반비례하도록 상기 마이크 음향 증폭부의 게인을 조절할 수 있다.Here, the processor may adjust the gain of the microphone sound amplifying unit so as to be in inverse proportion to the gain of the sound amplifying unit.

또한, 상기 프로세서는 상기 전자 장치의 볼륨을 조절하기 위한 사용자 명령이 수신되면, 상기 사용자 명령에 기초하여 상기 음향 증폭부의 게인을 조절하고, 상기 조절된 음향 증폭부의 게인에 기초하여 상기 마이크 음향 증폭부의 게인을 조절할 수 있다.In addition, when a user command for adjusting the volume of the electronic device is received, the processor adjusts a gain of the sound amplifying unit based on the user command, and the microphone sound amplifying unit based on the adjusted gain of the sound amplifying unit Gain can be adjusted.

여기에서, 상기 프로세서는 상기 음향 증폭부의 게인이 증가된 경우, 상기 마이크 음향 증폭부의 게인을 감소시킬 수 있다.Here, when the gain of the sound amplifying part is increased, the processor may decrease the gain of the microphone sound amplifying part.

또한, 상기 프로세서는 상기 획득된 음성 신호에 클리핑이 발생하였는지를 판단하고, 상기 클리핑 발생 여부에 기초하여 상기 마이크 음향 증폭부의 게인을 조절할 수 있다.In addition, the processor may determine whether clipping has occurred in the acquired voice signal, and adjust a gain of the microphone sound amplifying unit based on whether the clipping has occurred.

여기에서, 상기 프로세서는 상기 획득된 음성 신호에 클리핑이 발생된 경우, 상기 마이크 음향 증폭부의 게인을 감소시키고, 상기 획득된 음성 신호에 클리핑이 발생되지 않은 경우, 상기 마이크 음향 증폭부의 게인을 유지할 수 있다.Here, the processor may reduce the gain of the microphone sound amplifying unit when clipping occurs in the acquired audio signal, and maintain the gain of the microphone sound amplifying unit when clipping does not occur in the acquired audio signal. have.

또한, 상기 프로세서는 상기 획득된 음성 신호의 평균 레벨을 판단하고, 상기 평균 레벨이 기설정된 레벨 이하인 경우, 상기 획득된 음성 신호의 평균 레벨이 상기 기설정된 레벨보다 높아지도록 상기 마이크 음향 증폭부의 게인을 증가시킬 수 있다.In addition, the processor determines the average level of the acquired voice signal, and when the average level is less than or equal to a preset level, the gain of the microphone sound amplification unit is increased so that the average level of the acquired voice signal is higher than the preset level. Can be increased.

한편, 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법은 제1 게인에 기초하여 오디오 신호를 증폭하는 단계, 상기 증폭된 오디오 신호를 출력하는 단계, 상기 출력된 오디오 신호 및 사용자의 음성 신호가 입력되면, 제2 게인에 기초하여 입력된 신호를 증폭하는 단계 및 상기 증폭된 신호에 대한 음향 반향 제거(Acoustic Echo Cancellation, AEC)를 수행하여 음성 신호를 획득하고, 상기 음성 신호에 대해 음성 인식을 위한 전처리를 수행하는 단계를 포함하며, 상기 입력된 신호를 증폭하는 단계는 상기 제1 게인에 기초하여 상기 제2 게인을 결정하고, 상기 결정된 제2 게인에 기초하여 상기 입력된 신호를 증폭할 수 있다.Meanwhile, in the control method of an electronic device according to an embodiment of the present disclosure, amplifying an audio signal based on a first gain, outputting the amplified audio signal, and the output audio signal and a user's voice signal are Upon input, amplifying the input signal based on the second gain and performing Acoustic Echo Cancellation (AEC) on the amplified signal to obtain a speech signal, and performing speech recognition on the speech signal. And performing pre-processing for, wherein the amplifying the input signal may determine the second gain based on the first gain, and amplify the input signal based on the determined second gain. have.

여기에서, 상기 입력된 신호를 증폭하는 단계는 상기 제1 게인과 반비례하도록 상기 제2 게인을 조절할 수 있다.Here, in the step of amplifying the input signal, the second gain may be adjusted to be inversely proportional to the first gain.

또한, 상기 오디오 신호를 증폭하는 단계는 상기 전자 장치의 볼륨을 조절하기 위한 사용자 명령이 수신되면, 상기 사용자 명령에 기초하여 상기 제1 게인을 조절하고, 상기 입력된 신호를 증폭하는 단계는 상기 조절된 제1 게인에 기초하여 상기 제2 게인을 조절할 수 있다.In addition, in the amplifying the audio signal, when a user command for adjusting the volume of the electronic device is received, adjusting the first gain based on the user command, and amplifying the input signal is the adjustment. The second gain may be adjusted based on the first gain.

여기에서, 상기 입력된 신호를 증폭하는 단계는 상기 제1 게인이 증가된 경우, 상기 제2 게인을 감소시킬 수 있다.Here, in the amplifying the input signal, when the first gain is increased, the second gain may be decreased.

또한, 본 개시의 일 실시 예에 따른 제어 방법은 상기 획득된 음성 신호에 클리핑이 발생하였는지를 판단하고, 상기 클리핑 발생 여부에 기초하여 상기 제2 게인을 조절하는 단계를 더 포함할 수 있다.In addition, the control method according to an embodiment of the present disclosure may further include determining whether clipping has occurred in the acquired voice signal, and adjusting the second gain based on whether the clipping has occurred.

여기에서, 상기 조절하는 단계는 상기 획득된 음성 신호에 클리핑이 발생된 경우, 상기 제2 게인을 감소시키고, 상기 획득된 음성 신호에 클리핑이 발생되지 않은 경우, 상기 제2 게인을 유지할 수 있다.Here, in the adjusting step, when clipping occurs in the acquired audio signal, the second gain may be reduced, and when no clipping occurs in the acquired audio signal, the second gain may be maintained.

또한, 본 개시의 일 실시 예에 따른 제어 방법은 상기 획득된 음성 신호의 평균 레벨을 판단하고, 상기 평균 레벨이 기설정된 레벨 이하인 경우, 상기 획득된 음성 신호의 평균 레벨이 상기 기설정된 레벨보다 높아지도록 상기 제2 게인을 증가시키는 단계를 더 포함할 수 있다.In addition, the control method according to an embodiment of the present disclosure determines the average level of the acquired voice signal, and when the average level is less than or equal to a preset level, the average level of the acquired voice signal is higher than the preset level. It may further include the step of increasing the second gain so as to be.

본 개시의 다양한 실시 예에 따르면, 스피커에서 출력되어 마이크로 입력되는 오디오 신호 또는 사용자의 음성 신호가 클리핑되지 않도록 하여, 음성 인식 성능이 향상될 수 있게 된다.According to various embodiments of the present disclosure, an audio signal output from a speaker and input to a microphone or a user's voice signal is not clipped, thereby improving speech recognition performance.

도 1은 본 개시의 일 실시 예에 따른 인공지능 에이전트 시스템의 사용도,
도 2는 본 개시의 일 실시 예에 따른 전자 장치의 구성을 설명하기 위한 블록도,
도 3은 본 개시의 일 실시 예에 따른 음성 인식을 위한 전처리를 수행하는 구성요소를 나타내는 블록도,
도 4는 본 개시의 일 실시 예에 따른 마이크 음향 증폭부의 게인을 조절하는 방법을 설명하기 위한 흐름도,
도 5는 본 개시의 일 실시 예에 따른 전자 장치의 세부 구성을 설명하기 위한 블록도, 그리고
도 6은 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 흐름도이다.1 is a diagram illustrating the use of an artificial intelligence agent system according to an embodiment of the present disclosure,
2 is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure;
3 is a block diagram illustrating a component that performs preprocessing for speech recognition according to an embodiment of the present disclosure;
4 is a flowchart illustrating a method of adjusting a gain of a microphone sound amplifying unit according to an embodiment of the present disclosure;
5 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure, and
6 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present disclosure.

이하, 본 개시의 다양한 실시 예가 첨부된 도면을 참조하여 기재된다. 그러나, 이는 본 개시에 기재된 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 개시의 실시 예의 다양한 변경(modifications), 균등물(equivalents), 및/또는 대체물(alternatives)을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다.Hereinafter, various embodiments of the present disclosure will be described with reference to the accompanying drawings. However, this is not intended to limit the technology described in the present disclosure to a specific embodiment, it should be understood to include various modifications, equivalents, and/or alternatives of the embodiments of the present disclosure. . In connection with the description of the drawings, similar reference numerals may be used for similar elements.

본 개시에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In the present disclosure, expressions such as "have," "may have," "include," or "may include" are the presence of corresponding features (eg, elements such as numbers, functions, actions, or parts). And does not exclude the presence of additional features.

본 개시에서, "A 또는 B," "A 또는/및 B 중 적어도 하나," 또는 "A 또는/및 B 중 하나 또는 그 이상"등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B," "A 및 B 중 적어도 하나," 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는 (3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.In the present disclosure, expressions such as "A or B," "at least one of A or/and B," or "one or more of A or/and B" may include all possible combinations of items listed together. . For example, "A or B," "at least one of A and B," or "at least one of A or B" includes (1) at least one A, (2) at least one B, Or (3) it may refer to all cases including both at least one A and at least one B.

본 개시에서 사용된 "제1," "제2," "첫째," 또는 "둘째,"등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. Expressions such as "first," "second," "first," or "second," used in the present disclosure may modify various elements regardless of order and/or importance, and It is used to distinguish it from other components and does not limit the components.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 어떤 구성요소가 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 어떤 구성요소와 다른 구성요소 사이에 다른 구성요소(예: 제 3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.Some component (eg, the first component) is “(functionally or communicatively) coupled with/to)” to another component (eg, the second component) or “ When referred to as "connected to", it should be understood that a component can be directly connected to another component, or can be connected through another component (eg, a third component). On the other hand, when a component (eg, a first component) is referred to as being “directly connected” or “directly connected” to another component (eg, a second component), a component different from a component It may be understood that no other component (eg, a third component) exists between the elements.

본 개시에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된) 프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다. The expression "configured to (configured to)" used in the present disclosure is, for example, "suitable for," "having the capacity to" depending on the situation. ," "designed to," "adapted to," "made to," or "capable of." The term "configured to (or set)" may not necessarily mean only "specifically designed to" in hardware. Instead, in some situations, the expression "a device configured to" may mean that the device "can" along with other devices or parts. For example, the phrase “a processor configured (or configured) to perform A, B, and C” means a dedicated processor (eg, an embedded processor) for performing the operation, or by executing one or more software programs stored in a memory device. , May mean a generic-purpose processor (eg, a CPU or an application processor) capable of performing corresponding operations.

이하에서는 도면을 참조하여 본 발명에 대해 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the drawings.

인공지능 에이전트 시스템은 도 1에 도시된 바와 같이, 전자 장치(100) 및 응답 제공 서버(10)를 포함할 수 있다. 한편, 도 1에서 전자 장치(100)는 스피커 형태의 장치인 것으로 도시하였으나, 이는 일 예에 불과하다.The artificial intelligence agent system may include an electronic device 100 and a response providing server 10 as shown in FIG. 1. Meanwhile, in FIG. 1, the electronic device 100 is shown to be a speaker-type device, but this is only an example.

본 개시의 다양한 실시 예들에 따른 전자 장치는, 예를 들면, 텔레비전, 스마트폰, 태블릿 PC, 이동 전화기, 영상 전화기, 전자책 리더기, 데스크탑 PC, 랩탑 PC, 넷북 컴퓨터, 워크스테이션, 서버, PDA, PMP(portable multimedia player), MP3 플레이어, 의료기기, 카메라, 또는 웨어러블 장치 중 적어도 하나를 포함할 수 있다. 웨어러블 장치는 액세서리형(예: 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD)), 직물 또는 의류 일체형(예: 전자 의복), 신체 부착형(예: 스킨 패드 또는 문신), 또는 생체 이식형 회로 중 적어도 하나를 포함할 수 있다. 어떤 실시 예들에서, 전자 장치는, 예를 들면, DVD(digital video disk) 플레이어, 오디오, 냉장고, 에어컨, 청소기, 오븐, 전자레인지, 세탁기, 공기 청정기, 셋톱 박스, 홈 오토매이션 컨트롤 패널, 보안 컨트롤 패널, 미디어 박스, 게임 콘솔, 전자 사전, 전자 키, 캠코더, 또는 전자 액자 중 적어도 하나를 포함할 수 있다.Electronic devices according to various embodiments of the present disclosure include, for example, televisions, smart phones, tablet PCs, mobile phones, video phones, e-book readers, desktop PCs, laptop PCs, netbook computers, workstations, servers, PDAs, It may include at least one of a portable multimedia player (PMP), an MP3 player, a medical device, a camera, or a wearable device. Wearable devices include accessory types (e.g. watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-devices (HMD)), fabric or clothing integrals (e.g. electronic clothing), It may include at least one of a body-attached type (for example, a skin pad or a tattoo), or a bio-implantable circuit In some embodiments, the electronic device includes, for example, a digital video disk (DVD) player, an audio device, and a refrigerator. , Air conditioner, vacuum cleaner, oven, microwave oven, washing machine, air purifier, set-top box, home automation control panel, security control panel, media box, game console, electronic dictionary, electronic key, camcorder, or electronic frame. Can include.

다른 실시 예에서, 전자 장치는, 각종 의료기기(예: 각종 휴대용 의료측정기기(혈당 측정기, 심박 측정기, 혈압 측정기, 또는 체온 측정기 등), MRA(magnetic resonance angiography), MRI(magnetic resonance imaging), CT(computed tomography), 촬영기, 또는 초음파기 등), 네비게이션 장치, 위성 항법 시스템(GNSS(global navigation satellite system)), EDR(event data recorder), FDR(flight data recorder), 자동차 인포테인먼트 장치, 선박용 전자 장비(예: 선박용 항법 장치, 자이로 콤파스 등), 항공 전자기기(avionics), 보안 기기, 차량용 헤드 유닛(head unit), 산업용 또는 가정용 로봇, 드론(drone), 금융 기관의 ATM, 상점의 POS(point of sales), 또는 사물 인터넷 장치 (예: 전구, 각종 센서, 스프링클러 장치, 화재 경보기, 온도조절기, 가로등, 토스터, 운동기구, 온수탱크, 히터, 보일러 등) 중 적어도 하나를 포함할 수 있다. In another embodiment, the electronic device includes various medical devices (e.g., various portable medical measuring devices (blood glucose meter, heart rate meter, blood pressure meter, or body temperature meter, etc.), magnetic resonance angiography (MRA), magnetic resonance imaging (MRI), CT (computed tomography), camera, or ultrasound), navigation device, global navigation satellite system (GNSS), event data recorder (EDR), flight data recorder (FDR), automobile infotainment device, electronic equipment for ships (E.g., navigation devices for ships, gyro compasses, etc.), avionics, security devices, vehicle head units, industrial or home robots, drones, ATMs in financial institutions, point of sale points (POSs) in stores. of sales), or IoT devices (eg, light bulbs, various sensors, sprinkler devices, fire alarms, temperature controllers, street lights, toasters, exercise equipment, hot water tanks, heaters, boilers, etc.).

이와 같이, 본 개시의 다양한 실시 예에 따른 전자 장치는 다양한 타입의 전자 장치로 구현될 수 있다.As such, an electronic device according to various embodiments of the present disclosure may be implemented as various types of electronic devices.

한편, 전자 장치(100)는 인공지능 에이전트 프로그램을 이용하여 사용자 음성에 대한 응답을 사용자에게 제공할 수 있다. Meanwhile, the electronic device 100 may provide a response to a user's voice to a user using an artificial intelligence agent program.

이때, 전자 장치(100)는 사용자 음성을 입력받기 이전에 인공지능 에이전트 프로그램을 활성화하기 위한 트리거 단어를 포함하는 사용자 음성을 입력받을 수 있다. 예를 들어, 전자 장치(100)는 "빅스비"와 같은 트리거 단어를 포함하는 사용자 음성을 입력받을 수 있다. 트리거 단어를 포함하는 사용자 음성이 입력되면, 전자 장치(100)는 인공지능 에이전트 프로그램을 실행 또는 활성화시키고, 사용자 음성의 입력을 대기할 수 있다. 인공지능 에이전트 프로그램은 사용자 음성 및 응답을 자연어로 처리할 수 있는 대화 시스템을 포함할 수 있다. 이때, 인공지능 에이전트 프로그램을 활성화하기 위한 트리거 단어 이외에, 전자 장치(100)에 구비된 특정 버튼을 선택한 후, 사용자 음성을 입력받을 수도 있다.In this case, before receiving the user voice input, the electronic device 100 may receive a user voice including a trigger word for activating the artificial intelligence agent program. For example, the electronic device 100 may receive a user voice including a trigger word such as "Bixby". When a user voice including a trigger word is input, the electronic device 100 may execute or activate an artificial intelligence agent program and wait for input of the user voice. The artificial intelligence agent program may include a conversation system capable of processing user voices and responses in natural language. In this case, in addition to the trigger word for activating the artificial intelligence agent program, after selecting a specific button provided in the electronic device 100, a user's voice may be input.

이후, 전자 장치(100)는 사용자 음성을 입력받을 수 있다. 예를 들어, 도 1과 같이, 전자 장치(100)는 "오늘 날씨는 어때"라는 사용자 음성을 입력받을 수 있다. 이때, 전자 장치(100)는 "오늘 날씨는 어때"로부터 "오늘", "날씨" 등과 같은 키워드를 결정하고, 키워드를 응답 제공 서버(10)로 제공할 수 있다. Thereafter, the electronic device 100 may receive a user's voice. For example, as shown in FIG. 1, the electronic device 100 may receive a user's voice “how is the weather today”. In this case, the electronic device 100 may determine keywords such as "today" and "weather" from "how is the weather today", and provide the keyword to the response providing server 10.

응답 제공 서버(10)는 전자 장치(100)로부터 수신한 키워드에 기초하여 사용자 음성에 대한 응답을 제공할 수 있다. 예를 들어, 응답 제공 서버(10)는 "기온 22℃"라는 응답을 전자 장치(100)로 제공할 수 있다. 이 경우, 응답 제공 서버(10)는 텍스트를 포함하는 응답을 제공할 수 있으나, 이는 일 예일 뿐이고 자연어 형태의 응답을 제공할 수도 있다.The response providing server 10 may provide a response to a user's voice based on a keyword received from the electronic device 100. For example, the response providing server 10 may provide a response of “at an temperature of 22° C.” to the electronic device 100. In this case, the response providing server 10 may provide a response including text, but this is only an example and may provide a response in a natural language form.

전자 장치(100)는 응답을 출력할 수 있다. 이때, 전자 장치(100)는 대화 시스템을 이용하여 응답을 자연어로 처리하여 출력할 수 있다. 예를 들어, 전자 장치(100)는 "오늘 기온은 22℃ 입니다"라는 자연어 응답을 제공할 수 있다. The electronic device 100 may output a response. In this case, the electronic device 100 may process and output the response in natural language using a conversation system. For example, the electronic device 100 may provide a natural language response of “the temperature today is 22°C”.

한편, 전자 장치(100)는 상술한 바와 같은 사용자 음성에 대한 응답을 제공하기 위하여 인공지능 에이전트(Artificial intelligence agent)를 이용할 수 있다. 이때, 인공지능 에이전트는 AI(Artificial Intelligence) 기반의 서비스(예를 들어, 음성 인식 서비스, 비서 서비스, 번역 서비스, 검색 서비스 등)를 제공하기 위한 전용 프로그램으로서, 기존의 범용 프로세서(예를 들어, CPU) 또는 별도의 AI 전용 프로세서(예를 들어, GPU 등)에 의해 실행될 수 있다. 특히, 인공지능 에이전트는 다양한 모듈(예로, 대화 시스템)을 제어할 수 있다.Meanwhile, the electronic device 100 may use an artificial intelligence agent to provide a response to the user's voice as described above. At this time, the artificial intelligence agent is a dedicated program for providing AI (Artificial Intelligence)-based services (e.g., voice recognition service, secretary service, translation service, search service, etc.), and is an existing general-purpose processor (e.g., CPU) or a separate AI dedicated processor (eg, GPU, etc.). In particular, the artificial intelligence agent can control various modules (eg, a conversation system).

이와 같이, 본 개시의 일 실시 예에 따르면, 전자 장치(100)는 사용자 음성을 입력받고, 그에 대한 응답을 제공할 수 있다.As described above, according to an embodiment of the present disclosure, the electronic device 100 may receive a user's voice and provide a response thereto.

한편, 전자 장치(100)는 사용자 음성을 입력받기 위한 마이크를 구비할 수 있다. 이때, 마이크를 통해 입력된 오디오 신호는 전자 장치(100)의 마이크 음향 증폭부를 통해 증폭될 수 있다.Meanwhile, the electronic device 100 may include a microphone for receiving a user's voice. In this case, the audio signal input through the microphone may be amplified through the microphone sound amplifying unit of the electronic device 100.

이 경우, 마이크 음향 증폭부는 게인에 따라 오디오 신호를 증폭하는데, 증폭된 오디오 신호의 레벨이 일정한 레벨 범위를 벗어나게 되면, 일정한 레벨 범위를 벗어난 부분을 클리핑하고, 클리핑된 오디오 신호를 출력할 수 있다. 이와 같이, 오디오 신호에 대해 클리핑이 발생되면, 클리핑된 부분에서, 원 오디오 신호와 클리핑된 오디오 신호 간에 차이가 발생하게 된다는 점에서, 사용자 음성이 정확하게 인식되지 못하는 문제가 발생될 수 있다. In this case, the microphone sound amplifying unit amplifies the audio signal according to the gain. When the level of the amplified audio signal is out of a certain level range, a portion out of the certain level range may be clipped and the clipped audio signal may be output. As described above, when clipping occurs for an audio signal, a difference between the original audio signal and the clipped audio signal occurs in the clipped portion, and thus a problem in that the user's voice cannot be accurately recognized may occur.

또한, 마이크를 통해 입력된 사용자 음성이 지나치게 작은 경우, 오디오 신호가 증폭되더라고, 사용자 음성이 정확하게 인식되지 못하는 문제가 발생될 수도 있다. In addition, when the user's voice input through the microphone is too small, even though the audio signal is amplified, there may be a problem in that the user's voice is not accurately recognized.

이에 따라, 본 개시의 일 실시 예에 따른 전자 장치(100)는 이러한 문제점을 해소하기 위해, 마이크 음향 증폭부의 게인을 조절할 수 있는데, 이하에서 보다 구체적으로 설명하도록 한다.Accordingly, in order to solve this problem, the electronic device 100 according to an embodiment of the present disclosure may adjust the gain of the microphone sound amplifying unit, which will be described in more detail below.

도 2는 본 개시의 일 실시 예에 따른 전자 장치의 구성을 설명하기 위한 블록도이다.2 is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure.

도 2를 참조하면, 전자 장치(100)는 음향 증폭부(amplifier)(110), 스피커(120), 마이크(130), 마이크 음향 증폭부(140) 및 프로세서(150)를 포함할 수 있다.Referring to FIG. 2, the electronic device 100 may include an acoustic amplifier 110, a speaker 120, a microphone 130, a microphone acoustic amplification unit 140, and a processor 150.

음향 증폭부(110)는 오디오 신호를 증폭할 수 있다. 구체적으로, 음향 증폭부(110)는 스피커(120)를 통해 출력될 오디오 신호를 증폭하고, 증폭된 오디오 신호를 스피커(120)를 전달할 수 있다.The sound amplifying unit 110 may amplify an audio signal. Specifically, the sound amplifying unit 110 may amplify an audio signal to be output through the speaker 120 and transmit the amplified audio signal to the speaker 120.

이때, 오디오 신호가 증폭되는 정도는 음향 증폭부(110)의 게인에 따라 결정될 수 있다. In this case, the degree to which the audio signal is amplified may be determined according to the gain of the sound amplifying unit 110.

한편, 오디오 신호는 전자 장치(100)가 오디오 컨텐츠를 제공하는 경우, 오디오 컨텐츠에 대한 오디오 신호일 수 있고, 또한, 전자 장치(100)가 사용자 음성에 대한 응답을 음성으로 제공하는 경우, 해당 음성에 대한 오디오 신호일 수 있다.Meanwhile, when the electronic device 100 provides audio content, the audio signal may be an audio signal for audio content. In addition, when the electronic device 100 provides a response to the user's voice as a voice, the corresponding voice is It may be an audio signal for.

스피커(120)는 오디오 신호를 출력할 수 있다. 구체적으로, 스피커(120)는 음향 증폭부(110)로부터 입력된 오디오 신호를 출력할 수 있다.The speaker 120 may output an audio signal. Specifically, the speaker 120 may output an audio signal input from the sound amplifying unit 110.

마이크(130)는 오디오 신호를 입력(또는, 수신)받는다. 구체적으로, 마이크(130)는 스피커(120)를 통해 출력된 오디오 신호 및 사용자의 음성 신호를 입력받을 수 있다. The microphone 130 receives (or receives) an audio signal. Specifically, the microphone 130 may receive an audio signal output through the speaker 120 and a user's voice signal.

마이크 음향 증폭부(140)는 마이크(130)에 입력된 신호를 증폭할 수 있다. 이 경우, 마이크(130)에 입력된 신호가 증폭되는 정도는 마이크 음향 증폭부(140)의 게인에 따라 결정될 수 있다. 이때, 마이크 음향 증폭부(140)의 게인은 최초 초기 값으로 설정되며, 이후, 프로세서(150)에 의해 조절될 수 있다.The microphone sound amplifying unit 140 may amplify a signal input to the microphone 130. In this case, the degree to which the signal input to the microphone 130 is amplified may be determined according to the gain of the microphone sound amplifying unit 140. In this case, the gain of the microphone sound amplifying unit 140 is set to an initial initial value, and then, may be adjusted by the processor 150.

프로세서(150)는 전자 장치(100)의 전반적인 동작 및 기능을 제어할 수 있다.The processor 150 may control overall operations and functions of the electronic device 100.

구체적으로, 프로세서(150)는 오디오 신호를 음향 증폭부(110)를 통해 증폭하고, 증폭된 오디오 신호를 스피커(120)를 통해 출력하고, 스피커(120)를 통해 출력된 오디오 신호 및 사용자의 음성 신호가 마이크(130)에 입력되면, 마이크(130)에 입력된 신호를 마이크 음향 증폭부(140)를 통해 증폭할 수 있다.Specifically, the processor 150 amplifies the audio signal through the sound amplifying unit 110, outputs the amplified audio signal through the speaker 120, and outputs the audio signal and the user's voice through the speaker 120. When the signal is input to the microphone 130, the signal input to the microphone 130 may be amplified through the microphone sound amplifying unit 140.

그리고, 프로세서(150)는 마이크 음향 증폭부(140)에 의해 증폭된 신호에 대한 음향 반향 제거(acoustic echo cancellation, AEC)를 수행하여 음성 신호를 획득할 수 있다. 여기에서, 음성 신호는 마이크 음향 증폭부(140)에 의해 증폭된 음성 신호일 수 있다.Further, the processor 150 may acquire a voice signal by performing acoustic echo cancellation (AEC) on the signal amplified by the microphone sound amplifying unit 140. Here, the voice signal may be a voice signal amplified by the microphone sound amplifying unit 140.

구체적으로, 스피커(120)를 통해 출력된 오디오 신호가 마이크(130)에 에코 신호로 입력되는 경우, 에코 신호에 의해 사용자 음성에 대한 인식률이 저하될 수 있다. 이에 따라, 프로세서(150)는 에코 신호에 의한 성능 저하를 방지하기 위해, 마이크 음향 증폭부(140)에 의해 증폭된 신호에 대해 음향 반향 제거를 수행하여, 에코 신호를 제거하고 음성 신호를 획득할 수 있다.Specifically, when the audio signal output through the speaker 120 is input to the microphone 130 as an echo signal, the recognition rate for the user's voice may be lowered by the echo signal. Accordingly, the processor 150 performs acoustic echo cancellation on the signal amplified by the microphone acoustic amplification unit 140 in order to prevent performance degradation due to the echo signal, thereby removing the echo signal and obtaining a voice signal. I can.

한편, 프로세서(150)는 음향 증폭부(110)의 게인에 기초하여 마이크 음향 증폭부(140)의 게인을 결정하고, 결정된 게인에 기초하여 마이크 음향 증폭부(140)를 통해 마이크(130)에 입력된 신호를 증폭할 수 있다.On the other hand, the processor 150 determines the gain of the microphone sound amplifying unit 140 based on the gain of the sound amplifying unit 110, and the microphone 130 through the microphone sound amplifying unit 140 based on the determined gain. The input signal can be amplified.

이 경우, 프로세서(150)는 음향 증폭부(110)의 게인과 반비례하도록 마이크 음향 증폭부(140)의 게인을 조절할 수 있다.In this case, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 to be in inverse proportion to the gain of the sound amplifying unit 110.

구체적으로, 스피커(120)를 통해 출력된 오디오 신호는 마이크(130)에 입력될 수 있다. 이때, 음향 증폭부(110)의 게인이 크다는 것은 스피커(120)를 통해 출력되는 오디오 신호의 레벨이 크다는 것을 의미하는데, 레벨이 큰 오디오 신호가 마이크(130)에 입력되는 경우, 해당 오디오 신호에 대해 클리핑이 발생될 가능성이 높을 수 있다. Specifically, an audio signal output through the speaker 120 may be input to the microphone 130. In this case, the high gain of the sound amplifying unit 110 means that the level of the audio signal output through the speaker 120 is high. When an audio signal having a high level is input to the microphone 130, the corresponding audio signal is However, there is a high possibility that clipping will occur.

이에 따라, 프로세서(150)는 스피커(120)를 통해 출력된 오디오 신호가 마이크(130)에 입력되는 경우, 해당 오디오 신호가 클리핑되는 것을 방지하기 위해, 음향 증폭부(110)의 게인과 반비례하도록 마이크 음향 증폭부(140)의 게인을 조절할 수 있다.Accordingly, when the audio signal output through the speaker 120 is input to the microphone 130, the processor 150 is inversely proportional to the gain of the sound amplifying unit 110 in order to prevent the audio signal from being clipped. The gain of the microphone sound amplifying unit 140 may be adjusted.

예를 들어, 프로세서(150)는 전자 장치(100)의 볼륨을 조절하기 위한 사용자 명령이 수신되면, 사용자 명령에 기초하여 음향 증폭부(120)의 게인을 조절할 수 있다. 여기에서, 사용자 명령은 전자 장치(100)에 마련된 볼륨 조절 버튼 또는 전자 장치(100)를 제어하기 위한 리모컨에 마련된 볼륨 조절 버튼 등을 선택하는 명령을 포함할 수 있다. 또한, 사용자 명령은 볼륨 조절하기 위한 사용자의 음성을 포함할 수 있다.For example, when a user command for adjusting the volume of the electronic device 100 is received, the processor 150 may adjust the gain of the sound amplifying unit 120 based on the user command. Here, the user command may include a command for selecting a volume control button provided on the electronic device 100 or a volume control button provided on a remote control for controlling the electronic device 100. In addition, the user command may include a user's voice for adjusting the volume.

구체적으로, 프로세서(150)는 전자 장치(100)의 볼륨을 증가시키기 위한 사용자 명령이 수신되면, 음향 증폭부(110)의 게인을 증가시키고, 전자 장치(100)의 볼륨을 감소시키기 위한 사용자 명령이 수신되면, 음향 증폭부(110)의 게인을 감소시킬 수 있다. Specifically, when a user command for increasing the volume of the electronic device 100 is received, the processor 150 increases the gain of the sound amplifying unit 110 and a user command for decreasing the volume of the electronic device 100 When this is received, the gain of the sound amplifying unit 110 may be reduced.

이에 따라, 프로세서(150)는 사용자 명령에 대응되는 불륨으로 오디오 신호가 스피커(120)를 통해 출력되도록 할 수 있다.Accordingly, the processor 150 may output an audio signal through the speaker 120 in a volume corresponding to a user command.

한편, 프로세서(150)는 음향 증폭부(110)의 게인에 기초하여 마이크 음향 증폭부(140)의 게인을 조절할 수 있다.Meanwhile, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 based on the gain of the sound amplifying unit 110.

예를 들어, 프로세서(150)는 음향 증폭부(110)의 게인이 증가된 경우, 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다.For example, when the gain of the sound amplifying unit 110 is increased, the processor 150 may decrease the gain of the microphone sound amplifying unit 140.

구체적으로, 프로세서(150)는 음향 증폭부(110)의 게인이 기설정된 값보다 큰 경우, 음향 증폭부(110)의 게인이 기설정된 값보다 큰 정도에 반비례하도록, 초기 값을 기준으로 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다.Specifically, when the gain of the sound amplifying unit 110 is greater than a preset value, the processor 150 performs a microphone sound based on the initial value so that the gain of the sound amplifying unit 110 is inversely proportional to a degree greater than a preset value. The gain of the amplifying unit 140 may be reduced.

즉, 프로세서(150)는 기설정된 값을 기준으로 음향 증폭부(110)의 게인이 클수록, 초기 값을 기준으로 마이크 음향 증폭부(140)의 게인을 상대적으로 더 작은 값으로 조절할 수 있다. That is, as the gain of the sound amplifying unit 110 increases based on a preset value, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 to a relatively smaller value based on the initial value.

여기에서, 기설정된 값은 스피커(120)를 통해 출력된 오디오 신호가 마이크(130)에 입력되고, 입력된 오디오 신호를 초기 값으로 설정된 마이크 음향 증폭부(140)의 게인을 통해 증폭하였을 때, 클리핑이 발생되지 않는 음향 증폭부(110)의 최대 게인 값 또는 이와 근접한 값일 수 있다. Here, the preset value is when the audio signal output through the speaker 120 is input to the microphone 130 and the input audio signal is amplified through the gain of the microphone sound amplifying unit 140 set as an initial value, It may be a maximum gain value of the sound amplifying unit 110 in which clipping does not occur or a value close thereto.

한편, 프로세서(150)는 음향 증폭부(110)의 게인이 기설정된 값보다 작은 경우에는, 마이크 음향 증폭부(140)의 게인을 초기 값으로 설정할 수 있다.Meanwhile, when the gain of the sound amplifying unit 110 is smaller than a preset value, the processor 150 may set the gain of the microphone sound amplifying unit 140 to an initial value.

이와 같이, 프로세서(150)는 음향 증폭부(110)의 게인에 따라 마이크 음향 증폭부(140)의 게인을 조절할 수 있다. In this way, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 according to the gain of the sound amplifying unit 110.

이에 따라, 본 개시의 일 실시 예에 따르면, 스피커(120)를 통해 출력된 오디오 신호가 마이크(130)에 입력되는 경우, 해당 오디오 신호가 클리핑되는 것을 방지하여, 최적의 음향 반향 제거가 수행될 수 있도록 할 수 있다. Accordingly, according to an embodiment of the present disclosure, when an audio signal output through the speaker 120 is input to the microphone 130, the audio signal is prevented from being clipped, so that optimal acoustic echo cancellation is performed. You can do it.

즉, 마이크(130)에 입력된 오디오 신호에 대해 클리핑이 발생되는 경우, 스피커(120)를 통해 출력된 오디오 신호와 클리핑된 오디오 신호 간의 차이가 발생하고, 이에 따라, 에코 신호가 효과적으로 제거될 수 없게 된다.That is, when clipping occurs for the audio signal input to the microphone 130, a difference between the audio signal output through the speaker 120 and the clipped audio signal occurs, and accordingly, the echo signal can be effectively removed. There will be no.

이에 따라, 본 개시의 일 실시 예에서는, 마이크(130)를 통해 입력된 오디오 신호가 클리핑되는 것을 방지하기 위해, 마이크 음향 증폭부(140)의 게인을 음향 증폭부(110)의 게인에 따라 조절할 수 있다. 이에 따라, 음향 반향이 효과적으로 제거되고, 음성 인식 성능이 향상될 수 있게 된다.Accordingly, in an embodiment of the present disclosure, in order to prevent the audio signal input through the microphone 130 from being clipped, the gain of the microphone sound amplifying unit 140 may be adjusted according to the gain of the sound amplifying unit 110. I can. Accordingly, acoustic echo can be effectively removed and speech recognition performance can be improved.

한편, 프로세서(150)는 획득된 음성 신호에 클리핑이 발생하였는지를 판단하고, 클리핑 발생 여부에 기초하여 마이크 음향 증폭부(140)의 게인을 조절할 수 있다. Meanwhile, the processor 150 may determine whether clipping has occurred in the acquired voice signal, and adjust a gain of the microphone sound amplifying unit 140 based on whether clipping has occurred.

구체적으로, 프로세서(150)는 획득된 음성 신호에 클리핑이 발생된 경우, 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다.Specifically, when clipping occurs in the acquired voice signal, the processor 150 may reduce the gain of the microphone sound amplifying unit 140.

예를 들어, 사용자가 지나치게 큰 소리로 말을 하거나, 전자 장치(100)와 지나치게 가까운 위치에서 말을 한 경우, 큰 레벨의 음성 신호가 마이크(130)에 입력될 수 있고, 이 경우, 음성 신호가 증폭될 경우, 음성 신호에 대해 클리핑이 발생될 수 있다. For example, when the user speaks too loudly or speaks at a location too close to the electronic device 100, a high-level voice signal may be input to the microphone 130. In this case, the voice signal When is amplified, clipping may occur for the audio signal.

이에 따라, 프로세서(150)는 음성 신호에 클리핑이 발생하였는지를 판단하고, 클리핑이 발생된 경우, 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다. 즉, 프로세서(150)는 음성 신호가 클리핑된 경우, 현재 설정된 값보다 작은 값으로 마이크 음향 증폭부(140)의 게인을 조절할 수 있다. 이 경우, 프로세서(150)는 마이크 음향 증폭부(140)에서 출력되는 음성 신호에 클리핑이 발생되지 않을 정도로, 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다.Accordingly, the processor 150 may determine whether clipping has occurred in the audio signal, and if clipping occurs, the gain of the microphone sound amplifying unit 140 may be reduced. That is, when the audio signal is clipped, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 to a value smaller than the currently set value. In this case, the processor 150 may reduce the gain of the microphone sound amplifying unit 140 to the extent that clipping does not occur in the audio signal output from the microphone sound amplifying unit 140.

다만, 프로세서(150)는 획득된 음성 신호에 클리핑이 발생되지 않은 경우, 마이크 음향 증폭부(140)의 게인을 유지할 수 있다. 즉, 프로세서(150)는 음성 신호가 클리핑되지 않은 경우, 마이크 음향 증폭부(140)의 게인을 현재 설정된 값으로 유지할 수 있다.However, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 when clipping does not occur in the acquired voice signal. That is, when the audio signal is not clipped, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 at the currently set value.

이 경우, 프로세서(150)는 음성 신호가 클리핑되었는지를 계속적으로 모니터링하여, 클리핑 여부에 따라 전술한 동작을 수행할 수 있다.In this case, the processor 150 may continuously monitor whether or not the audio signal is clipped, and perform the above-described operation according to whether or not the audio signal is clipped.

한편, 프로세서(150)는 획득된 음성 신호의 평균 레벨을 판단하고, 판단된 평균 레벨이 기설정된 레벨 이하인 경우, 획득왼 음성 신호의 평균 레벨이 기설정된 레벨보다 높아지도록 마이크 음향 증폭부(140)의 게인을 증가시킬 수 있다.On the other hand, the processor 150 determines the average level of the acquired voice signal, and if the determined average level is less than or equal to a preset level, the microphone sound amplifying unit 140 so that the average level of the acquired left voice signal is higher than the preset level. The gain of can be increased.

이 경우, 프로세서(150)는 음성 신호가 클리핑되지 않은 경우, 음성 신호의 평균 레벨이 기설정된 레벨 이하인지를 판단할 수 있다.In this case, when the voice signal is not clipped, the processor 150 may determine whether the average level of the voice signal is less than or equal to a preset level.

이에 따라, 프로세서(150)는 음성 신호의 평균 레벨이 기설정된 레벨보다 높은 경우, 마이크 음향 증폭부(140)의 게인을 현재 설정된 값으로 유지할 수 있다.Accordingly, when the average level of the voice signal is higher than the preset level, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 at the currently set value.

다만, 프로세서(150)는 음성 신호의 평균 레벨이 기설정된 레벨 이하인 경우, 마이크 음향 증폭부(140)에서 출력되는 오디오 신호의 평균 레벨이 기설정된 레벨보다 높아지도록 마이크 음향 증폭부(140)의 게인을 증가시킬 수 있다.However, when the average level of the voice signal is less than or equal to the preset level, the processor 150 obtains the gain of the microphone sound amplifying unit 140 so that the average level of the audio signal output from the microphone sound amplifying unit 140 is higher than the preset level. Can increase.

즉, 사용자가 지나치게 작은 소리로 말을 하거나, 전자 장치(100)와 지나치게 먼 위치에서 말을 한 경우, 마이크(130)에 입력되는 음성 신호의 레벨이 작기 때문에, 해당 음성 신호가 마이크 음향 증폭부(140)에 의해 증폭되더라도 효과적인 음성 인식이 수행되지 못할 수 있다. That is, when the user speaks with an excessively low sound or speaks at a location that is too far from the electronic device 100, the level of the audio signal input to the microphone 130 is small, so that the corresponding audio signal is transmitted to the microphone sound amplifying unit. Even if amplified by 140, effective speech recognition may not be performed.

이에 따라, 프로세서(150)는 음성 신호의 평균 레벨이 기설정된 레벨 이하인 경우, 마이크 음향 증폭부(140)의 게인을 증가시켜 보다 정확한 음성 인식이 수행될 수 있도록 할 수 있다.Accordingly, when the average level of the voice signal is less than or equal to a preset level, the processor 150 may increase the gain of the microphone sound amplifying unit 140 so that more accurate voice recognition can be performed.

이 경우, 프로세서(150)는 음성 신호의 평균 레벨을 계속적으로 모니터링하고, 평균 레벨과 기설정된 레벨을 비교하여 전술한 동작을 수행할 수 있다.In this case, the processor 150 may continuously monitor the average level of the voice signal and perform the above-described operation by comparing the average level with a preset level.

전술한 바와 같이, 프로세서(150)는 마이크 음향 증폭부(140)에 의해 증폭된 신호에 대해 음향 반향 제거를 수행할 수 있다. 그리고, 프로세서(150)는 음성 신호에 대해 음성 인식을 위한 전처리를 수행할 수 있다.As described above, the processor 150 may perform acoustic echo cancellation on the signal amplified by the microphone acoustic amplification unit 140. In addition, the processor 150 may perform pre-processing for speech recognition on the speech signal.

이를 위해, 전자 장치(100)의 메모리에는 다양한 모듈이 저장될 수 있는데, 일 예로, 메모리는 도 3에 도시된 바와 같이, 음향 반향 제거 모듈(31)(Acoustic Echo Cancelation, AEC), 음원 방향 측정 모듈(32)(Sound Source Localization, SSL), 빔포밍 모듈(33)(Beam Forming, BF), 음원 분리 모듈(34)(Source Separation, SS), 잡음 억제 및 음성 개선 모듈(35)(Noise Suppression, NS 및 Speech Enhancement, SE)을 포함할 수 있다. 여기에서, 도 3에 도시된 구성요소들은 각 구성요소의 기능을 수행하기 위해 소프트웨어로 구현될 수 있다.To this end, various modules may be stored in the memory of the electronic device 100. For example, as shown in FIG. 3, the memory is an acoustic echo cancellation module 31 (Acoustic Echo Cancelation, AEC), and a sound source direction measurement. Module 32 (Sound Source Localization, SSL), Beam Forming Module 33 (Beam Forming, BF), Sound Source Separation Module 34 (Source Separation, SS), Noise Suppression and Voice Improvement Module 35 (Noise Suppression) , NS and Speech Enhancement, SE). Here, the components shown in FIG. 3 may be implemented as software to perform the functions of each component.

이 경우, 프로세서(150)는 이들 모듈들을 통해, 음향 반향 제거 및 전처리를 수행할 수 있다.In this case, the processor 150 may perform acoustic echo cancellation and pre-processing through these modules.

먼저, 음향 반향 제거 모듈(31)은 음향 반향 제거를 수행하기 위한 모듈이다. 구체적으로, 음향 반향 제거 모듈(31)은 스피커(120)를 통해 출력될 오디오 신호를 기준 데이터(echo reference)로 설정하고, 주파수 분석을 통해 전자 장마이크(130)를 통해 입력된 오디오 신호 중 기준 데이터와 유사한 주파수 특성을 가지는 신호를 에코 신호로 판단하여, 해당 신호를 제거 또는 감쇄시킬 수 있다. First, the acoustic echo cancellation module 31 is a module for performing acoustic echo cancellation. Specifically, the acoustic echo cancellation module 31 sets an audio signal to be output through the speaker 120 as an echo reference, and a reference among the audio signals input through the electromagnetic field microphone 130 through frequency analysis. A signal having a frequency characteristic similar to that of data may be determined as an echo signal, and the corresponding signal may be removed or attenuated.

음원 방향 측정 모듈(32)은 음원이 존재하는 방향 즉, 음성을 발화하는 사용자가 존재하는 방향을 측정하기 위한 모듈이다. The sound source direction measurement module 32 is a module for measuring a direction in which a sound source exists, that is, a direction in which a user uttering a voice exists.

이 경우, 음원 방향 측정 모듈(32)은 다양한 방법을 통해 음원이 존재하는 방향을 측정할 수 있다. 예를 들어, 음원 방향 측정 모듈(32)은 도달 지연 시간(Time Difference of Arrival, TDOA)을 이용하는 방법 등을 통해 음원이 존재하는 방향을 측정할 수 있다.In this case, the sound source direction measurement module 32 may measure the direction in which the sound source exists through various methods. For example, the sound source direction measurement module 32 may measure the direction in which the sound source exists through a method using a time difference of arrival (TDOA), or the like.

빔포밍 모듈(33)은 음원이 위치에 따라 마이크(130)의 빔포밍을 수행하기 위한 모듈이다. 구체적으로, 빔포밍 모듈(33)은 음원이 존재하는 방향으로부터 수신되는 오디오 신호만을 획득하고, 나머지 방향으로부터 수신되는 신호는 배제할 수 있다.The beamforming module 33 is a module for performing beamforming of the microphone 130 according to the location of the sound source. Specifically, the beamforming module 33 may acquire only an audio signal received from a direction in which a sound source exists, and may exclude a signal received from the other direction.

음원 분리 모듈(34)은 음원을 분리하기 위한 모듈이다. 구체적으로, 음원 분리 모듈(34)은 잡음과 음성 신호가 섞이는 과정을 역으로 처리하여, 수신된 오디오 신호에서 음원을 분리할 수 있다.The sound source separation module 34 is a module for separating a sound source. Specifically, the sound source separation module 34 may separate the sound source from the received audio signal by reversely processing the process of mixing the noise and the audio signal.

잡음 억제 및 음성 개선 모듈(35)은 잡음을 제거하기 위한 모듈이다. 이 경우, 경겨 잡음 억제 및 음성 개선 모듈(35)은 음원에서 정적 잡음을 제거할 수 있다.The noise suppression and speech improvement module 35 is a module for removing noise. In this case, the warning noise suppression and speech improvement module 35 may remove static noise from the sound source.

한편, 전술한 전처리 단계는 일 예일 뿐이며, 실시 예에 따라, 일부 단계가 생략되거나, 다른 단계가 추가적으로 수행될 수 있음은 물론이다.On the other hand, the above-described pre-processing step is only an example, and it goes without saying that some steps may be omitted or other steps may be additionally performed according to embodiments.

한편, 프로세서(150)는 전처리가 수행된 음성 신호에 대한 음성 인식을 수행할 수 있다. Meanwhile, the processor 150 may perform speech recognition on a preprocessed speech signal.

구체적으로, 프로세서(150)는 전처리된 음성 신호에 포함된 사용자 음성을 텍스트로 변환하고, 음성 인식 결과에 기초하여 사용자 음성의 의도(intent) 및 엔티티(entity)를 파악할 수 있다. 그리고, 프로세서(150)는 자연어 이해 결과에 기초하여 키워드를 획득하고, 키워드를 통해 사용자 음성에 대한 응답을 획득할 수 있다.Specifically, the processor 150 may convert the user's voice included in the preprocessed voice signal into text, and determine an intent and an entity of the user's voice based on the voice recognition result. In addition, the processor 150 may obtain a keyword based on a result of natural language understanding, and obtain a response to a user's voice through the keyword.

예를 들어, 프로세서(150)는 키워드를 응답 제공 서버(도 1의 10)으로 전송할 수 있다. 이에 따라, 응답 제공 서버(10)는 획득된 키워드를 바탕으로 사용자 음성에 대한 응답을 제공할 수 있다. 이 경우, 응답 제공 서버(10)는 텍스트 형태로 응답을 제공할 수 있으나, 이는 일 실시 예에 불과할 뿐, 자연어 형태의 응답을 제공할 수도 있다For example, the processor 150 may transmit the keyword to the response providing server (10 in FIG. 1). Accordingly, the response providing server 10 may provide a response to the user's voice based on the acquired keyword. In this case, the response providing server 10 may provide a response in a text format, but this is only an example, and may provide a response in a natural language format.

응답 제공 서버(10)는 사용자 음성에 대한 응답을 전자 장치(100)로 전송할 수 있다. The response providing server 10 may transmit a response to the user's voice to the electronic device 100.

이 경우, 프로세서(150)는 응답을 출력할 수 있다. 이때, 프로세서(150)는 대화 시스템을 이용하여 응답을 자연어로 처리하고, 자연어 형태의 음성을 스피커(120)를 통해 출력할 수 있다. 다만, 이는 일 예일 뿐이고, 프로세서(150)는 응답을 디스플레이를 통해 출력할 수도 있다.In this case, the processor 150 may output a response. In this case, the processor 150 may process the response in a natural language using a conversation system and output a natural language voice through the speaker 120. However, this is only an example, and the processor 150 may output a response through a display.

한편, 전술한 예에서는 사용자 음성에 대한 음성 인식이 전자 장치(100)에서 수행되는 것으로 설명하였으나, 이는 일 예에 불과하다. 즉, 프로세서(150)는 전처리된 오디오 신호를 별도의 서버(미도시)(가령, 응답 제공 서버(10))로 전송할 수 있다.Meanwhile, in the above-described example, it has been described that voice recognition for a user's voice is performed by the electronic device 100, but this is only an example. That is, the processor 150 may transmit the preprocessed audio signal to a separate server (not shown) (eg, the response providing server 10).

이 경우, 응답 제공 서버(10)는 전처리된 오디오 신호를 바탕으로 음성 인식을 수행하고, 사용자 음성에 대한 응답을 획득하여 전자 장치(100)로 전송할 수 있다. 이에 따라, 프로세서(150)는 응답을 출력할 수 있다.In this case, the response providing server 10 may perform voice recognition based on the pre-processed audio signal, obtain a response to the user's voice, and transmit it to the electronic device 100. Accordingly, the processor 150 may output a response.

또한, 전술한 예에서는 음향 반향 제거가 수행디는 것으로 설명하였으나, 이는 일 예에 불과하다. In addition, in the above-described example, it has been described that acoustic echo cancellation is performed, but this is only an example.

도 4는 본 개시의 일 실시 예에 따른 마이크 음향 증폭부의 게인을 조절하는 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a method of adjusting a gain of a microphone sound amplifying unit according to an exemplary embodiment of the present disclosure.

먼저, 프로세서(150)는 스피커(120)를 통해 오디오 신호를 출력할 수 있다(S410).First, the processor 150 may output an audio signal through the speaker 120 (S410).

그리고, 프로세서(150)는 마이크(130)를 통해, 사용자의 음성 신호 및 스피커(120)에서 출력된 오디오 신호를 입력받을 수 있다.Further, the processor 150 may receive a user's voice signal and an audio signal output from the speaker 120 through the microphone 130.

이때, 프로세서(150)는 트리거 단어를 포함하는 사용자 음성이 마이크(130)를 통해 입력되었는지를 판단할 수 있다.In this case, the processor 150 may determine whether a user voice including a trigger word is input through the microphone 130.

이에 따라, 프로세서(150)는 트리거 단어를 포함하는 사용자 음성이 입력되면(S420-Y), 마이크 음향 증폭부(140)의 게인을 조절할 수 있다(S430).Accordingly, when a user voice including a trigger word is input (S420-Y), the processor 150 may adjust the gain of the microphone sound amplifying unit 140 (S430).

구체적으로, 프로세서(150)는 음향 증폭부(110)의 게인이 기설정된 값보다 큰 경우, 음향 증폭부(110)의 게인이 기설정된 값보다 큰 정도에 반비례하도록, 초기 값을 기준으로 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다. 다만, 프로세서(150)는 음향 증폭부(110)의 게인이 기설정된 값보다 작은 경우에는, 마이크 음향 증폭부(140)의 게인을 초기 값으로 설정할 수 있다.Specifically, when the gain of the sound amplifying unit 110 is greater than a preset value, the processor 150 performs a microphone sound based on the initial value so that the gain of the sound amplifying unit 110 is inversely proportional to a degree greater than a preset value. The gain of the amplifying unit 140 may be reduced. However, when the gain of the sound amplifying unit 110 is smaller than a preset value, the processor 150 may set the gain of the microphone sound amplifying unit 140 to an initial value.

그리고, 프로세서(150)는 마이크 음향 증폭부(140)에서 출력되는 오디오 신호에 대해 음향 반향 제거를 수행하여, 해당 오디오 신호에서 스피커(120)에서 출력된 오디오 신호를 제거하고, 음성 신호를 획득할 수 있다.Further, the processor 150 performs acoustic echo cancellation on the audio signal output from the microphone sound amplification unit 140, removes the audio signal output from the speaker 120 from the corresponding audio signal, and obtains an audio signal. I can.

이후, 프로세서(150)는 음성 신호를 분석하여, 음성 신호의 평균 음량과 피크 음량을 판단할 수 있다(S440).Thereafter, the processor 150 may analyze the voice signal and determine an average volume and a peak volume of the voice signal (S440).

이 경우, 프로세서(150)는 피크 음량에 기초하여 음성 신호에 클리핑이 발생되었는지를 판단할 수 있다(S450).In this case, the processor 150 may determine whether clipping has occurred in the voice signal based on the peak volume (S450).

이에 따라, 프로세서(150)는 음성 신호에 클리핑이 발생된 것으로 판단되면(S450-Y), 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다(S460). 이 경우, 프로세서(150)는 마이크 음향 증폭부(140)에서 출력되는 음성 신호의 클리핑이 발생되지 않도록 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다.Accordingly, if it is determined that clipping has occurred in the audio signal (S450-Y), the processor 150 may reduce the gain of the microphone sound amplifying unit 140 (S460). In this case, the processor 150 may reduce the gain of the microphone sound amplifying unit 140 so that clipping of the audio signal output from the microphone sound amplifying unit 140 does not occur.

한편, 프로세서(150)는 음성 신호에 클리핑이 발생되지 않은 것으로 판단되면(S450-N), 음성 신호의 평균 음량을 기설정된 레벨(Vth)과 비교할 수 있다(S470).Meanwhile, if it is determined that clipping has not occurred in the voice signal (S450-N), the processor 150 may compare the average volume of the voice signal with a preset level Vth (S470).

이에 따라, 프로세서(150)는 음성 신호의 평균 음량이 기설정된 레벨 이하인 경우, 마이크 음향 증폭부(140)의 게인을 증가시킬 수 있다(S480). 이 경우, 프로세서(150)는 마이크 음향 증폭부(140)에서 출력되는 음성 신호의 평균 음량이 기설정된 레벨보다 높아지도록 마이크 음향 증폭부(140)의 게인을 증가시킬 수 있다.Accordingly, the processor 150 may increase the gain of the microphone sound amplifying unit 140 when the average volume of the voice signal is less than or equal to a preset level (S480). In this case, the processor 150 may increase the gain of the microphone sound amplifying unit 140 so that the average volume of the voice signal output from the microphone sound amplifying unit 140 is higher than a preset level.

한편, 프로세서(150)는 음성 신호의 평균 음량이 기설정된 레벨보다 큰 경우, 마이크 음향 증폭부(140)의 현재 게인 값을 유지할 수 있다(S490).Meanwhile, when the average volume of the voice signal is greater than a preset level, the processor 150 may maintain the current gain value of the microphone sound amplifying unit 140 (S490).

도 5는 본 개시의 일 실시 예에 따른 전자 장치의 세부 구성을 설명하기 위한 블록도이다.5 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure.

도 5를 참조하면, 전자 장치(100)는 음향 증폭부(110), 스피커(120), 마이크(130), 마이크 음향 증폭부(140), 프로세서(150), 메모리(160), 디스플레이(170), 통신 인터페이스(180) 및 입력 인터페이스(190)를 포함할 수 있다. 이 경우, 프로세서(150)는 이들 구성요소와 전기적으로 연결되어 전자 장치(100)의 전반적인 동작 및 기능을 제어할 수 있다. 한편, 도 5에 도시된 음향 증폭부(110), 스피커(120), 마이크(130), 마이크 음향 증폭부(140), 프로세서(150)는 도 2에서 설명한 바 있으며, 중복되는 부분에 대한 설명은 생략하기로 한다.Referring to FIG. 5, the electronic device 100 includes an acoustic amplification unit 110, a speaker 120, a microphone 130, a microphone acoustic amplification unit 140, a processor 150, a memory 160, and a display 170. ), a communication interface 180, and an input interface 190. In this case, the processor 150 may be electrically connected to these components to control overall operations and functions of the electronic device 100. Meanwhile, the sound amplification unit 110, speaker 120, microphone 130, microphone sound amplification unit 140, and processor 150 shown in FIG. 5 have been described in FIG. 2, and overlapping portions are described. Will be omitted.

프로세서(150)는 중앙처리장치, 어플리케이션 프로세서 및 커뮤니케이션 프로세서 중 하나 또는 그 이상을 포함할 수 있다. 프로세서(150)는 전자 장치(100)를 제어할 수 있다. 예를 들어, 프로세서(150)는 전자 장치(100)의 적어도 하나의 구성요소들의 제어 및 연산이나 데이터 처리 등을 실행할 수 있다. The processor 150 may include one or more of a central processing unit, an application processor, and a communication processor. The processor 150 may control the electronic device 100. For example, the processor 150 may control at least one component of the electronic device 100 and perform operations or data processing.

메모리(160)는 전자 장치(100)의 적어도 하나의 다른 구성요소에 관계된 인스트럭션(instruction) 또는 데이터를 저장할 수 있다. 메모리(160)는 비휘발성 메모리, 휘발성 메모리, 플래시메모리(flash-memory), 하드디스크 드라이브(HDD) 또는 솔리드 스테이트 드라이브(SSD) 등으로 구현될 수 있다. 메모리(160)는 프로세서(150)에 의해 액세스되며, 프로세서(150)에 의한 데이터의 독취/기록/수정/삭제/갱신 등이 수행될 수 있다. The memory 160 may store instructions or data related to at least one other component of the electronic device 100. The memory 160 may be implemented as a nonvolatile memory, a volatile memory, a flash memory, a hard disk drive (HDD), a solid state drive (SSD), or the like. The memory 160 is accessed by the processor 150, and data read/write/edit/delete/update by the processor 150 may be performed.

디스플레이(170)는 다양한 화면을 표시할 수 있다. 예를 들어, 디스플레이(170)는 전자 장치(100)의 동작과 관련된 다양한 화면을 표시할 수 있다.The display 170 may display various screens. For example, the display 170 may display various screens related to the operation of the electronic device 100.

이 경우, 디스플레이(170)는 터치 패널(191)과 결합하여 레이어 구조의 터치 스크린으로 구현될 수 있다. 터치 스크린은 디스플레이 기능뿐만 아니라 터치 입력 위치, 터치된 면적뿐만 아니라 터치 입력 압력까지도 검출하는 기능을 가질 수 있고, 또한 실질적인 터치(real-touch)뿐만 아니라 근접 터치(proximity touch)도 검출하는 기능을 가질 수 있다.In this case, the display 170 may be combined with the touch panel 191 to be implemented as a layered touch screen. The touch screen may have a function to detect not only a display function, but also a touch input position, a touched area, as well as a touch input pressure, and also has a function of detecting not only a real-touch but also a proximity touch. I can.

통신 인터페이스(180)는 외부 전자 장치 및 서버와 통신을 수행하기 위한 구성으로 회로(circuitry)를 포함할 수 있다. 이 경우, 통신 인터페이스(180)가 외부 전자 장치와 통신 연결되는 것은 제3 기기(예로, 중계기, 허브, 엑세스 포인트, 서버 또는 게이트웨이 등)를 거쳐서 통신하는 것을 포함할 수 있다. The communication interface 180 is a component for performing communication with an external electronic device and a server, and may include a circuit. In this case, the communication connection of the communication interface 180 with the external electronic device may include communication through a third device (eg, a repeater, a hub, an access point, a server, or a gateway).

예를 들어, 통신 인터페이스(180)는 LTE, WiFi(wireless fidelity) 등과 같은 무선 통신 및, USB(universal serial bus), HDMI(high definition multimedia interface) 등과 같은 유선 통신을 위한 구성요소를 포함할 수 있으며, 무선 통신 또는 유선 통신이 수행되는 네트워크는 텔레커뮤니케이션 네트워크, 예를 들면, 컴퓨터 네트워크(예: LAN 또는 WAN), 인터넷, 또는 텔레폰 네트워크 중 적어도 하나를 포함할 수 있다.For example, the communication interface 180 may include components for wireless communication such as LTE and wireless fidelity (WiFi), and wired communication such as universal serial bus (USB) and high definition multimedia interface (HDMI), and the like. , The network in which wireless communication or wired communication is performed may include at least one of a telecommunication network, for example, a computer network (eg, LAN or WAN), the Internet, or a telephone network.

또한, 통신 인터페이스(180)는 외부의 서버와 통신을 수행하여 인공지능 에이전트 서비스를 제공할 수 있다. In addition, the communication interface 180 may provide an artificial intelligence agent service by performing communication with an external server.

입력 인터페이스(190)는 다양한 사용자 커맨드(command)을 입력받기 위한 구성으로, 회로(circuitry)를 포함할 수 있다. 이 경우, 입력 인터페이스(190)는 입력된 사용자 커맨드를 프로세서(150)로 전달할 수 있다. 입력 인터페이스(190)는 예를 들면, 터치 패널(191), 또는 키(192)를 포함할 수 있다. 터치 패널(191)은, 예를 들면, 정전식, 감압식, 적외선 방식, 또는 초음파 방식 중 적어도 하나의 방식을 사용할 수 있다. 또한, 터치 패널(191)은 제어 회로를 더 포함할 수도 있다. 터치 패널(191)은 택타일 레이어(tactile layer)를 더 포함하여, 사용자에게 촉각 반응을 제공할 수 있다. 키(192)는 예를 들면, 물리적인 버튼, 광학식 키, 또는 키패드를 포함할 수 있다. The input interface 190 is configured to receive various user commands and may include a circuit. In this case, the input interface 190 may transmit the input user command to the processor 150. The input interface 190 may include, for example, a touch panel 191 or a key 192. The touch panel 191 may use at least one of, for example, a capacitive type, a pressure sensitive type, an infrared type, or an ultrasonic type. In addition, the touch panel 191 may further include a control circuit. The touch panel 191 may further include a tactile layer to provide a tactile reaction to a user. The key 192 may include, for example, a physical button, an optical key, or a keypad.

한편, 본 개시의 일 실시 예에 따르면, 프로세서(150)는 사용자 음성에 대한 응답을 제공할 수 있다.Meanwhile, according to an embodiment of the present disclosure, the processor 150 may provide a response to a user's voice.

구체적으로, 프로세서(150)는 음향 증폭부(110)의 게인에 따라 마이크 음향 증폭부(140)의 게인을 조절할 수 있다. 예를 들어, 프로세서(150)는 음향 증폭부(110)의 게인에 반비례하도록 마이크 음향 증폭부(140)의 게인을 조절할 수 있다. Specifically, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 according to the gain of the sound amplifying unit 110. For example, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 so as to be in inverse proportion to the gain of the sound amplifying unit 110.

이에 따라, 스피커(120)에서 출력된 오디오 신호 및 사용자가 발화한 음성 신호 등의 오디오 신호가 마이크(130)에 입력된 경우, 이들 신호 중 스피커(120)에서 출력되어 마이크(130)에 입력된 오디오 신호가 클리핑되는 것을 방지될 수 있다. 이에 따라, 해당 오디오 신호에 대한 음향 반향 제거가 효과적으로 수행될 수 있고, 이로 인해 사용자 음성에 대한 인식 정확도가 향상될 수 있다.Accordingly, when an audio signal such as an audio signal output from the speaker 120 and an audio signal uttered by a user is input to the microphone 130, among these signals, the audio signal is output from the speaker 120 and input to the microphone 130. Clipping of the audio signal can be prevented. Accordingly, acoustic echo cancellation for a corresponding audio signal can be effectively performed, and thus, recognition accuracy for a user's voice can be improved.

프로세서(150)는 마이크(130)에 입력된 오디오 신호에서 스피커(120)에서 출력된 오디오 신호를 제거하는 음향 반향 제거를 수행할 수 있다. 이에 따라, 마이크(130)에 입력된 오디오 신호 중에서 음성 신호가 남게 된다.The processor 150 may perform acoustic echo cancellation by removing the audio signal output from the speaker 120 from the audio signal input to the microphone 130. Accordingly, a voice signal remains among the audio signals input to the microphone 130.

이 경우, 프로세서(150)는 음성 신호에 대해 클리핑이 발생되었는지를 판단하고, 클리핑이 발생된 경우, 마이크 음향 증폭부(140)의 게인을 감소시킬 수 있다. In this case, the processor 150 may determine whether clipping has occurred with respect to the voice signal, and if the clipping has occurred, the gain of the microphone sound amplifying unit 140 may be reduced.

다만, 프로세서(150)는 클리핑이 발생되지 않은 경우, 음성 신호의 평균 레벨이 기설정된 레벨 이하인지를 판단할 수 있다.However, when clipping has not occurred, the processor 150 may determine whether the average level of the voice signal is less than or equal to a preset level.

이에 따라, 프로세서(150)는 음성 신호의 평균 레벨이 기설정된 레벨 이하인 경우, 마이크 음향 증폭부(140)의 게인을 증가시킬 수 있다. 이 경우, 프로세서(150)는 평균 레벨이 기설정된 레벨보다 높아지도록 마이크 음향 증폭부(140)의 게인을 증가시킬 수 있다.Accordingly, the processor 150 may increase the gain of the microphone sound amplifying unit 140 when the average level of the voice signal is less than or equal to a preset level. In this case, the processor 150 may increase the gain of the microphone sound amplifying unit 140 so that the average level is higher than a preset level.

한편, 프로세서(150)는 음성 신호의 평균 레벨이 기설정된 레벨 이하보다 큰 경우, 마이크 음향 증폭부(140)의 게인을 유지할 수 있다.Meanwhile, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 when the average level of the voice signal is greater than or equal to a preset level.

이후, 프로세서(150)는 마이크 음향 증폭부(140)에서 출력된 음성 신호에 대해, 음성 인식을 위한 전처리를 수행할 수 있다.Thereafter, the processor 150 may perform pre-processing for voice recognition on the voice signal output from the microphone sound amplifying unit 140.

예를 들어, 프로세서(150)는 음원 반향 측정, 빔포밍, 음원 분리, 잡음 억제 및 음성 개선 등의 전처리를 수행할 수 있다.For example, the processor 150 may perform preprocessing such as sound source echo measurement, beamforming, sound source separation, noise suppression, and voice improvement.

그리고, 프로세서(150)는 전처리된 음성 신호에 대한 음성 인식을 수행할 수 있다. In addition, the processor 150 may perform speech recognition on the preprocessed speech signal.

이 경우, 프로세서(150)는 인공지능 에이전트 프로그램을 이용하여 사용자 음성에 대한 응답을 제공할 수 있다.In this case, the processor 150 may provide a response to the user's voice using an artificial intelligence agent program.

구체적으로, 프로세서(150)는 음성 신호에 대한 음성 인식을 수행하여 사용자 음성을 텍스트로 변환하고, 음성 인식 결과에 기초하여 사용자 음성에 대한 도메인(domain), 의도(intend) 및 의도를 파악하는데 필요한 파라미터(parameter)(또는, 슬롯(slot)) 등을 파악할 수 있다. 그리고, 프로세서(150)는 사용자 의도에 따라 검색 등을 수행하여 사용자 음성에 대한 응답을 획득하고, 획득된 응답을 자연어로 처리하고, 이를 출력하여 사용자 음성에 대한 응답을 제공할 수 있다.Specifically, the processor 150 converts the user's speech into text by performing speech recognition on the speech signal, and is necessary to identify the domain, intention, and intention of the user's speech based on the speech recognition result. A parameter (or slot), etc. can be identified. Further, the processor 150 may perform a search or the like according to the user's intention to obtain a response to the user's voice, process the obtained response in natural language, and output the result to provide a response to the user's voice.

예를 들어, 전자 장치(100)는 날씨를 문의하는 의도를 갖는 사용자 음성에 대해, 현재 날씨를 검색하여 사용자 음성에 대한 응답을 획득하고, 응답을 자연어 처리하고, 획득된 자연어를 TTS(Text to Speech)를 통해 음성으로 변환하여 전자 장치(100)의 스피커(130)를 통해 출력할 수 있다.For example, the electronic device 100 retrieves the current weather for a user voice that has an intention to inquire about the weather, obtains a response to the user voice, processes the response in natural language, and converts the obtained natural language to text to Speech) may be converted into speech and output through the speaker 130 of the electronic device 100.

이에 의해, 대화 시스템은 사용자 음성에 대한 응답을 제공할 수 있게 되어, 사용자는 전자 장치(100)와 대화를 수행할 수 있게 된다.Accordingly, the conversation system can provide a response to the user's voice, so that the user can perform a conversation with the electronic device 100.

한편, 사용자 음성에 대한 음성 인식이 전자 장치(100)가 아닌 별도의 서버에서 수행될 수 있다.Meanwhile, voice recognition for the user's voice may be performed in a separate server other than the electronic device 100.

이 경우, 프로세서(150)는 전처리된 음성 신호를 통신 인터페이스(180)를 통해 서버로 전송할 수 있다. 서버는 전자 장치(100)로부터 수신된 음성 신호에 대해 음성 인식을 수행하고, 그에 대한 응답을 전자 장치(100)로 제공할 수 있다. 이때, 서버는 텍스트를 포함하는 응답을 제공할 수 있으나, 이는 일 예일 뿐이고 자연어 형태의 응답을 제공할 수도 있다.In this case, the processor 150 may transmit the preprocessed voice signal to the server through the communication interface 180. The server may perform voice recognition on a voice signal received from the electronic device 100 and provide a response to the voice signal to the electronic device 100. In this case, the server may provide a response including text, but this is only an example and may provide a response in a natural language form.

이에 따라, 프로세서(150)는 응답을 스피커(130)를 통해 출력할 수 있다. 이때, 프로세서(150)는 대화 시스템을 이용하여 응답을 자연어로 처리하여 출력할 수 있다. 다만, 이는 일 예일 뿐이고, 프로세서(150)는 디스플레이(170)를 통해 응답을 표시할 수도 있다.Accordingly, the processor 150 may output a response through the speaker 130. In this case, the processor 150 may process and output the response in natural language using a conversation system. However, this is only an example, and the processor 150 may display a response through the display 170.

도 6은 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 흐름도이다6 is a flowchart illustrating a method of controlling an electronic device according to an exemplary embodiment of the present disclosure

먼저, 제1 게인에 기초하여 오디오 신호를 증폭한다(S610). 이 경우, 전자 장치는 음향 증폭부를 이용하여 오디오 신호를 증폭할 수 있다. 이때, 오디오 신호가 증폭되는 정도는 제1 게인에 따라 결정될 수 있다.First, the audio signal is amplified based on the first gain (S610). In this case, the electronic device may amplify the audio signal using the sound amplification unit. In this case, the degree to which the audio signal is amplified may be determined according to the first gain.

이후, 증폭된 오디오 신호를 출력한다(S620). 이 경우, 전자 장치는 스피커를 통해 증폭된 오디오 신호를 출력할 수 있다.Thereafter, the amplified audio signal is output (S620). In this case, the electronic device may output an amplified audio signal through the speaker.

출력된 오디오 신호 및 사용자의 음성 신호가 입력되면, 제2 게인에 기초하여 입력된 신호를 증폭한다(S630).When the output audio signal and the user's voice signal are input, the input signal is amplified based on the second gain (S630).

이 경우, 전자 장치는 출력된 오디오 신호 및 사용자의 음성 신호를 마이크를 통해 입력받고, 마이크 음향 증폭부를 통해 입력된 신호를 증폭할 수 있다. 이때, 입력된 신호가 증폭되는 정보는 제2 게인에 따라 결정될 수 있다.In this case, the electronic device may receive the output audio signal and the user's voice signal through the microphone, and amplify the input signal through the microphone sound amplifier. In this case, information by which the input signal is amplified may be determined according to the second gain.

그리고, 증폭된 신호에 대한 음향 반향 제거를 수행하여 음성 신호를 획득하고, 음성 신호에 대해 음성 인식을 위한 전처리를 수행한다(S650).Then, acoustic echo cancellation is performed on the amplified signal to obtain a speech signal, and preprocessing for speech recognition is performed on the speech signal (S650).

한편, S630 단계는 제1 게인에 기초하여 제2 게인을 결정하고, 결정된 제2 게인에 기초하여 입력된 신호를 증폭할 수 있다.Meanwhile, in step S630, a second gain may be determined based on the first gain, and an input signal may be amplified based on the determined second gain.

이 경우, S630 단계는 제1 게인과 반비례하도록 제2 게인을 조절할 수 있다.In this case, in step S630, the second gain may be adjusted to be inversely proportional to the first gain.

또한, S610 단계는 전자 장치의 볼륨을 조절하기 위한 사용자 명령이 수신되면, 사용자 명령에 기초하여 제1 게인을 조절하고, S630 단계는 조절된 제1 게인에 기초하여 제2 게인을 조절할 수 있다.Also, in step S610, when a user command for adjusting the volume of the electronic device is received, a first gain may be adjusted based on the user command, and in step S630, a second gain may be adjusted based on the adjusted first gain.

한편, S630 단계는 제1 게인이 증가된 경우, 제2 게인을 감소시킬 수 있다.Meanwhile, in step S630, when the first gain is increased, the second gain may be decreased.

한편, 획득된 음성 신호에 클리핑이 발생하였는지를 판단하고, 클리핑 발생 여부에 기초하여 제2 게인을 조절할 수 있다.Meanwhile, it is possible to determine whether clipping has occurred in the acquired voice signal, and adjust the second gain based on whether clipping has occurred.

구체적으로, 획득된 음성 신호에 클리핑이 발생된 경우, 제2 게인을 감소시키고, 획득된 음성 신호에 클리핑이 발생되지 않은 경우, 제2 게인을 유지할 수 있다.Specifically, when clipping occurs in the acquired audio signal, the second gain may be reduced, and when no clipping occurs in the acquired audio signal, the second gain may be maintained.

한편, 획득된 음성 신호의 평균 레벨을 판단하고, 평균 레벨이 기설정된 레벨보다 낮은 경우, 획득된 음성 신호의 평균 레벨이 기설정된 레벨보다 높아지도록 상기 제2 게인을 증가시킬 수 있다.Meanwhile, when the average level of the acquired voice signal is determined and the average level is lower than a preset level, the second gain may be increased so that the average level of the acquired voice signal is higher than the preset level.

한편, 마이크 음향 증폭부의 게잉늘 조절하는 구체적인 방법에 대해서는 전술한 바 있다.On the other hand, a specific method of adjusting the gain of the microphone sound amplifying unit has been described above.

한편, 본 개시에서 사용된 용어 "부" 또는 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구성된 유닛을 포함하며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로 등의 용어와 상호 호환적으로 사용될 수 있다. "부" 또는 "모듈"은, 일체로 구성된 부품 또는 하나 또는 그 이상의 기능을 수행하는 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 모듈은 ASIC(application-specific integrated circuit)으로 구성될 수 있다.Meanwhile, the term "unit" or "module" used in the present disclosure includes a unit composed of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic blocks, parts, or circuits. I can. The “unit” or “module” may be an integrally configured part or a minimum unit that performs one or more functions, or a part thereof. For example, the module may be configured as an application-specific integrated circuit (ASIC).

본 개시의 다양한 실시예들은 기기(machine)(예: 컴퓨터)로 읽을 수 있는 저장 매체(machine-readable storage media에 저장된 명령어를 포함하는 소프트웨어로 구현될 수 있다. 기기는, 저장 매체로부터 저장된 명령어를 호출하고, 호출된 명령어에 따라 동작이 가능한 장치로서, 개시된 실시예들에 따른 전자 장치(예: 전자 장치(100))를 포함할 수 있다. 상기 명령이 프로세서에 의해 실행될 경우, 프로세서가 직접, 또는 상기 프로세서의 제어 하에 다른 구성요소들을 이용하여 상기 명령에 해당하는 기능을 수행할 수 있다. 명령은 컴파일러 또는 인터프리터에 의해 생성 또는 실행되는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다.Various embodiments of the present disclosure may be implemented as software including instructions stored in a machine-readable storage medium (eg, a computer). The device receives instructions stored from the storage medium. A device capable of making a call and operating according to the called command, and may include an electronic device (eg, the electronic device 100) according to the disclosed embodiments. When the command is executed by a processor, the processor directly, Alternatively, a function corresponding to the instruction may be performed using other components under the control of the processor, and the instruction may include a code generated or executed by a compiler or an interpreter. , May be provided in the form of a non-transitory storage medium, where "non-transitory" means that the storage medium does not contain a signal and is tangible. It does not distinguish between being stored semi-permanently or temporarily.

일시예에 따르면, 본 개시에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로, 또는 어플리케이션 스토어를 통해 온라인으로 배포될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to an example, a method according to various embodiments disclosed in the present disclosure may be provided by being included in a computer program product. Computer program products can be traded between sellers and buyers as commodities. The computer program product may be distributed in the form of a device-readable storage medium (eg, compact disc read only memory (CD-ROM)) or online through an application store. In the case of online distribution, at least some of the computer program products may be temporarily stored or temporarily generated in a storage medium such as a server of a manufacturer, a server of an application store, or a memory of a relay server.

다양한 실시예들에 따른 구성 요소(예: 모듈 또는 프로그램) 각각은 단수 또는 복수의 개체로 구성될 수 있으며, 전술한 해당 서브 구성 요소들 중 일부 서브 구성 요소가 생략되거나, 또는 다른 서브 구성 요소가 다양한 실시예에 더 포함될 수 있다. 대체적으로 또는 추가적으로, 일부 구성 요소들(예: 모듈 또는 프로그램)은 하나의 개체로 통합되어, 통합되기 이전의 각각의 해당 구성 요소에 의해 수행되는 기능을 동일 또는 유사하게 수행할 수 있다. 다양한 실시예들에 따른, 모듈, 프로그램 또는 다른 구성 요소에 의해 수행되는 동작들은 순차적, 병렬적, 반복적 또는 휴리스틱하게 실행되거나, 적어도 일부 동작이 다른 순서로 실행되거나, 생략되거나, 또는 다른 동작이 추가될 수 있다.Each of the constituent elements (eg, a module or a program) according to various embodiments may be composed of a singular or a plurality of entities, and some sub-elements of the above-described sub-elements are omitted, or other sub-elements are It may be further included in various embodiments. Alternatively or additionally, some constituent elements (eg, a module or a program) may be integrated into a single entity, and functions performed by each corresponding constituent element prior to the consolidation may be performed identically or similarly. Operations performed by modules, programs, or other components according to various embodiments are sequentially, parallel, repetitively or heuristically executed, at least some operations are executed in a different order, omitted, or other operations are added. Can be.

100 : 전자 장치 110 : 음향 증폭부
120 : 스피커 130 : 마이크
140 : 마이크 음향 증폭부 150 : 프로세서100: electronic device 110: sound amplification unit
120: speaker 130: microphone
140: microphone sound amplification unit 150: processor

Claims

In the electronic device,
Sound amplification unit;
speaker;
MIC;
Microphone sound amplification unit; And
When an audio signal is amplified through the sound amplifying unit, the amplified audio signal is output through the speaker, and an audio signal output through the speaker and a user's voice signal are input to the microphone, a signal input to the microphone Is amplified through the microphone sound amplifying unit, acoustic echo cancellation (AEC) is performed on the amplified signal to obtain the speech signal, and preprocessing for speech recognition is performed on the obtained speech signal Including;
The processor,
An electronic device for determining a gain of the microphone sound amplifying unit based on the gain of the sound amplifying unit and amplifying the input signal through the microphone sound amplifying unit based on the determined gain.

The method of claim 1,
The processor,
An electronic device that adjusts the gain of the microphone sound amplifying unit so as to be in inverse proportion to the gain of the sound amplifying unit.

The method of claim 2,
The processor,
When a user command for adjusting the volume of the electronic device is received, the gain of the sound amplifying part is adjusted based on the user command, and the gain of the microphone sound amplifying part is adjusted based on the adjusted gain of the sound amplifying part. Device.

The method of claim 3,
The processor,
When the gain of the sound amplifying part is increased, the electronic device for reducing the gain of the microphone sound amplifying part.

The method of claim 1,
The processor,
An electronic device that determines whether clipping has occurred in the acquired voice signal and adjusts a gain of the microphone sound amplifying unit based on whether the clipping has occurred.

The method of claim 5,
The processor,
An electronic device configured to reduce the gain of the microphone sound amplifying unit when clipping occurs in the acquired audio signal, and maintain the gain of the microphone sound amplifying unit when clipping does not occur in the acquired audio signal.

The method of claim 1,
The processor,
An electronic device configured to determine an average level of the acquired voice signal and, when the average level is less than or equal to a preset level, increase the gain of the microphone sound amplifying unit such that the average level of the acquired voice signal is higher than the preset level.

In the control method of an electronic device,
Amplifying the audio signal based on the first gain;
Outputting the amplified audio signal;
Amplifying the input signal based on a second gain when the output audio signal and the user's voice signal are input; And
Acquiring a speech signal by performing Acoustic Echo Cancellation (AEC) on the amplified signal, and performing pre-processing for speech recognition on the speech signal; including,
Amplifying the input signal,
A control method for determining the second gain based on the first gain and amplifying the input signal based on the determined second gain.

The method of claim 8,
In the step of amplifying the input signal, the control method of adjusting the second gain to be in inverse proportion to the first gain.

The method of claim 9,
In the amplifying the audio signal, when a user command for adjusting the volume of the electronic device is received, adjusting the first gain based on the user command,
The step of amplifying the input signal may include adjusting the second gain based on the adjusted first gain.

The method of claim 10,
The amplifying the input signal is a control method of reducing the second gain when the first gain is increased.

The method of claim 8,
Determining whether clipping has occurred in the acquired voice signal, and adjusting the second gain based on whether the clipping has occurred.

The method of claim 12,
The adjusting step,
A control method for reducing the second gain when clipping occurs in the acquired audio signal, and maintaining the second gain when clipping does not occur in the acquired audio signal.

The method of claim 8,
Determining an average level of the acquired voice signal, and when the average level is less than or equal to a preset level, increasing the second gain so that the average level of the acquired voice signal is higher than the preset level; How to control.