KR20240038532A

KR20240038532A - Method for operating singing mode and electronic device performing the same

Info

Publication number: KR20240038532A
Application number: KR1020220131592A
Authority: KR
Inventors: 이철민
Original assignee: 삼성전자주식회사
Priority date: 2022-09-16
Filing date: 2022-10-13
Publication date: 2024-03-25

Abstract

가창 모드 동작 방법 및 이를 수행하는 전자 장치이 개시된다. 일 실시예에 따른 무선 오디오 장치(102;202;302)는 인스트럭션들을 포함하는 메모리(141;531;532)와, 상기 메모리(141;531;532)와 전기적으로 연결되고, 상기 인스트럭션들을 실행하기 위한 프로세서(131;521;522)를 포함할 수 있다. 상기 프로세서(131;521;522)에 의해 상기 인스트럭션들이 실행될 때, 상기 프로세서(131;521;522)는, 복수의 동작들을 수행할 수 있다. 상기 복수의 동작들은 오디오 신호의 분석 결과에 기초하여 상기 무선 오디오 장치(102;202;302)의 동작 모드를 가창 모드 및 대화 모드 중 어느 하나로 결정하는 동작, 결정된 모드에 따라 상기 무선 오디오 장치(102;202;302)의 출력 신호를 제어하는 동작을 포함할 수 있다.A method of operating a singing mode and an electronic device performing the same are disclosed. A wireless audio device (102;202;302) according to an embodiment includes a memory (141;531;532) including instructions, is electrically connected to the memory (141;531;532), and executes the instructions. It may include processors (131; 521; 522) for. When the instructions are executed by the processor (131;521;522), the processor (131;521;522) may perform a plurality of operations. The plurality of operations include determining an operation mode of the wireless audio device (102; 202; 302) as one of a singing mode and a conversation mode based on the analysis result of the audio signal, and determining the operation mode of the wireless audio device (102; 202; 302) according to the determined mode. ;202;302) may include an operation of controlling the output signal.

Description

Singing mode operation method and electronic device performing the same {METHOD FOR OPERATING SINGING MODE AND ELECTRONIC DEVICE PERFORMING THE SAME}

본 발명의 실시예들은 가창 모드 동작 방법 및 이를 수행하는 전자 장치에 관한 것이다.Embodiments of the present invention relate to a method of operating a singing mode and an electronic device that performs the same.

이어버드(earbud)와 같은 무선 오디오 장치가 널리 이용되고 있다. 무선 오디오 장치는 휴대폰과 같은 전자 장치와 무선으로 연결되어, 휴대폰으로부터 수신되는 오디오 데이터를 출력할 수 있다. 무선 오디오 장치는 전자 장치와 무선으로 연결되기 때문에, 사용자의 편의성이 증대될 수 있다. 편의성의 증대로 인하여, 사용자의 무선 오디오 장치의 착용 시간이 증가될 수 있다.Wireless audio devices such as earbuds are widely used. A wireless audio device can be wirelessly connected to an electronic device such as a mobile phone and output audio data received from the mobile phone. Since wireless audio devices are wirelessly connected to electronic devices, user convenience can be increased. Due to increased convenience, the time a user wears a wireless audio device may increase.

무선 오디오 장치는 사용자의 귀에 착용될 수 있다. 무선 오디오 장치의 착용으로 인하여, 사용자는 외부의 소리를 듣기 어려울 수 있다. 무선 오디오 장치의 착용자가 외부의 소리를 들을 수 있도록, 무선 오디오 장치는 주변 소리(ambient sound)를 출력할 수 있다. 예를 들어, 무선 오디오 장치는 무선 오디오 장치의 마이크에 의하여 수신된 소리를 실시간으로 출력함으로써, 사용자에게 주변 소리를 제공할 수 있다.A wireless audio device may be worn on the user's ears. When wearing a wireless audio device, it may be difficult for the user to hear external sounds. The wireless audio device can output ambient sound so that the wearer of the wireless audio device can hear external sounds. For example, a wireless audio device can provide surrounding sounds to the user by outputting sound received by the microphone of the wireless audio device in real time.

일 실시예에 따른 무선 오디오 장치(102;202;302)는 인스트럭션들을 포함하는 메모리(141;531;532)와, 상기 메모리(141;531;532)와 전기적으로 연결되고, 상기 인스트럭션들을 실행하기 위한 프로세서(131;521;522)를 포함할 수 있다. 상기 프로세서(131;521;522)에 의해 상기 인스트럭션들이 실행될 때, 상기 프로세서(131;521;522)는, 복수의 동작들을 수행할 수 있다. 상기 복수의 동작들은 오디오(audio) 신호를 감지하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 오디오 신호의 분석 결과에 기초하여 상기 무선 오디오 장치(102;202;302)의 동작 모드를 가창 모드 및 대화 모드 중 어느 하나로 결정하는 동작을 포함할 수 있다. 상기 복수의 동작들은 결정된 모드에 따라 상기 무선 오디오 장치(102;202;302)의 출력 신호를 제어하는 동작을 포함할 수 있다. 상기 대화 모드는 상기 오디오 신호에 포함된 주변 소리(ambient sound)의 적어도 일부를 출력하는 모드이고, 상기 가창 모드는 상기 오디오 신호에 포함된 주변 소리 및 미디어의 적어도 일부를 출력하는 모드일 수 있다.A wireless audio device (102;202;302) according to an embodiment includes a memory (141;531;532) including instructions, is electrically connected to the memory (141;531;532), and executes the instructions. It may include processors (131; 521; 522) for. When the instructions are executed by the processor (131;521;522), the processor (131;521;522) may perform a plurality of operations. The plurality of operations may include detecting an audio signal. The plurality of operations may include determining an operation mode of the wireless audio device (102; 202; 302) as one of a singing mode and a conversation mode based on an analysis result of the audio signal. The plurality of operations may include controlling an output signal of the wireless audio device (102; 202; 302) according to the determined mode. The conversation mode may be a mode for outputting at least part of the ambient sound included in the audio signal, and the singing mode may be a mode for outputting at least a part of the ambient sound and media included in the audio signal.

일 실시예에 따른 무선 오디오 장치(102;202;302)는 인스트럭션들을 포함하는 메모리(141;531;532)와, 상기 메모리(141;531;532)와 전기적으로 연결되고, 상기 인스트럭션들을 실행하기 위한 프로세서(131;521;522)를 포함할 수 있다. 상기 프로세서(131;521;522)에 의해 상기 인스트럭션들이 실행될 때, 상기 프로세서(131;521;522)는 복수의 동작들을 수행할 수 있다. 상기 복수의 동작들은 오디오 신호를 감지하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 무선 오디오 장치(102;202;302)의 상기 오디오 신호에 대한 동작 모드를 가창 모드로 결정하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 가창 모드에 따라 상기 무선 오디오 장치(102;202;302)의 출력 신호를 제어하는 동작을 포함할 수 있다. 상기 가창 모드는 상기 오디오 신호에 포함된 주변 소리 및 미디어의 적어도 일부를 출력하는 모드일 수 있다.A wireless audio device (102;202;302) according to an embodiment includes a memory (141;531;532) including instructions, is electrically connected to the memory (141;531;532), and executes the instructions. It may include processors (131; 521; 522) for. When the instructions are executed by the processor (131;521;522), the processor (131;521;522) may perform a plurality of operations. The plurality of operations may include detecting an audio signal. The plurality of operations may include determining an operation mode for the audio signal of the wireless audio device (102; 202; 302) as a singing mode. The plurality of operations may include controlling an output signal of the wireless audio device (102; 202; 302) according to the singing mode. The singing mode may be a mode that outputs at least part of the surrounding sounds and media included in the audio signal.

일 실시예에 따른 무선 오디오 장치(102;202;302)는 인스트럭션들을 포함하는 메모리(141;531;532)와, 상기 메모리(141;531;532)와 전기적으로 연결되고, 상기 인스트럭션들을 실행하기 위한 프로세서(131;521;522)를 포함할 수 있다. 상기 프로세서(131;521;522)에 의해 상기 인스트럭션들이 실행될 때, 상기 프로세서(131;521;522)는 복수의 동작들을 수행할 수 있다. 상기 복수의 동작들은 오디오 신호를 감지하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 오디오 신호의 분석 결과에 기초하여 상기 오디오 신호에 대한 상기 무선 오디오 장치(102;202;302)의 동작 모드를 가창 모드 또는 대화 모드 중 어느 하나로 결정하는 동작을 포함할 수 있다. 상기 복수의 동작들은 결정된 모드가 상기 대화 모드일 경우, 상기 오디오 신호에 포함된 주변 소리(ambient sound)의 적어도 일부를 출력하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 결정된 모드가 상기 가창 모드일 경우, 상기 오디오 신호에 포함된 주변 소리 및 미디어의 적어도 일부를 출력하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 가창 모드에서, 상기 주변 소리에서 가창 음성이 지정된 시간 이상 검출되지 않는 경우 상기 가창 모드를 비활성화하는 동작을 포함할 수 있다.A wireless audio device (102;202;302) according to an embodiment includes a memory (141;531;532) including instructions, is electrically connected to the memory (141;531;532), and executes the instructions. It may include processors (131; 521; 522) for. When the instructions are executed by the processor (131;521;522), the processor (131;521;522) may perform a plurality of operations. The plurality of operations may include detecting an audio signal. The plurality of operations may include determining an operation mode of the wireless audio device (102; 202; 302) for the audio signal as either a singing mode or a conversation mode based on an analysis result of the audio signal. . The plurality of operations may include outputting at least a portion of the ambient sound included in the audio signal when the determined mode is the conversation mode. The plurality of operations may include outputting at least a portion of ambient sounds and media included in the audio signal when the determined mode is the singing mode. The plurality of operations may include an operation of deactivating the singing mode when a singing voice is not detected in the ambient sound for more than a specified time in the singing mode.

도 1 은 일 실시예에 따른 통합 지능(integrated intelligence) 시스템을 나타낸 블록도이다.
도 2는 일 실시예에 따른 통합 지능 시스템을 나타낸 블록도이다.
도 3은 일 실시예에 따른 무선 오디오 장치와 전자 장치의 통신 환경을 도시한다.
도 4는 일 실시예에 따른 전자 장치와 무선 오디오 장치들의 블록도를 도시한다.
도 5는 일 실시예에 따른 제1 무선 오디오 장치의 정면도 및 배면도를 도시한다.
도 6는 일 실시예에 따른 무선 오디오 장치의 블록도를 도시한다.
도 7은 일 실시예에 따른 무선 오디오 장치의 구성을 도시한 블록도이다.
도 8은 일 실시예에 따른 무선 오디오 장치가 출력 신호를 제어하는 동작을 설명하기 위한 흐름도이다.
도 9는 일 실시예에 따른 무선 오디오 장치가 가창 모드 및 대화 모드 중 어느 하나에 따라 출력 신호를 제어하는 동작을 설명하기 위한 흐름도이다.
도 10은 일 실시예에 따른 유사도 판단 모듈의 개략도이다.
도 11은 일 실시예에 따른 가창 모드 모듈의 개략도이다.
도 12a 및 도 12b는 일 실시예에 따른 전자 장치의 디스플레이에 출력된 화면의 일 예이다.Figure 1 is a block diagram showing an integrated intelligence system according to an embodiment.
Figure 2 is a block diagram showing an integrated intelligence system according to one embodiment.
FIG. 3 illustrates a communication environment between a wireless audio device and an electronic device according to an embodiment.
Figure 4 shows a block diagram of electronic devices and wireless audio devices according to one embodiment.
Figure 5 shows front and rear views of a first wireless audio device according to one embodiment.
Figure 6 shows a block diagram of a wireless audio device according to one embodiment.
Figure 7 is a block diagram showing the configuration of a wireless audio device according to an embodiment.
FIG. 8 is a flowchart illustrating an operation of a wireless audio device controlling an output signal according to an embodiment.
FIG. 9 is a flowchart illustrating an operation of a wireless audio device controlling an output signal according to one of a singing mode and a conversation mode, according to an embodiment.
Figure 10 is a schematic diagram of a similarity determination module according to one embodiment.
Figure 11 is a schematic diagram of a singing mode module according to one embodiment.
12A and 12B are examples of screens output on a display of an electronic device according to an embodiment.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 통합 지능(integrated intelligence) 시스템을 나타낸 블록도이다.Figure 1 is a block diagram showing an integrated intelligence system according to an embodiment.

도 1을 참조하면, 일 실시예의 통합 지능 시스템은 제1 전자 장치(101)(예: 사용자 단말), 제2 전자 장치(102)(예: 이어버드 또는 마이크를 포함하는 임의의 장치), 지능형 서버(100), 및 서비스 서버(103)를 포함할 수 있다.Referring to Figure 1, an integrated intelligence system in one embodiment includes a first electronic device 101 (e.g., a user terminal), a second electronic device 102 (e.g., any device including earbuds or a microphone), and an intelligent system. It may include a server 100 and a service server 103.

도시된 실시 예에 따르면, 제1 전자 장치(101)는 통신 인터페이스(110), 입출력(Input/Output, I/O) 인터페이스(120), 적어도 하나의 프로세서(130), 및/또는 메모리(140)를 포함할 수 있다. 상기 열거된 구성요소들은 서로 작동적으로 또는 전기적으로 연결될 수 있다.According to the illustrated embodiment, the first electronic device 101 includes a communication interface 110, an input/output (I/O) interface 120, at least one processor 130, and/or a memory 140. ) may include. The components listed above may be operatively or electrically connected to each other.

일 실시예에서, 통신 인터페이스(110)는 제1 네트워크(199)(예: 셀룰러 네트워크 및/또는 WLAN(wireless local area network)을 포함하는 임의의 네트워크)를 통해 외부 장치(예: 지능형 서버(100), 또는 서비스 서버(103))와 연결되어 데이터를 송수신할 수 있다. 통신 인터페이스(110)는 제2 네트워크(198)(예: 근거리 무선 통신 네트워크)를 통해 외부 장치(예: 제2 전자 장치(102))와의 데이터 송수신을 지원할 수 있다.In one embodiment, communication interface 110 connects to an external device (e.g., intelligent server 100) via first network 199 (e.g., any network including a cellular network and/or wireless local area network (WLAN)). ), or connected to the service server 103) to transmit and receive data. The communication interface 110 may support data transmission and reception with an external device (e.g., the second electronic device 102) through the second network 198 (e.g., a short-range wireless communication network).

일 실시예에서, I/O 인터페이스(120)는 입출력 디바이스(미도시)(예: 마이크, 스피커, 및/또는 디스플레이)를 이용하여, 사용자 입력을 수신하거나, 수신된 사용자 입력을 처리하거나, 및/또는 프로세서(130)에 의해 처리된 결과를 출력할 수 있다.In one embodiment, I/O interface 120 uses input/output devices (not shown) (e.g., microphones, speakers, and/or displays) to receive user input, process received user input, and /Or the results processed by the processor 130 may be output.

일 실시예에서, 프로세서(130)는 통신 인터페이스(110), I/O 인터페이스(120), 및/또는 메모리(140)와 전기적으로 연결되어 지정된 동작을 수행할 수 있다. 프로세서(130)는 메모리(140)에 저장된 프로그램(또는, 하나 이상의 인스트럭션)을 실행하여 지정된 동작을 수행할 수 있다. 예를 들어, 프로세서(130)는 I/O 인터페이스(120)를 통해 사용자의 음성 입력(예: 사용자 발화)을 수신할 수 있다. 예를 들어, 프로세서(130)는 통신 인터페이스(110)를 통하여, 제2 전자 장치(102)에 의하여 수신된 사용자의 음성 입력을 수신할 수 있다. 프로세서(130)는 통신 인터페이스(110)를 통해 수신된 음성 입력을 지능형 서버(100)로 송신할 수 있다.In one embodiment, the processor 130 may be electrically connected to the communication interface 110, the I/O interface 120, and/or the memory 140 to perform designated operations. The processor 130 may perform a designated operation by executing a program (or one or more instructions) stored in the memory 140. For example, the processor 130 may receive a user's voice input (eg, user speech) through the I/O interface 120. For example, the processor 130 may receive the user's voice input received by the second electronic device 102 through the communication interface 110. The processor 130 may transmit the voice input received through the communication interface 110 to the intelligent server 100.

일 실시예에서, 프로세서(130)는 음성 입력에 대응되는 결과를 지능형 서버(100)로부터 수신할 수 있다. 예를 들어, 프로세서(130)는 음성 입력에 대응되는 플랜(plan) 및/또는 플랜을 이용하여 산출된 결과를 지능형 서버(100)로부터 수신할 수 있다. 프로세서(130)는 지능형 서버(100)로부터 음성 입력에 대응되는 플랜을 생성하기 위해 필요한 정보(예: 파라미터)를 획득하기 위한 요청을 수신할 수 있다. 프로세서(130)는 상기 요청에 응답하여 상기 필요한 정보를 지능형 서버(100)로 송신할 수 있다.In one embodiment, the processor 130 may receive a result corresponding to the voice input from the intelligent server 100. For example, the processor 130 may receive a plan corresponding to the voice input and/or a result calculated using the plan from the intelligent server 100. The processor 130 may receive a request from the intelligent server 100 to obtain information (eg, parameters) necessary to create a plan corresponding to the voice input. The processor 130 may transmit the necessary information to the intelligent server 100 in response to the request.

일 실시예에서, 프로세서(130)는 플랜에 따라 지정된 동작을 실행한 결과를 I/O 인터페이스(120)를 통해 시각적, 촉각적, 및/또는 음성적으로 출력할 수 있다. 예를 들어, 프로세서(130)는 복수의 동작의 실행 결과를 순차적으로 디스플레이에 표시할 수 있다. 일 예로, 프로세서(130)는 복수의 동작을 실행한 일부 결과(예: 마지막 동작의 결과)만을 디스플레이에 표시할 수 있다. 프로세서(130)는 실행 결과 또는 일부의 실행 결과를 제2 네트워크(198)를 통해 제2 전자 장치(102)에 송신함으로써, 제2 전자 장치(102)를 통하여 피드백을 제공할 수 있다.In one embodiment, the processor 130 may output the results of executing a specified operation according to the plan visually, tactilely, and/or vocally through the I/O interface 120. For example, the processor 130 may sequentially display execution results of a plurality of operations on the display. As an example, the processor 130 may display only partial results of executing a plurality of operations (eg, the result of the last operation) on the display. The processor 130 may transmit an execution result or a partial execution result to the second electronic device 102 through the second network 198, thereby providing feedback through the second electronic device 102.

일 실시예에서, 프로세서(130)는 제한된 기능을 수행하는 음성 입력을 인식할 수 있다. 예를 들어, 프로세서(130)는 지정된 음성 입력(예: 웨이크 업!)에 대응하여 음성 입력을 처리하기 위한 지능형 앱(또는, 음성 인식 앱)을 실행할 수 있다. 프로세서(130)는 지능형 앱(app)(또는, 어플리케이션 프로그램(application program)을 통해 음성 인식 서비스를 제공할 수 있다. 프로세서(130)는 지능형 앱을 통해 음성 입력을 지능형 서버(100)로 송신하고, 지능형 서버(100)로부터 음성 입력에 대응되는 결과를 수신할 수 있다.In one embodiment, processor 130 may recognize voice input to perform limited functions. For example, the processor 130 may execute an intelligent app (or voice recognition app) to process voice input in response to a designated voice input (e.g., wake up!). The processor 130 may provide a voice recognition service through an intelligent app (or application program). The processor 130 transmits voice input to the intelligent server 100 through the intelligent app. , a result corresponding to the voice input can be received from the intelligent server 100.

일 예에 따르면, 제2 전자 장치(102)는 통신 인터페이스(111), 입출력(Input/Output, I/O) 인터페이스(121), 적어도 하나의 프로세서(131), 및/또는 메모리(141)를 포함할 수 있다. 상기 열거된 구성요소들은 서로 작동적으로 또는 전기적으로 연결될 수 있다. 일 예에서, 제2 전자 장치(102)는 하나의 세트로 구성된 복수의 전자 장치들의 집합(예: 좌측 이어버드와 우측 이어버드)일 수 있다.According to one example, the second electronic device 102 includes a communication interface 111, an input/output (I/O) interface 121, at least one processor 131, and/or a memory 141. It can be included. The components listed above may be operatively or electrically connected to each other. In one example, the second electronic device 102 may be a set of a plurality of electronic devices (eg, a left earbud and a right earbud).

일 실시예에서, 통신 인터페이스(111)는 제2 네트워크(198)를 통한 외부 장치(예: 제1 전자 장치(101))와의 연결을 지원할 수 있다. I/O 인터페이스(121)는 입출력 디바이스(미도시)(예: 적어도 하나의 마이크, 적어도 하나의 스피커, 및/또는 버튼)를 이용하여, 사용자 입력을 수신하거나, 수신된 사용자 입력을 처리하거나, 및/또는 프로세서(131)에 의해 처리된 결과를 출력할 수 있다.In one embodiment, the communication interface 111 may support connection with an external device (eg, the first electronic device 101) through the second network 198. The I/O interface 121 uses an input/output device (not shown) (e.g., at least one microphone, at least one speaker, and/or button) to receive user input, process the received user input, or And/or the results processed by the processor 131 may be output.

일 실시예에서, 프로세서(131)는 통신 인터페이스(111), I/O 인터페이스(121), 및/또는 메모리(141)와 전기적으로 연결되어 지정된 동작을 수행할 수 있다. 프로세서(131)는 메모리(141)에 저장된 프로그램(또는, 하나 이상의 인스트럭션)을 실행하여 지정된 동작을 수행할 수 있다. 예를 들어, 프로세서(131)는 I/O 인터페이스(121)를 통해 사용자의 음성 입력(예: 사용자 발화)을 수신할 수 있다. 일 예에서, 프로세서(131)는 제2 전자 장치(102)의 적어도 하나의 센서(미도시)를 이용하여 VAD(voice activity detection)를 수행할 수 있다. 프로세서(131)는 가속도 센서(미도시) 및/또는 마이크를 이용하여 제2 전자 장치(102)의 사용자의 발화를 감지할 수 있다.In one embodiment, the processor 131 may be electrically connected to the communication interface 111, the I/O interface 121, and/or the memory 141 to perform a designated operation. The processor 131 may perform a designated operation by executing a program (or one or more instructions) stored in the memory 141. For example, the processor 131 may receive a user's voice input (eg, user speech) through the I/O interface 121. In one example, the processor 131 may perform voice activity detection (VAD) using at least one sensor (not shown) of the second electronic device 102. The processor 131 may detect the user's speech of the second electronic device 102 using an acceleration sensor (not shown) and/or a microphone.

일 실시예에서, 프로세서(131)는 통신 인터페이스(111)를 이용하여, 수신된 음성 입력을 제1 전자 장치(101)로 제2 네트워크(198)를 통하여 송신할 수 있다.In one embodiment, the processor 131 may transmit the received voice input to the first electronic device 101 through the second network 198 using the communication interface 111.

일 실시예에서, 프로세서(131)는 음성 입력에 대응되는 결과를 제1 전자 장치(101)로부터 제2 네트워크(198)를 통해 수신할 수 있다. 예를 들어, 프로세서(131)는 음성 입력에 대응되는 결과에 대응하는 데이터(예: 텍스트 데이터)를 제1 전자 장치(101)로부터 수신할 수 있다. 프로세서(131)는 수신된 결과를 I/O 인터페이스(121)를 통해 출력할 수 있다. In one embodiment, the processor 131 may receive a result corresponding to the voice input from the first electronic device 101 through the second network 198. For example, the processor 131 may receive data (e.g., text data) corresponding to a result corresponding to a voice input from the first electronic device 101. The processor 131 may output the received result through the I/O interface 121.

일 실시예에서, 프로세서(131)는 제한된 기능을 수행하는 음성 입력을 인식할 수 있다. 예를 들어, 프로세서(131)는 지정된 음성 입력(예: 웨이크 업!)에 대응하여 음성 입력을 처리하기 위한 지능형 앱(또는, 음성 인식 앱)의 실행을 제1 전자 장치(101)에 요청할 수 있다. In one embodiment, processor 131 may recognize voice input to perform limited functions. For example, the processor 131 may request the first electronic device 101 to execute an intelligent app (or voice recognition app) to process the voice input in response to a specified voice input (e.g., wake up!). there is.

일 실시 예의 지능형 서버(100)는 제1 네트워크(199)를 통해 제1 전자 장치(101)로부터 사용자의 음성 입력을 수신할 수 있다. 지능형 서버(100)는 수신된 음성 입력에 대응하는 오디오 데이터(audio data)를 텍스트 데이터(text data)로 변환할 수 있다. 지능형 서버(100)는 텍스트 데이터에 기초하여 사용자 음성 입력에 대응되는 태스크(task)를 수행하기 위한 적어도 하나의 플랜(plan)을 생성할 수 있다. 지능형 서버(100)는 생성된 플랜, 또는 생성된 플랜에 따른 결과를 제1 네트워크(199)를 통해 제1 전자 장치(101)로 송신할 수 있다.The intelligent server 100 of one embodiment may receive a user's voice input from the first electronic device 101 through the first network 199. The intelligent server 100 may convert audio data corresponding to the received voice input into text data. The intelligent server 100 may generate at least one plan for performing a task corresponding to the user's voice input based on text data. The intelligent server 100 may transmit the generated plan or a result according to the generated plan to the first electronic device 101 through the first network 199.

일 실시 예의 지능형 서버(100)는 프론트 엔드(front end)(160), 자연어 플랫폼(natural language platform)(150), 캡슐 데이터베이스(capsule database)(190), 실행 엔진(execution engine)(170), 및/또는 엔드 유저 인터페이스(end user interface)(180)를 포함할 수 있다.The intelligent server 100 of one embodiment includes a front end 160, a natural language platform 150, a capsule database 190, an execution engine 170, and/or an end user interface (180).

일 실시예에서, 프론트 엔드(160)는 제1 전자 장치(101)에 의하여 수신된 음성 입력을 제1 전자 장치(101)로부터 수신할 수 있다. 프론트 엔드(160)는 음성 입력에 대응되는 응답을 제1 전자 장치(101)로 송신할 수 있다.In one embodiment, the front end 160 may receive a voice input received by the first electronic device 101 from the first electronic device 101 . The front end 160 may transmit a response corresponding to the voice input to the first electronic device 101.

일 실시예에서, 자연어 플랫폼(150)은 자동 음성 인식 모듈(automatic speech recognition module)(ASR module)(151), 자연어 이해 모듈(natural language understanding module)(NLU module)(153), 플래너 모듈(planner module)(155), 자연어 생성 모듈(natural language generator module)(NLG module)(157), 및/또는 텍스트 음성 변환 모듈(text to speech module)(TTS module)(159)을 포함할 수 있다.In one embodiment, the natural language platform 150 includes an automatic speech recognition module (ASR module) 151, a natural language understanding module (NLU module) 153, and a planner module. module) (155), a natural language generator module (NLG module) (157), and/or a text to speech module (TTS module) (159).

일 실시예에서, 자동 음성 인식 모듈(151)은 제1 전자 장치(101)로부터 수신된 음성 입력을 텍스트 데이터로 변환할 수 있다. 자연어 이해 모듈(153)은 음성 입력의 텍스트 데이터에 기초하여 사용자의 의도(intent) 및/또는 파라미터를 결정할 수 있다.In one embodiment, the automatic voice recognition module 151 may convert voice input received from the first electronic device 101 into text data. The natural language understanding module 153 may determine the user's intent and/or parameters based on text data of the voice input.

일 실시예에서, 플래너 모듈(155)은 자연어 이해 모듈(153)에서 결정된 의도 및 파라미터를 이용하여 플랜(plan)을 생성할 수 있다. 플래너 모듈(155)은 결정된 의도에 기초하여 태스크를 수행하기 위해 필요한 복수의 도메인을 결정할 수 있다. 플래너 모듈(155)은 의도에 기초하여 결정된 복수의 도메인 각각에 포함된 복수의 동작을 결정할 수 있다. 플래너 모듈(155)은 결정된 복수의 동작을 실행하는데 필요한 파라미터나, 복수의 동작의 실행에 의해 출력되는 결과 값을 결정할 수 있다. 파라미터, 및 결과 값은 지정된 형식(또는, 클래스)의 컨셉으로 정의될 수 있다. 이에 따라, 플랜은 사용자의 의도에 의해 결정된 복수의 동작, 및/또는 복수의 컨셉을 포함할 수 있다. 플래너 모듈(155)은 복수의 동작, 및 복수의 컨셉 사이의 관계를 단계적(또는, 계층적)으로 결정할 수 있다. 예를 들어, 플래너 모듈(155)은 복수의 컨셉(예: 복수의 동작의 실행에 필요한 파라미터, 및 복수의 동작의 실행에 의해 출력되는 결과)에 기초하여 사용자의 의도에 기초하여 결정된 복수의 동작의 실행 순서를 결정할 수 있다. 플래너 모듈(155)은 복수의 동작 및 복수의 컨셉 사이의 연관 정보(예: 온톨로지(ontology))가 포함된 플랜을 생성할 수 있다. 플래너 모듈(155)은 컨셉과 동작의 관계들의 집합이 저장된 캡슐 데이터베이스(190)에 저장된 정보(예: 적어도 하나의 캡슐)를 이용하여 플랜을 생성할 수 있다.In one embodiment, the planner module 155 may generate a plan using the intent and parameters determined by the natural language understanding module 153. The planner module 155 may determine a plurality of domains required to perform the task based on the determined intention. The planner module 155 may determine a plurality of operations included in each of the plurality of domains determined based on intention. The planner module 155 may determine parameters required to execute the determined plurality of operations or result values output by executing the plurality of operations. Parameters and result values may be defined as concepts of a specified type (or class). Accordingly, the plan may include a plurality of operations and/or a plurality of concepts determined by the user's intention. The planner module 155 may determine the relationship between a plurality of operations and a plurality of concepts in a stepwise (or hierarchical) manner. For example, the planner module 155 determines a plurality of operations based on the user's intention based on a plurality of concepts (e.g., parameters required for execution of a plurality of operations, and results output by execution of the plurality of operations). The execution order can be determined. The planner module 155 may generate a plan that includes association information (eg, ontology) between a plurality of operations and a plurality of concepts. The planner module 155 may create a plan using information (eg, at least one capsule) stored in the capsule database 190, which stores a set of relationships between concepts and operations.

일 실시예에서, 플래너 모듈(155)은 인공 지능(artificial intelligent, AI) 시스템에 기반하여 플랜을 생성할 수 있다. 예를 들어, 인공지능 시스템은, 룰 베이스 시스템(rule-based system), 신경망 베이스 시스템(neural network-based system)(예: 피드포워드 신경망(feedforward neural network(FNN)), 및/또는 순환 신경망(recurrent neural network(RNN))), 또는, 전술한 것의 조합일 수도 있고, 이와 다른 인공지능 시스템일 수도 있다. 플래너 모듈(155)은 미리 정의된 플랜들의 집합에서 사용자 요청에 대응하는 플랜을 선택하거나, 사용자 요청에 응답하여 실시간으로 플랜을 생성할 수 있다.In one embodiment, the planner module 155 may generate a plan based on an artificial intelligence (AI) system. For example, an artificial intelligence system may include a rule-based system, a neural network-based system (e.g., a feedforward neural network (FNN)), and/or a recurrent neural network ( It may be a recurrent neural network (RNN)), or a combination of the above, or it may be another artificial intelligence system. The planner module 155 may select a plan corresponding to a user request from a set of predefined plans or generate a plan in real time in response to a user request.

일 실시예에서, 자연어 생성 모듈(157)은 지정된 정보를 텍스트 형태로 변경할 수 있다. 텍스트 형태로 변경된 정보는 자연어 발화의 형태일 수 있다. 텍스트 음성 변환 모듈(159)은 텍스트 형태의 정보를 음성 형태의 정보로 변환할 수 있다.In one embodiment, the natural language generation module 157 may change designated information into text form. Information changed to text form may be in the form of natural language speech. The text-to-speech conversion module 159 can convert information in text form into information in voice form.

일 실시예에서, 캡슐 데이터베이스(190)는 복수의 도메인(예: 어플리케이션)에 대응되는 복수의 컨셉과 동작들의 관계에 대한 정보를 저장할 수 있다. 캡슐 데이터베이스(190)는 CAN(concept action network) 형태로 적어도 하나의 캡슐(191, 193)을 저장할 수 있다. 예를 들어, 캡슐 데이터베이스(190)는 사용자의 음성 입력에 대응되는 태스크를 처리하기 위한 동작, 및 동작을 위해 필요한 파라미터를 CAN 형태로 저장될 수 있다. 캡슐은 플랜에 포함된 복수의 동작 오브젝트(action object)(또는 동작 정보) 및/또는 컨셉 오브젝트(concept object)(또는 컨셉 정보)를 포함할 수 있다In one embodiment, the capsule database 190 may store information about the relationship between a plurality of concepts and operations corresponding to a plurality of domains (eg, applications). The capsule database 190 may store at least one capsule 191 or 193 in CAN (concept action network) format. For example, the capsule database 190 may store operations for processing tasks corresponding to the user's voice input and parameters necessary for the operations in CAN format. A capsule may include a plurality of action objects (or action information) and/or concept objects (or concept information) included in the plan.

일 실시예에서, 실행 엔진(170)은 생성된 플랜을 이용하여 결과를 산출할 수 있다. 엔드 유저 인터페이스(180)는 산출된 결과를 제1 전자 장치(101)로 송신할 수 있다.In one embodiment, execution engine 170 may use the generated plan to produce results. The end user interface 180 may transmit the calculated result to the first electronic device 101.

일 실시예에 따르면, 지능형 서버(100)의 일부 기능(예: 자연어 플랫폼(150)) 또는 전체 기능이 제1 전자 장치(101)에 구현될 수 있다. 예를 들어, 제1 전자 장치(101)은 지능형 서버(100)와 별도로 자연어 플랫폼을 포함하거나, 지능형 서버(100)의 자연어 플랫폼(150)(예: 자동 음성 인식 모듈(151), 자연어 이해 모듈(153), 플래너 모듈(155), 자연어 생성 모듈(157), 및/또는 텍스트 음성 변환 모듈(159))의 동작 중 적어도 일부를 직접 수행할 있다.According to one embodiment, some functions (eg, natural language platform 150) or all functions of the intelligent server 100 may be implemented in the first electronic device 101. For example, the first electronic device 101 includes a natural language platform separate from the intelligent server 100, or the natural language platform 150 of the intelligent server 100 (e.g., automatic speech recognition module 151, natural language understanding module At least some of the operations of (153), planner module (155), natural language generation module (157), and/or text-to-speech conversion module (159) may be directly performed.

일 실시 예의 서비스 서버(103)는 제1 전자 장치(101)에 지정된 서비스(예: 음식 주문 또는 호텔 예약)를 제공할 수 있다. 서비스 서버(103)는 제3 자에 의해 운영되는 서버일 수 있다. 서비스 서버(103)는 제1 네트워크(199)를 통하여 지능형 서버(100) 및/또는 제1 전자 장치(101)와 통신할 수 있다. 서비스 서버(103)는 별도의 연결을 통하여 지능형 서버(100)와 통신할 수 있다. 서비스 서버(103)는 제1 전자 장치(101)에 수신된 음성 입력에 대응되는 플랜을 생성하기 위한 정보(예: 지정된 서비스를 제공하기 위한 동작 정보 및/또는 컨셉 정보)를 지능형 서버(100)에 제공할 수 있다. 제공된 정보는 캡슐 데이터베이스(190)에 저장될 수 있다. 서비스 서버(103)는 제1 전자 장치(101)로부터 수신한, 플랜에 따른 결과 정보를 지능형 서버(100)에 제공할 수 있다.The service server 103 in one embodiment may provide a designated service (eg, food ordering or hotel reservation) to the first electronic device 101. The service server 103 may be a server operated by a third party. The service server 103 may communicate with the intelligent server 100 and/or the first electronic device 101 through the first network 199. The service server 103 can communicate with the intelligent server 100 through a separate connection. The service server 103 provides information for generating a plan corresponding to the voice input received by the first electronic device 101 (e.g., operation information and/or concept information for providing a designated service) to the intelligent server 100. can be provided to. The provided information may be stored in the capsule database 190. The service server 103 may provide result information according to the plan received from the first electronic device 101 to the intelligent server 100.

도 2는 일 실시예에 따른 통합 지능 시스템을 나타낸 블록도이다.Figure 2 is a block diagram showing an integrated intelligence system according to one embodiment.

도 2를 참조하면, 통합 지능 시스템은 제1 전자 장치(201)(예: 도 1의 제1 전자 장치(101)), 제2 전자 장치(202)(예: 도 1의 제2 전자 장치(102)) 및 지능형 서버(200)(예: 도 1의 지능형 서버(100))를 포함할 수 있다. 제1 전자 장치(201) 및 지능형 서버(200)는 네트워크를 통해 서로 연결되어 데이터를 송수신할 수 있다. 제1 전자 장치(201) 및 제2 전자 장치(202)는 근거리 네트워크를 통해 서로 연결되어 데이터를 송수신할 수 있다. 일 실시예에 따르면, 통합 지능 시스템은 단일 장치 또는 복수의 장치들로 구성될 수도 있다. 예를 들어, 각각의 장치들이 실질적으로 동일하거나 유사한 기능의 구성을 포함할 수 있으며, 하나의 장치의 구성이 다른 장치의 구성으로 대체될 수 있다.Referring to FIG. 2, the integrated intelligence system includes a first electronic device 201 (e.g., the first electronic device 101 in FIG. 1) and a second electronic device 202 (e.g., the second electronic device in FIG. 1 (e.g., 102)) and an intelligent server 200 (eg, the intelligent server 100 of FIG. 1). The first electronic device 201 and the intelligent server 200 are connected to each other through a network and can transmit and receive data. The first electronic device 201 and the second electronic device 202 are connected to each other through a local area network and can transmit and receive data. According to one embodiment, the integrated intelligence system may consist of a single device or multiple devices. For example, each device may include substantially the same or similar functional configuration, and the configuration of one device may be replaced with the configuration of another device.

일 실시예에 따르면, 지능형 서버(200)는 도 1에 도시된 지능형 서버(100)의 전체 구성 또는 적어도 일부 구성을 포함할 수 있다. 예를 들어, 지능성 서버(200)는 도 1의 지능형 서버(100)의 자연어 플랫폼(150) 및/또는 캡슐 데이터베이스(190)를 포함할 수 있다. 다만, 지능형 서버(200)의 구성은 도 2에 도시된 바에 한정되지 않으며, 자연어 플랫폼(250)의 적어도 일부 구성(예: 자동 음성 인식 모듈(251), 자연어 이해 모듈(253), 플래너 모듈(255), 자연어 생성 모듈(257), 및/또는 텍스트 음성 변환 모듈(259))이 생략될 수도 있고, 도 1의 지능형 서버(100)의 일부 구성(예: 프론트 엔드(160), 실행 엔진(170) 및/또는 엔드 유저 인터페이스(180))을 더 포함할 수도 있다.According to one embodiment, the intelligent server 200 may include the entire configuration or at least a partial configuration of the intelligent server 100 shown in FIG. 1. For example, the intelligent server 200 may include the natural language platform 150 and/or the capsule database 190 of the intelligent server 100 of FIG. 1 . However, the configuration of the intelligent server 200 is not limited to that shown in FIG. 2, and includes at least some components of the natural language platform 250 (e.g., automatic speech recognition module 251, natural language understanding module 253, planner module ( 255), natural language generation module 257, and/or text-to-speech conversion module 259) may be omitted, and some components of the intelligent server 100 of FIG. 1 (e.g., front end 160, execution engine ( 170) and/or an end user interface 180) may be further included.

일 실시예에 따르면, 제1 전자 장치(201)는 자연어 플랫폼(260) 및/또는 캡슐 데이터베이스(280)를 포함할 수 있다. 자연어 플랫폼(260)은 자동 음성 인식 모듈(ASR(automatic speech recognition) 모듈)(261), 자연어 이해 모듈(NLU(natural language understanding) 모듈)(263), 플래너 모듈(265), 자연어 생성 모듈(NLG(natural language generator) 모듈)(267), 및/또는 텍스트 음성 변환 모듈(TTS(text to speech) 모듈)(269)을 포함할 수 있다. 자동 음성 인식 모듈(261), 자연어 이해 모듈(263), 플래너 모듈(265), 자연어 생성 모듈(267), 및 텍스트 음성 변환 모듈(269)은 각각 도 1의 자동 음성 인식 모듈(151), 자연어 이해 모듈(153), 플래너 모듈(155), 자연어 생성 모듈(157), 및 텍스트 음성 변환 모듈(159)과 실질적으로 동일하거나 유사한 기능을 수행할 수 있다.According to one embodiment, the first electronic device 201 may include a natural language platform 260 and/or a capsule database 280. The natural language platform 260 includes an automatic speech recognition (ASR) module (261), a natural language understanding (NLU) module (263), a planner module (265), and a natural language generation module (NLG). (natural language generator) module) 267, and/or a text to speech (TTS) module 269. The automatic speech recognition module 261, the natural language understanding module 263, the planner module 265, the natural language generation module 267, and the text-to-speech conversion module 269 are the automatic speech recognition module 151 and the natural language of FIG. 1, respectively. It may perform substantially the same or similar functions as the understanding module 153, the planner module 155, the natural language generation module 157, and the text-to-speech conversion module 159.

일 실시예에 따르면, 캡슐 데이터베이스(280)는 지능형 서버(100, 200)의 캡슐 데이터베이스(190, 290)와 실질적으로 동일하거나 유사한 기능을 수행할 수 있다. 캡슐 데이터베이스(280)는 플래너 모듈(265)이 생성하는 플랜에 포함되는 복수의 동작 및 복수의 컨셉의 관계들에 대한 정보를 저장할 수 있다. 예를 들어, 캡슐 데이터베이스(280)는 적어도 하나의 캡슐(281, 283)을 저장할 수 있다.According to one embodiment, the capsule database 280 may perform substantially the same or similar functions as the capsule databases 190 and 290 of the intelligent servers 100 and 200. The capsule database 280 may store information about relationships between a plurality of operations and a plurality of concepts included in the plan generated by the planner module 265. For example, the capsule database 280 may store at least one capsule 281 or 283.

일 실시예에 따르면, 제1 전자 장치(201)(예: 자연어 플랫폼(260) 및/또는 캡슐 데이터베이스(280)) 및 지능형 서버(200)(예: 자연어 플랫폼(250) 및/또는 캡슐 데이터베이스(290))는 서로 연계하여 적어도 하나의 기능(또는, 동작)을 수행하거나, 또는 각각 독립적으로 적어도 하나의 기능(또는, 또는 동작)을 수행할 수도 있다. 예를 들어, 제1 전자 장치(201)는 수신된 사용자의 음성 입력을 지능형 서버(200)로 송신하지 않고, 자체적으로 음성 인식을 수행할 수 있다. 일 예로, 제1 전자 장치(201)는 자동 음성 인식 모듈(261)을 통해 수신된 음성 입력을 텍스트 데이터로 변환할 수 있다. 제1 전자 장치(201)는 변환된 텍스트 데이터를 지능형 서버(200)로 송신할 수 있다. 지능형 서버(200)는 자연어 이해 모듈(253)을 통해 텍스트 데이터로부터 사용자의 의도 및/또는 파라미터를 결정할 수 있다. 지능형 서버(200)는 결정된 의도 및 파라미터에 기반하여 플래너 모듈(255)을 통해 플랜을 생성하여 제1 전자 장치(201)로 송신하거나, 결정된 의도 및 파라미터를 제1 전자 장치(201)로 송신하여 제1 전자 장치(201)의 플래너 모듈(265)을 통해 플랜을 생성하도록 할 수 있다. 제1 전자 장치(201)의 플래너 모듈(265)은 캡슐 데이터베이스(280)에 저장된 정보를 이용하여 음성 입력에 대응되는 태스크를 수행하기 위한 적어도 하나의 플랜을 생성할 수 있다.According to one embodiment, a first electronic device 201 (e.g., natural language platform 260 and/or capsule database 280) and an intelligent server 200 (e.g., natural language platform 250 and/or capsule database ( 290)) may perform at least one function (or operation) in conjunction with each other, or may perform at least one function (or operation) independently. For example, the first electronic device 201 may perform voice recognition on its own without transmitting the received user's voice input to the intelligent server 200. As an example, the first electronic device 201 may convert a voice input received through the automatic voice recognition module 261 into text data. The first electronic device 201 can transmit the converted text data to the intelligent server 200. The intelligent server 200 may determine the user's intention and/or parameters from text data through the natural language understanding module 253. The intelligent server 200 generates a plan through the planner module 255 based on the determined intent and parameters and transmits it to the first electronic device 201, or transmits the determined intent and parameters to the first electronic device 201 A plan can be created through the planner module 265 of the first electronic device 201. The planner module 265 of the first electronic device 201 may generate at least one plan for performing a task corresponding to a voice input using information stored in the capsule database 280.

일 예로, 제1 전자 장치(201)는 자동 음성 인식 모듈(261)을 통해 수신된 음성 입력을 텍스트 데이터로 변환하고, 자연어 이해 모듈(263)을 통해 텍스트 데이터에 기반하여 사용자의 의도 및/또는 파라미터를 결정할 수 있다. 제1 전자 장치(201)는 결정된 의도 및 파라미터에 기반하여 플래너 모듈(265)을 통해 플랜을 생성하거나, 결정된 의도 및 파라미터를 지능형 서버(200)로 송신하여 지능형 서버(200)의 플래너 모듈(255)을 통해 플랜을 생성하도록 할 수 있다. 예를 들어, 제1 전자 장치(201)가 플래너 모듈(265) 및/또는 캡슐 데이터베이스(280)를 포함하지 않는 경우 제1 전자 장치(201)는 지능형 서버(200)를 통해 플랜을 생성할 수 있다.As an example, the first electronic device 201 converts the voice input received through the automatic voice recognition module 261 into text data, and determines the user's intention and/or information based on the text data through the natural language understanding module 263. Parameters can be determined. The first electronic device 201 generates a plan through the planner module 265 based on the determined intent and parameters, or transmits the determined intent and parameters to the intelligent server 200 to plan the planner module 255 of the intelligent server 200. ), you can create a plan. For example, if the first electronic device 201 does not include the planner module 265 and/or the capsule database 280, the first electronic device 201 may generate a plan through the intelligent server 200. there is.

일 예로, 제1 전자 장치(201)는 자동 음성 인식 모듈(261) 또는 자연어 이해 모듈(263)에서 학습하기 어려운 발화 패턴을 검출하고, 검출된 발화 패턴에 대응하는 음성 입력은 지능형 서버(200)로 송신하여 지능형 서버(200)의 자동 음성 인식 모듈(251) 또는 자연어 이해 모듈(253)에서 처리하도록 할 수 있다.As an example, the first electronic device 201 detects a speech pattern that is difficult to learn in the automatic speech recognition module 261 or the natural language understanding module 263, and the voice input corresponding to the detected speech pattern is sent to the intelligent server 200. It can be transmitted to and processed by the automatic voice recognition module 251 or the natural language understanding module 253 of the intelligent server 200.

본 개시의 실시예들은 상술한 예시에 한정되는 것은 아니다. 예를 들어, 제1 전자 장치(201)는 수신된 음성 입력을 단말 내에서만 처리하여 음성 입력에 대응하는 결과까지 산출할 수도 있다. 일 예로, 제1 전자 장치(201)와 지능형 서버(200)가 음성 입력을 모듈 단위로 분할하여 처리할 뿐만 아니라, 상응하는 모듈 간에 협업하여 처리할 수도 있다. 예를 들어, 제1 전자 장치(201)의 자연어 이해 모듈(263)과 지능형 서버(200)의 자연어 이해 모듈(253)이 함께 동작하여 하나의 결과 값(사용자의 의도 및/또는 파라미터)을 산출할 수 있다.Embodiments of the present disclosure are not limited to the above-described examples. For example, the first electronic device 201 may process the received voice input only within the terminal and even calculate a result corresponding to the voice input. For example, the first electronic device 201 and the intelligent server 200 not only divide voice input into modules and process them, but also process them collaboratively between corresponding modules. For example, the natural language understanding module 263 of the first electronic device 201 and the natural language understanding module 253 of the intelligent server 200 operate together to produce one result value (user intent and/or parameters). can do.

일 실시예에 따르면, 제2 전자 장치(202)는 자동 음성 인식 모듈(ASR(automatic speech recognition) 모듈)(262) 및/또는 텍스트 음성 변환 모듈(TTS(text to speech) 모듈)(264)을 포함할 수 있다. 자동 음성 인식 모듈(262) 및 텍스트 음성 변환 모듈(264)은 각각 도 1의 자동 음성 인식 모듈(151) 및 텍스트 음성 변환 모듈(159)과 실질적으로 동일하거나 유사한 기능을 수행할 수 있다.According to one embodiment, the second electronic device 202 includes an automatic speech recognition (ASR) module 262 and/or a text to speech (TTS) module 264. It can be included. The automatic speech recognition module 262 and the text-to-speech conversion module 264 may perform substantially the same or similar functions as the automatic speech recognition module 151 and the text-to-speech conversion module 159 of FIG. 1 , respectively.

일 실시예에 따르면, 제1 전자 장치(201) 및 제2 전자 장치(202)는 서로 연계하여 적어도 하나의 기능(또는, 동작)을 수행하거나, 또는 각각 독립적으로 적어도 하나의 기능(또는, 또는 동작)을 수행할 수도 있다. 예를 들어, 제2 전자 장치(202)는 자동 음성 인식 모듈(262)을 이용하여 음성 입력에 대한 음성 인식을 수행할 수 있다. 제2 전자 장치(202)는 음성 인식에 기반하여 음성 입력에 대응하는 기능을 수행할 수 있다. 예를 들어, 제2 전자 장치(202)는 인식된 음성 명령에 대응하는 명령을 제1 전자 장치(201)에 전달할 수 있다. 제2 전자 장치(202)는 제1 전자 장치(201)로부터 수신된 데이터를 출력할 수 있다. 예를 들어, 제2 전자 장치(202)는 제1 전자 장치(201)로부터 수신된 데이터를 텍스트 음성 변환 모듈(264)을 이용하여 음성으로 변환하고, 변환된 음성을 출력할 수 있다.According to one embodiment, the first electronic device 201 and the second electronic device 202 perform at least one function (or operation) in conjunction with each other, or each independently performs at least one function (or or action) can also be performed. For example, the second electronic device 202 may perform voice recognition for voice input using the automatic voice recognition module 262. The second electronic device 202 may perform a function corresponding to voice input based on voice recognition. For example, the second electronic device 202 may transmit a command corresponding to the recognized voice command to the first electronic device 201. The second electronic device 202 may output data received from the first electronic device 201. For example, the second electronic device 202 may convert data received from the first electronic device 201 into voice using the text-to-speech conversion module 264 and output the converted voice.

도 3은 일 실시예에 따른 무선 오디오 장치와 전자 장치의 통신 환경을 도시한다. FIG. 3 illustrates a communication environment between a wireless audio device and an electronic device according to an embodiment.

도 3을 참조하면, 일 실시예에 따르면, 전자 장치(301)는 도 1에 도시된 제1 전자 장치(101) 및 도 2에 도시된 제1 전자 장치(201)와 적어도 일부가 동일하거나 유사한 구성요소를 포함하고, 적어도 일부가 동일하거나 유사한 기능을 수행할 수 있다. 또한, 및 무선 오디오 장치(302)(예: 제1 무선 오디오 장치(302-1) 및/또는 제2 무선 오디오 장치(302-2))는 도 1에 도시된 제2 전자 장치(102) 및 도 2에 도시된 제2 전자 장치(202)와 적어도 일부가 동일하거나 유사한 구성요소를 포함하고, 적어도 일부가 동일하거나 유사한 기능을 수행할 수 있다. 이하에서, 용어 무선 오디오 장치(302)는, 다르게 설명되지 않으면, 제1 무선 오디오 장치(302-1), 제2 무선 오디오 장치(302-2), 또는 제1 및 제2 무선 오디오 장치(302-1, 302-2)로 참조될 수 있다. 전자 장치(301)는 예를 들어, 스마트폰, 태블릿, 데스크탑 컴퓨터, 또는 랩탑 컴퓨터와 같은 사용자 단말을 포함할 수 있다. 무선 오디오 장치(302)는 무선 이어폰, 헤드셋, 이어버드, 또는 스피커를 포함할 수 있으나, 이에 한정되는 것은 아니다. 무선 오디오 장치(302)는 오디오 신호를 수신하고, 수신된 오디오 신호를 출력하는 다양한 형태의 장치(예: 보청기, 또는 휴대용 음향기기)를 포함할 수 있다. 용어 “무선 오디오 장치”는 전자 장치(301)와의 구분을 위한 것으로서, “무선 오디오 장치”는 전자 장치, 무선 이어폰, 이어버드, TWS(true wireless stereo), 또는 이어셋(earset)으로 참조될 수 있다.Referring to FIG. 3, according to one embodiment, the electronic device 301 is at least partially the same or similar to the first electronic device 101 shown in FIG. 1 and the first electronic device 201 shown in FIG. 2. It includes components, and at least some of them may perform the same or similar functions. In addition, and the wireless audio device 302 (e.g., the first wireless audio device 302-1 and/or the second wireless audio device 302-2) includes the second electronic device 102 and At least some of the second electronic device 202 shown in FIG. 2 may include the same or similar components, and at least some of them may perform the same or similar functions. Hereinafter, unless otherwise specified, the term wireless audio device 302 refers to a first wireless audio device 302-1, a second wireless audio device 302-2, or the first and second wireless audio devices 302. -1, 302-2). The electronic device 301 may include, for example, a user terminal such as a smartphone, tablet, desktop computer, or laptop computer. The wireless audio device 302 may include, but is not limited to, wireless earphones, headsets, earbuds, or speakers. The wireless audio device 302 may include various types of devices (eg, hearing aids or portable audio devices) that receive audio signals and output the received audio signals. The term “wireless audio device” is intended to distinguish it from the electronic device 301, and “wireless audio device” may be referred to as an electronic device, wireless earphone, earbud, true wireless stereo (TWS), or earset. .

예를 들어, 전자 장치(301)와 무선 오디오 장치(302)는 블루투스(Bluetooth™) SIG(special interest group)에 의하여 규정되는 블루투스 네트워크에 따라 근거리에서 무선 통신을 수행할 수 있다. 블루투스 네트워크는 예를 들어, 블루투스 레거시(legacy) 네트워크 또는 BLE(bluetooth low energy) 네트워크를 포함할 수 있다. 일 실시 예에 따르면, 전자 장치(301)와 무선 오디오 장치(302)는 블루투스 레거시 네트워크 또는 BLE 네트워크 중 하나의 네트워크를 통해 무선 통신을 수행하거나, 두 개의 네트워크를 통해 무선 통신을 수행할 수 있다.For example, the electronic device 301 and the wireless audio device 302 may perform wireless communication in a short distance according to a Bluetooth network defined by the Bluetooth™ special interest group (SIG). The Bluetooth network may include, for example, a Bluetooth legacy network or a Bluetooth low energy (BLE) network. According to one embodiment, the electronic device 301 and the wireless audio device 302 may perform wireless communication through one of a Bluetooth legacy network or a BLE network, or may perform wireless communication through two networks.

일 실시 예에 따르면, 전자 장치(301)는 프라이머리(primary) 장치(예: 마스터 장치)의 역할을 수행하고, 무선 오디오 장치(302)는 세컨더리(secondary) 장치(예: 슬레이브 장치)의 역할을 수행할 수 있다. 세컨더리 장치의 역할을 수행하는 장치들의 개수는 도 3에 도시된 예로 제한되는 것은 아니다. 일 실시 예에 따르면, 프라이머리 장치 또는 세컨더리 장치의 역할은 장치들 간 링크(link)(예: 305, 310, 및/또는 315)가 생성되는 동작에서 결정될 수 있다. 다른 실시 예에 따르면, 제1 무선 오디오 장치(302-1)와 제2 무선 오디오 장치(302-2) 중에서 하나의 장치(예: 제1 무선 오디오 장치(302-1))가 프라이머리 장치의 역할을 수행하고, 다른 하나의 장치가 세컨더리 장치의 역할을 수행할 수 있다.According to one embodiment, the electronic device 301 functions as a primary device (e.g., master device), and the wireless audio device 302 functions as a secondary device (e.g., slave device). can be performed. The number of devices performing the role of secondary devices is not limited to the example shown in FIG. 3. According to one embodiment, the role of the primary device or secondary device may be determined in an operation in which a link (eg, 305, 310, and/or 315) between devices is created. According to another embodiment, one device (e.g., the first wireless audio device 302-1) among the first wireless audio device 302-1 and the second wireless audio device 302-2 is the primary device. performs the role, and another device may perform the role of the secondary device.

일 실시 예에 따르면, 전자 장치(301)는 무선 오디오 장치(302)에 문자, 오디오, 이미지, 또는 비디오와 같은 콘텐츠를 포함하는 데이터 패킷을 전송할 수 있다. 전자 장치(301)뿐만 무선 오디오 장치(302) 중 적어도 하나의 장치도 데이터 패킷을 전자 장치(301)로 전송할 수 있다. 예를 들어, 전자 장치(301)에서 음악이 재생되면, 전자 장치(301)가 무선 오디오 장치(302)와 생성된 링크(예: 제1 링크(305) 및/또는 제2 링크(310))를 통하여 콘텐츠(예: 음악 데이터)를 포함하는 데이터 패킷을 전송할 수 있다. 예를 들어, 무선 오디오 장치(302) 중 적어도 하나는 생성된 링크를 통하여 콘텐츠(예: 오디오 데이터)를 포함하는 데이터 패킷을 전자 장치(301)로 전송할 수 있다. 전자 장치(301)가 데이터 패킷을 전송하는 경우, 전자 장치(301)는 소스 장치(source device)로 지칭되고, 무선 오디오 장치(302)는 싱크 장치(sink device)로 지칭될 수 있다.According to one embodiment, the electronic device 301 may transmit a data packet containing content such as text, audio, image, or video to the wireless audio device 302. At least one of the electronic device 301 and the wireless audio device 302 may also transmit a data packet to the electronic device 301. For example, when music is played on the electronic device 301, the electronic device 301 connects the wireless audio device 302 to a link created (e.g., first link 305 and/or second link 310). A data packet containing content (e.g., music data) can be transmitted. For example, at least one of the wireless audio devices 302 may transmit a data packet including content (eg, audio data) to the electronic device 301 through the created link. When the electronic device 301 transmits a data packet, the electronic device 301 may be referred to as a source device, and the wireless audio device 302 may be referred to as a sink device.

일 실시예에 따르면, 전자 장치(301)는 데이터 패킷을 전송하기 위하여 무선 오디오 장치(302) 중 적어도 하나의 장치(302-1 및/또는 302-2)와 링크를 생성(create) 또는 수립(establish))할 수 있다. 예를 들어, 전자 장치(301)는 블루투스 또는 BLE 프로토콜에 기반하여 제1 무선 오디오 장치(302-1)와의 제1 링크(305) 및/또는 제2 무선 오디오 장치(302-2)와의 제2 링크(310)를 생성할 수 있다. 일 실시예에서, 전자 장치(301)는 제1 무선 오디오 장치(302-1)와의 제1 링크(305)를 통하여 제1 무선 오디오 장치(302-1)와 통신할 수 있다. 이 경우, 예를 들어, 제2 무선 오디오 장치(302-2)는 제1 링크(305)를 모니터링하도록 설정될 수 있다. 예를 들어, 제2 무선 오디오 장치(302-2)는 제1 링크(305)를 모니터링함으로써, 전자 장치(301)가 제1 링크(305)를 통하여 송신한 데이터를 수신할 수 있다.According to one embodiment, the electronic device 301 creates or establishes a link with at least one device (302-1 and/or 302-2) of the wireless audio device 302 in order to transmit a data packet ( establish) can be done. For example, the electronic device 301 may establish a first link 305 with the first wireless audio device 302-1 and/or a second link with the second wireless audio device 302-2 based on the Bluetooth or BLE protocol. A link 310 can be created. In one embodiment, the electronic device 301 may communicate with the first wireless audio device 302-1 through a first link 305 with the first wireless audio device 302-1. In this case, for example, the second wireless audio device 302-2 may be set to monitor the first link 305. For example, the second wireless audio device 302-2 can receive data transmitted by the electronic device 301 through the first link 305 by monitoring the first link 305.

일 실시예에 따르면, 제2 무선 오디오 장치(302-2)는 제1 링크(305)에 연관된 정보를 이용하여 제1 링크(305)를 모니터링할 수 있다. 상기 제1 링크(305)에 연관된 정보는 주소 정보(예: 제1 링크(305)의 프라이머리 장치의 블루투스 주소, 전자 장치(301)의 블루투스 주소, 및/또는 제1 무선 오디오 장치(302-1)의 블루투스 주소), 피코넷(piconet)(예: 토폴로지) 클록 정보(예: 제1 링크(305)의 프라이머리 장치의 CLKN(clock native)), 논리 운송(logical transport, LT) 주소 정보(예: 제1 링크(305)의 프라이머리 장치에 의하여 할당된 정보), 사용 채널 맵(used channel map) 정보, 링크 키(link key) 정보, SDP(service discovery protocol) 정보(예: 제1 링크(305)에 연관된 서비스 및/또는 프로필 정보), 및/또는 지원 피쳐(supported feature) 정보를 포함할 수 있다.According to one embodiment, the second wireless audio device 302-2 may monitor the first link 305 using information associated with the first link 305. Information associated with the first link 305 may include address information (e.g., the Bluetooth address of the primary device of the first link 305, the Bluetooth address of the electronic device 301, and/or the first wireless audio device 302- 1) Bluetooth address), piconet (e.g., topology) clock information (e.g., CLKN (clock native) of the primary device of the first link 305), logical transport (LT) address information ( Example: information allocated by the primary device of the first link 305), used channel map information, link key information, service discovery protocol (SDP) information (e.g., first link It may include service and/or profile information associated with (305), and/or supported feature information.

도 4는 일 실시예에 따른 전자 장치와 무선 오디오 장치들의 블록도를 도시한다.Figure 4 shows a block diagram of electronic devices and wireless audio devices according to one embodiment.

도 4를 참조하면, 일 실시예에 따르면, 전자 장치(301)는 프로세서(420)(예: 도 1의 프로세서(130)), 메모리(430)(예: 도 1의 메모리(140)), 제1 통신 회로(491), 디스플레이(460), 및/또는 제2 통신 회로(492)를 포함할 수 있다. 프로세서(420)는 메모리(430), 디스플레이(460), 제1 통신 회로(491), 및 제2 통신 회로(492)에 작동적으로(operatively) 연결될 수 있다. 메모리(430)는 실행되었을 때, 프로세서(420)로 하여금 전자 장치(301)의 다양한 동작들을 수행하도록 하는 하나 이상의 인스트럭션들(instructions)을 저장할 수 있다. 제2 통신 회로(492)는 블루투스 프로토콜(예: 블루투스 레거시 및/또는 BLE)에 기반하여 무선 통신을 지원하도록 설정될 수 있다. 제1 통신 회로(491)는 블루투스 프로토콜을 제외한 무선 통신 규격(예: 셀룰러 및/또는 와이파이)에 기반한 통신을 지원하도록 설정될 수 있다. 전자 장치(301)는 도 4에 미도시된 구성을 더 포함할 수 있다. 예를 들어, 전자 장치(301)는 오디오 입출력 장치, 및/또는 하우징(housing)을 더 포함할 수 있다. Referring to FIG. 4, according to one embodiment, the electronic device 301 includes a processor 420 (e.g., processor 130 of FIG. 1), a memory 430 (e.g., memory 140 of FIG. 1), It may include a first communication circuit 491, a display 460, and/or a second communication circuit 492. The processor 420 may be operatively connected to the memory 430, the display 460, the first communication circuit 491, and the second communication circuit 492. The memory 430 may store one or more instructions that, when executed, cause the processor 420 to perform various operations of the electronic device 301. The second communication circuit 492 may be configured to support wireless communication based on a Bluetooth protocol (eg, Bluetooth legacy and/or BLE). The first communication circuit 491 may be configured to support communication based on wireless communication standards (eg, cellular and/or Wi-Fi) other than the Bluetooth protocol. The electronic device 301 may further include components not shown in FIG. 4 . For example, the electronic device 301 may further include an audio input/output device and/or a housing.

일 실시예에 따르면, 전자 장치(301)는 제1 무선 오디오 장치(302-1)와 제1 링크(305)를 통하여 연결될 수 있다. 예를 들어, 전자 장치(301)와 제1 무선 오디오 장치(302-1)는 제1 링크(305)의 프라이머리 장치의 클록에 기반하여 설정된 시간 슬롯(time slot) 단위로 통신할 수 있다. 전자 장치(301)는 제2 무선 오디오 장치(302-2)와 제2 링크(310)를 통하여 연결될 수 있다. 예를 들어, 전자 장치(301)는 제1 무선 오디오 장치(302-1)와의 연결 후에 제2 링크(310)를 수립할 수 있다. 일 예에서, 제2 링크(310)는 생략될 수 있다.According to one embodiment, the electronic device 301 may be connected to the first wireless audio device 302-1 through a first link 305. For example, the electronic device 301 and the first wireless audio device 302-1 may communicate in units of time slots set based on the clock of the primary device of the first link 305. The electronic device 301 may be connected to the second wireless audio device 302-2 through a second link 310. For example, the electronic device 301 may establish the second link 310 after connection with the first wireless audio device 302-1. In one example, second link 310 may be omitted.

일 실시예에 따르면, 제1 무선 오디오 장치(302-1)는 프로세서(521)(예: 도 1의 프로세서(131), 메모리(531)(예: 도 1의 메모리(141)), 센서 회로(551), 오디오 출력 회로(571), 오디오 수신 회로(581), 및/또는 통신 회로(591)를 포함할 수 있다.According to one embodiment, the first wireless audio device 302-1 includes a processor 521 (e.g., processor 131 in FIG. 1), a memory 531 (e.g., memory 141 in FIG. 1), and a sensor circuit. 551, may include an audio output circuit 571, an audio reception circuit 581, and/or a communication circuit 591.

일 실시예에 따르면, 프로세서(521)는 센서 회로(551), 통신 회로(591), 오디오 출력 회로(571), 오디오 수신 회로(581) 및 메모리(531)에 작동적으로(operatively) 연결될 수 있다.According to one embodiment, the processor 521 may be operatively connected to the sensor circuit 551, the communication circuit 591, the audio output circuit 571, the audio reception circuit 581, and the memory 531. there is.

일 실시예에 따르면, 센서 회로(551)는 적어도 하나의 센서를 포함할 수 있다. 센서 회로(551)는 제1 무선 오디오 장치(302-1)의 착용상태에 대한 정보, 착용자의 생체 정보, 및/또는 움직임을 감지할 수 있다. 예를 들어, 센서 회로(551)는 착용 상태 감지를 위한 근접센서, 생체 정보 감지를 위한 생체 센서(예: 심박 센서), 및/또는 움직임 감지를 위한 모션 센서(예: 가속도 센서)를 포함할 수 있다. 일 실시예에서, 센서 회로(551)는 골전도 센서 또는 가속도 센서 중 적어도 하나를 더 포함할 수 있다. 또 하나의 실시예로 가속도 센서는 골전도를 감지하기 위해 피부에 가깝게 센서가 배치될 수 있다. 예를 들어, 가속도 센서는 일반적인 움직임 샘플링보다 상대적으로 높은 kHz단위의 샘플링을 이용하여 kHz단위의 떨림 정보를 감지하도록 설정될 수 있다. 프로세서(521)는 가속도 센서의 떨림 정보 중 유의미한 축(x,y,z축 중)을 중심으로 하는 떨림을 이용하여 노이즈 환경에서의 음성 식별, 음성 감지, 탭 감지, 및/또는 착용 감지를 수할 수 있다.According to one embodiment, the sensor circuit 551 may include at least one sensor. The sensor circuit 551 may detect information about the wearing state of the first wireless audio device 302-1, the wearer's biometric information, and/or movement. For example, the sensor circuit 551 may include a proximity sensor for detecting a wearing state, a biometric sensor (e.g., a heart rate sensor) for detecting biometric information, and/or a motion sensor (e.g., an acceleration sensor) for detecting movement. You can. In one embodiment, the sensor circuit 551 may further include at least one of a bone conduction sensor or an acceleration sensor. In another embodiment, the acceleration sensor may be placed close to the skin to detect bone conduction. For example, an acceleration sensor may be set to detect tremor information in kHz units using sampling in kHz units, which is relatively higher than general motion sampling. The processor 521 can perform voice identification, voice detection, tap detection, and/or wearing detection in a noisy environment by using tremor centered on a significant axis (x, y, z axis) among the tremor information of the acceleration sensor. You can.

일 실시예에 따르면, 오디오 출력 회로(571)는 소리를 출력하도록 설정될 수 있다. 오디오 수신 회로(581)는 하나 또는 복수의 마이크를 포함할 수 있다. 오디오 수신 회로(581)는 하나 또는 복수의 마이크를 이용하여 오디오 신호를 감지하도록 설정될 수 있다. 일 실시예에서, 복수의 마이크들 각각은 서로 다른 오디오 수신 경로에 대응할 수 있다. 예를 들어, 오디오 수신 회로(581)가 제1 마이크와 제2 마이크를 포함하는 경우, 제1 마이크에 의하여 획득된 오디오 신호와 제2 마이크에 의하여 오디오 신호는 서로 다른 오디오 채널로 참조될 수 있다. 프로세서(521)는 오디오 수신 회로(581)에 연결된 복수의 마이크들 중 적어도 하나의 마이크를 이용하여 오디오 데이터를 획득할 수 있다. 프로세서(521)는, 예를 들어, 복수의 마이크들 중 오디오 데이터 획득을 위한 적어도 하나의 마이크를 동적으로 선택 또는 결정할 수 있다. 프로세서(521)는 복수의 마이크들을 이용하여 빔포밍을 함으로써 오디오 데이터를 획득할 수 있다. 메모리(531)는 실행되었을 때, 프로세서(521)로 하여금 제1 무선 오디오 장치(302-1)의 다양한 동작들을 수행하도록 하는 하나 이상의 인스트럭션들(instructions)을 저장할 수 있다.According to one embodiment, the audio output circuit 571 may be set to output sound. The audio receiving circuit 581 may include one or multiple microphones. The audio receiving circuit 581 may be configured to detect an audio signal using one or multiple microphones. In one embodiment, each of the plurality of microphones may correspond to a different audio reception path. For example, when the audio receiving circuit 581 includes a first microphone and a second microphone, the audio signal acquired by the first microphone and the audio signal obtained by the second microphone may be referred to as different audio channels. . The processor 521 may acquire audio data using at least one microphone among a plurality of microphones connected to the audio reception circuit 581. For example, the processor 521 may dynamically select or determine at least one microphone for acquiring audio data among a plurality of microphones. The processor 521 may acquire audio data by performing beamforming using a plurality of microphones. The memory 531 may store one or more instructions that, when executed, cause the processor 521 to perform various operations of the first wireless audio device 302-1.

일 실시예에 따르면, 프로세서(521)는 오디오 수신 회로(581) 또는 센서 회로(551) 중 적어도 하나를 이용하여 오디오 데이터를 획득할 수 있다. 예를 들어, 프로세서(521)는 오디오 수신 회로(581)와 연결된 하나 이상의 마이크들을 이용하여 오디오 데이터를 획득할 수 있다. 프로세서(521)는 센서 회로(551)를 이용하여 오디오 신호에 대응하는 진동을 감지함으로써 오디오 데이터를 획득할 수 있다. 예를 들어, 프로세서(521)는 모션 센서, 골전도 센서, 또는 가속도 센서 중 적어도 하나를 이용하여 오디오 데이터를 획득할 수 있다. 프로세서(521)는 다양한 경로(예: 오디오 수신 회로(581) 또는 센서 회로(551) 중 적어도 하나)를 통하여 획득된 오디오 데이터를 처리(예: 잡음 억제, 잡음 제거 또는 에코 제거)하도록 설정될 수 있다.According to one embodiment, the processor 521 may acquire audio data using at least one of the audio receiving circuit 581 or the sensor circuit 551. For example, the processor 521 may acquire audio data using one or more microphones connected to the audio receiving circuit 581. The processor 521 may acquire audio data by detecting vibration corresponding to the audio signal using the sensor circuit 551. For example, the processor 521 may acquire audio data using at least one of a motion sensor, a bone conduction sensor, or an acceleration sensor. The processor 521 may be configured to process (e.g., noise suppression, noise removal, or echo cancellation) audio data acquired through various paths (e.g., at least one of the audio receiving circuit 581 or the sensor circuit 551). there is.

일 실시예에 따르면, 제1 무선 오디오 장치(302-1)는 도 4에 미도시된 구성을 더 포함할 수 있다. 예를 들어, 제1 무선 오디오 장치(302-1)는 인디케이터, 입력 인터페이스, 및/또는 하우징(housing)을 더 포함할 수 있다 According to one embodiment, the first wireless audio device 302-1 may further include a component not shown in FIG. 4. For example, the first wireless audio device 302-1 may further include an indicator, an input interface, and/or a housing.

일 실시예에 따르면, 제2 무선 오디오 장치(302-2)는 프로세서(522)(예: 도 1의 프로세서(131)), 메모리(532)(예: 도 1의 메모리(141)), 센서 회로(552), 오디오 출력 회로(572), 오디오 수신 회로(582), 및/또는 통신 회로(592)를 포함할 수 있다.According to one embodiment, the second wireless audio device 302-2 includes a processor 522 (e.g., processor 131 in FIG. 1), a memory 532 (e.g., memory 141 in FIG. 1), and a sensor. It may include circuitry 552, audio output circuitry 572, audio reception circuitry 582, and/or communication circuitry 592.

일 실시예에 따르면, 프로세서(522)는 통신 회로(592), 오디오 출력 회로(572), 오디오 수신 회로(582), 및 메모리(532)에 작동적으로(operatively) 연결될 수 있다. According to one embodiment, processor 522 may be operatively coupled to communication circuitry 592, audio output circuitry 572, audio reception circuitry 582, and memory 532.

일 실시예에 따르면, 센서 회로(552)는 제2 무선 오디오 장치(302-2)의 착용 상태에 대한 정보, 착용자의 생체 정보, 및/또는 움직임 정보를 감지할 수 있다. 예를 들어, 센서 회로(552)는 착용 상태 감지를 위한 근접센서, 생체 정보 감지를 위한 생체 센서(예: 심박 센서), 및/또는 움직임 감지를 위한 모션 센서(예: 가속도 센서)를 포함할 수 있다. 일 실시예에서, 센서 회로(552)는 골전도 센서 또는 가속도 센서 중 적어도 하나를 더 포함할 수 있다. 또 하나의 실시예로 가속도 센서는 골전도를 감지하기 위해 피부에 가깝게 센서가 배치될 수 있다. 예를 들어, 가속도 센서는 일반적인 움직임 샘플링보다 상대적으로 높은 kHz단위의 샘플링을 이용하여 kHz단위의 떨림 정보를 감지하도록 설정될 수 있다. 프로세서(522)는 가속도 센서의 떨림 정보 중 유의미한 축(x,y,z축 중)을 중심으로 하는 떨림을 이용하여 노이즈 환경에서의 음성 식별, 음성 감지, 탭 감지, 및/또는 착용 감지를 수할 수 있다.According to one embodiment, the sensor circuit 552 may detect information about the wearing state of the second wireless audio device 302-2, the wearer's biometric information, and/or movement information. For example, the sensor circuit 552 may include a proximity sensor for detecting a wearing state, a biometric sensor (e.g., a heart rate sensor) for detecting biometric information, and/or a motion sensor (e.g., an acceleration sensor) for detecting movement. You can. In one embodiment, the sensor circuit 552 may further include at least one of a bone conduction sensor or an acceleration sensor. In another embodiment, the acceleration sensor may be placed close to the skin to detect bone conduction. For example, an acceleration sensor may be set to detect tremor information in kHz units using sampling in kHz units, which is relatively higher than general motion sampling. The processor 522 can perform voice identification, voice detection, tap detection, and/or wearing detection in a noisy environment by using tremor centered on a significant axis (out of x, y, and z axes) among the tremor information of the acceleration sensor. You can.

일 실시예에 따르면, 오디오 출력 회로(572)는 소리를 출력하도록 설정될 수 있다. 오디오 수신 회로(582)는 하나 또는 복수의 마이크를 포함할 수 있다. 오디오 수신 회로(582)는 하나 또는 복수의 마이크를 이용하여 오디오 신호를 감지하도록 설정될 수 있다. 일 실시예에서, 복수의 마이크들 각각은 서로 다른 오디오 수신 경로에 대응할 수 있다. 예를 들어, 오디오 수신 회로(582)가 제1 마이크와 제2 마이크를 포함하는 경우, 제1 마이크에 의하여 획득된 오디오 신호와 제2 마이크에 의하여 오디오 신호는 서로 다른 오디오 채널로 참조될 수 있다. 프로세서(522)는 복수의 마이크들을 이용하여 빔포밍을 함으로써 오디오 데이터를 획득할 수 있다.According to one embodiment, the audio output circuit 572 may be set to output sound. Audio receiving circuitry 582 may include one or multiple microphones. The audio receiving circuit 582 may be configured to detect an audio signal using one or multiple microphones. In one embodiment, each of the plurality of microphones may correspond to a different audio reception path. For example, when the audio receiving circuit 582 includes a first microphone and a second microphone, the audio signal acquired by the first microphone and the audio signal obtained by the second microphone may be referred to as different audio channels. . The processor 522 may acquire audio data by performing beamforming using a plurality of microphones.

일 실시예에 따르면, 메모리(532)는 실행되었을 때, 프로세서(522)로 하여금 제2 무선 오디오 장치(302-2)의 다양한 동작들을 수행하도록 하는 하나 이상의 인스트럭션들(instructions)을 저장할 수 있다.According to one embodiment, the memory 532 may store one or more instructions that, when executed, cause the processor 522 to perform various operations of the second wireless audio device 302-2.

일 실시예에 따르면, 프로세서(522)는 오디오 수신 회로(582) 또는 센서 회로(552) 중 적어도 하나를 이용하여 오디오 데이터를 획득할 수 있다. 예를 들어, 프로세서(522)는 오디오 수신 회로(582)와 연결된 하나 이상의 마이크들을 이용하여 오디오 데이터를 획득할 수 있다. 프로세서(522)는 센서 회로(552)를 이용하여 오디오 신호에 대응하는 진동을 감지함으로써 오디오 데이터를 획득할 수 있다. 예를 들어, 프로세서(522)는 모션 센서, 골전도 센서, 또는 가속도 센서 중 적어도 하나를 이용하여 오디오 데이터를 획득할 수 있다. 프로세서(522)는 다양한 경로(예: 오디오 수신 회로(582) 또는 센서 회로(552) 중 적어도 하나)를 통하여 획득된 오디오 데이터를 처리(예: 잡음 억제, 잡음 제거 또는 에코 제거)하도록 설정될 수 있다.According to one embodiment, the processor 522 may acquire audio data using at least one of the audio receiving circuit 582 or the sensor circuit 552. For example, the processor 522 may acquire audio data using one or more microphones connected to the audio receiving circuit 582. The processor 522 may acquire audio data by detecting vibration corresponding to the audio signal using the sensor circuit 552. For example, the processor 522 may acquire audio data using at least one of a motion sensor, a bone conduction sensor, or an acceleration sensor. Processor 522 may be configured to process (e.g., noise suppression, noise removal, or echo cancellation) audio data acquired through various paths (e.g., at least one of audio receiving circuitry 582 or sensor circuitry 552). there is.

일 실시예에서, 제2 무선 오디오 장치(302-2)는 도 4에 미도시된 구성을 더 포함할 수 있다. 예를 들어, 제2 무선 오디오 장치(302-2)는 인디케이터(예: 도 1의 I/O 인터페이스(121)), 오디오 입력 장치, 입력 인터페이스, 및/또는 하우징(housing)을 더 포함할 수 있다.In one embodiment, the second wireless audio device 302-2 may further include a component not shown in FIG. 4. For example, the second wireless audio device 302-2 may further include an indicator (e.g., I/O interface 121 in FIG. 1), an audio input device, an input interface, and/or a housing. there is.

도 5는 일 실시예에 따른 제1 무선 오디오 장치의 정면도 및 배면도를 도시한다.Figure 5 shows front and rear views of a first wireless audio device according to one embodiment.

도 5를 참조하여, 제1 무선 오디오 장치(302-1)의 구조가 설명된다. 설명의 편의를 위하여 중복된 설명은 생략되나, 제2 무선 오디오 장치(302-2)도 제1 무선 오디오 장치(302-1)와 실질적으로 동일 또는 유사한 구조를 가질 수 있다. With reference to Figure 5, the structure of the first wireless audio device 302-1 is described. For convenience of explanation, redundant descriptions are omitted, but the second wireless audio device 302-2 may have a structure substantially the same or similar to that of the first wireless audio device 302-1.

일 실시예에서, 참조 번호 501은 제1 무선 오디오 장치(302-1)의 정면도를 도시한다. 제1 무선 오디오 장치(302-1)는 하우징(510)을 포함할 수 있다. 하우징(510)은 제1 무선 오디오 장치(302-1)의 외관의 적어도 일부를 형성할 수 있다. 하우징(510)의 제1 면(예: 착용되었을 때 귀의 외부를 향하는 면)에 배치된 버튼(513) 및 복수의 마이크(581a, 581b)를 포함할 수 있다. 버튼(513)은 사용자 입력(예: 터치 입력 또는 푸시 입력)을 수신하도록 설정될 수 있다. 제1 마이크(581a) 및 제2 마이크(581b)는 도 4의 오디오 수신 회로(581)에 포함될 수 있다. 제1 마이크(581a) 및 제2 마이크(581b)는, 제1 무선 오디오 장치(302-1)가 착용되었을 때, 사용자의 외부를 향하는 방향으로 소리를 감지하도록 배치될 수 있다. 제1 마이크(581a) 및 제2 마이크(581b)는 외부 마이크로 참조될 수 있다. 제1 마이크(581a) 및 제2 마이크(581b)는 하우징(510) 외부의 소리를 감지할 수 있다. 예를 들어, 제1 마이크(581a) 및 제2 마이크(581b)는 제1 무선 오디오 장치(302-1)의 주변에서 발생되는 소리를 감지할 수 있다. 제1 무선 오디오 장치(302-1)에 의해 감지된 주변 환경의 소리는 스피커(570)에 의해 출력될 수도 있다. 일 실시 예에서, 제1 마이크(581a) 및 제2 마이크(581b)는 제1 무선 오디오 장치(302-1)의 노이즈 캔슬 기능(예: ANC; active noise cancellation)을 위한 수음용 마이크일 수 있다. 또한, 제1 마이크(581a) 및 제2 마이크(581b)는 제1 무선 오디오 장치(302-1)의 주변 소리 듣기 기능(예: transparency 기능 또는 ambient aware 기능)을 위한 수음용 마이크일 수 있다. 예를 들어, 제1 마이크(581a) 및 제2 마이크(581b)는 전자 콘덴서 마이크(ECM; electronic condenser microphone) 및 MEMS(micro electro mechanical system) 마이크를 포함하는 다양한 종류의 마이크를 포함할 수 있다. 하우징(510)의 둘레에는 윙팁(wing tip, 511)이 결합될 수 있다. 윙 팁(511)은 적어도 일부가 탄성 소재로 형성될 수 있다. 윙팁(511)은 하우징(510)으로부터 탈착되거나 하우징(510)에 부착될 수 있다. 윙팁(511)은 제1 무선 오디오 장치(302-1)의 착용성을 개선할 수 있다.In one embodiment, reference numeral 501 depicts a front view of the first wireless audio device 302-1. The first wireless audio device 302-1 may include a housing 510. The housing 510 may form at least a portion of the exterior of the first wireless audio device 302-1. It may include a button 513 and a plurality of microphones 581a and 581b disposed on the first side of the housing 510 (eg, the side facing the outside of the ear when worn). Button 513 may be set to receive user input (eg, touch input or push input). The first microphone 581a and the second microphone 581b may be included in the audio receiving circuit 581 of FIG. 4. The first microphone 581a and the second microphone 581b may be arranged to detect sound in a direction toward the outside of the user when the first wireless audio device 302-1 is worn. The first microphone 581a and the second microphone 581b may be referred to as external microphones. The first microphone 581a and the second microphone 581b can detect sound from outside the housing 510. For example, the first microphone 581a and the second microphone 581b can detect sounds generated around the first wireless audio device 302-1. The sound of the surrounding environment detected by the first wireless audio device 302-1 may be output through the speaker 570. In one embodiment, the first microphone 581a and the second microphone 581b may be microphones for collecting noise for a noise cancellation function (e.g., active noise cancellation (ANC)) of the first wireless audio device 302-1. . Additionally, the first microphone 581a and the second microphone 581b may be microphones for collecting ambient sounds (eg, transparency function or ambient awareness function) of the first wireless audio device 302-1. For example, the first microphone 581a and the second microphone 581b may include various types of microphones, including electronic condenser microphones (ECM) and micro electro mechanical system (MEMS) microphones. A wing tip 511 may be coupled to the circumference of the housing 510. At least a portion of the wing tip 511 may be formed of an elastic material. The wing tip 511 may be detached from the housing 510 or attached to the housing 510. The wingtip 511 can improve the wearability of the first wireless audio device 302-1.

일 실시예에 따르면, 참조 번호 502는 제1 무선 오디오 장치(302-1)의 배면도를 도시한다. 하우징(510)의 제2 면(예: 착용되었을 때 사용자를 향하는 면)에 배치된 제1 전극(514), 제2 전극(515), 근접 센서(550), 제3 마이크(581c), 및 스피커(570)를 포함할 수 있다. 스피커(570)는 도 4의 오디오 출력 회로(571)에 포함될 수 있다. 스피커(570)는 전기 신호를 소리 신호로 변환할 수 있다. 스피커(570)는 제1 무선 오디오 장치(302-1) 외부로 소리를 출력할 수 있다. 예를 들어, 스피커(570)는 전기 신호를 사용자가 청각적으로 인식할 수 있는 소리로 변환하여 출력할 수 있다. 스피커(570)의 적어도 일부는 하우징(510) 내부에 배치될 수 있다. 스피커(570)는 하우징(510)의 일 단부를 통하여 이어팁(512)과 결합될 수 있다. 이어팁(512)은 내부에 중공이 형성된 원통 형상으로 형성될 수 있다. 예를 들어, 이어팁(512)이 하우징(510)과 결합된 경우에, 스피커(570)로부터 출력되는 소리(오디오)는 이어팁(512)의 중공을 통해 외부 객체(예: 사용자)에게 전달될 수 있다.According to one embodiment, reference numeral 502 depicts a rear view of the first wireless audio device 302-1. A first electrode 514, a second electrode 515, a proximity sensor 550, a third microphone 581c disposed on the second side of the housing 510 (e.g., the side facing the user when worn), and It may include a speaker 570. The speaker 570 may be included in the audio output circuit 571 of FIG. 4. The speaker 570 can convert electrical signals into sound signals. The speaker 570 can output sound to the outside of the first wireless audio device 302-1. For example, the speaker 570 can convert an electrical signal into a sound that can be perceived by the user and output it. At least a portion of the speaker 570 may be disposed inside the housing 510. The speaker 570 may be coupled to the eartip 512 through one end of the housing 510. The eartip 512 may be formed in a cylindrical shape with a hollow interior. For example, when the eartip 512 is combined with the housing 510, sound (audio) output from the speaker 570 may be transmitted to an external object (e.g., a user) through the hollow of the eartip 512. there is.

일 실시예에 따르면, 제1 무선 오디오 장치(302-1)는 하우징(510)의 제2 면에 배치된 센서(551a)(예: 가속도 센서, 골전도 센서 및/또는 자이로 센서)를 포함할 수 있다. 도 5에 도시된 센서(551a)의 위치 및 형태는 예시적인 것으로서, 본 문서의 실시예들이 이에 제한되는 것은 아니다. 예를 들어, 센서(551a)는 하우징(510)의 내부에 배치되어 외부에 노출되지 않을 수도 있다. 센서(551a)는, 착용되었을 때, 착용자의 귀에 접촉할 수 있는 위치 또는 착용자의 귀에 접촉하는 하우징(510)의 일 부분에 위치될 수 있다.According to one embodiment, the first wireless audio device 302-1 may include a sensor 551a (e.g., an acceleration sensor, a bone conduction sensor, and/or a gyro sensor) disposed on the second side of the housing 510. You can. The location and shape of the sensor 551a shown in FIG. 5 are illustrative, and the embodiments of this document are not limited thereto. For example, the sensor 551a may be placed inside the housing 510 and not exposed to the outside. The sensor 551a may be located in a position that, when worn, may contact the wearer's ear or in a portion of the housing 510 that contacts the wearer's ear.

일 실시예에 따르면, 이어팁(512)은 탄성 소재(또는 가요성 재질)로 형성될 수 있다. 이어팁(512)은 제1 무선 오디오 장치(302-1)가 사용자의 귀에 밀착하여 삽입되도록 보조할 수 있다. 예를 들어, 이어팁(512)은 실리콘(silicon) 재질로 형성될 수 있다. 이어팁(512)의 적어도 일 영역은 외부 객체의 형상(예: 귀의 커널 모양)에 따라 변형될 수 있다. 본 발명의 다양한 실시 예에 따라서, 이어팁(512)은 실리콘, 폼(foam) 및 플라스틱 재질 중 적어도 둘의 조합에 의하여 형성될 수도 있다. 예를 들어, 이어팁(512) 중 사용자 귀에 삽입되어 맞닿는 영역은 실리콘 재질로 형성되고, 하우징(510)이 삽입되는 영역은 플라스틱 재질로 형성될 수 있다. 이어팁(512)은 하우징(510)으로부터 탈착되거나 하우징(510)에 부착될 수 있다. 제1 전극(514) 및 제2 전극(515)은, 외부 전력원(예: 케이스)과 연결되고, 외부 전력원으로부터 전기 신호를 수신할 수 있다. 근접 센서(550)는, 사용자의 착용 상태를 감지하기 위하여 이용될 수 있다. 근접 센서(550)는 하우징(510)의 내부에 배치될 수 있다. 근접 센서(550)는 적어도 일부가 제1 무선 오디오 장치(302-1)의 외관으로 노출되도록 배치될 수 있다. 제1 무선 오디오 장치(302-1)는 근접 센서(550)에 의해 측정된 데이터에 기반하여, 제1 무선 오디오 장치(302-1)의 사용자에게 착용 여부를 판단할 수 있다. 예를 들어, 근접 센서(550)는 IR 센서를 포함할 수 있다. IR 센서는 하우징(510)이 사용자의 신체에 접촉되었는지 여부를 감지할 수 있고, 제1 무선 오디오 장치(302-1)는 IR 센서의 감지에 기초하여 제1 무선 오디오 장치(302-1)의 착용 여부를 판단할 수 있다. 근접 센서(550)는 IR 센서에 한정되지 않고, 다양한 종류의 센서(예: 가속도 센서 또는 자이로 센서)를 이용하여 구현될 수 있다. 제3 마이크(581c)는 제1 무선 오디오 장치(302-1)가 착용되었을 때, 사용자를 향하는 방향으로 소리를 감지하도록 배치될 수 있다. 제3 마이크(581c)는 내부 마이크로 참조될 수 있다.According to one embodiment, the eartip 512 may be formed of an elastic material (or flexible material). The eartip 512 can assist the first wireless audio device 302-1 to be inserted in close contact with the user's ear. For example, the ear tips 512 may be made of silicone material. At least one area of the eartip 512 may be deformed according to the shape of an external object (eg, the shape of the ear kernel). According to various embodiments of the present invention, the eartip 512 may be formed by a combination of at least two of silicone, foam, and plastic materials. For example, the area of the ear tip 512 that is inserted into the user's ear and comes into contact with it may be made of a silicone material, and the area where the housing 510 is inserted may be made of a plastic material. The eartip 512 may be detached from or attached to the housing 510 . The first electrode 514 and the second electrode 515 are connected to an external power source (eg, a case) and can receive an electrical signal from the external power source. The proximity sensor 550 can be used to detect the user's wearing state. The proximity sensor 550 may be placed inside the housing 510. The proximity sensor 550 may be disposed so that at least a portion is exposed to the exterior of the first wireless audio device 302-1. The first wireless audio device 302-1 may determine whether the user of the first wireless audio device 302-1 is wearing the device based on data measured by the proximity sensor 550. For example, proximity sensor 550 may include an IR sensor. The IR sensor may detect whether the housing 510 is in contact with the user's body, and the first wireless audio device 302-1 may detect the first wireless audio device 302-1 based on the detection of the IR sensor. You can decide whether to wear it or not. The proximity sensor 550 is not limited to an IR sensor and may be implemented using various types of sensors (eg, an acceleration sensor or a gyro sensor). The third microphone 581c may be arranged to detect sound in a direction toward the user when the first wireless audio device 302-1 is worn. The third microphone 581c may be referred to as an internal microphone.

도 6는 일 실시예에 따른 무선 오디오 장치의 블록도를 도시한다.Figure 6 shows a block diagram of a wireless audio device according to one embodiment.

도 6를 참조하면, 일 실시예에 따르면, 무선 오디오 장치(302)의 구성들은 예시적인 소프트웨어 모듈들을 포함할 수 있다. 예를 들어, 상기 구성들은 제1 무선 오디오 장치(예: 도 3내지 도 5의 제1 무선 오디오 장치(302-1)) 또는 제2 무선 오디오 장치(예: 도 3, 도 4의 제2 무선 오디오 장치(302-2))에 의하여 구현될 수 있다. 상기 구성들의 적어도 일부는 생략될 수 있다. 상기 구성들 중 적어도 일부는 하나의 소프트웨어 모듈로 구현될 수 있다. 상기 구성들의 구분은 논리적인 것으로서, 동일한 기능을 수행하는 임의의 프로그램, 쓰레드(thread), 어플리케이션, 또는 코드(code)가 상기 구성들에 대응할 수 있다.Referring to Figure 6, according to one embodiment, components of wireless audio device 302 may include example software modules. For example, the above configurations include a first wireless audio device (e.g., the first wireless audio device 302-1 in FIGS. 3 to 5) or a second wireless audio device (e.g., the second wireless audio device in FIGS. 3 and 4). It can be implemented by an audio device 302-2). At least some of the above configurations may be omitted. At least some of the above configurations may be implemented as one software module. The division of the configurations is logical, and any program, thread, application, or code that performs the same function may correspond to the configurations.

일 실시예에 따르면, 전처리(pre-processing) 모듈(610)은 제1 오디오 수신 회로(예: 도 5의 오디오 수신 회로(581 또는 582)) 및 제2 오디오 수신 회로(예: 도 7의 제2 오디오 수신 회로(583))를 이용하여 수신된 소리(audio)(또는, 오디오 신호)에 대한 전처리를 수행할 수 있다. 제2 오디오 수신 회로(583)는 무선 오디오 장치(예: 도 5의 제1 무선 오디오 장치(302-1), 제2 무선 오디오 장치(302-2))에 포함된 것일 수 있다. 제2 오디오 수신 회로(583)는 전자 장치(예: 도 5의 전자 장치(301))로부터 오디오 신호(예: 참조 신호(reference signal))를 수신할 수 있다. 참조 신호는 전자 장치(301)에서 재생되는 미디어에 대응하는 것일 수 있다. 예를 들어, 전처리 모듈(610)은 AEC(acoustic echo canceller)(611)를 이용하여 획득된 오디오 신호의 에코를 제거할 수 있다. 전처리 모듈(610)은 NS(noise suppression)(612)를 이용하여 획득된 오디오 신호의 잡음을 감소시킬 수 있다. 전처리 모듈(610)은 HPF(high pass filter)(613)를 이용하여 획득된 오디오 신호의 지정된 대역의 신호를 감소시킬 수 있다. 전처리 모듈(610)은 컨버터(614)를 이용하여 오디오 입력 신호의 샘플링 레이트(sampling rate)를 변경할 수 있다. 예를 들어, 컨버터(614)는 오디오 입력 신호에 대한 다운샘플링 또는 업샘플링을 수행하도록 설정될 수 있다. 전처리 모듈(610)은 AEC(611), NS(612), HPF(613), 또는 컨버터(614) 중 적어도 하나를 오디오 신호에 선택적으로 적용시킬 수 있다.According to one embodiment, the pre-processing module 610 includes a first audio receiving circuit (e.g., the audio receiving circuit 581 or 582 in FIG. 5) and a second audio receiving circuit (e.g., the audio receiving circuit 581 or 582 in FIG. 7). 2 Preprocessing of the received sound (audio) (or audio signal) can be performed using the audio receiving circuit 583). The second audio receiving circuit 583 may be included in a wireless audio device (e.g., the first wireless audio device 302-1 and the second wireless audio device 302-2 in FIG. 5). The second audio receiving circuit 583 may receive an audio signal (eg, a reference signal) from an electronic device (eg, the electronic device 301 of FIG. 5). The reference signal may correspond to media played on the electronic device 301. For example, the preprocessing module 610 may remove echo of the acquired audio signal using an acoustic echo canceller (AEC) 611. The preprocessing module 610 can reduce noise of the acquired audio signal using noise suppression (NS) 612. The pre-processing module 610 may use a high pass filter (HPF) 613 to reduce signals in a designated band of the acquired audio signal. The preprocessing module 610 can change the sampling rate of the audio input signal using the converter 614. For example, the converter 614 may be set to perform downsampling or upsampling on an audio input signal. The preprocessing module 610 may selectively apply at least one of the AEC 611, NS 612, HPF 613, or converter 614 to the audio signal.

일 실시예에 따르면, 페이즈 결정 모듈(620)은 전자 장치(301)에서의 미디어 재생 여부 및 전자 장치(301)에 연관된 정보 중 하나 이상에 기초하여 무선 오디오 장치(302-1, 302-2)가 제1 모드 변경 페이즈(phase) 및 제2 모드 변경 페이즈(phase) 중 어느 하나로 진입하도록 결정할 수 있다. 전자 장치(301)에 연관된 정보는 전자 장치(301)의 환경 정보, 전자 장치(301)의 위치 정보, 및 전자 장치(301)의 주변에 있는 장치에 관한 정보 중 하나 이상을 포함할 수 있다.According to one embodiment, the phase determination module 620 controls the wireless audio devices 302-1 and 302-2 based on one or more of whether media is played on the electronic device 301 and information associated with the electronic device 301. may decide to enter one of the first mode change phase and the second mode change phase. Information related to the electronic device 301 may include one or more of environmental information of the electronic device 301, location information of the electronic device 301, and information about devices in the vicinity of the electronic device 301.

일 실시예에 따르면, 제1 모드 변경 페이즈(phase)는 동작 모드 상기 가창 모드 및 상기 대화 모드 중 어느 하나로의 변경을 결정하기 위한 것일 수 있다. 제2 모드 변경 페이즈(phase)는 대화 모드로의 변경을 결정하기 위한 것일 수 있다.According to one embodiment, the first mode change phase may be for determining a change to one of the operation modes, the singing mode and the conversation mode. The second mode change phase may be for determining the change to conversation mode.

일 실시예에 따르면, 대화 모드 모듈(dialog mode module)(625)은 대화 모드의 활성화 및 비활성화를 결정할 수 있다. 예를 들어, 대화 모드 모듈(625)은 제1 VAD(voice activity detection)(621)을 이용하여, 무선 오디오 장치(302)의 착용자(예: 사용자)의 발화 여부를 감지할 수 있다. 대화 모드 모듈(625)은 제2 VAD(622)를 이용하여, 착용자의 발화 및 외부인의 발화 여부를 감지할 수 있다. 대화 모드 모듈(625)은 제1 VAD(621)를 통하여 착용자의 발화 구간을 식별 및/또는 특정할 수 있다. 대화 모드 모듈(625)은 제1 VAD(621) 및 제2 VAD(622)를 통하여 외부인의 발화 구간을 식별 및/또는 특정할 수 있다. 예를 들어, 대화 모드 모듈(625)은 제2 VAD(622)를 통하여 발화가 식별된 구간 중, 제1 VAD(621)를 통하여 착용자의 발화가 식별된 구간을 제외함으로써 외부인의 발화 구간을 식별 및/또는 특정할 수 있다. 대화 모드 모듈(625)은 제1 VAD(621), 제2 VAD(622), 및 대화 모드 기능(623)을 이용하여 음성 에이전트의 실행 및 비활성화 여부를 결정할 수 있다.According to one embodiment, the dialogue mode module 625 may determine activation and deactivation of the dialogue mode. For example, the conversation mode module 625 may use the first voice activity detection (VAD) 621 to detect whether the wearer (eg, user) of the wireless audio device 302 speaks. The conversation mode module 625 can use the second VAD 622 to detect whether the wearer speaks or whether an outsider speaks. The conversation mode module 625 can identify and/or specify the wearer's speech section through the first VAD (621). The conversation mode module 625 can identify and/or specify the speech section of an outsider through the first VAD 621 and the second VAD 622. For example, the conversation mode module 625 identifies the utterance section of an outsider by excluding the section in which the wearer's utterance is identified through the first VAD 621 among the sections in which the utterance is identified through the second VAD 622. and/or may be specified. The conversation mode module 625 can determine whether to run or deactivate the voice agent using the first VAD 621, the second VAD 622, and the conversation mode function 623.

일 실시예에 따르면, 대화 모드 모듈(625)은 제1 VAD(621) 및 제2 VAD(622)를 이용하여 사용자의 발화 여부 및 외부인의 발화 여부를 감지할 수 있다. 일 예에서, 대화 모드 모듈(625)은 전처리 모듈(610)에 의하여 전처리된 오디오 신호 또는 전처리 모듈(610)에 의하여 처리되지 않은 오디오 신호를 이용하여 제1 VAD(621) 또는 제2 VAD(622) 중 적어도 하나를 수행할 수 있다. 도 4을 참조하여, 무선 오디오 장치(302)는 오디오 수신 회로(581, 582)를 이용하여 오디오 신호를 수신할 수 있다. 무선 오디오 장치(302)는 센서 회로(551, 552)(예: 모션 센서, 가속도 센서 및/또는 자이로 센서)를 이용하여 무선 오디오 장치(302)의 움직임을 감지할 수 있다. 예를 들어, 지정된 대역(예: 사람의 음역 대역)에서, 지정된 크기 이상의 오디오 신호(예: 음성 신호)가 감지되면, 무선 오디오 장치(302)는 오디오 신호로부터 음성 신호를 감지할 수 있다. 음성 신호가 감지되는 동안, 동시에 또는 실질적으로 동시에, 지정된 움직임이 감지되는 경우, 무선 오디오 장치(302)는 음성 신호에 기반한 사용자 발화(예: 착용자 발화)를 감지할 수 있다. 예를 들어, 지정된 움직임은 무선 오디오 장치(302)의 착용자의 발화로 인하여 무선 오디오 장치(302)에 의하여 감지되는 움직임일 수 있다. 예를 들어, 착용자의 발화로 인한 움직임은 움직임 또는 진동의 형태로 모션 센서, 가속도 센서, 및/또는 자이로 센서에 전달될 수 있다. 착용자의 발화로 인한 움직임은 골전도 마이크로폰의 입력과 유사한 형태로 모션 센서, 가속도 센서, 및/또는 자이로 센서에 유입될 수 있다. 무선 오디오 장치(302)는 지정된 움직임과 음성 신호에 기반하여 착용자의 발화의 활성화 시간과 끝 시간에 대한 정보를 획득할 수 있다. 음성 신호가 감지되는 동안, 동시에 또는 실질적으로 동시에, 지정된 움직임이 감지되지 않는 경우, 무선 오디오 장치(302)는 음성 신호에 기반한 외부인 발화(예: 착용자가 아닌 사람(예: 외부인 또는 상대방)의 발화)를 감지할 수 있다. 무선 오디오 장치(302)는 지정된 움직임과 음성 신호에 기반하여 외부인의 발화의 활성화 시간과 끝 시간에 대한 정보를 획득할 수 있다. 대화 모드 모듈(625)은 사용자 또는 외부인의 발화의 활성화 활성화와 끝 시간에 대한 정보를 메모리(예: 도 4의 메모리531, 532)에 저장하고, 메모리(531, 532)에 저장된 정보에 기반하여 대화 모드의 활성화 또는 비활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 can detect whether the user speaks and whether an outsider speaks using the first VAD 621 and the second VAD 622. In one example, the conversation mode module 625 uses the audio signal preprocessed by the preprocessing module 610 or the audio signal not processed by the preprocessing module 610 to process the first VAD 621 or the second VAD 622. ) can be performed at least one of the following. Referring to FIG. 4, the wireless audio device 302 can receive an audio signal using the audio reception circuits 581 and 582. The wireless audio device 302 may detect movement of the wireless audio device 302 using sensor circuits 551 and 552 (e.g., a motion sensor, an acceleration sensor, and/or a gyro sensor). For example, when an audio signal (e.g., a voice signal) of a specified size or greater is detected in a designated band (e.g., a human vocal range band), the wireless audio device 302 may detect a voice signal from the audio signal. If specified movement is detected simultaneously or substantially simultaneously while the audio signal is being detected, the wireless audio device 302 may detect user utterance (e.g., wearer utterance) based on the audio signal. For example, the designated movement may be a movement detected by the wireless audio device 302 due to speech by the wearer of the wireless audio device 302. For example, movement due to the wearer's speech may be transmitted to the motion sensor, acceleration sensor, and/or gyro sensor in the form of movement or vibration. Movement caused by the wearer's speech may flow into the motion sensor, acceleration sensor, and/or gyro sensor in a similar way to input from a bone conduction microphone. The wireless audio device 302 can obtain information about the activation time and end time of the wearer's speech based on designated movement and voice signals. If no specified movement is detected while the audio signal is being detected, simultaneously or substantially simultaneously, the wireless audio device 302 may detect an outsider utterance (e.g., an utterance from a person other than the wearer (e.g., an outsider or an opponent)) based on the audio signal. ) can be detected. The wireless audio device 302 can obtain information about the activation time and end time of an outsider's speech based on designated movement and voice signals. The conversation mode module 625 stores information about the activation and end time of the user or an outsider's utterance in a memory (e.g., memories 531 and 532 in FIG. 4), and based on the information stored in the memories 531 and 532, You can decide to enable or disable conversation mode.

일 예를 들어, 제1 VAD(621)와 제2 VAD(622)는 직렬적인 프로세스일 수 있다. 무선 오디오 장치(302)는 제2 VAD(622)를 이용하여 음성 신호가 감지되면 움직임 센서(예: 가속도 센서 및/또는 자이로 센서)를 이용하여 움직임을 감지함으로써, 음성 신호가 사용자의 발화에 대응하는 것인지 식별할 수 있다.For example, the first VAD 621 and the second VAD 622 may be serial processes. When a voice signal is detected using the second VAD 622, the wireless audio device 302 detects movement using a motion sensor (e.g., an acceleration sensor and/or a gyro sensor), so that the voice signal responds to the user's utterance. You can identify what you are doing.

일 예를 들어, 제1 VAD(621)와 제2 VAD(622)는 병렬적인 프로세스일 수 있다. 예를 들어, 제1 VAD(621)는 제2 VAD(622)와는 독립적으로 사용자 발화를 감지하도록 설정될 수 있다. 제2 VAD(622)는 사용자의 발화 여부와 무관하게 음성 신호를 감지하도록 설정될 수 있다.For example, the first VAD 621 and the second VAD 622 may be parallel processes. For example, the first VAD 621 may be set to detect user utterance independently from the second VAD 622. The second VAD 622 may be set to detect voice signals regardless of whether the user speaks.

일 예를 들어, 무선 오디오 장치(302)는 사용자의 발화와 외부인의 발화를 감지함에 있어서 상이한 마이크를 이용할 수 있다. 무선 오디오 장치(302)는 외부인의 발화를 감지하기 위하여 외부 마이크(예: 도 5의 제1 마이크(581a) 및 제2 마이크(581b))를 이용할 수 있다. 무선 오디오 장치(302)는 사용자의 발화를 감지하기 위하여 내부 마이크(예: 도 5의 제3 마이크(581c))를 이용할 수 있다. 내부 마이크를 이용하는 경우, 전자 장치(302)는 내부 마이크에 기반한 음성 신호와 움직임 정보에 기반하여 착용자의 발화 여부를 결정할 수 있다. 무선 오디오 장치(302)는 사용자의 발화를 감지하기 위하여 센서 입력으로 유입된 음성 신호에 기반하여 착용자의 발화 여부를 결정할 수 있다. 센서 입력으로 유입되는 신호로는 가속도 센서, 자이로 센서 입력을 적어도 하나 포함할 수 있다.For example, the wireless audio device 302 may use different microphones to detect the user's speech and the speech of an outsider. The wireless audio device 302 may use an external microphone (eg, the first microphone 581a and the second microphone 581b in FIG. 5) to detect an outsider's speech. The wireless audio device 302 may use an internal microphone (eg, the third microphone 581c in FIG. 5) to detect the user's speech. When using an internal microphone, the electronic device 302 can determine whether the wearer speaks based on a voice signal and movement information based on the internal microphone. The wireless audio device 302 may determine whether the wearer speaks based on the voice signal received through the sensor input to detect the user's speech. Signals flowing into the sensor input may include at least one acceleration sensor and gyro sensor input.

일 실시예에 따르면, 대화 모드 모듈(625)은 제1 VAD(621) 및/또는 제2 VAD(622)를 이용하여 대화 모드의 활성화를 결정할 수 있다. 대화 모드 OFF 상태에서, 대화 모드 모듈(625)은 대화 모드의 활성화 여부를 결정할 수 있다. 예를 들어, 대화 모드 모듈(625)은 사용자의 발화가 지정된 시간 구간(예: L 프레임 이상, L은 자연수임) 동안 유지되면 대화 모드의 활성화를 결정할 수 있다. 다른 예를 들어, 대화 모드 모듈(625)은 사용자의 발화 비활성화 후, 상대방의 발화가 지정된 시간 구간 동안 유지되면 대화 모드의 활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 may determine activation of the conversation mode using the first VAD 621 and/or the second VAD 622. In the conversation mode OFF state, the conversation mode module 625 can determine whether to activate the conversation mode. For example, the conversation mode module 625 may determine activation of the conversation mode when the user's speech is maintained for a specified time interval (eg, more than L frames, where L is a natural number). For another example, the conversation mode module 625 may determine activation of the conversation mode when the other party's speech is maintained for a specified time period after deactivating the user's speech.

일 실시예에 따르면, 대화 모드 모듈(625)은 제1 VAD(621) 및/또는 제2 VAD(622)를 이용하여 대화 모드의 유지 또는 비활성화를 결정할 수 있다. 대화 모드 ON 상태에서, 대화 모드 모듈(625)은 대화 모드의 유지 또는 비활성화 여부를 결정할 수 있다. 예를 들어, 대화 모드 중에, 대화 모드 모듈(625)은 지정된 시간 구간 동안 음성 신호가 감지되지 않으면 대화 모드의 비활성화를 결정할 수 있다. 대화 모드 중에, 대화 모드 모듈(625)은 이전 음성 신호의 비활성화로부터 지정된 시간 구간 내에 음성 신호가 감지되면 대화 모드의 유지를 결정할 수 있다.According to one embodiment, the conversation mode module 625 may determine whether to maintain or deactivate the conversation mode using the first VAD 621 and/or the second VAD 622. In the conversation mode ON state, the conversation mode module 625 can determine whether to maintain or deactivate the conversation mode. For example, during conversation mode, the conversation mode module 625 may determine to deactivate the conversation mode if a voice signal is not detected for a specified time period. During the conversation mode, the conversation mode module 625 may determine whether to maintain the conversation mode when a voice signal is detected within a specified time interval from the deactivation of the previous voice signal.

일 실시예에 따르면, 대화 모드 모듈(625)은 대화 모드 기능(623)에 기반하여 대화 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 대화 모드 기능(623)은 사용자 입력에 기반한 대화 모드 활성화 및/또는 비활성화를 감지할 수 있다. 예를 들어, 사용자 입력은 사용자의 음성 명령, 사용자의 터치 입력, 또는 사용자의 버튼 입력을 포함할 수 있다. According to one embodiment, conversation mode module 625 may determine activation and/or deactivation of conversation mode based on conversation mode function 623. Conversation mode function 623 may detect conversation mode activation and/or deactivation based on user input. For example, the user input may include the user's voice command, the user's touch input, or the user's button input.

일 실시예에 따르면, 대화 모드 모듈(625)은 지정된 시간 구간의 길이를 주변 소리에 기반하여 결정할 수 있다. 예를 들어, 대화 모드 모듈(625)은 외부 마이크를 이용하여 획득되는 소리의 배경 잡음의 감도, SNR의 값 또는 잡음의 종류 중 적어도 하나에 기반하여 지정된 시간 구간의 길이를 결정할 수 있다. 잡음이 큰 환경에서, 대화 모드 모듈(625)은 지정된 시간 구간의 길이를 증가시킬 수 있다.According to one embodiment, the conversation mode module 625 may determine the length of the designated time period based on surrounding sounds. For example, the conversation mode module 625 may determine the length of the designated time section based on at least one of the sensitivity of background noise of sound acquired using an external microphone, the SNR value, or the type of noise. In a noisy environment, the conversation mode module 625 may increase the length of the designated time interval.

일 실시예에 따르면, 대화 모드 모듈(625)은 사용자의 음성 명령에 기반하여 대화 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 일 예에서, 음성 에이전트 모듈(630)은 사용자의 대화 모드 활성화를 지시하는 음성 명령을 감지하고, 음성 명령의 감지에 응답하여 대화 모드의 활성화를 지시하는 정보를 대화 모드 기능(623)으로 전달할 수 있다. 대화 모드 활성화를 지시하는 음성 명령은 음성 에이전트의 웨이크업을 위한 웨이크업 발화(예: 하이 빅스비) 및 음성 명령을 포함할 수 있다. 예를 들어, 상기 음성 명령은 “하이 빅스비, 대화 모드 활성화해줘”와 같은 형태를 가질 수 있다. 다른 예를 들어, 대화 모드 활성화를 지시하는 음성 명령은 웨이크업 발화를 포함하지 않는 “대화 모드 활성화해줘”와 같은 형태를 가질 수 있다. 대화 모드 기능(623)이 음성 에이전트 모듈(630)로부터 대화 모드 활성화를 지시하는 정보를 수신하는 경우, 대화 모드 모듈(625)은 대화 모드의 활성화를 결정할 수 있다. 일 예에서, 음성 에이전트 모듈(630)은 사용자의 대화 모드 비활성화를 지시하는 음성 명령을 감지하고, 상기 음성 명령의 감지에 응답하여 대화 모드의 비활성화를 지시하는 정보를 대화 모드 기능(623)으로 전달할 수 있다. 예를 들어, 대화 모드 비활성화를 지시하는 음성 명령은 음성 에이전트의 웨이크업을 위한 웨이크업 발화 및 음성 명령을 포함할 수 있다. 상기 음성 명령은 “하이 빅스비, 대화 모드 비활성화해줘”와 같은 형태를 가질 수 있다. 예를 들어, 대화 모드 비활성화를 지시하는 음성 명령은 웨이크업 발화를 포함하지 않는 “대화 모드 비활성화해줘”와 같은 형태를 가질 수 있다. 대화 모드 기능(623)이 음성 에이전트 모듈(630)로부터 대화 모드 비활성화를 지시하는 정보를 수신하는 경우, 대화 모드 모듈(625)은 대화 모드의 비활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 may determine whether to activate and/or deactivate the conversation mode based on the user's voice command. In one example, the voice agent module 630 may detect a user's voice command instructing activation of the conversation mode, and transmit information instructing activation of the conversation mode to the conversation mode function 623 in response to detection of the voice command. there is. The voice command instructing to activate the conversation mode may include a wake-up utterance (e.g., Hi Bixby) and a voice command to wake up the voice agent. For example, the voice command may take the form of “Hi Bixby, activate conversation mode.” As another example, a voice command instructing to activate conversation mode may have a form such as “Activate conversation mode,” which does not include a wake-up utterance. When the conversation mode function 623 receives information indicating activation of the conversation mode from the voice agent module 630, the conversation mode module 625 may determine activation of the conversation mode. In one example, the voice agent module 630 detects a voice command instructing the user to deactivate the conversation mode, and transmits information indicating deactivation of the conversation mode to the conversation mode function 623 in response to detection of the voice command. You can. For example, a voice command instructing to deactivate the conversation mode may include a wake-up utterance and a voice command to wake up the voice agent. The voice command may take the form of “Hi Bixby, disable conversation mode.” For example, a voice command instructing to deactivate conversation mode may have a form such as “Deactivate conversation mode,” which does not include a wake-up utterance. When the conversation mode function 623 receives information indicating deactivation of the conversation mode from the voice agent module 630, the conversation mode module 625 may determine to deactivate the conversation mode.

일 실시예에 따르면, 대화 모드 모듈(625)은 사용자의 터치 입력에 기반하여 대화 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 예를 들어, 전자 장치(301)는 무선 오디오 장치(302)의 대화 모드 제어를 위한 인터페이스를 제공할 수 있다. 인터페이스를 통하여, 전자 장치(301)는 대화 모드의 활성화 또는 비활성화를 설정하는 사용자의 입력을 수신할 수 있다. 대화 모드의 활성화를 지시하는 사용자 입력이 수신되면, 전자 장치(301)는 무선 오디오 장치(302)에 대화 모드의 활성화를 지시하는 신호를 송신할 수 있다. 대화 모드 기능(623)이 대화 모드의 활성화를 지시하는 정보를 상기 신호로부터 획득하면, 대화 모드 모듈(625)은 대화 모드의 활성화를 결정할 수 있다. 인터페이스를 통하여 대화 모드의 비활성화를 지시하는 사용자 입력이 수신되면, 전자 장치(301)는 무선 오디오 장치(302)에 대화 모드의 비활성화를 지시하는 신호를 송신할 수 있다. 대화 모드 기능(623)이 상기 신호로부터 대화 모드의 비활성화를 지시하는 정보를 획득하면, 대화 모드 모듈(625)은 대화 모드의 비활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 may determine activation and/or deactivation of the conversation mode based on the user's touch input. For example, the electronic device 301 may provide an interface for controlling the conversation mode of the wireless audio device 302. Through the interface, the electronic device 301 may receive a user's input for activating or deactivating the conversation mode. When a user input indicating activation of the conversation mode is received, the electronic device 301 may transmit a signal indicating activation of the conversation mode to the wireless audio device 302. If the conversation mode function 623 obtains information indicating activation of the conversation mode from the signal, the conversation mode module 625 can determine activation of the conversation mode. When a user input indicating deactivation of the conversation mode is received through the interface, the electronic device 301 may transmit a signal indicating deactivation of the conversation mode to the wireless audio device 302. If the conversation mode function 623 obtains information from the signal indicating deactivation of the conversation mode, the conversation mode module 625 can determine deactivation of the conversation mode.

일 실시예에 따르면, 대화 모드 모듈(625)이 대화 모드의 활성화 또는 비활성화를 결정하면, 무선 오디오 장치(302)는 전자 장치(301)에 대화 모드의 활성화 또는 비활성화가 결정되었음을 지시하는 신호를 송신할 수 있다. 전자 장치(301)는 상기 신호로부터 획득된 대화 모드의 활성화 또는 비활성화가 결정되었음을 지시하는 정보를 무선 오디오 장치(302)의 대화 모드 제어를 위한 인터페이스를 통하여 제공할 수 있다.According to one embodiment, when the conversation mode module 625 determines whether to activate or deactivate the conversation mode, the wireless audio device 302 transmits a signal to the electronic device 301 indicating that the conversation mode has been determined to be activated or deactivated. can do. The electronic device 301 may provide information indicating that activation or deactivation of the conversation mode obtained from the signal has been determined through an interface for controlling the conversation mode of the wireless audio device 302.

일 실시예에 따르면, 대화 모드 모듈(625)은 사용자의 버튼 입력에 기반하여 대화 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 예를 들어, 무선 오디오 장치(302)는 적어도 하나의 버튼(예: 도 5의 버튼(513))을 포함할 수 있다. 대화 모드 기능(623)은 버튼에 대한 지정된 입력(예: 더블 탭(double tap) 또는 롱 프레스)을 감지하도록 설정될 수 있다. 버튼을 통하여 대화 모드의 활성화를 지시하는 입력이 수신되면, 대화 모드 모듈(625)은 대화 모드의 활성화를 결정할 수 있다. 버튼을 통하여 대화 모드의 비활성화를 지시하는 입력이 수신되면, 대화 모드 모듈(625)은 대화 모드의 비활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 may determine activation and/or deactivation of the conversation mode based on the user's button input. For example, the wireless audio device 302 may include at least one button (eg, button 513 in FIG. 5). The conversation mode feature 623 can be configured to detect designated input on a button (e.g., a double tap or long press). When an input indicating activation of the conversation mode is received through a button, the conversation mode module 625 can determine activation of the conversation mode. When an input indicating deactivation of the conversation mode is received through a button, the conversation mode module 625 can determine deactivation of the conversation mode.

일 실시예에 따르면, 대화 모드 기능(623)은 음성 에이전트 모듈(630)과 상호작용하도록 설정될 수 있다. 예를 들어, 대화 모드 기능(623)은 음성 에이전트 모듈(630)로부터 발화가 음성 에이전트 호출에 대한 것인지를 지시하는 정보를 획득할 수 있다. 예를 들어, 지정된 시간 이상 유지되는 착용자의 발화가 제1 VAD(621)에 의하여 감지될 수 있다. 이 경우, 대화 모드 모듈(625)은 대화 모드 기능(623)을 이용하여 해당 착용자의 발화가 음성 에이전트의 호출에 대한 것인지를 식별할 수 있다. 대화 모드 기능(623)이 해당 발화에 의한 음성 에이전트 호출이 수행되었음을 음성 에이전트 모듈(630)을 이용하여 확인하는 경우, 대화 모드 모듈(625)은 해당 발화를 무시할 수 있다. 예를 들어, 해당 발화가 지정된 시간 이상 지속되었다고 하더라도, 대화 모드 모듈(625)은 해당 발화만으로는 대화 모드를 활성화하도록 결정하지 않을 수 있다. 예를 들어, 음성 에이전트 모듈(630)은 해당 발화로부터 대화 모드의 활성화를 지시하는 음성 명령을 식별할 수 있다. 이 경우, 음성 에이전트 모듈(630)은 대화 모드의 활성화를 지시하는 신호를 대화 모드 모듈(625)로 전달하고, 대화 모드 모듈(625)은 대화 모드의 활성화를 결정할 수 있다. 즉, 이 경우, 대화 모드 모듈(625)은 발화 자체의 길이가 아닌, 음성 에이전트 모듈(630)의 지시에 기반하여 대화 모드의 활성화를 결정할 수 있다.According to one embodiment, conversation mode function 623 may be configured to interact with voice agent module 630. For example, the conversation mode function 623 may obtain information from the voice agent module 630 indicating whether the utterance is for a voice agent call. For example, the wearer's speech that continues for more than a specified time may be detected by the first VAD (621). In this case, the conversation mode module 625 can use the conversation mode function 623 to identify whether the wearer's utterance is in response to a call from a voice agent. If the conversation mode function 623 uses the voice agent module 630 to confirm that a voice agent call has been made due to the corresponding utterance, the conversation mode module 625 may ignore the corresponding utterance. For example, even if the corresponding utterance continues for more than a specified time, the conversation mode module 625 may not decide to activate the conversation mode based on the corresponding utterance alone. For example, the voice agent module 630 may identify a voice command instructing activation of the conversation mode from the corresponding utterance. In this case, the voice agent module 630 transmits a signal indicating activation of the conversation mode to the conversation mode module 625, and the conversation mode module 625 can determine activation of the conversation mode. That is, in this case, the conversation mode module 625 may determine activation of the conversation mode based on the instruction of the voice agent module 630, rather than the length of the utterance itself.

일 실시예에 따르면, 대화 모드 모듈(625)은 대화 모드의 작동 시간에 기반하여 대화 모드의 비활성화를 결정할 수 있다. 예를 들어, 대화 모드가 ON된 후, 일정 시간이 경과되면, 대화 모드 모듈(625)은 대화 모드의 비활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 may determine deactivation of the conversation mode based on the operating time of the conversation mode. For example, after a certain amount of time has elapsed after the conversation mode is turned on, the conversation mode module 625 may determine to deactivate the conversation mode.

일 실시예에 따르면, 가창 모드 모듈(singing mode module)(627)은 가창 모드의 활성화 및 비활성화를 결정할 수 있다. 가창 모드 모듈(627)은 제1 모드 변경 페이즈에서, 무선 오디오 장치(302-1, 302-2)가 수신한 오디오 신호의 분석 결과가 가창 모드의 활성화 조건을 만족하는지에 기초하여 가창 모드의 활성화 및 비활성화를 결정할 수 있다. 가창 모드의 활성화 조건은 제1 민감도 레벨, 제2 민감도 레벨, 및 제3 민감도 레벨 중 전자 장치(301)의 민감도 레벨에 따라 구분되는 것일 수 있다.According to one embodiment, the singing mode module 627 may determine activation and deactivation of the singing mode. In the first mode change phase, the singing mode module 627 activates the singing mode based on whether the analysis result of the audio signal received by the wireless audio devices (302-1, 302-2) satisfies the activation conditions of the singing mode. and deactivation can be decided. The activation conditions for the singing mode may be classified according to the sensitivity level of the electronic device 301 among the first sensitivity level, second sensitivity level, and third sensitivity level.

일 실시예에 따르면, 제1 민감도 레벨에 따른 활성화 조건은 주변 소리 중 가창 음성이 미리 정한 시간 동안 연속하여 검출되는지에 관한 조건을 포함할 수 있다. 제2 민감도 레벨에 따른 활성화 조건은 주변 소리에 포함된 가창 음성과 미디어의 음향학적 유사도에 관한 조건을 포함할 수 있다. 주변 소리 및 미디어는 오디오 신호에 포함된 것이 수 있다. 제3 민감도 레벨에 따른 활성화 조건은 주변 소리에 포함된 가창 음성 및 미디어 각각에 포함된 가사(lyrics) 간의 유사도에 관한 조건을 포함할 수 있다.According to one embodiment, the activation condition according to the first sensitivity level may include a condition regarding whether a singing voice among ambient sounds is continuously detected for a predetermined time. The activation condition according to the second sensitivity level may include a condition regarding the acoustic similarity between the singing voice included in the surrounding sound and the media. Ambient sounds and media may be included in the audio signal. The activation condition according to the third sensitivity level may include a condition regarding the similarity between the singing voice included in the ambient sound and the lyrics included in each media.

일 실시예에 따르면, 가창 모드의 활성화 조건은 전자 장치(301)의 민감도 레벨 이하의 모든 레벨에 따른 활성화 조건을 포함할 수 있다. 예를 들어, 전자 장치(301)의 민감도 레벨이 제2 민감도 레벨인 경우, 가창 모드의 활성화 조건은 제1 민감도 레벨 및 제2 민감도 레벨에 따른 활성화 조건을 포함하고, 전자 장치(301)의 민감도 레벨이 제3 민감도 레벨인 경우, 가창 모드의 활성화 조건은 제1 민감도 레벨, 제2 민감도 레벨, 및 제3 민감도 레벨에 따른 활성화 조건을 포함할 수 있다.According to one embodiment, the activation condition of the singing mode may include activation conditions according to all levels below the sensitivity level of the electronic device 301. For example, when the sensitivity level of the electronic device 301 is the second sensitivity level, the activation condition of the singing mode includes activation conditions according to the first sensitivity level and the second sensitivity level, and the sensitivity of the electronic device 301 When the level is the third sensitivity level, the activation condition of the singing mode may include activation conditions according to the first sensitivity level, the second sensitivity level, and the third sensitivity level.

일 실시예에 따르면, 가창 모드 모듈(627)은 대화 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 가창 모드 모듈(627)은 사용자 입력에 기반한 가창 모드 활성화 및/또는 비활성화를 감지할 수 있다. 예를 들어, 사용자 입력은 사용자의 음성 명령, 사용자의 터치 입력, 또는 사용자의 버튼 입력을 포함할 수 있다.According to one embodiment, singing mode module 627 may determine activation and/or deactivation of conversation mode. The singing mode module 627 may detect singing mode activation and/or deactivation based on user input. For example, the user input may include the user's voice command, the user's touch input, or the user's button input.

일 실시예에 따르면, 가창 모드 모듈(627)은 지정된 시간 구간의 길이를 주변 소리에 기반하여 결정할 수 있다. 예를 들어, 가창 모드 모듈(627)은 외부 마이크를 이용하여 획득되는 소리의 배경 잡음의 감도, SNR의 값 또는 잡음의 종류 중 적어도 하나에 기반하여 지정된 시간 구간의 길이를 결정할 수 있다. 잡음이 큰 환경에서, 가창 모드 모듈(627)은 지정된 시간 구간의 길이를 증가시킬 수 있다.According to one embodiment, the singing mode module 627 may determine the length of the designated time section based on surrounding sounds. For example, the singing mode module 627 may determine the length of the designated time section based on at least one of the sensitivity of background noise of sound acquired using an external microphone, the value of SNR, or the type of noise. In a noisy environment, the singing mode module 627 may increase the length of the designated time interval.

일 실시예에 따르면, 가창 모드 모듈(627)은 사용자의 음성 명령에 기반하여 가창 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 일 예에서, 음성 에이전트 모듈(630)은 사용자의 가창 모드 활성화를 지시하는 음성 명령을 감지하고, 음성 명령의 감지에 응답하여 가창 모드의 활성화를 지시하는 정보를 가창 모드 모듈(627)에 전달할 수 있다. 가창 모드 활성화를 지시하는 음성 명령은 음성 에이전트의 웨이크업을 위한 웨이크업 발화(예: 하이 빅스비) 및 음성 명령을 포함할 수 있다. 예를 들어, 상기 음성 명령은 “하이 빅스비, 가창 모드 활성화해줘”와 같은 형태를 가질 수 있다. 다른 예를 들어, 가창 모드 활성화를 지시하는 음성 명령은 웨이크업 발화를 포함하지 않는 “가창 모드 활성화해줘”와 같은 형태를 가질 수 있다. 가창 모드 모듈(627)이 음성 에이전트 모듈(630)로부터 가창 모드 활성화를 지시하는 정보를 수신하는 경우, 가창 모드 모듈(627)은 가창 모드의 활성화를 결정할 수 있다. 일 예에서, 음성 에이전트 모듈(630)은 사용자의 가창 모드 비활성화를 지시하는 음성 명령을 감지하고, 상기 음성 명령의 감지에 응답하여 가창 모드의 비활성화를 지시하는 정보를 가창 모드 모듈(627)로 전달할 수 있다. 예를 들어, 가창 모드 비활성화를 지시하는 음성 명령은 음성 에이전트의 웨이크업을 위한 웨이크업 발화 및 음성 명령을 포함할 수 있다. 상기 음성 명령은 “하이 빅스비, 가창 모드 비활성화해줘”와 같은 형태를 가질 수 있다. 예를 들어, 가창 모드 비활성화를 지시하는 음성 명령은 웨이크업 발화를 포함하지 않는 “가창 모드 비활성화해줘”와 같은 형태를 가질 수 있다. 가창 모드 모듈(627)이 음성 에이전트 모듈(630)로부터 가창 모드 비활성화를 지시하는 정보를 수신하는 경우, 가창 모드 모듈(627)은 가창 모드의 비활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may determine activation and/or deactivation of the singing mode based on the user's voice command. In one example, the voice agent module 630 detects a voice command instructing the user to activate the singing mode, and transmits information instructing the activation of the singing mode to the singing mode module 627 in response to detection of the voice command. there is. The voice command instructing to activate the singing mode may include a wake-up utterance (e.g., Hi Bixby) and a voice command to wake up the voice agent. For example, the voice command may take the form of “Hi Bixby, activate singing mode.” As another example, a voice command instructing to activate singing mode may have a form such as “Activate singing mode,” which does not include a wake-up utterance. When the singing mode module 627 receives information indicating activation of the singing mode from the voice agent module 630, the singing mode module 627 may determine activation of the singing mode. In one example, the voice agent module 630 detects a voice command instructing the user to deactivate the singing mode, and transmits information indicating deactivation of the singing mode in response to detection of the voice command to the singing mode module 627. You can. For example, a voice command instructing to deactivate the singing mode may include a wake-up utterance and a voice command for waking up the voice agent. The voice command may take the form of “Hi Bixby, disable singing mode.” For example, a voice command instructing to deactivate singing mode may have a form such as “Deactivate singing mode,” which does not include a wake-up utterance. When the singing mode module 627 receives information indicating deactivation of the singing mode from the voice agent module 630, the singing mode module 627 may determine deactivation of the singing mode.

일 실시예에 따르면, 가창 모드 모듈(627)은 사용자의 터치 입력에 기반하여 가창 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 예를 들어, 전자 장치(301)는 무선 오디오 장치(302)의 가창 모드 제어를 위한 인터페이스를 제공할 수 있다. 인터페이스를 통하여, 전자 장치(301)는 가창 모드의 활성화 또는 비활성화를 설정하는 사용자의 입력을 수신할 수 있다. 가창 모드의 활성화를 지시하는 사용자 입력이 수신되면, 전자 장치(301)는 무선 오디오 장치(302)에 가창 모드의 활성화를 지시하는 신호를 송신할 수 있다. 가창 모드 모듈(627)이 가창 모드의 활성화를 지시하는 정보를 상기 신호로부터 획득하면, 가창 모드 모듈(627)은 가창 모드의 활성화를 결정할 수 있다. 인터페이스를 통하여 가창 모드의 비활성화를 지시하는 사용자 입력이 수신되면, 전자 장치(301)는 무선 오디오 장치(302)에 가창 모드의 비활성화를 지시하는 신호를 송신할 수 있다. 가창 모드 모듈(627)이 상기 신호로부터 가창 모드의 비활성화를 지시하는 정보를 획득하면, 가창 모드 모듈(627)은 가창 모드의 비활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may determine activation and/or deactivation of the singing mode based on the user's touch input. For example, the electronic device 301 may provide an interface for controlling the singing mode of the wireless audio device 302. Through the interface, the electronic device 301 may receive a user's input for activating or deactivating the singing mode. When a user input indicating activation of the singing mode is received, the electronic device 301 may transmit a signal indicating activation of the singing mode to the wireless audio device 302. When the singing mode module 627 obtains information indicating activation of the singing mode from the signal, the singing mode module 627 may determine activation of the singing mode. When a user input indicating deactivation of the singing mode is received through the interface, the electronic device 301 may transmit a signal indicating deactivation of the singing mode to the wireless audio device 302. If the singing mode module 627 obtains information indicating deactivation of the singing mode from the signal, the singing mode module 627 may determine deactivation of the singing mode.

일 실시예에 따르면, 가창 모드 모듈(627)이 가창 모드의 활성화 또는 비활성화를 결정하면, 무선 오디오 장치(302)는 전자 장치(301)에 가창 모드의 활성화 또는 비활성화가 결정되었음을 지시하는 신호를 송신할 수 있다. 전자 장치(301)는 상기 신호로부터 획득된 가창 모드의 활성화 또는 비활성화가 결정되었음을 지시하는 정보를 무선 오디오 장치(302)의 가창 모드 제어를 위한 인터페이스를 통하여 제공할 수 있다.According to one embodiment, when the singing mode module 627 determines activation or deactivation of the singing mode, the wireless audio device 302 transmits a signal to the electronic device 301 indicating that activation or deactivation of the singing mode has been determined. can do. The electronic device 301 may provide information indicating that activation or deactivation of the singing mode obtained from the signal has been determined through an interface for controlling the singing mode of the wireless audio device 302.

일 실시예에 따르면, 가창 모드 모듈(627)은 사용자의 버튼 입력에 기반하여 가창 모드의 활성화 및/또는 비활성화를 결정할 수 있다. 예를 들어, 무선 오디오 장치(302)는 적어도 하나의 버튼(예: 도 5의 버튼(513))을 포함할 수 있다. 가창 모드 모듈(627)은 버튼에 대한 지정된 입력(예: 더블 탭(double tap) 또는 롱 프레스)을 감지하도록 설정될 수 있다. 버튼을 통하여 가창 모드의 활성화를 지시하는 입력이 수신되면, 가창 모드 모듈(627)은 가창 모드의 활성화를 결정할 수 있다. 버튼을 통하여 가창 모드의 비활성화를 지시하는 입력이 수신되면, 가창 모드 모듈(627)은 가창 모드의 비활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may determine activation and/or deactivation of the singing mode based on the user's button input. For example, the wireless audio device 302 may include at least one button (eg, button 513 in FIG. 5). Singing mode module 627 can be configured to detect designated input on a button (e.g., double tap or long press). When an input instructing activation of the singing mode is received through a button, the singing mode module 627 may determine activation of the singing mode. When an input instructing deactivation of the singing mode is received through a button, the singing mode module 627 may determine deactivation of the singing mode.

일 실시예에 따르면, 가창 모드 모듈(627)은 음성 에이전트 모듈(630)과 상호작용하도록 설정될 수 있다. 예를 들어, 가창 모드 모듈(627)은 음성 에이전트 모듈(630)로부터 발화가 음성 에이전트 호출에 대한 것인지를 지시하는 정보를 획득할 수 있다. 예를 들어, 지정된 시간 이상 유지되는 착용자의 발화가 제1 VAD(621)에 의하여 감지될 수 있다. 이 경우, 가창 모드 모듈(627)은 해당 착용자의 발화가 음성 에이전트의 호출에 대한 것인지를 식별할 수 있다. 가창 모드 모듈(627)은 해당 발화에 의한 음성 에이전트 호출이 수행되었음을 음성 에이전트 모듈(630)을 이용하여 확인하는 경우, 가창 모드 모듈(627)은 해당 발화를 무시할 수 있다. 예를 들어, 해당 발화에 포함된 가창 음성이 지정된 시간 이상 지속되었다고 하더라도, 가창 모드 모듈(627)은 해당 발화만으로는 가창 모드를 활성화하도록 결정하지 않을 수 있다. 예를 들어, 음성 에이전트 모듈(630)은 해당 발화로부터 가창 모드의 활성화를 지시하는 음성 명령을 식별할 수 있다. 이 경우, 음성 에이전트 모듈(630)은 가창 모드의 활성화를 지시하는 신호를 가창 모드 모듈(627)로 전달하고, 가창 모드 모듈(627)은 가창 모드의 활성화를 결정할 수 있다. 즉, 이 경우, 가창 모드 모듈(627)은 가창 모드의 활성화 조건이 아닌, 음성 에이전트 모듈(630)의 지시에 기반하여 가창 모드의 활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may be configured to interact with the voice agent module 630. For example, the singing mode module 627 may obtain information indicating whether the utterance is for a voice agent call from the voice agent module 630. For example, the wearer's speech that continues for more than a specified time may be detected by the first VAD (621). In this case, the singing mode module 627 can identify whether the wearer's utterance is in response to a voice agent's call. When the singing mode module 627 confirms using the voice agent module 630 that a voice agent call has been made by the corresponding utterance, the singing mode module 627 may ignore the corresponding utterance. For example, even if the singing voice included in the corresponding utterance continues for more than a specified time, the singing mode module 627 may not decide to activate the singing mode based on the corresponding utterance alone. For example, the voice agent module 630 may identify a voice command instructing activation of the singing mode from the corresponding utterance. In this case, the voice agent module 630 transmits a signal indicating activation of the singing mode to the singing mode module 627, and the singing mode module 627 may determine activation of the singing mode. That is, in this case, the singing mode module 627 may determine activation of the singing mode based on the instructions of the voice agent module 630, rather than the activation condition of the singing mode.

일 실시예에 따르면, 가창 모드 모듈(627)은 가창 모드에서, 가창 모드의 비활성화를 결정할 수 있다. 예를 들어, 가창 모드 모듈(627)은 가창 모드에서 무선 오디오 장치(302-1, 302-2)가 수신한 오디오 신호의 분석 결과가 활성화 조건을 더 이상 만족하지 않을 경우 가창 모드의 비활성화를 결정할 수 있다. 다른 예를 들어, 가창 모드 모듈(627)은 미디어 재생 여부 및 전자 장치(301)에 연관된 정보에 기초하여 가창 모드의 비활성화를 결정할 수 있다. 이 경우, 가창 모드 모듈(627)은 전자 장치(301)에 미디어가 더 이상 재생되지 않거나, 전자 장치(301)에 연관된 정보에 따라 가창 모드가 필요하지 않다고 판단함으로써 가창 모드의 비활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may determine deactivation of the singing mode in the singing mode. For example, the singing mode module 627 determines deactivation of the singing mode when the analysis result of the audio signal received by the wireless audio devices (302-1, 302-2) in the singing mode no longer satisfies the activation conditions. You can. As another example, the singing mode module 627 may determine deactivation of the singing mode based on whether media is playing and information associated with the electronic device 301. In this case, the singing mode module 627 determines that media is no longer played on the electronic device 301 or that the singing mode is not needed according to information associated with the electronic device 301, thereby determining deactivation of the singing mode. .

일 실시예에 따르면, 무선 오디오 장치(302-1, 302-2)는 가창 모드에서, 가창 모드 모듈(627)을 이용하여 주변 소리에 포함된 가창 음성을 트래킹하면서, 사용자에게 가창 음성 및 미디어에 관한 가이드를 제공할 수 있다. 예를 들어, 무선 오디오 장치(302-1, 302-2)는 사용자가 노래 가이드 제공을 선택했거나, 가창 음성과 미디어 간의 유사도가 낮은 경우, 미디어에 관한 가이드 정보를 사용자에게 제공할 수 있다. 미디어에 관한 가이드 정보는 미디어(예: 노래)을 따라 부를 수 있는 메인 멜로디 정보, 박자, 또는 노래의 다음 소절에 재생될 가사를 포함할 수 있다. 미디어에 관한 가이드 정보는 TTS generation을 통한 작은 소리의 오디오가 무선 오디오 장치(302)를 통해 출력되거나, 시각적 정보로서 전자 장치(301)의 화면을 통해 디스플레이될 수 있다.According to one embodiment, the wireless audio devices (302-1, 302-2), in the singing mode, track the singing voice included in the surrounding sound using the singing mode module 627, and provide the user with information about the singing voice and media. We can provide guidance on this. For example, the wireless audio devices 302-1 and 302-2 may provide guide information about the media to the user when the user selects to provide a song guide or when the similarity between the singing voice and the media is low. Guide information about the media may include main melody information that can be sung along with the media (e.g., a song), a beat, or lyrics to be played in the next verse of the song. Guide information about media may be output as low-pitched audio through TTS generation through the wireless audio device 302, or may be displayed as visual information on the screen of the electronic device 301.

일 실시예에 따르면, 음성 에이전트 모듈(630)은 웨이크업 발화 인식 모듈(631) 및 음성 에이전트 제어 모듈(632)을 포함할 수 있다. 일 예에서, 음성 에이전트 모듈(630)은 음성 명령 인식 모듈(633)을 더 포함할 수 있다. 웨이크업 발화 인식 모듈(631)은 오디오 수신 회로(581, 582)를 이용하여 오디오 신호를 획득하고, 오디오 신호로부터 웨이크업 발화(예: 하이 빅스비)를 인식할 수 있다. 웨이크업 발화 인식 모듈(631)은 지정된 음성 명령이 인식되면, 음성 에이전트 제어 모듈(632)을 이용하여 음성 에이전트를 제어할 수 있다. 예를 들어, 음성 에이전트 제어 모듈(632)은 전자 장치(301)에 수신된 음성 신호를 전달하고, 전자 장치(301)로부터 음성 신호에 대응하는 태스크 또는 명령을 수신할 수 있다. 예를 들어, 음성 신호가 볼륨 조정을 지시하는 경우, 전자 장치(301)는 볼륨 조정을 지시하는 신호를 무선 오디오 장치(302)에 전달할 수 있다. 음성 명령 인식 모듈(633)은 오디오 수신 회로(581, 582)를 이용하여 오디오 신호를 획득하고, 오디오 신호로부터 지정된 음성 명령을 인식할 수 있다. 일 실시예에서, 지정된 음성 발화는, 대화 모드의 제어를 위한 음성 명령(예: 대화 모드 활성화, 대화 모드 비활성화)을 포함할 수 있다. 음성 명령 인식 모듈(633)은 웨이크업 발화의 인식 없이도 지정된 음성 명령이 인식되면 지정된 음성 명령에 대응하는 기능을 수행할 수 있다. 예를 들어, 음성 명령 인식 모듈(633)은 “대화 모드 비활성화” 또는 "가창 모드 비활성화"와 같은 지정된 명령의 발화를 인식하면, 전자 장치(301)에 대화 모드 또는 가창 모드의 비활성화를 지시하는 신호를 송신할 수 있다. 예를 들어, 음성 명령 인식 모듈(633)은 음성 에이전트와의 상호작용 없이 지정된 음성 명령에 대응하는 기능을 수행할 수 있다. 전자 장치(301)는 특정 모드(예: 대화 모드, 가창 모드)의 비활성화를 지시하는 신호에 응답하여, 후술되는 무선 오디오 장치(302)의 소리 제어를 수행할 수 있다.According to one embodiment, the voice agent module 630 may include a wake-up speech recognition module 631 and a voice agent control module 632. In one example, the voice agent module 630 may further include a voice command recognition module 633. The wake-up speech recognition module 631 may acquire an audio signal using the audio reception circuits 581 and 582 and recognize a wake-up speech (eg, Hi Bixby) from the audio signal. When the wake-up speech recognition module 631 recognizes a designated voice command, it can control the voice agent using the voice agent control module 632. For example, the voice agent control module 632 may transmit a received voice signal to the electronic device 301 and receive a task or command corresponding to the voice signal from the electronic device 301. For example, when a voice signal instructs volume adjustment, the electronic device 301 may transmit a signal instructing volume adjustment to the wireless audio device 302. The voice command recognition module 633 can obtain an audio signal using the audio reception circuits 581 and 582 and recognize a designated voice command from the audio signal. In one embodiment, the designated voice utterance may include a voice command for controlling the conversation mode (eg, activating the conversation mode, deactivating the conversation mode). The voice command recognition module 633 can perform a function corresponding to the designated voice command when the designated voice command is recognized even without recognition of the wake-up utterance. For example, when the voice command recognition module 633 recognizes the utterance of a designated command such as “deactivate conversation mode” or “deactivate singing mode,” it provides a signal instructing the electronic device 301 to deactivate the conversation mode or singing mode. can be transmitted. For example, the voice command recognition module 633 may perform a function corresponding to a designated voice command without interaction with a voice agent. The electronic device 301 may perform sound control of the wireless audio device 302, which will be described later, in response to a signal indicating deactivation of a specific mode (eg, conversation mode, singing mode).

일 실시예에 따르면, 대화 모드 모듈(625)은 대화 모드에 대한 결정(예: 대화 모드의 비활성화 또는 대화 모드의 활성화)을 대화 모드 제어 모듈(655)에 전달할 수 있다. 대화 모드 제어 모듈(655)은 대화 모드의 활성화 및/또는 비활성화에 따른 무선 오디오 장치(302)의 기능을 제어할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 대화 모드의 활성화 및/또는 비활성화에 따라서 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다.According to one embodiment, the conversation mode module 625 may transmit a decision about the conversation mode (eg, deactivating the conversation mode or activating the conversation mode) to the conversation mode control module 655. The conversation mode control module 655 may control functions of the wireless audio device 302 according to activation and/or deactivation of the conversation mode. For example, the conversation mode control module 655 may control the output signal of the wireless audio device 302 using the sound control module 640 according to activation and/or deactivation of the conversation mode.

일 실시예에 따르면, 가창 모드 모듈(627)은 가창 모드에 대한 결정(예: 가창 모드의 비활성화 또는 가창 모드의 활성화)을 가창 모드 제어 모듈(657)에 전달할 수 있다. 가창 모드 제어 모듈(657)은 가창 모드의 활성화 및/또는 비활성화에 따른 무선 오디오 장치(302)의 기능을 제어할 수 있다. 예를 들어, 가창 모드 제어 모듈(657)은 가창 모드의 활성화 및/또는 비활성화에 따라서 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. According to one embodiment, the singing mode module 627 may transmit a decision about the singing mode (eg, deactivation of the singing mode or activation of the singing mode) to the singing mode control module 657. The singing mode control module 657 may control the functions of the wireless audio device 302 according to activation and/or deactivation of the singing mode. For example, the singing mode control module 657 may control the output signal of the wireless audio device 302 using the sound control module 640 according to activation and/or deactivation of the singing mode.

일 예를 들어, 소리 제어 모듈(640)은 ANC(active noise cancellation) 제어 모듈(641) 및 주변 소리(ambient sound) 제어 모듈(642)을 포함할 수 있다. ANC 제어 모듈(641)은 주변 소리를 획득하고, 주변 소리에 기반하여 잡음 제거를 수행하도록 설정될 수 있다. 예를 들어, ANC 제어 모듈(641)은 외부 마이크를 이용하여 주변 소리를 획득하고, 획득된 주변 소리를 이용하여 잡음 제거를 수행할 수 있다. 주변 소리 제어 모듈(642)는 주변 소리를 착용자에게 제공하도록 설정될 수 있다. 예를 들어, 주변 소리 제어 모듈(642)은 외부 마이크를 이용하여 주변 소리를 획득하고, 획득된 주변 소리를 무선 오디오 장치(302)의 스피커를 이용하여 출력함으로써 주변 소리를 제공하도록 설정될 수 있다.For example, the sound control module 640 may include an active noise cancellation (ANC) control module 641 and an ambient sound control module 642. The ANC control module 641 may be set to acquire ambient sounds and perform noise removal based on the ambient sounds. For example, the ANC control module 641 may acquire ambient sounds using an external microphone and perform noise removal using the acquired ambient sounds. The ambient sound control module 642 can be set to provide ambient sounds to the wearer. For example, the ambient sound control module 642 may be set to provide ambient sound by acquiring ambient sound using an external microphone and outputting the acquired ambient sound using the speaker of the wireless audio device 302. .

일 실시예에 따르면, 대화 모드가 활성화되면, 대화 모드 제어 모듈(655)은 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 대화 모드의 활성화에 응답하여 ANC을 비활성화하고, 주변 소리를 활성화할 수 있다. 다른 예를 들어, 대화 모드 제어 모듈(655)은 무선 오디오 장치(302)에서 음악이 출력 중인 경우, 대화 모드의 활성화에 응답하여 출력 중인 음악의 음량 레벨을 일정 비율 이상 줄이거나, 최대 mute로 설정할 수 있다. 무선 오디오 장치(302)의 사용자는, 대화 모드의 활성화에 따라서 주변 소리를 보다 선명하게 들을 수 있다.According to one embodiment, when the conversation mode is activated, the conversation mode control module 655 can control the output signal of the wireless audio device 302 using the sound control module 640. For example, conversation mode control module 655 may disable ANC and activate ambient sound in response to activation of conversation mode. For another example, when music is being output from the wireless audio device 302, the conversation mode control module 655 reduces the volume level of the music being output by a certain percentage or more in response to activation of the conversation mode, or sets it to maximum mute. You can. The user of the wireless audio device 302 can hear surrounding sounds more clearly by activating the conversation mode.

일 실시예에 따르면, 대화 모드가 비활성화되면, 대화 모드 제어 모듈(655)은 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 대화 모드의 비활성화에 응답하여 ANC 설정 및/또는 주변 소리 설정을 대화 모드 활성화 이전의 설정으로 복원하고, 주변 소리를 비활성화할 수 있다. 예를 들어, 대화 모드 활성화 전에, 대화 모드 제어 모듈(655)은 ANC 설정 및/또는 주변 소리 설정을 메모리(531, 532)에 저장할 수 있다. 대화 모드가 비활성화되면, 대화 모드 제어 모듈(655)은 메모리(531, 532)에 저장된 ANC 설정 및/또는 주변 소리 설정에 따라서 ANC 및/또는 주변 소리를 활성화 또는 비활성화 할 수 있다.According to one embodiment, when the conversation mode is deactivated, the conversation mode control module 655 can control the output signal of the wireless audio device 302 using the sound control module 640. For example, the conversation mode control module 655 may restore the ANC settings and/or ambient sound settings to settings prior to activation of the conversation mode and disable ambient sounds in response to deactivation of the conversation mode. For example, before activating the conversation mode, the conversation mode control module 655 may store ANC settings and/or ambient sound settings in the memories 531 and 532. When the conversation mode is deactivated, the conversation mode control module 655 may activate or deactivate ANC and/or ambient sound according to the ANC settings and/or ambient sound settings stored in the memories 531 and 532.

다른 예를 들어, 대화 모드 제어 모듈(655)은 대화 모드의 비활성화에 응답하여 무선 오디오 장치(302)의 출력 신호를 대화 모드 활성화 이전의 설정으로 복원할 수 있다. 예를 들어, 대화 모드 활성화 전에, 무선 오디오 장치(302)에서 음악이 출력 중인 경우, 대화 모드 제어 모듈(655)은 음악 출력 신호 설정을 메모리(531, 532)에 저장할 수 있다. 대화 모드가 비활성화되면, 대화 모드 제어 모듈(655)은 메모리(531, 532)에 저장된 음악 출력 신호 설정으로 음악 출력 신호를 복원할 수 있다. 대화 모드 제어 모듈(655)은 대화 모드에서, 설정에 따라서 미디어 출력 볼륨을 지정된 값으로 감소시키거나 뮤트시킬 수 있다. 대화 모드에서, 무선 오디오 장치(302)는 음성 에이전트의 알림(예: 사용자 발화에 대한 응답)을 대화 모드의 볼륨과는 독립적으로 출력할 수 있다. 예를 들어, 무선 오디오 장치(302)는 대화 모드에서, 음성 에이전트의 알림(예: TTS 기반 응답)을 지정된 볼륨 값으로 출력할 수 있다.As another example, the conversation mode control module 655 may restore the output signal of the wireless audio device 302 to a setting prior to activation of the conversation mode in response to deactivation of the conversation mode. For example, if music is being output from the wireless audio device 302 before activating the conversation mode, the conversation mode control module 655 may store the music output signal settings in the memories 531 and 532. When the conversation mode is deactivated, the conversation mode control module 655 can restore the music output signal to the music output signal settings stored in the memories 531 and 532. In conversation mode, the conversation mode control module 655 can reduce or mute the media output volume to a specified value according to settings. In conversation mode, the wireless audio device 302 may output a voice agent's notification (e.g., a response to a user utterance) independently of the volume of the conversation mode. For example, in conversation mode, the wireless audio device 302 may output a voice agent's notification (eg, TTS-based response) at a specified volume value.

일 실시예에 따르면, 대화 모드 제어 모듈(655)은 대화 모드의 동작 중에 소리 제어 모듈(640)을 이용하여 출력 신호를 제어할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 ANC 및/또는 주변 소리의 강도를 제어할 수 있다. 대화 모드 제어 모듈(655)은 주변 소리의 이득 값을 제어하여 주변 소리의 강도를 증폭시킬 수 있다. 대화 모드 제어 모듈(655)은 주변 소리에서 음성이 존재하는 구간 또는 음성 대응 주파수 대역만을 증폭시킬 수 있다. 대화 모드에서, 대화 모드 제어 모듈(655)은 ANC의 강도를 감소시킬 수 있다. 대화 모드 제어 모듈(655)은 오디오 신호의 출력 볼륨을 제어할 수 있다.According to one embodiment, the conversation mode control module 655 may control the output signal using the sound control module 640 during operation in the conversation mode. For example, conversation mode control module 655 may control ANC and/or the intensity of ambient sounds. The conversation mode control module 655 can control the gain value of the surrounding sound to amplify the intensity of the surrounding sound. The conversation mode control module 655 may amplify only the section where a voice is present or a frequency band corresponding to the voice from the surrounding sound. In conversation mode, conversation mode control module 655 can reduce the intensity of ANC. The conversation mode control module 655 can control the output volume of the audio signal.

하기, 표 1 및 표 2는 대화 모드 활성화(예: ON)과 비활성화(예: OFF)에 따른 대화 모드 제어 모듈(655)의 소리 제어의 예시들을 나타낸다.Tables 1 and 2 below show examples of sound control of the conversation mode control module 655 according to conversation mode activation (eg, ON) and deactivation (eg, OFF).

[표 1][Table 1]

표 1을 참조하여, 무선 오디오 장치(302)의 착용자는 무선 오디오 장치(302)를 이용하여 음악을 청취 중일 수 있다. 예를 들어, 무선 오디오 장치(302)는 ANC를 수행하면서 음악을 출력할 수 있다. 예를 들어, 무선 오디오 장치(302)는 제1 볼륨으로 음악을 출력할 수 있다. 대화 모드의 활성화에 따라서, 대화 모드 제어 모듈(655)은 주변 소리를 활성화하고, ANC를 비활성화할 수 있다. 이 경우, 대화 모드 제어 모듈(655)은 출력 중인 음악의 볼륨을 지정된 값 이하로 감소시키거나, 지정된 비율만큼 감소시킬 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은, 대화 모드에서, 출력 중인 음악의 볼륨을 제2 값으로 감소시킬 수 있다. 대화 모드의 비활성화에 따라서, 대화 모드 제어 모듈(655)은 출력 신호에 관련된 설정을 복원할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 ANC를 활성화하고 주변 소리를 비활성화할 수 있다. 또한, 대화 모드 제어 모듈(655)은 출력 중인 음악의 볼륨을 제1 값으로 증가시킬 수 있다.Referring to Table 1, the wearer of the wireless audio device 302 may be listening to music using the wireless audio device 302. For example, the wireless audio device 302 can output music while performing ANC. For example, the wireless audio device 302 may output music at a first volume. Depending on activation of the conversation mode, the conversation mode control module 655 may activate ambient sound and deactivate ANC. In this case, the conversation mode control module 655 may reduce the volume of music being output below a specified value or by a specified ratio. For example, the conversation mode control module 655 may reduce the volume of music being output to a second value in conversation mode. Upon deactivation of the conversation mode, the conversation mode control module 655 may restore settings related to the output signal. For example, talk mode control module 655 can activate ANC and disable ambient sounds. Additionally, the conversation mode control module 655 may increase the volume of the music being output to the first value.

[표 2][Table 2]

표 2를 참조하여, 무선 오디오 장치(302)의 착용자는 무선 오디오 장치(302)를 이용하여 음악을 청취 중일 수 있다. 예를 들어, 무선 오디오 장치(302)는 ANC를 적용하지 않고 음악을 출력할 수 있다. 예를 들어, 무선 오디오 장치(302)는 제1 볼륨으로 음악을 출력할 수 있다. 대화 모드의 활성화에 따라서, 대화 모드 제어 모듈(655)은 주변 소리를 활성화하고, ANC를 비활성화 상태로 유지할 수 있다. 이 경우, 대화 모드 제어 모듈(655)은 출력 중인 음악의 볼륨을 지정된 값 이하로 감소시키거나, 지정된 비율만큼 감소시킬 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은, 대화 모드에서, 출력 중인 음악의 볼륨을 제2 값으로 감소시킬 수 있다. 대화 모드의 비활성화에 따라서, 대화 모드 제어 모듈(655)은 출력 신호에 관련된 설정을 복원할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 ANC를 비활성화 상태로 유지하고 주변 소리를 비활성화할 수 있다. 또한, 대화 모드 제어 모듈(655)은 출력 중인 음악의 볼륨을 제1 값으로 증가시킬 수 있다.Referring to Table 2, the wearer of the wireless audio device 302 may be listening to music using the wireless audio device 302. For example, the wireless audio device 302 may output music without applying ANC. For example, the wireless audio device 302 may output music at a first volume. Depending on activation of the conversation mode, the conversation mode control module 655 may activate ambient sound and keep ANC in a deactivated state. In this case, the conversation mode control module 655 may reduce the volume of music being output below a specified value or by a specified ratio. For example, the conversation mode control module 655 may reduce the volume of music being output to a second value in conversation mode. Upon deactivation of the conversation mode, the conversation mode control module 655 may restore settings related to the output signal. For example, talk mode control module 655 may keep ANC disabled and disable ambient sounds. Additionally, the conversation mode control module 655 may increase the volume of the music being output to the first value.

표 1과 표 2의 예시에서, 대화 모드가 설정되지 않았을 때, 무선 오디오 장치(302)가 주변 소리를 비활성화한 것으로 설명되었으나, 본 문서의 실시예들이 이에 제한되는 것은 아니다. 예를 들어, 무선 오디오 장치(302)는, 대화 모드가 설정되지 않은 경우에도, 사용자의 설정에 따라서 주변 소리를 활성화할 수 있다.In the examples of Tables 1 and 2, the wireless audio device 302 is described as disabling ambient sound when the conversation mode is not set, but embodiments of the present document are not limited thereto. For example, the wireless audio device 302 may activate ambient sound according to the user's settings even when the conversation mode is not set.

일 실시예에 따르면, 가창 모드 제어 모듈(657)은 가창 모드 모듈(627)은 가창 모드에 대한 결정(예: 가창 모드의 비활성화 또는 가창 모드의 활성화)을 가창 모드 제어 모듈(657)에 전달할 수 있다. 가창 모드 제어 모듈(657)은 가창 모드의 활성화 및/또는 비활성화에 따른 무선 오디오 장치(302)의 기능을 제어할 수 있다. 예를 들어, 가창 모드 제어 모듈(657)은 가창 모드의 활성화 및/또는 비활성화에 따라서 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다.According to one embodiment, the singing mode control module 657 may transmit a decision about the singing mode (e.g., deactivation of the singing mode or activation of the singing mode) to the singing mode control module 657. there is. The singing mode control module 657 may control the functions of the wireless audio device 302 according to activation and/or deactivation of the singing mode. For example, the singing mode control module 657 may control the output signal of the wireless audio device 302 using the sound control module 640 according to activation and/or deactivation of the singing mode.

일 실시예에 따르면, 주변 상황 인지 모듈(660)은 오디오 수신 회로(예: 도 제1 4의 오디오 수신 회로(581), 제2 오디오 수신 회로(582))를 이용하여 오디오 신호를 획득하고, 오디오 신호에 기초하여 주변 상황을 인지하고, 주변 상황의 환경을 분류할 수 있다. 주변 상황 인지 모듈(660)은 환경 분류 모듈(661) 및 사용자 주변 기기 탐색 모듈(663)을 포함할 수 있다. 주변 상황 인지 모듈(660)은 오디오 신호로부터 배경 잡음, SNR(signal to noise ratio), 또는 잡음의 종류 중 적어도 하나를 획득할 수 있다. 주변 상황 인지 모듈(660)은 센서 회로(예: 도 4의 센서회로(551, 552))로부터 센서 정보를 더 획득할 수 있다. 센서 정보는 와이파이 정보 및/또는 BLE 정보, GPS 정보를 포함할 수 있다.According to one embodiment, the surrounding situation recognition module 660 acquires an audio signal using an audio receiving circuit (e.g., the audio receiving circuit 581 and the second audio receiving circuit 582 in Figures 14), Based on audio signals, the surrounding situation can be recognized and the environment of the surrounding situation can be classified. The surrounding situation recognition module 660 may include an environment classification module 661 and a device search module 663 around the user. The surrounding situation awareness module 660 may obtain at least one of background noise, signal to noise ratio (SNR), or type of noise from the audio signal. The surrounding situation recognition module 660 may further obtain sensor information from a sensor circuit (eg, sensor circuits 551 and 552 in FIG. 4). Sensor information may include Wi-Fi information and/or BLE information, and GPS information.

일 실시예에 따르면, 환경 분류 모듈(661)은 배경 잡음의 강도, SNR, 또는 잡음의 종류에 기초하여 환경을 감지할 수 있다. 예를 들어, 환경 분류 모듈(661)은 메모리(531, 532)에 저장된 환경 정보와 배경 잡음의 강도, SNR, 또는 잡음의 종류 중 적어도 하나를 비교하여, 무선 오디오 장치(302)의 환경 정보를 계산할 수 있다.According to one embodiment, the environment classification module 661 may detect the environment based on the intensity of background noise, SNR, or type of noise. For example, the environment classification module 661 compares the environmental information stored in the memories 531 and 532 with at least one of the intensity of background noise, SNR, or type of noise to determine the environmental information of the wireless audio device 302. It can be calculated.

일 실시예에 따르면, 사용자 주변 기기 탐색 모듈(663)은 센서 정보를 이용하여 무선 오디오 장치(예: 제1 무선 오디오 장치(302-1), 제2 무선 오디오 장치(302-2))의 주변에 있는 장치에 관한 정보를 계산할 수 있다. 예를 들어, 사용자 주변 기기 탐색 모듈(663)은 센서 정보를 이용하여 무선 오디오 장치(302-1, 302-2)가 위치한 환경의 주변 기기의 종류, 분포 등을 계산할 수 있다. 다른 예를 들어, 사용자 주변 기기 탐색 모듈(663)은 센서 정보를 이용하여 무선 오디오 장치(302-1, 302-2)의 사용자의 위치 정보를 획득할 수 있다. 사용자 주변 기기 탐색 모듈(663)은 발화에 대응하는 환경 정보, 위치 정보, 전자 장치(301)의 주변 장치에 관한 정보 중 하나 이상과 해당 발화에 대하여 사용되는 모드와 매핑하고, 매핑된 모드의 패턴을 분석할 수 있다.According to one embodiment, the user's surrounding device discovery module 663 uses sensor information to detect the surroundings of a wireless audio device (e.g., the first wireless audio device 302-1, the second wireless audio device 302-2). Information about the device in can be calculated. For example, the user surrounding device search module 663 may use sensor information to calculate the type and distribution of surrounding devices in the environment where the wireless audio devices 302-1 and 302-2 are located. For another example, the user's surrounding device discovery module 663 may obtain location information about the user of the wireless audio devices 302-1 and 302-2 using sensor information. The user's surrounding device search module 663 maps one or more of environmental information corresponding to the utterance, location information, and information about peripheral devices of the electronic device 301 with the mode used for the utterance, and maps the pattern of the mapped mode. can be analyzed.

일 실시예에 따르면, 대화 모드 및 가창 모드 중 어느 하나가 활성화된 상태에서 주변 상황 인지 모듈(660)은 식별된 환경에 기반하여 출력 신호를 제어할 수 있다. 주변 상황 인지 모듈(660)은 배경 잡음의 강도 및/또는 SNR에 기반하여 주변 소리를 제어할 수 있다. 예를 들어, 주변 상황 인지 모듈(660)은 주변 소리의 전체 출력, 주변 소리 중 음성 대역의 증폭, 또는 주변 소리 중 지정된 소리(예: 알람 또는 사이렌)의 증폭을 결정할 수 있다.According to one embodiment, when either the conversation mode or the singing mode is activated, the surrounding situation recognition module 660 may control the output signal based on the identified environment. The surrounding situation awareness module 660 may control surrounding sounds based on the intensity and/or SNR of background noise. For example, the surrounding situation awareness module 660 may determine the total output of surrounding sounds, amplification of the voice band among surrounding sounds, or amplification of a specified sound (eg, alarm or siren) among surrounding sounds.

예를 들어, 주변 상황 인지 모듈(660)은 ANC의 강도를 결정할 수 있다. 예를 들어, 주변 상황 인지 모듈(660)은 ANC를 위한 필터의 파라미터(예: 계수)를 조절할 수 있다. For example, the situation awareness module 660 can determine the intensity of ANC. For example, the surrounding situation recognition module 660 can adjust parameters (eg, coefficients) of a filter for ANC.

일 실시예에 따르면, 주변 상황 인지 모듈(660)은 식별된 환경에 기반하여 대화 모드 및 가창 모드 중 어느 하나를 제어할 수 있다. 예를 들어, 주변 상황 인지 모듈(660)은 식별된 환경에 기반하여 대화 모드 및 가창 모드 중 어느 하나를 활성화시킬 수 있다. 주변 상황 인지 모듈(660)은 사용자가 주변 소리를 들어야 할 환경에 있다고 판단되면, 대화 모드 제어 모듈(655)을 이용하여 대화 모드를 활성화시키고, 대화 모드에 따라서 주변 소리를 사용자에게 제공할 수 있다. 예를 들어, 사용자가 위험한 환경(예: 사이렌 소리가 감지되는 환경)에 처한 경우, 주변 상황 인지 모듈(660)은 대화 모드를 활성화시킬 수 있다.According to one embodiment, the surrounding situation recognition module 660 may control either a conversation mode or a singing mode based on the identified environment. For example, the surrounding situation recognition module 660 may activate either a conversation mode or a singing mode based on the identified environment. If the surrounding situation recognition module 660 determines that the user is in an environment where he or she needs to hear surrounding sounds, it can activate the conversation mode using the conversation mode control module 655 and provide surrounding sounds to the user according to the conversation mode. . For example, when a user is in a dangerous environment (e.g., an environment where a siren sound is detected), the surrounding situation recognition module 660 may activate the conversation mode.

일 실시예에 따르면, 전자 장치(301)는 대화 모드 및 가창 모드 중 어느 하나의 비활성화 또는 활성화를 지시하는 인터페이스를 디스플레이(360) 상에 디스플레이할 수 있다. 전자 장치(301)는 무선 오디오 장치(302)의 대화 모드 및 가창 모드 중 어느 하나와 동기화된 방식으로 인터페이스를 제공할 수 있다. 전자 장치(301)는 전자 장치(301)가 대화 모드 및 가창 모드 중 어느 하나의 비활성화 또는 활성화를 결정하였을 때, 또는 무선 오디오 장치(302)로부터 대화 모드 및 가창 모드 중 어느 하나의 비활성화 또는 활성화를 지시하는 신호를 수신하였을 때 인터페이스를 디스플레이할 수 있다. 예를 들어, 전자 장치(301)는 대화 모드 및 가창 모드 중 어느 하나가 활성화되면, 대화 모드 및 가창 모드 중 어느 하나가 설정되었음을 알리는 정보를 포함하는 제1 인터페이스를 디스플레이할 수 있다. 제1 인터페이스는 대화 모드 및 가창 모드 중 어느 하나에서의 출력 신호 설정을 제어하기 위한 인터페이스를 포함할 수 있다. 예를 들어, 전자 장치(301)는 대화 모드 및 가창 모드 중 어느 하나가 비활성화되면, 대화 모드 및 가창 모드 중 어느 하나가 비활성화되었음을 알리는 정보를 포함하는 제2 인터페이스를 디스플레이할 수 있다. 전자 장치(301)는 무선 오디오 장치(302)의 제어를 위한 어플리케이션(예: 웨어러블 어플리케이션)의 실행 화면 상에 제1 인터페이스 및 제2 인터페이스를 디스플레이할 수 있다.According to one embodiment, the electronic device 301 may display an interface on the display 360 that instructs to deactivate or activate any one of the conversation mode and the singing mode. The electronic device 301 may provide an interface in synchronization with either the conversation mode or the singing mode of the wireless audio device 302. When the electronic device 301 determines to deactivate or activate any one of the conversation mode and the singing mode, or when the electronic device 301 determines the deactivation or activation of any one of the conversation mode and the singing mode from the wireless audio device 302 The interface can be displayed when an indicating signal is received. For example, when one of the conversation mode and the singing mode is activated, the electronic device 301 may display a first interface including information indicating that one of the conversation mode and the singing mode is set. The first interface may include an interface for controlling output signal settings in any one of conversation mode and singing mode. For example, when one of the conversation mode and the singing mode is deactivated, the electronic device 301 may display a second interface including information indicating that one of the conversation mode and the singing mode is deactivated. The electronic device 301 may display a first interface and a second interface on the execution screen of an application (eg, a wearable application) for controlling the wireless audio device 302.

일 실시예에 따르면, 대화 모드 모듈(625)은 착용 여부에 더 기반하여 대화 모드의 활성화 및 비활성화를 결정할 수 있다. 예를 들어, 대화 모드 모듈(625)은 무선 오디오 장치(302)가 사용자에 의하여 착용되었을 때에, 사용자(예: 착용자)의 발화 또는 사용자 입력에 기반하여 대화 모드를 활성화할 수 있다. 무선 오디오 장치(302)가 사용자에 의하여 착용되지 않은 경우, 대화 모드 모듈(625)은 사용자의 발화가 감지되더라도 대화 모드의 활성화하지 않을 수 있다.According to one embodiment, the conversation mode module 625 may determine activation and deactivation of the conversation mode based further on whether the device is worn. For example, when the wireless audio device 302 is worn by a user, the conversation mode module 625 may activate the conversation mode based on a user's (eg, wearer's) utterance or user input. If the wireless audio device 302 is not worn by the user, the conversation mode module 625 may not activate the conversation mode even if the user's speech is detected.

예를 들어, 제1 무선 오디오 장치(302-1) 및 제2 무선 오디오 장치(302-2) 각각은 도 5에 도시된 무선 오디오 장치(302)의 구성들을 포함할 수 있다. 제1 무선 오디오 장치(302-1)와 제2 무선 오디오 장치(302-2) 각각이 대화 모드 및 가창 모드 중 어느 하나의 활성화 여부를 결정하도록 설정될 수 있다. 일 실시예에 따르면, 제1 무선 오디오 장치(302-1) 또는 제2 무선 오디오 장치(302-2)가 대화 모드 및 가창 모드 중 어느 하나의 활성화를 결정하면, 제1 무선 오디오 장치(302-1)와 제2 무선 오디오 장치(302-2)가 대화 모드 및 가창 모드 중 어느 하나로 동작하도록 설정될 수 있다. 예를 들어, 대화 모드 및 가창 모드 중 어느 하나를 활성화를 결정한 제1 무선 오디오 장치(302-1) 또는 제2 무선 오디오 장치(302-2)는 다른 무선 오디오 장치 및/또는 전자 장치(301)에 대화 모드 및 가창 모드 중 어느 하나의 활성화를 지시하는 신호를 송신하도록 설정될 수 있다. 일 실시예에 따르면, 제1 무선 오디오 장치(302-1) 및 제2 무선 오디오 장치(302-2) 양자가 대화 모드의 활성화를 결정하면, 제1 무선 오디오 장치(302-1)와 제2 무선 오디오 장치(302-2)가 대화 모드 및 가창 모드 중 어느 하나로 동작하도록 설정될 수 있다. 예를 들어, 대화 모드 및 가창 모드 중 어느 하나의 활성화를 결정한 제1 무선 오디오 장치(302-1) 또는 제2 무선 오디오 장치(302-2)는 다른 무선 오디오 장치가 대화 모드 및 가창 모드 중 어느 하나의 활성화를 결정하였는지 확인하고, 두 무선 오디오 장치 모두가 대화 모드 및 가창 모드 중 어느 하나의 활성화를 결정한 경우에 제1 무선 오디오 장치(302-1) 및 제2 무선 오디오 장치(302-2)가 대화 모드 및 가창 모드 중 어느 하나로 동작할 수 있다. 다른 예를 들어, 대화 모드 및 가창 모드 중 어느 하나의 활성화를 결정한 제1 무선 오디오 장치(302-1) 또는 제2 무선 오디오 장치(302-2)는 전자 장치(301)로 대화 모드 및 가창 모드 중 어느 하나의 활성화를 지시하는 신호를 송신할 수 있다. 전자 장치(301)는 지정된 시간 내에 제1 무선 오디오 장치(302-1) 및 제2 무선 오디오 장치(302-2) 양자로부터 대화 모드 및 가창 모드 중 어느 하나를 활성화를 지시하는 신호가 수신되면, 제1 무선 오디오 장치(302-1) 및 제2 무선 오디오 장치(302-2)가 대화 모드 및 가창 모드 중 어느 하나로 동작하도록 하는 신호를 송신할 수 있다.For example, the first wireless audio device 302-1 and the second wireless audio device 302-2 may each include components of the wireless audio device 302 shown in FIG. 5. Each of the first wireless audio device 302-1 and the second wireless audio device 302-2 may be set to determine whether to activate either the conversation mode or the singing mode. According to one embodiment, when the first wireless audio device 302-1 or the second wireless audio device 302-2 determines activation of either the conversation mode or the singing mode, the first wireless audio device 302- 1) and the second wireless audio device 302-2 may be set to operate in either conversation mode or singing mode. For example, the first wireless audio device 302-1 or the second wireless audio device 302-2, which has decided to activate either the conversation mode or the singing mode, is connected to another wireless audio device and/or electronic device 301. It can be set to transmit a signal instructing activation of any one of conversation mode and singing mode. According to one embodiment, when both the first wireless audio device 302-1 and the second wireless audio device 302-2 decide to activate the conversation mode, the first wireless audio device 302-1 and the second wireless audio device 302-2 The wireless audio device 302-2 may be set to operate in either conversation mode or singing mode. For example, the first wireless audio device 302-1 or the second wireless audio device 302-2, which has decided to activate either the conversation mode or the singing mode, may cause the other wireless audio device to activate either the conversation mode or the singing mode. Check whether activation of one is determined, and if both wireless audio devices determine activation of either conversation mode or singing mode, the first wireless audio device 302-1 and the second wireless audio device 302-2 It can operate in either conversation mode or singing mode. For another example, the first wireless audio device 302-1 or the second wireless audio device 302-2, which has decided to activate either the conversation mode or the singing mode, activates the conversation mode or the singing mode with the electronic device 301. A signal indicating activation of any one of the signals may be transmitted. When the electronic device 301 receives a signal instructing to activate either the conversation mode or the singing mode from both the first wireless audio device 302-1 and the second wireless audio device 302-2 within a specified time, The first wireless audio device 302-1 and the second wireless audio device 302-2 may transmit signals to operate in either a conversation mode or a singing mode.

일 실시예에 따르면, 유사도 판단 모듈(670)은 가창 음성의 특징에 기초하여 오디오 신호에 포함된 주변 소리(ambient sound) 중 가창 음성에 관한 정보를 검출할 수 있다. 유사도 판단 모듈(670)은 오디오 신호에 포함된 주변 소리에 대한 주요부 신호와, 오디오 신호에 포함된 미디어에 대응하는 참조 신호에 대한 주요부 신호를 추출할 수 있다. 주요부 신호 및 가창 음성에 기초하여 미디어 및 가창 음성 간의 음향학적 유사도(acoustic similarity) 및 가사의 유사도를 계산할 수 있다. 유사도 판단 모듈(670)은 유사도를 가창 모드 모듈(627)에 출력하여, 유사도가 미리 정한 기준치(threshold)를 넘는 경우, 가창 모드의 활성화를 결정하도록 할 수 있다.According to one embodiment, the similarity determination module 670 may detect information about the singing voice among ambient sounds included in the audio signal based on the characteristics of the singing voice. The similarity determination module 670 may extract a main signal for the surrounding sound included in the audio signal and a main signal for the reference signal corresponding to the media included in the audio signal. Based on the main signal and the singing voice, the acoustic similarity and the similarity of the lyrics between the media and the singing voice can be calculated. The similarity determination module 670 may output the similarity to the singing mode module 627 to determine activation of the singing mode when the similarity exceeds a predetermined threshold.

대화 모드 및 가창 모드 중 어느 하나의 활성화, 유지, 및/또는 비활성화의 결정 방법은 도 7 내지 도 12b와 관련하여 후술되는 내용들에 의하여 참조될 수 있다.A method of determining activation, maintenance, and/or deactivation of any one of the conversation mode and the singing mode may be referred to by the contents described later in relation to FIGS. 7 to 12B.

도 7은 일 실시예에 따른 무선 오디오 장치의 구성을 도시한 블록도이다.Figure 7 is a block diagram showing the configuration of a wireless audio device according to an embodiment.

도 7을 참조하면, 일 실시예에 따르면, 무선 오디오 장치(302)는 센서 회로(예: 도 4의 센서 회로(551, 552)), 오디오 출력 회로(예: 도 4의 오디오 출력 회로(571, 572)), 오디오 수신 회로(예: 도 4의 제1 오디오 수신 회로(581, 582), 제2 오디오 수신 회로(583)), 전처리 모듈(610), 페이즈 결정 모듈(620), 대화 모드 모듈(625), 가창 모드 모듈(627), 음성 에이전트 모듈(630), 소리 제어 모듈(640), 대화 모드 제어 모듈(655), 가창 모드 제어 모듈(657), 주변 상황 인지 모듈(660), 및 유사도 판단 모듈(670)을 포함할 수 있다.Referring to FIG. 7, according to one embodiment, the wireless audio device 302 includes a sensor circuit (e.g., sensor circuits 551 and 552 in FIG. 4) and an audio output circuit (e.g., audio output circuit 571 in FIG. 4). , 572)), audio reception circuit (e.g., the first audio reception circuits 581 and 582 and the second audio reception circuit 583 in FIG. 4), preprocessing module 610, phase determination module 620, conversation mode module 625, singing mode module 627, voice agent module 630, sound control module 640, conversation mode control module 655, singing mode control module 657, surrounding situation recognition module 660, and a similarity determination module 670.

일 실시예에 따르면, 무선 오디오 장치(302)는 구성 요소에 기초하여 무선 오디오 장치(302)의 사용자에게 복수의 동작 모드를 제공할 수 있다. 복수의 동작 모드는 정상 모드(normal mode), 대화 모드(dialogue mode), 및 가창 모드(singing mode)를 포함할 수 있다. 복수의 동작 모드는 택일적으로 활성화되는 것으로, 둘 이상의 동작 모드가 동시에 활성화될 수는 없다.According to one embodiment, wireless audio device 302 may provide multiple operating modes to a user of wireless audio device 302 based on components. The plurality of operation modes may include normal mode, dialogue mode, and singing mode. A plurality of operation modes are activated alternatively, and two or more operation modes cannot be activated simultaneously.

일 실시예에 따르면, 정상 모드는 무선 오디오 장치(302)의 디폴트(default) 모드일 수 있다. 대화 모드는 사용자가 무선 오디오 장치(302)를 사용(예: 착용)하는 동안 사용자 외의 화자와 원활히 대화를 수행하기 위하여, 무선 오디오 장치(302)가 감지하는 오디오 신호에 포함된 주변 소리(ambient sound)의 적어도 일부를 출력하는 모드일 수 있다. 가창 모드는 사용자의 음악 감상 경험을 최적으로 돕기 위하여 오디오 신호에 포함된 주변 소리 및 미디어의 적어도 일부를 출력하는 모드일 수 있다.According to one embodiment, the normal mode may be the default mode of the wireless audio device 302. The conversation mode is an ambient sound included in the audio signal detected by the wireless audio device 302 in order to smoothly have a conversation with a speaker other than the user while using (e.g., wearing) the wireless audio device 302. ) may be a mode that outputs at least part of. The singing mode may be a mode that outputs at least some of the surrounding sounds and media included in the audio signal to optimally assist the user's music listening experience.

일 실시예에 따르면, 오디오 수신 회로(예: 제1 오디오 수신 회로(581, 582), 제2 오디오 수신 회로(583))는 오디오(audio) 신호를 감지할 수 있다. 오디오 신호는 무선 오디오 장치(302)의 주변 소리(ambient sound), 및 전자 장치(301)에 재생되는 미디어에 대응하는 참조 신호를 포함할 수 있다. 예를 들어, 제1 오디오 수신 회로(581, 582)는 전자 장치(301)의 주변 소리(ambient sound)(예: 사용자와 사용자 이외의 화자 간의 대화, 가창 음성)를 수신하고, 제2 오디오 수신 회로(583)는 전자 장치(301)로부터 참조 신호(reference signal)를 수신할 수 있다.According to one embodiment, an audio receiving circuit (eg, the first audio receiving circuits 581 and 582 and the second audio receiving circuit 583) may detect an audio signal. The audio signal may include ambient sounds of the wireless audio device 302 and reference signals corresponding to media played on the electronic device 301. For example, the first audio receiving circuits 581 and 582 receive ambient sound (e.g., a conversation between a user and a speaker other than the user, a singing voice) of the electronic device 301, and receive the second audio. The circuit 583 may receive a reference signal from the electronic device 301.

일 실시예에 따르면, 전처리(pre-processing) 모듈(610)은 오디오 수신 회로(예: 제1 오디오 수신 회로(581, 582), 제2 오디오 수신 회로(583))를 이용하여 감지한 오디오 신호에 대한 전처리를 수행하여 오디오 신호의 왜곡을 개선할 수 있다.According to one embodiment, the pre-processing module 610 detects an audio signal using an audio receiving circuit (e.g., the first audio receiving circuits 581 and 582, the second audio receiving circuit 583). Distortion of the audio signal can be improved by performing preprocessing.

일 실시예에 따르면, 페이즈 결정 모듈(620)은 전자 장치(301)의 미디어 재생 여부를 획득할 수 있다. 예를 들어, 전자 장치(301)로부터 수신한 미디어 재생 앱 정보를 통하여 전자 장치(301)에서의 미디어 재생 여부 및 미디어의 종류 및 가사가 있는지 여부를 획득할 수 있다. 다른 예를 들어, 페이즈 결정 모듈(620)은 참조 신호에 기초하여 미디어 재생 여부를 획득할 수 있다. 페이즈 결정 모듈(620)은 참조 신호의 크기가 미리 정한 크기 이상으로 미리 정한 시간 이상 유입될 경우 미디어가 재생되고 있다고 판단할 수 있다.According to one embodiment, the phase determination module 620 may obtain whether the electronic device 301 is playing media. For example, it is possible to obtain whether media is played on the electronic device 301, the type of media, and whether there are lyrics through media playback app information received from the electronic device 301. For another example, the phase determination module 620 may obtain whether to play media based on a reference signal. The phase determination module 620 may determine that media is being played when the size of the reference signal is greater than or equal to a predetermined size and flows in for more than a predetermined time.

일 실시예에 따르면, 페이즈 결정 모듈(620)은 주변 상황 인지 모듈(660) 및 센서 회로(551) 중 하나 이상으로부터 전자 장치(301)에 연관된 정보를 획득할 수 있다. 전자 장치(301)에 연관된 정보는 전자 장치(301)의 환경 정보, 전자 장치(301)의 위치 정보, 및 전자 장치(301)의 주변에 있는 장치에 관한 정보 중 하나 이상을 포함할 수 있다.According to one embodiment, the phase determination module 620 may obtain information related to the electronic device 301 from one or more of the surrounding situation recognition module 660 and the sensor circuit 551. Information related to the electronic device 301 may include one or more of environmental information of the electronic device 301, location information of the electronic device 301, and information about devices in the vicinity of the electronic device 301.

일 실시예에 따르면, 환경 정보는 주변 상황 인지 모듈(660)(예: 환경 분류 모듈(661))이 오디오 신호 및 전처리된 오디오 신호로부터 획득한 배경 잡음의 강도, SNR, 또는 잡음의 종류에 기초하여 생성된 것일 수 있다.According to one embodiment, the environmental information is based on the intensity of background noise, SNR, or type of noise obtained by the surrounding situation awareness module 660 (e.g., environment classification module 661) from the audio signal and the preprocessed audio signal. It may have been created by doing so.

일 실시예에 따르면, 전자 장치(301)의 위치 정보, 및 전자 장치(301)의 주변에 있는 장치에 관한 정보는 센서 회로(예: WIFI, BLE, UWB, GPS, ACC, gyro 센서 등)으로부터 수집한 센서 정보로부터 획득한 것일 수 있다. 또는, 전자 장치(301)의 위치 정보, 및 전자 장치(301)의 주변에 있는 장치에 관한 정보는 주변 상황 인지 모듈(660)(예: 사용자 주변 기기 탐색 모듈(663))이 센서 정보를 활용하여 계산한 것일 수 있다.According to one embodiment, the location information of the electronic device 301 and information about devices around the electronic device 301 are obtained from a sensor circuit (e.g., WIFI, BLE, UWB, GPS, ACC, gyro sensor, etc.) It may be obtained from collected sensor information. Alternatively, the location information of the electronic device 301 and information about devices in the vicinity of the electronic device 301 may be obtained by the surrounding situation recognition module 660 (e.g., the user's surrounding device search module 663) utilizing sensor information. This may have been calculated.

일 실시예에 따르면, 페이즈 결정 모듈(620)은 전자 장치(301)에서의 미디어 재생 여부 및 전자 장치(301)에 연관된 정보 중 하나 이상에 기초하여 무선 오디오 장치(302-1, 302-2)가 제1 모드 변경 페이즈(phase) 및 제2 모드 변경 페이즈(phase) 중 어느 하나로 진입하도록 결정할 수 있다. 제1 모드 변경 페이즈(phase)는 동작 모드가 상기 가창 모드 및 상기 대화 모드 중 어느 하나로의 변경을 결정하기 위한 것일 수 있다. 제2 모드 변경 페이즈(phase)는 대화 모드로의 변경을 결정하기 위한 것일 수 있다.According to one embodiment, the phase determination module 620 controls the wireless audio devices 302-1 and 302-2 based on one or more of whether media is played on the electronic device 301 and information associated with the electronic device 301. may decide to enter one of the first mode change phase and the second mode change phase. The first mode change phase may be for determining a change in the operating mode to one of the singing mode and the conversation mode. The second mode change phase may be for determining the change to conversation mode.

예를 들어, 전자 장치(301)의 주변 장치가 미리 정한 수보다 적거나, 오디오 신호에 기초하여 작은 소음 환경이 검출되거나, 가창 모드에 대한 사용자 사전 등록 위치가 검출된 경우, 제1 모드 변경 페이즈로 진입할 수 있다.For example, when the number of peripheral devices of the electronic device 301 is less than a predetermined number, a small noise environment is detected based on the audio signal, or a user pre-registered position for the singing mode is detected, the first mode change phase You can enter.

일 실시예에 따르면, 페이즈 결정 모듈(620)은 사용자 사용 패턴 모델(user’s usage pattern model)을 이용하여 사용자의 사용 패턴을 학습할 수 있다. 페이즈 결정 모듈(620)은 사용자의 가창 모드의 사용 패턴에 따라 제1 모드 변경 페이즈로 진입할 수 있다. 예를 들어, 페이즈 결정 모듈(620)은 사용자의 사용 패턴에 기초하여 사용자가 자주 노래를 부르는 환경과 실질적으로 동일 환경 혹은 유사한 환경에 위치한 것으로 확인되는 경우, 제1 모드 변경 페이즈로 진입할 수 있다. 사용자의 사용 패턴은 전자 장치(301)의 미디어 재생 여부 및 전자 장치(301)에 연관된 정보 중 하나 이상으로 특정될 수 있다. 전자 장치(301)에 연관된 정보는 환경 정보(예: 주변 잡음의 종류, 크기), 위치 정보, 주변 기기의 종류와 개수 등을 포함할 수 있다.According to one embodiment, the phase determination module 620 may learn the user's usage pattern using a user's usage pattern model. The phase determination module 620 may enter the first mode change phase according to the user's usage pattern of the singing mode. For example, when the phase determination module 620 determines that the user is located in an environment that is substantially the same or similar to an environment in which the user frequently sings based on the user's usage pattern, the phase determination module 620 may enter the first mode change phase. . The user's usage pattern may be specified by one or more of whether the electronic device 301 plays media and information related to the electronic device 301. Information related to the electronic device 301 may include environmental information (e.g., type and size of surrounding noise), location information, and the type and number of surrounding devices.

일 실시예에 따르면, 대화 모드 모듈(625)은 제1 모드 변경 페이즈 및 제2 모드 변경 페이즈에서, 무선 오디오 장치(302)의 사용자 및 사용자 이외의 화자 간의 대화를 감지하여 대화 모드의 활성화와 비활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 detects a conversation between the user of the wireless audio device 302 and a speaker other than the user in the first mode change phase and the second mode change phase to activate and deactivate the conversation mode. can be decided.

일 실시예에 따르면, 대화 모드 모듈(625)은 제1 모드 변경 페이즈에서, 가창 모드 모듈(627)에 의하여 가창 모드가 개시되지 않고, 사용자의 발화에 대응하는 음성 신호가 지정된 시간 구간(예: L 프레임 이상, L은 자연수임) 동안 유지되면, 대화 모드의 활성화를 결정할 수 있다. 다른 예를 들어, 대화 모드 모듈(625)은 제1 모드 변경 페이즈에서, 가창 모드 모듈(627)에 의하여 가창 모드가 개시되지 않고, 사용자의 발화 비활성화 후, 상대방의 발화에 대응하는 음성 신호가 지정된 시간 구간 동안 유지되면, 대화 모드의 활성화를 결정할 수 있다.According to one embodiment, the conversation mode module 625 does not start the singing mode by the singing mode module 627 in the first mode change phase, and the voice signal corresponding to the user's utterance is set to a designated time interval (e.g. If it is maintained for more than L frames (L is a natural number), activation of the conversation mode can be determined. For another example, the conversation mode module 625 does not initiate the singing mode by the singing mode module 627 in the first mode change phase, and after deactivating the user's speech, the voice signal corresponding to the other party's speech is designated. If maintained for the time interval, activation of the conversation mode can be determined.

일 실시예에 따르면, 대화 모드 모듈(625)은 제2 모드 변경 페이즈에서, 사용자의 발화에 대응하는 음성 신호가 지정된 시간 구간(예: L 프레임 이상, L은 자연수임) 동안 유지되면 대화 모드의 활성화를 결정할 수 있다. 다른 예를 들어, 대화 모드 모듈(625)은 제2 모드 변경 페이즈에서, 사용자의 발화 비활성화 후, 상대방의 발화에 대응하는 음성 신호가 지정된 시간 구간 동안 유지되면 대화 모드의 활성화를 결정할 수 있다.According to one embodiment, in the second mode change phase, the conversation mode module 625 switches to the conversation mode when the voice signal corresponding to the user's utterance is maintained for a specified time interval (e.g., L or more frames, L is a natural number). You can decide to activate it. For another example, in the second mode change phase, the conversation mode module 625 may determine activation of the conversation mode if, after deactivating the user's speech, the voice signal corresponding to the other party's speech is maintained for a designated time period.

일 실시예에 따르면, 대화 모드 모듈(625)은 음성 에이전트 모듈(630)과 상호작용하도록 설정될 수 있다. 예를 들어, 대화 모드 모듈(625)은 음성 에이전트 모듈(630)로부터 대화 모드의 활성화를 지시하는 정보를 획득할 수 있다. 즉, 이 경우, 가창 모드 모듈(627)은 가창 모드의 활성화 조건이 아닌, 음성 에이전트 모듈(630)의 지시에 기반하여 가창 모드의 활성화를 결정할 수 있다.According to one embodiment, conversation mode module 625 may be configured to interact with voice agent module 630. For example, the conversation mode module 625 may obtain information indicating activation of the conversation mode from the voice agent module 630. That is, in this case, the singing mode module 627 may determine activation of the singing mode based on the instructions of the voice agent module 630, rather than the activation condition of the singing mode.

일 실시예에 따르면, 가창 모드 모듈(627)은 제1 모드 변경 페이즈에서, 사용자의 가창 음성을 감지하여 가창 모드의 활성화와 비활성화를 결정할 수 있다. 가창 모드 모듈(627)은 제1 모드 변경 페이즈에서, 대화 모드 모듈(625)보다 우선하여 가창 모드의 활성화와 비활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may determine activation and deactivation of the singing mode by detecting the user's singing voice in the first mode change phase. The singing mode module 627 may determine activation and deactivation of the singing mode in priority over the conversation mode module 625 in the first mode change phase.

일 실시예에 따르면, 가창 모드 모듈(627)은 제1 모드 변경 페이즈에서, 페이즈 결정 모듈(620)을 통하여 수신한 오디오 신호 및 전처리된 오디오 신호의 분석 결과가 가창 모드의 활성화 조건을 만족하는지에 기초하여 가창 모드의 활성화와 비활성화를 결정할 수 있다. 가창 모드의 활성화 조건은 제1 민감도 레벨, 제2 민감도 레벨, 및 제3 민감도 레벨 중 전자 장치(301)의 사용자가 전자 장치(301)의 민감도 레벨에 따라 구분되는 것일 수 있다.According to one embodiment, the singing mode module 627 determines whether the analysis result of the audio signal and the pre-processed audio signal received through the phase determination module 620 satisfies the activation conditions of the singing mode in the first mode change phase. Based on this, it is possible to determine whether to activate or deactivate the singing mode. The activation condition of the singing mode may be that the user of the electronic device 301 is classified according to the sensitivity level of the electronic device 301 among the first sensitivity level, the second sensitivity level, and the third sensitivity level.

일 실시예에 따르면, 제1 민감도 레벨에 따른 활성화 조건은 주변 소리 중 가창 음성이 미리 정한 시간 동안 연속하여 검출되는지에 관한 조건을 포함할 수 있다. 제2 민감도 레벨에 따른 활성화 조건은 주변 소리에 포함된 가창 음성과 미디어의 음향학적 유사도에 관한 조건을 포함할 수 있다. 주변 소리 및 미디어는 오디오 신호에 포함된 것일 수 있다. 제3 민감도 레벨에 따른 활성화 조건은 주변 소리에 포함된 가창 음성 및 미디어 각각에 포함된 가사(lyrics) 간의 유사도에 관한 조건을 포함할 수 있다.According to one embodiment, the activation condition according to the first sensitivity level may include a condition regarding whether a singing voice among ambient sounds is continuously detected for a predetermined time. The activation condition according to the second sensitivity level may include a condition regarding the acoustic similarity between the singing voice included in the surrounding sound and the media. Ambient sounds and media may be included in the audio signal. The activation condition according to the third sensitivity level may include a condition regarding the similarity between the singing voice included in the ambient sound and the lyrics included in each media.

일 실시예에 따르면, 가창 모드 모듈(627)은 유사도 판단 모듈(670)로부터 수신한 가창 음성 검출 여부, 및 미디어와 가창 음성 간 유사도에 기초하여 민감도 레벨(예: 제1 민감도 레벨, 제2 민감도 레벨, 제3 민감도 레벨)에 따른 활성화 조건의 만족 여부를 판단할 수 있다. 가창 모드 모듈(627)은 활성화 조건을 만족하는 경우에는 가창 모드의 활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 detects whether the singing voice received from the similarity determination module 670 is detected, and sets a sensitivity level (e.g., a first sensitivity level, a second sensitivity level) based on the similarity between the media and the singing voice. It is possible to determine whether the activation conditions are satisfied according to the level, third sensitivity level). The singing mode module 627 may determine activation of the singing mode when the activation conditions are satisfied.

일 실시예에 따르면, 가창 모드의 활성화 조건은 가창 모드의 활성화 조건은 전자 장치(301)의 민감도 레벨 이하의 모든 레벨에 따른 활성화 조건을 포함할 수 있다. 예를 들어, 전자 장치(301)의 민감도 레벨이 제2 민감도 레벨인 경우, 가창 모드의 활성화 조건은 제1 민감도 레벨 및 제2 민감도 레벨에 따른 활성화 조건을 포함하고, 전자 장치(301)의 민감도 레벨이 제3 민감도 레벨인 경우, 가창 모드의 활성화 조건은 제1 민감도 레벨, 제2 민감도 레벨, 및 제3 민감도 레벨에 따른 활성화 조건을 포함할 수 있다.According to one embodiment, the activation condition of the singing mode may include activation conditions according to all levels below the sensitivity level of the electronic device 301. For example, when the sensitivity level of the electronic device 301 is the second sensitivity level, the activation condition of the singing mode includes activation conditions according to the first sensitivity level and the second sensitivity level, and the sensitivity of the electronic device 301 When the level is the third sensitivity level, the activation condition of the singing mode may include activation conditions according to the first sensitivity level, the second sensitivity level, and the third sensitivity level.

일 실시예에 따르면, 가창 모드 모듈(627)은 음성 에이전트 모듈(630)과 상호작용하도록 설정될 수 있다. 예를 들어, 가창 모드 모듈(627)은 음성 에이전트 모듈(630)로부터 가창 모드의 활성화를 지시하는 정보를 획득할 수 있다. 즉, 이 경우, 가창 모드 모듈(627)은 가창 모드의 활성화 조건이 아닌, 음성 에이전트 모듈(630)의 지시에 기반하여 가창 모드의 활성화를 결정할 수 있다.According to one embodiment, the singing mode module 627 may be configured to interact with the voice agent module 630. For example, the singing mode module 627 may obtain information indicating activation of the singing mode from the voice agent module 630. That is, in this case, the singing mode module 627 may determine activation of the singing mode based on the instructions of the voice agent module 630, rather than the activation condition of the singing mode.

일 실시예에 따르면, 음성 에이전트 모듈(630)은 대화 모드 또는 가창 모드의 활성화를 지시하는 신호를 대화 모드 모듈(625) 또는 가창 모드 모듈(627)로 전달할 수 있다. 이에, 대화 모드 모듈(625) 또는 가창 모드 모듈(627)은 대화 모드 또는 가창 모드의 활성화를 결정할 수 있다.According to one embodiment, the voice agent module 630 may transmit a signal indicating activation of the conversation mode or singing mode to the conversation mode module 625 or the singing mode module 627. Accordingly, the conversation mode module 625 or the singing mode module 627 may determine activation of the conversation mode or singing mode.

일 실시예에 따르면, 소리 제어 모듈(640)은 대화 모드 제어 모듈(655) 또는 가창 모드 제어 모듈(657)에 의하여 대화 모드 또는 가창 모드에 따라 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 소리 제어 모듈(640)은 오디오 출력 회로(571)에 출력 신호를 전송하여, 오디오 출력 회로(571)를 통하여 출력 신호가 출력(예: 재생)되도록 할 수 있다.According to one embodiment, the sound control module 640 can control the output signal of the wireless audio device 302 according to the conversation mode or singing mode by the conversation mode control module 655 or the singing mode control module 657. there is. The sound control module 640 may transmit an output signal to the audio output circuit 571 so that the output signal is output (e.g., played) through the audio output circuit 571.

일 실시예에 따르면, 대화 모드 제어 모듈(655)은 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 대화 모드 제어 모듈(655)은 대화 모드에서, 오디오 신호에 포함된 주변 소리(ambient sound)의 적어도 일부를 출력할 수 있다. 예를 들어, 대화 모드 제어 모듈(655)은 대화 모드에서, 주변 소리의 적어도 일부의 볼륨을 제1 이득으로 변경하여 출력할 수 있다.According to one embodiment, the conversation mode control module 655 may control the output signal of the wireless audio device 302 using the sound control module 640. The conversation mode control module 655 may output at least part of the ambient sound included in the audio signal in conversation mode. For example, in the conversation mode, the conversation mode control module 655 may change the volume of at least part of the ambient sound to the first gain and output it.

일 실시예에 따르면, 가창 모드 제어 모듈(657)은 소리 제어 모듈(640)을 이용하여 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 가창 모드 제어 모듈(657)은 가창 모드에서, 오디오 신호에 포함된 주변 소리 및 미디어의 적어도 일부를 출력할 수 있다. 예를 들어, 가창 모드 제어 모듈(657)은 가창 모드에서 상기 주변 소리의 적어도 일부의 볼륨을 제2 이득으로 변경하여 출력할 수 있다.According to one embodiment, the singing mode control module 657 may control the output signal of the wireless audio device 302 using the sound control module 640. The singing mode control module 657 may output at least a portion of the surrounding sounds and media included in the audio signal in the singing mode. For example, the singing mode control module 657 may change the volume of at least part of the ambient sound to a second gain in the singing mode and output it.

도 8은 일 실시예에 따른 무선 오디오 장치가 출력 신호를 제어하는 동작을 설명하기 위한 흐름도이다.FIG. 8 is a flowchart illustrating an operation of a wireless audio device controlling an output signal according to an embodiment.

이하 실시예에서 각 동작들은 순차적으로 수행될 수도 있으나, 반드시 순차적으로 수행되는 것은 아니다. 예를 들어, 각 동작들의 순서가 변경될 수도 있으며, 적어도 두 동작들이 병렬적으로 수행될 수도 있다.In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

일 실시 예에 따르면, 동작 810 내지 830은 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))의 프로세서(예: 도 4의 프로세서(521. 522))에서 수행되는 것으로 이해될 수 있다.According to one embodiment, operations 810 to 830 may be understood as being performed by a processor (e.g., processors 521 and 522 of FIG. 4) of a wireless audio device (e.g., wireless audio device 302 of FIG. 3). .

동작 810 내지 동작 830은 일 실시예에 따른 무선 오디오 장치가 가창 모드 및 대화 모드 중 어느 하나에 따라 출력 신호를 제어하는 동작을 설명하기 위한 것일 수 있다.Operations 810 to 830 may be used to explain an operation in which a wireless audio device according to an embodiment controls an output signal according to either a singing mode or a conversation mode.

동작 810에서, 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))는 오디오(audio) 신호를 감지할 수 있다. 오디오 신호는 주변 소리(ambient sound)를 포함할 수 있다. 오디오 신호는 전자 장치(301)에서 재생되는 미디어에 대응되는 참조 신호(reference signal)를 포함할 수 있다.In operation 810, a wireless audio device (eg, wireless audio device 302 of FIG. 3) may detect an audio signal. Audio signals may include ambient sounds. The audio signal may include a reference signal corresponding to media played on the electronic device 301.

동작 820에서, 무선 오디오 장치(302)는 오디오 신호의 분석 결과에 기초하여 무선 오디오 장치(302)의 동작 모드를 가창 모드 및 대화 모드 중 어느 하나로 결정할 수 있다. 대화 모드는 주변 소리의 적어도 일부를 출력하는 모드이고, 가창 모드는 주변 소리 및 미디어의 적어도 일부를 출력하는 모드일 수 있다.In operation 820, the wireless audio device 302 may determine the operation mode of the wireless audio device 302 to be one of a singing mode and a conversation mode based on the analysis result of the audio signal. The conversation mode may be a mode that outputs at least part of the surrounding sounds, and the singing mode may be a mode that outputs at least a part of the surrounding sounds and media.

동작 830에서, 무선 오디오 장치(302)는 결정된 모드에 따라 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 무선 오디오 장치(302)는 대화 모드에서 주변 소리의 적어도 일부의 볼륨을 제1 이득으로 변경하여 출력하고, 가창 모드에서 주변 소리의 적어도 일부의 볼륨을 제2 이득으로 변경하여 출력할 수 있다.In operation 830, the wireless audio device 302 may control the output signal of the wireless audio device 302 according to the determined mode. The wireless audio device 302 may change the volume of at least part of the surrounding sound to a first gain in conversation mode and output it, and change the volume of at least part of the surrounding sound to the second gain in singing mode and output it.

도 9는 일 실시예에 따른 무선 오디오 장치가 가창 모드 및 대화 모드 중 어느 하나에 따라 출력 신호를 제어하는 동작을 설명하기 위한 흐름도이다.FIG. 9 is a flowchart illustrating an operation of a wireless audio device controlling an output signal according to one of a singing mode and a conversation mode, according to an embodiment.

일 실시 예에 따르면, 동작 910 내지 990은 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))의 프로세서(예: 도 4의 프로세서(521. 522))에서 수행되는 것으로 이해될 수 있다.According to one embodiment, operations 910 to 990 may be understood as being performed by a processor (e.g., processors 521 and 522 of FIG. 4) of a wireless audio device (e.g., wireless audio device 302 of FIG. 3). .

동작 910 내지 동작 990은 일 실시예에 따른 무선 오디오 장치가 대화 모드 및 가창 모드의 사용을 모두 ON으로 설정한 상태에서, 가창 모드 및 대화 모드 중 어느 하나에 따라 출력 신호를 제어하는 동작에 관한 것일 수 있다.Operations 910 to 990 relate to operations in which the wireless audio device according to an embodiment controls the output signal according to any one of the singing mode and the conversation mode, with the use of both the conversation mode and the singing mode set to ON. You can.

일 실시예에서, 무선 오디오 장치(302)는 전자 장치(301)로부터 수신한 미디어 정보에 기초하여 미디어에 가사가 없는 것으로 판단한 경우, 민감도 레벨을 제1 민감도 레벨 및 제2 민감도 레벨 중 어느 하나로만 제한할 수 있다.In one embodiment, when the wireless audio device 302 determines that there are no lyrics in the media based on the media information received from the electronic device 301, the wireless audio device 302 sets the sensitivity level to only one of the first sensitivity level and the second sensitivity level. It can be limited.

동작 910에서, 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))는 제1 모드 변경 페이즈 및 제2 모드 변경 페이즈 중 어느 하나의 모드 변경 페이즈로 진입하도록 결정할 수 있다. 무선 오디오 장치(302)는 전자 장치(301)에서의 미디어 재생 여부 및 전자 장치(301)에 연관된 정보 중 하나 이상에 기초하여 제1 모드 변경 페이즈 및 제2 모드 변경 페이즈 중 어느 하나의 모드 변경 페이즈로 진입하도록 결정할 수 있다. 전자 장치(301)에 연관된 정보는 전자 장치(301)의 환경 정보, 전자 장치(301)의 위치 정보, 및 전자 장치(301)의 주변에 있는 장치에 관한 정보 중 하나 이상을 포함할 수 있다.In operation 910, the wireless audio device (e.g., the wireless audio device 302 in FIG. 3) may determine to enter one of the first mode change phase and the second mode change phase. The wireless audio device 302 performs one of the first mode change phase and the second mode change phase based on one or more of whether media is played on the electronic device 301 and information associated with the electronic device 301. You can decide to enter. Information related to the electronic device 301 may include one or more of environmental information of the electronic device 301, location information of the electronic device 301, and information about devices in the vicinity of the electronic device 301.

예를 들어, 무선 오디오 장치(302)는 미디어가 재생되고 있거나, 무선 오디오 장치(302)의 사용자의 현재 위치에서 가창 모드가 미리 정한 횟수 이상 활성화된 적이 있음에 따라 사용자가 자주 노래를 부르는 곳으로 확인되거나, 전자 장치(301)의 주변 장치가 미리 정한 수보다 적거나, 오디오 신호에 기초하여 작은 소음 환경이 검출되거나, 가창 모드에 대한 사용자 사전 등록 위치가 검출된 경우, 제1 모드 변경 페이즈로 진입을 결정할 수 있다.For example, the wireless audio device 302 may be directed to a location where the user frequently sings, as media is being played or the singing mode has been activated more than a predetermined number of times at the user's current location in the wireless audio device 302. If confirmed, if the number of peripheral devices of the electronic device 301 is less than a predetermined number, if a small noise environment is detected based on the audio signal, or if a user pre-registered position for the singing mode is detected, to the first mode change phase You can decide to enter.

무선 오디오 장치(302)는 제1 모드 변경 페이즈에 진입하도록 결정한 경우에는 동작 920을 수행하고, 제2 모드 변경 페이즈에 진입하도록 결정한 경우에는 동작 960을 수행할 수 있다. 제1 모드 변경 페이즈(phase)는 동작 모드 상기 가창 모드 및 상기 대화 모드 중 어느 하나로의 변경을 결정하기 위한 것일 수 있다. 제2 모드 변경 페이즈(phase)는 대화 모드로의 변경을 결정하기 위한 것일 수 있다.The wireless audio device 302 may perform operation 920 if it decides to enter the first mode change phase, and may perform operation 960 if it decides to enter the second mode change phase. The first mode change phase may be for determining a change to one of the operating modes, the singing mode and the conversation mode. The second mode change phase may be for determining the change to conversation mode.

동작 920에서, 무선 오디오 장치(302)는 무선 오디오 장치(302)가 감지한 오디오 신호 및 전처리된 오디오 신호에 기초하여 제1 민감도 레벨에 따른 활성화 조건(예: 제1 가창 모드 활성화 조건)을 만족하는지 판단할 수 있다. 제1 가창 모드 활성화 조건은 오디오 신호에 포함된 주변 소리 중 가창 음성이 미리 정한 시간 동안 연속하여 검출되는지에 관한 조건을 포함할 수 있다.In operation 920, the wireless audio device 302 satisfies an activation condition (e.g., a first singing mode activation condition) according to the first sensitivity level based on the audio signal detected by the wireless audio device 302 and the preprocessed audio signal. You can decide whether to do it or not. The first singing mode activation condition may include a condition regarding whether a singing voice among ambient sounds included in the audio signal is continuously detected for a predetermined time.

예를 들어, 무선 오디오 장치(302)는 오디오 신호에 포함된 주변 소리 중, 가창 음성이 지정된 시간 구간(예: N 프레임 이상, N은 자연수임) 동안 유지되면 제2 가창 모드 활성화 조건을 만족한 것으로 판단할 수 있다. 가창 음성은 노래를 따라 부르는 음성 및 허밍(humming)하는 음성 중 하나 이상을 포함할 수 있다.For example, the wireless audio device 302 satisfies the second singing mode activation condition when a singing voice is maintained for a specified time interval (e.g., N frames or more, N is a natural number) among the surrounding sounds included in the audio signal. It can be judged that The singing voice may include one or more of a singing voice and a humming voice.

무선 오디오 장치(302)는 제1 가창 모드 활성화 조건을 만족하는 경우, 동작 930을 수행하고, 제1 가창 모드 활성화 조건을 만족하지 않는 경우, 동작 970을 수행할 수 있다.The wireless audio device 302 may perform operation 930 when the first singing mode activation condition is satisfied, and may perform operation 970 when the first singing mode activation condition is not satisfied.

동작 930에서, 무선 오디오 장치(302)는 전자 장치(301)의 민감도 레벨이 1보다 큰지 판단할 수 있다. 전자 장치(301)의 민감도 레벨은 사용자가 미리 설정한 민감도 레벨이거나, 사용자가 미리 설정하지 않은 경우에는 기본으로 설정되어 있는 민감도 레벨(예: 제1 민감도 레벨)일 수 있다. 무선 오디오 장치(302)는 전자 장치(301)의 민감도 레벨이 1보다 큰 경우 동작 940을 수행하고, 전자 장치(301)의 민감도 레벨이 1이하인 경우 동작 980을 수행할 수 있다.In operation 930, the wireless audio device 302 may determine whether the sensitivity level of the electronic device 301 is greater than 1. The sensitivity level of the electronic device 301 may be a sensitivity level preset by the user, or, if not preset by the user, may be a sensitivity level set as default (eg, a first sensitivity level). The wireless audio device 302 may perform operation 940 if the sensitivity level of the electronic device 301 is greater than 1, and may perform operation 980 if the sensitivity level of the electronic device 301 is 1 or less.

동작 940에서, 무선 오디오 장치(302)는 무선 오디오 장치(302)가 감지한 오디오 신호 및 전처리된 오디오 신호에 기초하여 제2 민감도 레벨에 따른 활성화 조건(예: 제2 가창 모드 활성화 조건)을 만족하는지 판단할 수 있다. 제2 가창 모드 활성화 조건은 주변 소리에 포함된 가창 음성과, 미디어의 음향학적 유사도에 관한 조건을 포함할 수 있다. 주변 소리 및 미디어는 오디오 신호에 포함된 것일 수 있다.In operation 940, the wireless audio device 302 satisfies an activation condition according to the second sensitivity level (e.g., a second singing mode activation condition) based on the audio signal detected by the wireless audio device 302 and the preprocessed audio signal. You can decide whether to do it or not. The second singing mode activation condition may include conditions regarding the acoustic similarity between the singing voice included in the surrounding sound and the media. Ambient sounds and media may be included in the audio signal.

예를 들어, 무선 오디오 장치(302)는 오디오 신호에 포함된 주변 소리 중 가창 음성과, 전자 장치(301)에서 재생되는 미디어에 대응하는 참조 신호를 비교할 수 있다. 무선 오디오 장치(302)는 비교 결과에 따라 가창 음성과 참조 신호 간의 음향학적 유사도가 미리 정한 기준치(threshold)를 넘거나, 가창 음성과 참조 신호 간의 패턴 매칭 유사도가 미리 정한 기준치를 넘는 경우에는 제2 가창 모드 활성화 조건을 만족한 것으로 판단할 수 있다.For example, the wireless audio device 302 may compare a singing voice among ambient sounds included in the audio signal with a reference signal corresponding to media played on the electronic device 301. According to the comparison result, the wireless audio device 302 is configured to perform a second wireless audio device if the acoustic similarity between the singing voice and the reference signal exceeds a predetermined threshold or the pattern matching similarity between the singing voice and the reference signal exceeds a predetermined threshold. It can be determined that the singing mode activation conditions are satisfied.

무선 오디오 장치(302)는 제2 가창 모드 활성화 조건을 만족하는 경우, 동작 950을 수행하고, 제2 가창 모드 활성화 조건을 만족하지 않는 경우, 동작 970을 수행할 수 있다.The wireless audio device 302 may perform operation 950 when the second singing mode activation condition is satisfied, and may perform operation 970 when the second singing mode activation condition is not satisfied.

동작 950에서, 무선 오디오 장치(302)는 전자 장치(301)의 민감도 레벨이 2보다 큰지 판단할 수 있다. 무선 오디오 장치(302)는 전자 장치(301)의 민감도 레벨이 2보다 큰 경우 동작 960을 수행하고, 전자 장치(301)의 민감도 레벨이 2이하인 경우 동작 980을 수행할 수 있다.In operation 950, the wireless audio device 302 may determine whether the sensitivity level of the electronic device 301 is greater than 2. The wireless audio device 302 may perform operation 960 when the sensitivity level of the electronic device 301 is greater than 2, and may perform operation 980 when the sensitivity level of the electronic device 301 is 2 or less.

동작 960에서, 무선 오디오 장치(302)는 무선 오디오 장치(302)가 감지한 오디오 신호 및 전처리된 오디오 신호에 기초하여 제3 민감도 레벨에 따른 활성화 조건(예: 제3 가창 모드 활성화 조건)을 만족하는지 판단할 수 있다. 제3 가창 모드 활성화 조건은 주변 소리에 포함된 가창 음성 및 미디어 각각에 포함된 가사(lyrics) 간의 유사도에 관한 조건을 포함할 수 있다.In operation 960, the wireless audio device 302 satisfies an activation condition according to the third sensitivity level (e.g., a third singing mode activation condition) based on the audio signal detected by the wireless audio device 302 and the preprocessed audio signal. You can decide whether to do it or not. The third singing mode activation condition may include a condition regarding the similarity between the singing voice included in the surrounding sound and the lyrics included in each media.

예를 들어, 무선 오디오 장치(302)는 오디오 신호에 포함된 주변 소리 중 가창 음성과, 전자 장치(301)에서 재생되는 미디어에 대응하는 참조 신호를 비교할 수 있다. 무선 오디오 장치(302)는 비교 결과에 따라 가창 음성과 참조 신호 간의 가사에 관한 유사도(예: 가사의 길이의 유사도, 가사의 내용의 유사도)가 미리 정한 기준치(threshold)를 넘는 경우에는 제3 가창 모드 활성화 조건을 만족한 것으로 판단할 수 있다.For example, the wireless audio device 302 may compare a singing voice among ambient sounds included in the audio signal with a reference signal corresponding to media played on the electronic device 301. According to the comparison result, the wireless audio device 302 is a third song if the similarity regarding the lyrics (e.g., similarity in length of lyrics, similarity in content of lyrics) between the singing voice and the reference signal exceeds a predetermined threshold. It can be determined that the mode activation conditions are satisfied.

무선 오디오 장치(302)는 제3 가창 모드 활성화 조건을 만족하는 경우, 동작 980을 수행하고, 제3 가창 모드 활성화 조건을 만족하지 않는 경우, 동작 970을 수행할 수 있다.The wireless audio device 302 may perform operation 980 when the third singing mode activation condition is satisfied, and may perform operation 970 when the third singing mode activation condition is not satisfied.

동작 970에서, 무선 오디오 장치(302)는 오디오 신호에 포함된 사용자(또는 사용자 외의 화자)의 발화에 대응하는 음성 신호가 지정된 시간 구간(예: L 프레임 이상, L은 자연수임) 동안 검출되는지 판단할 수 있다. 무선 오디오 장치(302)는 음성 신호가 지정된 시간 구간 이상 검출되는 경우 동작 990을 수행하고, 음성 신호가 지정된 시간 구간 이상 검출되지 않는 경우 동작 910을 수행할 수 있다.In operation 970, the wireless audio device 302 determines whether a voice signal corresponding to an utterance of a user (or a speaker other than the user) included in the audio signal is detected during a specified time interval (e.g., L frames or more, where L is a natural number). can do. The wireless audio device 302 may perform operation 990 when a voice signal is detected for a specified time period or longer, and may perform operation 910 when a voice signal is not detected for a designated time period or longer.

동작 980에서, 무선 오디오 장치(302)는 가창 모드에 따라 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 무선 오디오 장치(302)는 가창 모드에서 주변 소리의 적어도 일부의 볼륨을 제2 이득으로 변경하여 출력할 수 있다. 예를 들어, 무선 오디오 장치(302)는 가창 모드에서 주변 소리 중 가창 음성의 볼륨을 제2 이득으로 변경하고, 미디어에 대응하는 참조 신호의 볼륨을 제2 이득에 대응하여 변경할 수 있다. 미디어에 대응하는 참조 신호의 볼륨은 제2 이득의 가창 음성과 함께 무선 오디오 장치(302)가 출력(예: 재생)하였을 때 사용자가 두 신호를 모두 모니터링할 수 있는 정도의 이득으로 변경될 수 있다.In operation 980, the wireless audio device 302 may control the output signal of the wireless audio device 302 according to the singing mode. The wireless audio device 302 may change the volume of at least part of the ambient sound to the second gain and output the volume in the singing mode. For example, in the singing mode, the wireless audio device 302 may change the volume of a singing voice among ambient sounds to a second gain and change the volume of a reference signal corresponding to media in accordance with the second gain. The volume of the reference signal corresponding to the media can be changed to a level of gain that allows the user to monitor both signals when the wireless audio device 302 outputs (e.g., plays) a singing voice of the second gain. .

가창 모드에서, 무선 오디오 장치(302)는 무선 오디오 장치(302)의 민감도 레벨에 따른 가창 모드의 활성화 조건(예: 제1 가창 모드 활성화 조건, 제2 가창 모드 활성화 조건, 제3 가창 모드 활성화 조건)을 만족하지 않을 경우 가창 모드를 비활성화할 수 있다. 또는, 무선 오디오 장치(302)는 무선 오디오 장치(302)가 전자 장치(301)에서의 미디어 재생 여부 및 전자 장치(301)에 연관된 정보 중 하나 이상에 기초하여 모드 변경 페이즈를 제2 모드 변경 페이즈로 진입하는 것을 결정한 경우, 가창 모드를 비활성화할 수 있다. 무선 오디오 장치(302)는 가창 모드가 비활성화되는 경우, 가창 모드가 활성화되기 전의 주변 소리 및 참조 신호에 대한 이득(gain) 설정을 복구할 수 있다.In the singing mode, the wireless audio device 302 sets the activation conditions for the singing mode according to the sensitivity level of the wireless audio device 302 (e.g., a first singing mode activation condition, a second singing mode activation condition, and a third singing mode activation condition). ) is not satisfied, the singing mode can be deactivated. Alternatively, the wireless audio device 302 may perform a mode change phase in a second mode change phase based on one or more of whether media is played on the electronic device 301 and information associated with the electronic device 301. If you decide to enter , you can disable singing mode. When the singing mode is deactivated, the wireless audio device 302 may restore gain settings for ambient sounds and reference signals before the singing mode was activated.

동작 990에서, 무선 오디오 장치(302)는 대화 모드에 따라 무선 오디오 장치(302)의 출력 신호를 제어할 수 있다. 무선 오디오 장치(302)는 대화 모드에서 주변 소리의 적어도 일부의 볼륨을 제1 이득으로 변경하여 출력할 수 있다. 예를 들어, 무선 오디오 장치(302)는 대화 모드에서 ANC을 비활성화하고, 주변 소리의 볼륨을 제1 이득으로 변경할 수 있다. 다른 예를 들어, 무선 오디오 장치(302)는 대화 모드에서 무선 오디오 장치(302)에서 미디어가 재생 중인 경우, 미디어에 대응하는 참조 신호의 볼륨을 일정 비율 이상 줄이거나, 최대 mute로 설정할 수 있다. 무선 오디오 장치(302)의 사용자는, 대화 모드에서 주변 소리에 포함된 대화 내용을 보다 선명하게 들을 수 있다.At operation 990, the wireless audio device 302 may control the output signal of the wireless audio device 302 according to the conversation mode. In conversation mode, the wireless audio device 302 may change the volume of at least part of the ambient sound to the first gain and output it. For example, the wireless audio device 302 may disable ANC in conversation mode and change the volume of ambient sounds to the first gain. For another example, when media is being played on the wireless audio device 302 in conversation mode, the wireless audio device 302 may reduce the volume of the reference signal corresponding to the media by a certain percentage or set it to mute at the maximum. A user of the wireless audio device 302 can more clearly hear conversation content included in ambient sounds in conversation mode.

도 10은 일 실시예에 따른 유사도 판단 모듈의 개략도이다.Figure 10 is a schematic diagram of a similarity determination module according to one embodiment.

도 10을 참조하면, 일 실시예에 따르면, 유사도 판단 모듈(670)은 주요부 추출 모듈(1010), 가창 음성 검출 모듈(1020), 계산 모듈(1030), 가사 인식 모듈(1040), 멜로디/보컬 모델(1050), 가사 모델(1060), 및 가중치 모델(1070)을 포함할 수 있다.Referring to FIG. 10, according to one embodiment, the similarity determination module 670 includes a main part extraction module 1010, a singing voice detection module 1020, a calculation module 1030, a lyrics recognition module 1040, and a melody/vocal It may include a model 1050, a lyrics model 1060, and a weight model 1070.

일 실시예에 따르면, 가창 음성 검출 모듈(1020)은 오디오 수신 회로(예: 도 7의 오디오 수신 회로(581, 582, 583)로부터 오디오 신호를 수신하고, 전처리 모듈(예: 도 7의 전처리 모듈(610))로부터 전처리된 오디오 신호를 수신할 수 있다. 가창 음성 검출 모듈(1020)은 가창 음성의 특징에 기초하여 오디오 신호에 포함된 주변 소리(ambient sound) 중 가창 음성에 관한 정보를 검출할 수 있다. 예를 들어, 가창 음성은 일반적인 음성과 달리 고정된 pitch의 지속 시간이 길고 pause 구간이 짧은 특징이 있다. pitch는 음의 높낮이를 의미하고, pause는 음성이 재생되지 않는 구간을 의미할 수 있다. 가창 음성 검출 모듈(1020)은 가창 음성의 특징에 기초하여, 신호처리 기반의 pitch/melody estimation이나 학습 기반의 다양한 딥러닝 분류기를 통하여 가창 음성에 관한 정보를 검출할 수 있다. 가창 음성에 관한 정보는 주변 소리 또는 참조 신호의 특정 구간(예: 프레임)이 가창 음성인지에 대한 정보, 검출된 신호의 정보(예: 음향학적 정보), 및 주변 소리 또는 참조 신호의 특정 구간이 가창 음성에 근접한 정도에 대한 확률 정보 중 하나 이상을 포함할 수 있다.According to one embodiment, the singing voice detection module 1020 receives an audio signal from an audio receiving circuit (e.g., the audio receiving circuits 581, 582, and 583 in FIG. 7), and receives an audio signal from a preprocessing module (e.g., the preprocessing module in FIG. 7). (610)). The singing voice detection module 1020 can detect information about the singing voice among the ambient sounds included in the audio signal based on the characteristics of the singing voice. For example, unlike general voices, singing voices are characterized by a long fixed pitch and a short pause period. Pitch refers to the pitch of the sound, and pause refers to the section in which the voice is not played. The singing voice detection module 1020 can detect information about the singing voice through signal processing-based pitch/melody estimation or learning-based various deep learning classifiers, based on the characteristics of the singing voice. Information about whether a specific section (e.g., frame) of the ambient sound or reference signal is a singing voice, information of the detected signal (e.g., acoustic information), and information about whether a specific section (e.g., frame) of the ambient sound or reference signal is a singing voice. It may include one or more probability information about the degree of proximity to .

일 실시예에 따르면, 가창 음성 검출 모듈(1020)은 가창 음성을 검출하기 위하여 주변 소리의 주요부 정보를 더 활용할 수 있다. 주변 소리의 주요부 정보는 주요부 추출 모듈(1010)로부터 수신한 메인 멜로디 또는 보컬에 관한 것일 수 있다.According to one embodiment, the singing voice detection module 1020 may further utilize main part information of surrounding sounds to detect the singing voice. The main part information of the surrounding sound may be about the main melody or vocal received from the main part extraction module 1010.

일 실시예에 따르면, 가창 음성 검출 모듈(1020)은 제1 민감도 레벨 이상의 민감도 레벨에 따른 활성화 조건을 판단하는 경우에 활성화될 수 있다. 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))는 가창 음성 검출 모듈(1020)을 이용하여 제1 민감도 레벨에 따른 활성화 조건의 만족 여부를 판단할 수 있다.According to one embodiment, the singing voice detection module 1020 may be activated when determining an activation condition according to a sensitivity level higher than the first sensitivity level. A wireless audio device (eg, the wireless audio device 302 of FIG. 3) may use the singing voice detection module 1020 to determine whether the activation condition according to the first sensitivity level is satisfied.

일 실시예에 따르면, 주요부 추출 모듈(1010)은 오디오 수신 회로(예: 도 7의 오디오 수신 회로(581, 582, 583)로부터 오디오 신호를 수신하고, 전처리 모듈(예: 도 7의 전처리 모듈(610))로부터 전처리된 오디오 신호를 수신할 수 있다. 주요부 추출 모듈(1010)은 오디오 신호에 포함된 주변 소리에 대한 주요부 신호와, 오디오 신호에 포함된 미디어에 대응하는 참조 신호에 대한 주요부 신호를 추출할 수 있다. 주요부 추출 모듈(1010)은 미디어 정보에 기초하여 주요부 신호로서 메인 멜로디 및 보컬 중 어느 하나를 추출할 수 있다. 미디어 정보는 미디어에 가사가 포함되었는지 여부에 관한 것일 수 있다. 미디어 정보는 전자 장치(예: 도 3의 전자 장치(301))로부터 획득한 것일 수 있다.According to one embodiment, the main part extraction module 1010 receives an audio signal from an audio receiving circuit (e.g., the audio receiving circuits 581, 582, and 583 in FIG. 7) and receives an audio signal from a pre-processing module (e.g., the pre-processing module in FIG. 7). A pre-processed audio signal can be received from 610). The main part extraction module 1010 extracts a main part signal for the surrounding sound included in the audio signal and a main part signal for the reference signal corresponding to the media included in the audio signal. The main part extraction module 1010 can extract either the main melody or the vocal as a main part signal based on media information. The media information may be related to whether the media includes lyrics. Media The information may be obtained from an electronic device (eg, the electronic device 301 of FIG. 3).

일 실시예에 따르면, 주요부 추출 모듈(1010)은 멜로디/보컬 모델(1050)을 이용하여 주변 소리에 대한 주요부 신호와 참조 신호에 대한 주요부 신호를 추출할 수 있다. 주요부 추출 모듈(1010)은 미디어 정보에 따라 미디어가 가사를 포함하지 않는 경우, 멜로디/보컬 모델(1050) 중 멜로디 모델을 이용하여 주요부 신호를 추출할 수 있다. 주요부 추출 모듈(1010)은 미디어 정보에 따라 미디어가 가사를 포함하는 경우, 멜로디/보컬 모델(1050) 중 보컬 모델을 이용하여 주요부 신호를 추출할 수 있다.According to one embodiment, the main part extraction module 1010 can extract the main part signal for the surrounding sound and the main part signal for the reference signal using the melody/vocal model 1050. If the media does not include lyrics according to media information, the main part extraction module 1010 may extract the main part signal using a melody model among the melody/vocal models 1050. If the media includes lyrics according to media information, the main part extraction module 1010 may extract the main part signal using the vocal model among the melody/vocal models 1050.

일 실시예에 따르면, 멜로디/보컬 모델(1050) 중 멜로디 모델은 입력을 가사가 없는 미디어(예: 연주곡) 또는 해당 미디어의 특징으로 하고, 목표 출력을 해당 미디어의 메인 멜로디로 하여 학습될 수 있다. 멜로디/보컬 모델(1050) 중 보컬 모델은 입력을 가사가 있는 미디어 또는 해당 미디어의 특징으로 하고, 목표 출력을 해당 미디어의 메인 보컬로 하여 학습될 수 있다.According to one embodiment, the melody model among the melody/vocal models 1050 may be learned with the input being a media without lyrics (e.g., a performance song) or a feature of the media, and the target output being the main melody of the media. . Among the melody/vocal models 1050, the vocal model can be learned with the input being media with lyrics or features of the media, and the target output being the main vocal of the media.

일 실시예에 따르면, 계산 모듈(1030)은 주요부 신호 및 가창 음성에 기초하여 미디어 및 가창 음성 간의 음향학적 유사도(acoustic similarity)를 계산할 수 있다. 주요부 신호는 참조 신호의 주요부 신호와, 가창 음성의 주요부 신호를 포함할 수 있다. vpu(voice pickup unit)로부터 획득한 주변 소리에서 검출한 가창 음성에 대하여, 계산 모듈(1030)은 vpu(voice pickup unit) 신호의 낮은 frequency resolution을 보완하기 위해 가창 음성에 bandwidth extension을 적용한 이후 음향학적 유사도를 계산하거나 vpu 신호 대역에 대응하는 가창 음성에 대해서만 음향학적 유사도를 계산할 수 있다.According to one embodiment, the calculation module 1030 may calculate acoustic similarity between the media and the singing voice based on the main signal and the singing voice. The main part signal may include the main part signal of the reference signal and the main part signal of the singing voice. For the singing voice detected from the surrounding sound obtained from the vpu (voice pickup unit), the calculation module 1030 applies bandwidth extension to the singing voice to compensate for the low frequency resolution of the vpu (voice pickup unit) signal, and then acoustically Similarity can be calculated or acoustic similarity can be calculated only for singing voices corresponding to the vpu signal band.

일 실시예에 따르면, 계산 모듈(1030)은 멜로디의 특징(예: 옥타브, 음 높낮이, duration 등), 또는 보컬의 특징(예: pitch, prosody 등)에 기초하여 음향학적 유사도를 계산할 수 있다. 계산 모듈(1030)은 사용자가 정확하게 노래를 부르지 않는 경우를 고려하여 멜로디의 특징 및 보컬의 특징의 variation을 반영하여 음향학적 유사도를 계산할 수 있다. 예를 들어, 계산 모듈(1030)은 멜로디의 특징 및 보컬의 특징의 dynamic margin을 반영하여 음향학적 유사도를 계산할 수 있다. dynamic margin은 멜로디의 특징 및 보컬의 특징의 variation이 발생하는 범위를 의미할 수 있다.According to one embodiment, the calculation module 1030 may calculate acoustic similarity based on melody characteristics (e.g., octave, pitch, duration, etc.) or vocal characteristics (e.g., pitch, prosody, etc.). The calculation module 1030 may calculate acoustic similarity by reflecting variations in melody characteristics and vocal characteristics, considering the case where the user does not sing accurately. For example, the calculation module 1030 may calculate acoustic similarity by reflecting the dynamic margin of the melody characteristics and vocal characteristics. Dynamic margin may refer to the range in which variations in melody characteristics and vocal characteristics occur.

일 실시예에 따르면, 계산 모듈(1030)은 HMM(hidden markov model)이나 딥러닝, 템플릿 등을 통하여 추출된 주요부 신호 간에 패턴 매칭을 수행함으로써 주요부 신호 간의 유사도를 계산할 수도 있다. 또한 계산 모듈(1030)은 주요부 신호에서 멜로디 또는 보컬을 옥타브(예: CDCCDEF)로 1차 변환하고 텍스트 패턴으로 2차 변환함으로써 텍스트 패턴을 획득할 수 있다. 계산 모듈(1030)은 텍스트 패턴을 비교하여 유사도를 계산할 수 있다.According to one embodiment, the calculation module 1030 may calculate the similarity between main part signals by performing pattern matching between main part signals extracted through a hidden markov model (HMM), deep learning, template, etc. Additionally, the calculation module 1030 may obtain a text pattern by first converting the melody or vocal from the main signal into an octave (e.g., CDCCDEF) and secondarily converting it into a text pattern. The calculation module 1030 may calculate similarity by comparing text patterns.

일 실시예에 따르면, 유사도 판단 모듈(670)은 유사도를 가창 모드 모듈(657)에 출력하여, 가창 모드 모듈(예: 도 6, 도 7의 가창 모드 모듈(627))이 유사도가 미리 정한 기준치(threshold)를 넘는 경우, 가창 모드의 활성화를 결정하도록 할 수 있다. 해당 조건은 제2 민감도 레벨에 따른 활성화 조건에 대응하는 것일 수 있다. 유사도는 완전 일치를 1, 불일치를 0으로 하여 0과 1 사이의 score로 계산될 수 있다.According to one embodiment, the similarity determination module 670 outputs the similarity to the singing mode module 657, so that the singing mode module (e.g., the singing mode module 627 of FIGS. 6 and 7) determines that the similarity is a predetermined standard value. If it exceeds the (threshold), activation of the singing mode can be determined. The condition may correspond to an activation condition according to the second sensitivity level. Similarity can be calculated as a score between 0 and 1, with 1 being a perfect match and 0 being a mismatch.

일 실시예에 따르면, 계산 모듈(1030) 및 가중치 모듈(1070)은 제2 민감도 레벨 이상의 민감도 레벨에 따른 활성화 조건을 판단하는 경우에 활성화될 수 있다. 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))는 계산 모듈(1030)을 이용하여 제2 민감도 레벨에 따른 활성화 조건의 만족 여부를 판단할 수 있다.According to one embodiment, the calculation module 1030 and the weight module 1070 may be activated when an activation condition according to a sensitivity level equal to or higher than the second sensitivity level is determined. The wireless audio device (e.g., the wireless audio device 302 in FIG. 3) may use the calculation module 1030 to determine whether the activation condition according to the second sensitivity level is satisfied.

일 실시예에 따르면, 가사 인식 모듈(1040)은 가사 모델(예: ASR for lyrics 모델)을 이용하여 주요부 신호에 포함된 가사를 인식할 수 있다. 예를 들어, 가사 인식 모듈(1040)은 주요부 신호 간 가사의 길이의 유사도, 가사의 내용의 유사도를 WER(word error rate)와 같은 방식을 통하여 계산할 수 있다.According to one embodiment, the lyrics recognition module 1040 may recognize lyrics included in the main signal using a lyrics model (eg, ASR for lyrics model). For example, the lyrics recognition module 1040 may calculate the similarity of the length of the lyrics and the similarity of the contents of the lyrics between main signals through a method such as word error rate (WER).

가사 인식 모듈(1040)은 가사의 길이의 유사도, 가사의 내용의 유사도에 기초하여 유사도를 계산함으로써, 사용자가 가사의 일부를 다른 단어로 부르거나 누락하는 경우에도 사용자의 가창하고 있음을 알 수 있다. 가사 인식 모듈(1040)은 WER의 값 혹은 가사 길이에 대한 유사도를 0에서 1사이로 정규화한 값을 출력할 수 있다.The lyrics recognition module 1040 calculates the similarity based on the similarity of the length of the lyrics and the similarity of the content of the lyrics, so that it can be known that the user is singing even if the user sings a part of the lyrics with different words or omits it. . The lyrics recognition module 1040 can output a WER value or a normalized value of similarity to lyric length between 0 and 1.

일 실시예에 따르면, 가사 중 특정 단어의 각 음절을 길게 발성하는 경우(예: "그대 기억이이이이") 음절의 insertion이 빈번해지므로, 중복되는 음절의 insertion을 제거한 형태(예: "그대 기억이")로 주요부 신호를 변형한 이후 주요부 신호 간 가사의 길이의 유사도, 가사의 내용의 유사도를 계산할 수 있다.According to one embodiment, when each syllable of a specific word in lyrics is uttered for a long time (e.g., "I remember you"), insertion of syllables becomes frequent, so insertion of overlapping syllables is removed (e.g., "I remember you"). After transforming the main part signal into "), the similarity in the length of the lyrics and the similarity in the content of the lyrics between the main part signals can be calculated.

일 실시예에 따르면, 가중치 모듈(1070)은 계산 모듈(1030)로부터 미디어 및 가창 음성 간의 음향학적 유사도(acoustic similarity)를 수신할 수 있다. 음향학적 유사도는 vpu로부터 획득한 주변 소리에서 검출한 가창 음성과 참조 신호 간의 유사도와, 마이크로부터 획득한 주변 소리에서 검출한 가창 음성과 참조 신호 간의 유사도를 포함할 수 있다. 가중치 모듈(1070)은 유사도 값 간에 가중치를 부여하여 최종 유사도 값을 조절할 수 있다. 예를 들어, 주변 환경에 소음이 많아 마이크로부터 획득한 주변 소리에 잡음이 많다고 판단되는 경우, vpu로부터 획득한 주변 소리에서 검출한 가창 음성과 참조 신호 간의 유사도에 상대적으로 더 큰 가중치를 적용할 수 있다.According to one embodiment, the weight module 1070 may receive acoustic similarity between the media and the singing voice from the calculation module 1030. The acoustic similarity may include the similarity between the singing voice detected from the ambient sound obtained from the vpu and the reference signal, and the similarity between the singing voice detected from the ambient sound obtained from the microphone and the reference signal. The weighting module 1070 may adjust the final similarity value by assigning weights between similarity values. For example, if it is determined that there is a lot of noise in the surrounding environment and the surrounding sound acquired from the microphone has a lot of noise, a relatively larger weight can be applied to the similarity between the singing voice detected from the surrounding sound obtained from the vpu and the reference signal. there is.

일 실시예에 따르면, 가중치 모듈(1070)은 가사 인식 모듈(1040)로부터 주요부 신호 간의 가사의 유사도를 수신할 수 있다. 가중치 모듈(1070)은 가창 음성의 검출 구간 길이, 주요부 신호 간 유사도, 주요부 신호의 가사 인식률 및 인식 길이 등에 가중치를 부여하여 최종 유사도를 계산할 수 있다. 가중치 모듈(1070)은 최종 유사도를 가창 모드 모듈(627)에 전송할 수 있다. 가창 모드 모듈(627)은 제2 민감도 레벨에 따른 활성화 조건 및 제3 민감도 레벨에 따른 활성화 조건의 만족 여부를 판단하기 위하여 최종 유사도를 이용할 수 있다.According to one embodiment, the weight module 1070 may receive the similarity of lyrics between main signals from the lyrics recognition module 1040. The weighting module 1070 may calculate the final similarity by assigning weights to the length of the detection section of the singing voice, the similarity between the main part signals, the lyric recognition rate and recognition length of the main part signals, etc. The weight module 1070 may transmit the final similarity to the singing mode module 627. The singing mode module 627 may use the final similarity to determine whether the activation condition according to the second sensitivity level and the activation condition according to the third sensitivity level are satisfied.

일 실시예에 따르면, 가사 인식 모듈(1040)은 제3 민감도 레벨에 따른 활성화 조건을 판단하는 경우에 활성화될 수 있다. 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))는 가사 인식 모듈(1040)을 이용하여 제3 민감도 레벨에 따른 활성화 조건의 만족 여부를 판단할 수 있다.According to one embodiment, the lyrics recognition module 1040 may be activated when an activation condition according to the third sensitivity level is determined. A wireless audio device (e.g., the wireless audio device 302 in FIG. 3) may use the lyrics recognition module 1040 to determine whether the activation condition according to the third sensitivity level is satisfied.

도 11은 일 실시예에 따른 가창 모드 모듈(627)의 개략도이다.Figure 11 is a schematic diagram of a singing mode module 627 according to one embodiment.

도 11을 참조하면, 일 실시예에 따르면, 가창 모드 모듈(627)은 가창 모드 활성화 모듈(1110), 이득 계산 모듈(1130), 및 가이드 생성 모듈(1140)을 포함할 수 있다. 가창 모드 모듈(627)은 구성 요소에 기초하여 가창 모드의 활성화를 결정하고, 가창 모드에서 출력 신호의 제어를 수행하기 위한 이득을 계산할 수 있다. 가창 모드 모듈(627)은 가창 모드에서 사용자의 음악 감상 경험을 최적화하기 위한 가이드를 생성할 수 있다.Referring to FIG. 11, according to one embodiment, the singing mode module 627 may include a singing mode activation module 1110, a gain calculation module 1130, and a guide generation module 1140. The singing mode module 627 may determine activation of the singing mode based on the components and calculate a gain for controlling the output signal in the singing mode. The singing mode module 627 can create a guide to optimize the user's music listening experience in singing mode.

일 실시예에 따르면, 가창 모드 활성화 모듈(1110)은 전자 장치(301)의 민감도 레벨에 따른 가창 모드의 활성화 조건을 만족하는지 여부를 판단할 수 있다. 이득 계산 모듈(1130)은 가창 모드 활성화 모듈(1110)이 활성화 조건을 만족하는 것으로 판단한 경우, 무선 오디오 장치(예: 도 3의 무선 오디오 장치(302))에 감지된 오디오 신호 중 주변 소리에 포함된 외부 노이즈와, 가창 음성의 세기를 비교할 수 있다. 이득 계산 모듈(1130)은 비교 결과에 기초하여 오디오 신호에 포함된 가창 음성 및 미디어의 적정 볼륨을 계산할 수 있다. 예를 들어, 미디어의 적정 볼륨은 사용자가 미디어를 들을 수 있는 한도 내에서 최소한의 볼륨이고, 가창 음성의 적정 볼륨은 사용자가 미디어와 함께 모니터링이 가능한 정도의 볼륨일 수 있다. 이득 계산 모듈(1130)은 사용자가 사전에 설정한 가창 모드에 대한 볼륨을 반영할 수 있다. 이득 계산 모듈(1130)은 미디어 및 가창 음성 각각에 대한 적정 볼륨을 가창 모드 제어 모듈(예: 도 6, 도 7의 가창 모드 제어 모듈(657))에 전송할 수 있다.According to one embodiment, the singing mode activation module 1110 may determine whether the singing mode activation condition according to the sensitivity level of the electronic device 301 is satisfied. When the gain calculation module 1130 determines that the singing mode activation module 1110 satisfies the activation conditions, the audio signal detected by the wireless audio device (e.g., the wireless audio device 302 in FIG. 3) is included in the ambient sound. You can compare the intensity of the singing voice with the external noise. The gain calculation module 1130 may calculate the appropriate volume of the singing voice and media included in the audio signal based on the comparison result. For example, the appropriate volume of media may be the minimum volume within which a user can hear the media, and the appropriate volume of a singing voice may be a volume that allows a user to monitor along with the media. The gain calculation module 1130 may reflect the volume for the singing mode set in advance by the user. The gain calculation module 1130 may transmit the appropriate volume for each of the media and singing voice to the singing mode control module (e.g., the singing mode control module 657 in FIGS. 6 and 7).

일 실시예에 따르면, 가이드 생성 모듈(1140)은 가창 모드에서 사용자의 음악 감상 경험을 최적화할 수 있는 가이드를 생성하여 사용자에게 제공할 수 있다. 예를 들어, 가이드 생성 모듈(1140)은 사용자가 노래 가이드 제공을 선택했거나, 가창 음성과 미디어 간의 유사도가 낮은 경우, 미디어에 관한 가이드 정보를 사용자에게 제공할 수 있다. 미디어에 관한 가이드 정보는 미디어(예: 노래)을 따라 부를 수 있는 메인 멜로디 정보, 박자, 또는 노래의 다음 소절에 재생될 가사를 포함할 수 있다. 미디어에 관한 가이드 정보는 TTS generation을 통한 작은 소리의 오디오가 무선 오디오 장치(302)를 통해 출력되거나, 시각적 정보로서 전자 장치(301)의 화면을 통해 디스플레이될 수 있다. According to one embodiment, the guide creation module 1140 may generate a guide that optimizes the user's music listening experience in singing mode and provide the guide to the user. For example, the guide creation module 1140 may provide guide information about the media to the user when the user selects to provide a song guide or when the similarity between the singing voice and the media is low. Guide information about the media may include main melody information that can be sung along with the media (e.g., a song), a beat, or lyrics to be played in the next verse of the song. Guide information about media may be output as low-pitched audio through TTS generation through the wireless audio device 302, or may be displayed as visual information on the screen of the electronic device 301.

일 실시예에 따르면, 가창 모드 모듈(627)의 동작(예: 가창 모드의 활성화/비활성화. 및 가이드 제공)은 음성 에이전트 모듈(630)를 통해 수행될 수 있다.According to one embodiment, the operation of the singing mode module 627 (eg, activating/deactivating the singing mode and providing a guide) may be performed through the voice agent module 630.

일 실시예에 따르면, 하나의 전자 장치(301)에 복수 개의 무선 오디오 장치(302)가 연결되어 있거나, music sharing 등으로 복수 개의 무선 오디오 장치(302)가 서로의 재생되는 음악을 공유하고 있는 경우에도 가창 모드를 활성화시킬 수 있다. 이 경우, 복수 개의 무선 오디오 장치(302)의 사용자들은 노래를 들으면서 동시에 서로의 가창 음성을 모니터링할 수 있다.According to one embodiment, when a plurality of wireless audio devices 302 are connected to one electronic device 301 or when a plurality of wireless audio devices 302 share each other's playing music through music sharing, etc. You can also activate singing mode. In this case, users of a plurality of wireless audio devices 302 can simultaneously monitor each other's singing voices while listening to a song.

도 12a 및 도 12b는 일 실시예에 따른 전자 장치의 디스플레이에 출력된 화면의 일 예이다.12A and 12B are examples of screens output on a display of an electronic device according to an embodiment.

도 12a 및 도 12b을 참조하면, 일 실시예에 따르면, 전자 장치(301)는 무선 오디오 장치(예: 도 3의 오디오 장치(302))의 가창 모드 설정을 위한 사용자 인터페이스를 전자 장치(301)의 실행 화면 상에 디스플레이할 수 있다. 예를 들어, 사용자는 인터페이스 상의 가창 모드(singing mode)의 설정(1200)을 ON으로 둠으로써, 도 9에서 전술한 모드 결정 페이즈에 진입할 수 있다. 또한, 사용자 인터페이스는 가창 모드(singing mode)가 ON인 경우 활성화되는 정확도 레벨에 관한 설정(1210)을 포함할 수 있다. 인터페이스는 정확도 레벨에 관한 설정(1210)의 세부 항목으로 복수의 민감도 레벨에 관한 설정을 포함할 수 있다. 예를 들어, 복수의 민감도 레벨에 관한 설정은 제1 민감도 레벨(1220), 제2 민감도 레벨(1230), 및 제3 민감도 레벨(1240)에 관한 설정을 포함할 수 있다.Referring to FIGS. 12A and 12B, according to one embodiment, the electronic device 301 includes a user interface for setting a singing mode of a wireless audio device (e.g., the audio device 302 of FIG. 3). It can be displayed on the execution screen of . For example, the user may enter the mode determination phase described above in FIG. 9 by turning the singing mode setting 1200 on the interface to ON. Additionally, the user interface may include a setting 1210 regarding the accuracy level that is activated when the singing mode is ON. The interface may include settings for a plurality of sensitivity levels as detailed items of the accuracy level settings (1210). For example, settings regarding a plurality of sensitivity levels may include settings regarding a first sensitivity level 1220, a second sensitivity level 1230, and a third sensitivity level 1240.

일 실시예에 따르면, 사용자가 민감도 레벨에 관한 설정을 변경하지 않은 경우, 민감도 레벨은 디폴트(default)로 제1 민감도 레벨로 설정되어 있을 수 있다.According to one embodiment, if the user does not change the settings regarding the sensitivity level, the sensitivity level may be set to the first sensitivity level by default.

상기 결정하는 동작은 상기 무선 오디오 장치(102;202;302)에 연결된 전자 장치(101;201;301)에서의 미디어 재생 여부 및 상기 전자 장치(101;201;301)에 연관된 정보 중 하나 이상에 기초하여 상기 동작 모드의 상기 가창 모드 및 상기 대화 모드 중 어느 하나로의 변경을 결정하기 위한 제1 모드 변경 페이즈(phase), 및 상기 대화 모드로의 변경을 결정하기 위한 제2 모드 변경 페이즈(phase) 중 어느 하나로 진입하는 동작을 포함할 수 있다.The determining operation is based on one or more of whether or not to play media on an electronic device (101;201;301) connected to the wireless audio device (102;202;302) and information associated with the electronic device (101;201;301). A first mode change phase for determining a change of the operation mode to any one of the singing mode and the conversation mode based on, and a second mode change phase for determining a change to the conversation mode It may include an operation to enter any one of the following.

상기 전자 장치(101;201;301)에 연관된 정보는 상기 전자 장치(101;201;301)의 환경 정보, 상기 전자 장치(101;201;301)의 위치 정보, 및 상기 전자 장치(101;201;301)의 주변에 있는 장치에 관한 정보 중 하나 이상을 포함할 수 있다.Information associated with the electronic device (101;201;301) includes environmental information of the electronic device (101;201;301), location information of the electronic device (101;201;301), and information related to the electronic device (101;201;301). ;301) may contain one or more pieces of information about devices in the vicinity.

상기 결정하는 동작은 상기 제1 모드 변경 페이즈에서, 상기 분석 결과가 상기 가창 모드의 활성화 조건을 만족하는지에 기초하여 상기 동작 모드를 상기 어느 하나로 결정하는 동작을 포함할 수 있다.The determining operation may include determining the operation mode to be one of the above based on whether the analysis result satisfies the activation condition of the singing mode in the first mode change phase.

상기 가창 모드의 활성화 조건은 제1 민감도 레벨, 제2 민감도 레벨, 및 제3 민감도 레벨 중 상기 전자 장치(101;201;301)의 민감도 레벨에 따라 구분되는 것일 수 있다.The activation conditions of the singing mode may be classified according to the sensitivity level of the electronic device (101; 201; 301) among the first sensitivity level, second sensitivity level, and third sensitivity level.

상기 제1 민감도 레벨에 따른 활성화 조건은 상기 주변 소리 중 가창 음성이 미리 정한 시간 동안 연속하여 검출되는지에 관한 조건을 포함할 수 있다.The activation condition according to the first sensitivity level may include a condition regarding whether a singing voice among the surrounding sounds is continuously detected for a predetermined time.

상기 제2 민감도 레벨에 따른 활성화 조건은 상기 주변 소리에 포함된 가창 음성과, 상기 미디어의 음향학적 유사도에 관한 조건을 포함할 수 있다.Activation conditions according to the second sensitivity level may include conditions regarding acoustic similarity between the singing voice included in the ambient sound and the media.

상기 제3 민감도 레벨에 따른 활성화 조건은 상기 주변 소리에 포함된 가창 음성 및 상기 미디어 각각에 포함된 가사(lyrics) 간의 유사도에 관한 조건을 포함할 수 있다.The activation condition according to the third sensitivity level may include a condition regarding the similarity between the singing voice included in the ambient sound and the lyrics included in each of the media.

상기 제어하는 동작은 상기 대화 모드에서 상기 주변 소리의 적어도 일부의 볼륨을 제1 이득으로 변경하여 출력하고, 상기 가창 모드에서 상기 주변 소리의 적어도 일부의 볼륨을 제2 이득으로 변경하여 출력하는 동작을 포함할 수 있다.The controlling operation includes changing the volume of at least some of the ambient sounds to a first gain and outputting them in the conversation mode, and changing the volume of at least some of the ambient sounds to a second gain in the singing mode to output them. It can be included.

상기 가창 모드의 활성화 조건은 상기 전자 장치(101;201;301)의 민감도 레벨 이하의 모든 레벨에 따른 활성화 조건을 포함할 수 있다.The activation conditions of the singing mode may include activation conditions according to all levels below the sensitivity level of the electronic device (101; 201; 301).

상기 복수의 동작들은 상기 가창 모드에서, 상기 가창 모드의 활성화 조건을 만족하지 않을 경우 상기 가창 모드를 비활성화하는 동작을 더 포함할 수 있다.The plurality of operations may further include an operation of deactivating the singing mode when the activation conditions of the singing mode are not satisfied in the singing mode.

상기 복수의 동작들은 상기 가창 모드에서 상기 주변 소리에 포함된 가창 음성을 트래킹하여 상기 가창 음성에 관한 정보를 제공하는 동작을 더 포함할 수 있다. The plurality of operations may further include an operation of providing information about the singing voice by tracking the singing voice included in the ambient sound in the singing mode.

일 실시예에 따른 무선 오디오 장치(102;202;302)는 인스트럭션들을 포함하는 메모리(141;531;532)와, 상기 메모리(141;531;532)와 전기적으로 연결되고, 상기 인스트럭션들을 실행하기 위한 프로세서(131;521;522)를 포함할 수 있다. 상기 프로세서(131;521;522)에 의해 상기 인스트럭션들이 실행될 때, 상기 프로세서(131;521;522)는 복수의 동작들을 수행할 수 있다. 상기 복수의 동작들은 오디오 신호를 감지하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 오디오 신호의 분석 결과에 기초하여 상기 오디오 신호에 대한 상기 무선 오디오 장치(102;202;302)의 동작 모드를 가창 모드 또는 대화 모드 중 어느 하나로 결정하는 동작을 포함할 수 있다. 상기 복수의 동작들은 결정된 모드가 상기 대화 모드일 경우, 상기 오디오 신호에 포함된 주변 소리(ambient sound)의 적어도 일부를 출력하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 결정된 모드가 상기 가창 모드일 경우, 상기 오디오 신호에 포함된 주변 소리 및 미디어의 적어도 일부를 출력하는 동작을 포함할 수 있다. 상기 복수의 동작들은 상기 가창 모드에서, 상기 주변 소리에서 가창 음성이 미리 정한 시간 이상 검출되지 않는 경우 상기 가창 모드를 비활성화하는 동작을 포함할 수 있다.A wireless audio device (102;202;302) according to an embodiment includes a memory (141;531;532) including instructions, is electrically connected to the memory (141;531;532), and executes the instructions. It may include processors (131; 521; 522) for. When the instructions are executed by the processor (131;521;522), the processor (131;521;522) may perform a plurality of operations. The plurality of operations may include detecting an audio signal. The plurality of operations may include determining an operation mode of the wireless audio device (102; 202; 302) for the audio signal as either a singing mode or a conversation mode based on an analysis result of the audio signal. . The plurality of operations may include outputting at least a portion of the ambient sound included in the audio signal when the determined mode is the conversation mode. The plurality of operations may include outputting at least a portion of ambient sounds and media included in the audio signal when the determined mode is the singing mode. The plurality of operations may include an operation of deactivating the singing mode when a singing voice is not detected in the ambient sound for more than a predetermined time in the singing mode.

상기 복수의 동작들은 상기 가창 모드에서 상기 주변 소리에 포함된 가창 음성을 트래킹하여 상기 가창 음성에 관한 정보를 제공하는 동작을 더 포함할 수 있다.The plurality of operations may further include an operation of providing information about the singing voice by tracking the singing voice included in the ambient sound in the singing mode.

본 문서에 개시된 일 실시예들에 따른 전자 장치는 다양한 형태의 장치가 될 수 있다. 전자 장치는, 예를 들면, 휴대용 통신 장치(예: 스마트폰), 컴퓨터 장치, 휴대용 멀티미디어 장치, 휴대용 의료 기기, 카메라, 웨어러블 장치, 또는 가전 장치를 포함할 수 있다. 본 문서의 실시예에 따른 전자 장치는 전술한 기기들에 한정되지 않는다.Electronic devices according to embodiments disclosed in this document may be of various types. Electronic devices may include, for example, portable communication devices (e.g., smartphones), computer devices, portable multimedia devices, portable medical devices, cameras, wearable devices, or home appliances. Electronic devices according to embodiments of this document are not limited to the above-described devices.

본 문서의 일 실시예들 및 이에 사용된 용어들은 본 문서에 기재된 기술적 특징들을 특정한 실시예들로 한정하려는 것이 아니며, 해당 실시예의 다양한 변경, 균등물, 또는 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 또는 관련된 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다. 아이템에 대응하는 명사의 단수 형은 관련된 문맥상 명백하게 다르게 지시하지 않는 한, 상기 아이템 한 개 또는 복수 개를 포함할 수 있다. 본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다. "제 1", "제 2", 또는 "첫째" 또는 "둘째"와 같은 용어들은 단순히 해당 구성요소를 다른 해당 구성요소와 구분하기 위해 사용될 수 있으며, 해당 구성요소들을 다른 측면(예: 중요성 또는 순서)에서 한정하지 않는다. 어떤(예: 제1) 구성요소가 다른(예: 제 2) 구성요소에, "기능적으로" 또는 "통신적으로"라는 용어와 함께 또는 이런 용어 없이, "커플드" 또는 "커넥티드"라고 언급된 경우, 그것은 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로(예: 유선으로), 무선으로, 또는 제 3 구성요소를 통하여 연결될 수 있다는 것을 의미한다.The embodiments of this document and the terms used herein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various changes, equivalents, or replacements of the embodiments. In connection with the description of the drawings, similar reference numbers may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of the above items, unless the relevant context clearly indicates otherwise. As used herein, “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof. Terms such as "first", "second", or "first" or "second" may be used simply to distinguish one component from another, and to refer to that component in other respects (e.g., importance or order) is not limited. One (e.g., first) component is said to be “coupled” or “connected” to another (e.g., second) component, with or without the terms “functionally” or “communicatively.” When mentioned, it means that any of the components can be connected to the other components directly (e.g. wired), wirelessly, or through a third component.

본 문서의 일 실시예들에서 사용된 용어 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구현된 유닛을 포함할 수 있으며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로와 같은 용어와 상호 호환적으로 사용될 수 있다. 모듈은, 일체로 구성된 부품 또는 하나 또는 그 이상의 기능을 수행하는, 상기 부품의 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 일실시예에 따르면, 모듈은 ASIC(application-specific integrated circuit)의 형태로 구현될 수 있다.The term "module" used in embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as logic, logic block, component, or circuit, for example. It can be used as A module may be an integrated part or a minimum unit of the parts or a part thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).

본 문서의 일 실시예들은 기기(machine)(예: 전자 장치(101)) 의해 읽을 수 있는 저장 매체(storage medium)(예: 내장 메모리(136) 또는 외장 메모리(138))에 저장된 하나 이상의 명령어들을 포함하는 소프트웨어(예: 프로그램(140))로서 구현될 수 있다. 예를 들면, 기기(예: 전자 장치(101))의 프로세서(예: 프로세서(120))는, 저장 매체로부터 저장된 하나 이상의 명령어들 중 적어도 하나의 명령을 호출하고, 그것을 실행할 수 있다. 이것은 기기가 상기 호출된 적어도 하나의 명령어에 따라 적어도 하나의 기능을 수행하도록 운영되는 것을 가능하게 한다. 상기 하나 이상의 명령어들은 컴파일러에 의해 생성된 코드 또는 인터프리터에 의해 실행될 수 있는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장 매체는, 비일시적(non-transitory) 저장 매체의 형태로 제공될 수 있다. 여기서, ‘비일시적’은 저장 매체가 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장 매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다.One embodiment of the present document is one or more instructions stored in a storage medium (e.g., built-in memory 136 or external memory 138) that can be read by a machine (e.g., electronic device 101). It may be implemented as software (e.g., program 140) including these. For example, a processor (e.g., processor 120) of a device (e.g., electronic device 101) may call at least one command among one or more commands stored from a storage medium and execute it. This allows the device to be operated to perform at least one function according to the at least one instruction called. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' only means that the storage medium is a tangible device and does not contain signals (e.g. electromagnetic waves), and this term refers to cases where data is semi-permanently stored in the storage medium. There is no distinction between temporary storage cases.

일실시예에 따르면, 본 문서에 개시된 일 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory(CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 또는 두 개의 사용자 장치들(예: 스마트 폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, the method according to the embodiments disclosed in this document may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. The computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play StoreTM) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online. In the case of online distribution, at least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

일 실시예들에 따르면, 상기 기술한 구성요소들의 각각의 구성요소(예: 모듈 또는 프로그램)는 단수 또는 복수의 개체를 포함할 수 있으며, 복수의 개체 중 일부는 다른 구성요소에 분리 배치될 수도 있다. 일 실시예들에 따르면, 전술한 해당 구성요소들 중 하나 이상의 구성요소들 또는 동작들이 생략되거나, 또는 하나 이상의 다른 구성요소들 또는 동작들이 추가될 수 있다. 대체적으로 또는 추가적으로, 복수의 구성요소들(예: 모듈 또는 프로그램)은 하나의 구성요소로 통합될 수 있다. 이런 경우, 통합된 구성요소는 상기 복수의 구성요소들 각각의 구성요소의 하나 이상의 기능들을 상기 통합 이전에 상기 복수의 구성요소들 중 해당 구성요소에 의해 수행되는 것과 동일 또는 유사하게 수행할 수 있다. 일 실시예들에 따르면, 모듈, 프로그램 또는 다른 구성요소에 의해 수행되는 동작들은 순차적으로, 병렬적으로, 반복적으로, 또는 휴리스틱하게 실행되거나, 상기 동작들 중 하나 이상이 다른 순서로 실행되거나, 생략되거나, 또는 하나 이상의 다른 동작들이 추가될 수 있다.According to one embodiment, each component (e.g., module or program) of the above-described components may include a single or plural entity, and some of the plurality of entities may be separately placed in other components. there is. According to one embodiment, one or more of the above-described corresponding components or operations may be omitted, or one or more other components or operations may be added. Alternatively or additionally, multiple components (eg, modules or programs) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each component of the plurality of components in the same or similar manner as those performed by the corresponding component of the plurality of components prior to the integration. . According to embodiments, operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, or omitted. Alternatively, one or more other operations may be added.

301: 전자 장치
302: 무선 오디오 장치301: Electronic device
302: wireless audio device

Claims

In a wireless audio device (102;202;302),
memory containing instructions (141;531;532); and
A processor (131;521;522) electrically connected to the memory (141;531;532) and configured to execute the instructions.
Including,
When the instructions are executed by the processor (131;521;522), the processor (131;521;522) performs a plurality of operations,
The plurality of operations are,
An operation of detecting an audio signal;
Based on the analysis result of the audio signal, determining an operation mode of the wireless audio device (102; 202; 302) as one of a singing mode and a conversation mode; and
An operation of controlling the output signal of the wireless audio device (102; 202; 302) according to the determined mode.
Including,
The conversation mode is,
A mode that outputs at least part of the ambient sound included in the audio signal,
The singing mode is,
A wireless audio device (102; 202; 302) in a mode that outputs at least a portion of ambient sounds and media included in the audio signal.

According to paragraph 1,
The determining operation is,
Based on one or more of whether media is played on an electronic device (101;201;301) connected to the wireless audio device (102;202;302) and information associated with the electronic device (101;201;301), the operation Entering one of a first mode change phase for determining a change in mode to one of the singing mode and the conversation mode, and a second mode change phase for determining a change to the conversation mode. action
A wireless audio device (102;202;302), including.

According to any one of paragraphs 1 and 2,
Information related to the electronic device (101; 201; 301) is:
One or more of environmental information of the electronic device (101;201;301), location information of the electronic device (101;201;301), and information about devices in the vicinity of the electronic device (101;201;301) A wireless audio device (102;202;302), including.

According to any one of claims 1 to 3,
The determining operation is,
In the first mode change phase, determining the operation mode to be one of the above based on whether the analysis result satisfies the activation condition of the singing mode.
A wireless audio device (102;202;302), including.

According to any one of claims 1 to 4,
The activation conditions for the singing mode are,
A wireless audio device (102;202;302) that is classified according to the sensitivity level of the electronic device (101;201;301) among a first sensitivity level, a second sensitivity level, and a third sensitivity level.

According to any one of claims 1 to 5,
Activation conditions according to the first sensitivity level are:
Conditions regarding whether singing voices among the above ambient sounds are continuously detected for a specified period of time
A wireless audio device (102;202;302), including.

According to any one of claims 1 to 6,
Activation conditions according to the second sensitivity level are:
Conditions regarding the acoustic similarity of the singing voice included in the ambient sound and the media
A wireless audio device (102;202;302), including.

According to any one of claims 1 to 7,
Activation conditions according to the third sensitivity level are:
Conditions regarding the similarity between the singing voice included in the ambient sound and the lyrics included in each of the media
A wireless audio device (102;202;302), including.

According to any one of claims 1 to 8,
The controlling operation is,
An operation of changing the volume of at least a portion of the ambient sound to a first gain in the conversation mode and outputting the volume, and changing the volume of at least a portion of the ambient sound to a second gain in the singing mode and outputting the sound.
A wireless audio device (102;202;302), including.

According to any one of claims 1 to 9,
The activation conditions for the singing mode are,
A wireless audio device (102;202;302), comprising activation conditions according to all levels below the sensitivity level of the electronic device (101;201;301).

According to any one of claims 1 to 10,
The plurality of operations are,
In the singing mode, the operation of deactivating the singing mode when the activation conditions of the singing mode are not satisfied.
Further comprising a wireless audio device (102;202;302).

According to any one of claims 1 to 11,
The plurality of operations are,
An operation of providing information about the singing voice by tracking the singing voice included in the surrounding sound in the singing mode.
Further comprising a wireless audio device (102;202;302).

In a wireless audio device (102;202;302),
memory containing instructions (141;531;532); and
It is electrically connected to the memory (141; 531; 532) and includes a processor (131; 521; 522) for executing the instructions,
When the instructions are executed by the processor (131;521;522), the processor (131;521;522) performs a plurality of operations,
The plurality of operations are,
detecting an audio signal;
determining an operation mode for the audio signal of the wireless audio device (102; 202; 302) as a singing mode; and
An operation of controlling the output signal of the wireless audio device (102; 202; 302) according to the singing mode.
Including,
The singing mode is,
A wireless audio device (102; 202; 302) in a mode that outputs at least a portion of ambient sounds and media included in the audio signal.

In a wireless audio device (102;202;302),
memory containing instructions (141;531;532); and
It is electrically connected to the memory (141; 531; 532) and includes a processor (131; 521; 522) for executing the instructions,
When the instructions are executed by the processor (131;521;522), the processor (131;521;522) performs a plurality of operations,
The plurality of operations are,
detecting an audio signal;
An operation of determining an operation mode of the wireless audio device (102; 202; 302) for the audio signal as either a singing mode or a conversation mode based on an analysis result of the audio signal;
When the determined mode is the conversation mode, outputting at least a portion of an ambient sound included in the audio signal;
When the determined mode is the singing mode, outputting at least a portion of ambient sounds and media included in the audio signal; and
In the singing mode, an operation of deactivating the singing mode when a singing voice is not detected in the surrounding sound for more than a predetermined time
A wireless audio device (102;202;302), including.

According to clause 14,
The determining operation is,
The operation mode based on one or more of whether media is played on an electronic device (101;201;301) connected to the wireless audio device (102;202;302) and information associated with the electronic device (101;201;301) Entering any one of a first mode change phase for determining a change to any one of the singing mode and the conversation mode, and a second mode change phase for determining a change to the conversation mode. movement
A wireless audio device (102;202;302), including.

According to any one of claims 14 and 15,
Information related to the electronic device (101; 201; 301) is:
One or more of environmental information of the electronic device (101;201;301), location information of the electronic device (101;201;301), and information about devices in the vicinity of the electronic device (101;201;301) A wireless audio device (102;202;302), including.

According to any one of claims 14 to 16,
The determining operation is,
In the first mode change phase, determining the operation mode to be one of the above based on whether the analysis result satisfies the activation condition of the singing mode.
A wireless audio device (102;202;302), including.

According to any one of claims 14 to 17,
The activation conditions for the singing mode are,
A wireless audio device (102;202;302) that is classified according to the sensitivity level of the electronic device (101;201;301) among a first sensitivity level, a second sensitivity level, and a third sensitivity level.

According to any one of claims 14 to 18,
The controlling operation is,
An operation of changing the volume of at least some of the surrounding sounds to a first gain and outputting them in the conversation mode, and changing the volume of at least some of the surrounding sounds to a second gain in the singing mode and outputting them.
A wireless audio device (102;202;302), including.

According to any one of claims 14 to 19,
The plurality of operations are,
An operation of providing information about the singing voice by tracking the singing voice included in the surrounding sound in the singing mode.
Further comprising a wireless audio device (102;202;302).