WO2021132852A1

WO2021132852A1 - Audio data output method and electronic device supporting same

Info

Publication number: WO2021132852A1
Application number: PCT/KR2020/012910
Authority: WO
Inventors: 고성환; 김기훈; 박영현; 박의순; 박진우; 방경호; 송은정; 정문식; 조준영
Original assignee: 삼성전자 주식회사
Priority date: 2019-12-26
Filing date: 2020-09-24
Publication date: 2021-07-01
Also published as: KR20210083059A

Abstract

Disclosed is an electronic device comprising: multiple microphones; multiple speakers; a sensor; a memory; and a processor operatively connected to the multiple microphones, the multiple speakers, the sensor, and the memory, wherein the processor is configured to: receive a voice of a user through each of the multiple microphones; determine a position relation between the electronic device and the user on the basis of a difference in a reception time at which the voice of the user is received through each of the multiple microphones; determine the posture of the electronic device on the basis of sensor information measured through the sensor; and determine audio data output through the multiple speakers included in the electronic device on the basis of the determined position relation and the determined posture of the electronic device. Various other embodiments inferred from the present document are also possible.

Description

Audio data output method and electronic device supporting the same

Various embodiments of the present disclosure relate to a method of outputting audio data and an electronic device supporting the same.

An electronic device such as a smart phone may provide various functions. For example, the electronic device may receive a user's voice through a microphone and may provide a function of outputting voice data through a speaker. For example, during a call, the electronic device may transmit the user's voice received through the microphone to the external electronic device, and may output the other's voice through the speaker.

Existing electronic devices support only dual mono sound transmission and reception during a call. For example, even if the electronic device is equipped with a stereo speaker, the electronic device does not output stereo sound audio data during communication, but outputs dual mono audio data. Since most of the recently released electronic devices are equipped with stereo speakers, a function of outputting stereo audio data during a call is required.

Various embodiments of the present disclosure may provide an audio data output method for selecting and outputting audio data based on a positional relationship between an electronic device and a user, and an electronic device supporting the same.

An electronic device according to various embodiments of the present disclosure includes a plurality of microphones, a plurality of speakers, a sensor, a memory, and a processor operatively connected to the plurality of microphones, the plurality of speakers, the sensor, and the memory. including, wherein the processor receives the user's voice through each of the plurality of microphones, and based on a difference in reception time of the user's voice received through each of the plurality of microphones, the electronic device and determine the positional relationship between the users, determine the posture of the electronic device based on sensor information measured through the sensor, and determine the posture of the electronic device based on the determined positional relationship and the determined posture of the electronic device, the electronic device may be set to determine audio data output through the plurality of speakers included in the .

In addition, the method for outputting audio data of an electronic device according to various embodiments of the present disclosure includes an operation of receiving a user's voice through each of a plurality of microphones included in the electronic device, and each of the plurality of microphones An operation of determining a positional relationship between the electronic device and the user based on a difference in the reception time of the received user's voice, and the posture of the electronic device based on sensor information measured through a sensor included in the electronic device and determining the audio data to be output through the plurality of speakers included in the electronic device based on the determined positional relationship and the determined posture of the electronic device.

In addition, the electronic device according to various embodiments of the present disclosure is operatively configured with a plurality of microphones, a plurality of speakers, a camera, a memory, and the plurality of microphones, the plurality of speakers, the camera, and the memory. A connected processor, wherein the processor receives a user's voice through each of the plurality of microphones, obtains an image captured by the camera, and obtains a position value of an object corresponding to the user from the image and determining a positional relationship between the electronic device and the user based on a difference in reception time of the user's voice received through each of the plurality of microphones and a position value of the object, and based on the determined positional relationship Thus, it may be set to determine the audio data output through the plurality of speakers.

According to various embodiments of the present disclosure, high-quality audio sound may be provided to the user by selectively outputting audio data based on a positional relationship between the electronic device and the user.

In addition, various effects directly or indirectly identified through this document may be provided.

1 is a block diagram of an electronic device in a network environment according to various embodiments of the present disclosure;

2 is a block diagram of an electronic device related to output of audio data according to an embodiment of the present invention.

3 is a diagram illustrating a method of outputting audio data according to an embodiment of the present invention.

4 is a diagram illustrating a method of selectively outputting audio data based on a positional relationship between an electronic device and a user and a posture of the electronic device, according to an embodiment of the present invention.

5 is a diagram illustrating another method of selectively outputting audio data based on a positional relationship between an electronic device and a user and a posture of the electronic device, according to an embodiment of the present invention.

6 is a diagram illustrating another method of selectively outputting audio data based on a positional relationship between an electronic device and a user and a posture of the electronic device, according to an embodiment of the present invention.

7 is a diagram illustrating a method of selectively outputting audio data based on a positional relationship between an electronic device and a user, according to an embodiment of the present invention.

8 is a diagram illustrating another method of selectively outputting audio data based on a positional relationship between an electronic device and a user, according to an embodiment of the present invention.

9 is a view for explaining a preset area according to an arrangement position of a plurality of speakers, according to an embodiment of the present invention.

Hereinafter, various embodiments of the present invention will be described with reference to the accompanying drawings. For convenience of description, the sizes of the components shown in the drawings may be exaggerated or reduced, and the present invention is not necessarily limited to the illustrated ones.

1 is a block diagram of an electronic device 101 in a network environment 100 according to various embodiments.

Referring to FIG. 1 , in a network environment 100 , the electronic device 101 communicates with the electronic device 102 through a first network 198 (eg, a short-range wireless communication network) or a second network 199 . It may communicate with the electronic device 104 or the server 108 through (eg, a long-distance wireless communication network). According to an embodiment, the electronic device 101 may communicate with the electronic device 104 through the server 108 . According to an embodiment, the electronic device 101 includes a processor 120 , a memory 130 , an input device 150 , a sound output device 155 , a display device 160 , an audio module 170 , and a sensor module ( 176 , interface 177 , haptic module 179 , camera module 180 , power management module 188 , battery 189 , communication module 190 , subscriber identification module 196 , or antenna module 197 . ) may be included. In some embodiments, at least one of these components (eg, the display device 160 or the camera module 180 ) may be omitted or one or more other components may be added to the electronic device 101 . In some embodiments, some of these components may be implemented as one integrated circuit. For example, the sensor module 176 (eg, a fingerprint sensor, an iris sensor, or an illuminance sensor) may be implemented while being embedded in the display device 160 (eg, a display).

The processor 120, for example, executes software (eg, the program 140) to execute at least one other component (eg, a hardware or software component) of the electronic device 101 connected to the processor 120 . It can control and perform various data processing or operations. According to one embodiment, as at least part of data processing or operation, the processor 120 converts commands or data received from other components (eg, the sensor module 176 or the communication module 190 ) to the volatile memory 132 . may be loaded into the volatile memory 132 , process commands or data stored in the volatile memory 132 , and store the resulting data in the non-volatile memory 134 . According to an embodiment, the processor 120 includes a main processor 121 (eg, a central processing unit or an application processor), and a secondary processor 123 (eg, a graphic processing unit, an image signal processor) that can operate independently or together with the main processor , a sensor hub processor, or a communication processor). Additionally or alternatively, the auxiliary processor 123 may be configured to use less power than the main processor 121 or to be specialized for a designated function. The auxiliary processor 123 may be implemented separately from or as a part of the main processor 121 .

The auxiliary processor 123 may be, for example, on behalf of the main processor 121 while the main processor 121 is in an inactive (eg, sleep) state, or when the main processor 121 is active (eg, executing an application). ), together with the main processor 121, at least one of the components of the electronic device 101 (eg, the display device 160, the sensor module 176, or the communication module 190) It is possible to control at least some of the related functions or states. According to an embodiment, the coprocessor 123 (eg, an image signal processor or a communication processor) may be implemented as part of another functionally related component (eg, the camera module 180 or the communication module 190). have.

The memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176 ) of the electronic device 101 . The data may include, for example, input data or output data for software (eg, the program 140 ) and instructions related thereto. The memory 130 may include a volatile memory 132 or a non-volatile memory 134 .

The program 140 may be stored as software in the memory 130 , and may include, for example, an operating system 142 , middleware 144 , or an application 146 .

The input device 150 may receive a command or data to be used by a component (eg, the processor 120 ) of the electronic device 101 from the outside (eg, a user) of the electronic device 101 . The input device 150 may include, for example, a microphone, a mouse, a keyboard, or a digital pen (eg, a stylus pen).

The sound output device 155 may output a sound signal to the outside of the electronic device 101 . The sound output device 155 may include, for example, a speaker or a receiver. The speaker can be used for general purposes such as multimedia playback or recording playback, and the receiver can be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from or as part of the speaker.

The display device 160 may visually provide information to the outside (eg, a user) of the electronic device 101 . The display device 160 may include, for example, a display, a hologram device, or a projector and a control circuit for controlling the corresponding device. According to an embodiment, the display device 160 may include a touch circuitry configured to sense a touch or a sensor circuit (eg, a pressure sensor) configured to measure the intensity of a force generated by the touch. have.

The audio module 170 may convert a sound into an electric signal or, conversely, convert an electric signal into a sound. According to an embodiment, the audio module 170 acquires a sound through the input device 150 , or an external electronic device (eg, a sound output device 155 ) connected directly or wirelessly with the electronic device 101 . The electronic device 102) (eg, a speaker or headphones) may output a sound.

The sensor module 176 detects an operating state (eg, power or temperature) of the electronic device 101 or an external environmental state (eg, user state), and generates an electrical signal or data value corresponding to the sensed state. can do. According to an embodiment, the sensor module 176 may include, for example, a gesture sensor, a gyro sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, a humidity sensor, or an illuminance sensor.

The interface 177 may support one or more specified protocols that may be used by the electronic device 101 to directly or wirelessly connect with an external electronic device (eg, the electronic device 102 ). According to an embodiment, the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

The connection terminal 178 may include a connector through which the electronic device 101 can be physically connected to an external electronic device (eg, the electronic device 102 ). According to an embodiment, the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).

The haptic module 179 may convert an electrical signal into a mechanical stimulus (eg, vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic sense. According to an embodiment, the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.

The camera module 180 may capture still images and moving images. According to an embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.

The power management module 188 may manage power supplied to the electronic device 101 . According to an embodiment, the power management module 188 may be implemented as, for example, at least a part of a power management integrated circuit (PMIC).

The battery 189 may supply power to at least one component of the electronic device 101 . According to one embodiment, the battery 189 may include, for example, a non-rechargeable primary cell, a rechargeable secondary cell, or a fuel cell.

The communication module 190 is a direct (eg, wired) communication channel or a wireless communication channel between the electronic device 101 and an external electronic device (eg, the electronic device 102, the electronic device 104, or the server 108). It can support establishment and communication through the established communication channel. The communication module 190 may include one or more communication processors that operate independently of the processor 120 (eg, an application processor) and support direct (eg, wired) communication or wireless communication. According to one embodiment, the communication module 190 is a wireless communication module 192 (eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (eg, : It may include a local area network (LAN) communication module, or a power line communication module). Among these communication modules, a corresponding communication module may be a first network 198 (eg, a short-range communication network such as Bluetooth, WiFi direct, or infrared data association (IrDA)) or a second network 199 (eg, a cellular network, the Internet, or It may communicate with an external electronic device via a computer network (eg, a telecommunication network such as a LAN or WAN). These various types of communication modules may be integrated into one component (eg, a single chip) or may be implemented as a plurality of components (eg, multiple chips) separate from each other. The wireless communication module 192 uses subscriber information (eg, International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 within a communication network such as the first network 198 or the second network 199 . The electronic device 101 may be identified and authenticated.

The antenna module 197 may transmit or receive a signal or power to the outside (eg, an external electronic device). According to an embodiment, the antenna module 197 may include one antenna including a conductor formed on a substrate (eg, a PCB) or a radiator formed of a conductive pattern. According to an embodiment, the antenna module 197 may include a plurality of antennas. In this case, it may be selected from the first plurality of antennas. A signal or power may be transmitted or received between the communication module 190 and an external electronic device through the selected at least one antenna. According to some embodiments, other components (eg, RFIC) other than the radiator may be additionally formed as a part of the antenna module 197 .

At least some of the components are connected to each other through a communication method between peripheral devices (eg, a bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)) and a signal ( e.g. commands or data) can be exchanged with each other.

According to an embodiment, the command or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199 . Each of the

electronic devices

102 and 104 may be the same or a different type of the electronic device 101 . According to an embodiment, all or a part of operations executed in the electronic device 101 may be executed in one or more of the external

electronic devices

102 , 104 , or 108 .

For example, when the electronic device 101 needs to perform a function or service automatically or in response to a request from a user or other device, the electronic device 101 may perform the function or service itself instead of executing the function or service itself. Alternatively or additionally, one or more external electronic devices may be requested to perform at least a part of the function or the service. The one or more external electronic devices that have received the request may execute at least a part of the requested function or service, or an additional function or service related to the request, and transmit a result of the execution to the electronic device 101 . The electronic device 101 may process the result as it is or additionally and provide it as at least a part of a response to the request. For this purpose, for example, cloud computing, distributed computing, or client-server computing technology may be used.

The electronic device 200 according to various embodiments of the present disclosure converts audio data output through the plurality of speakers 202 to stereo sound or mono sound based on the positional relationship between the electronic device 200 and the user. (mono sound) can be provided. For example, the electronic device 200 may selectively provide stereo sound or mono sound in order to provide a better sound quality call environment to the user during a call in a hands-free situation.

According to an embodiment, the electronic device 200 determines a positional relationship between the electronic device 200 and the user to provide stereo sound when the user is located within a preset area, and provides mono sound when the user is located outside the preset area. can provide Here, the preset area is an area set according to the positions of the plurality of speakers 202 disposed on the electronic device 200, and is a sweet spot that provides stereo sound to the user and provides the best sound quality. may include.

Referring to FIG. 2 , the electronic device 200 for providing the above-described function includes a plurality of microphones 201 , a plurality of speakers 202 , a sensor 203 , a camera 204 , a memory 205 and It may include a processor 206 . However, the configuration of the electronic device 200 is not limited thereto. According to various embodiments, the electronic device 200 may omit at least one component among the above-described components, and may further include at least one other component.

The plurality of microphones 201 may receive a user's voice. Also, the plurality of microphones 201 may provide the received voice to the processor 206 . In FIG. 2 , it has been described that the plurality of microphones 201 include a first microphone 201a and a second microphone 201b, but the number of microphones included in the electronic device 200 is limited thereto. no. According to an embodiment, the electronic device 200 may further include at least one other microphone.

The plurality of speakers 202 may output audio data received from the processor 206 . For example, the plurality of speakers 202 may output audio data selected by the processor 206 to provide sound to the user. In FIG. 2 , it has been described that the plurality of speakers 202 include the first speaker 202a and the second speaker 202b, but the number of speakers included in the electronic device 200 is limited thereto. no.

The sensor 203 may be disposed inside the electronic device 200 to detect an operating state of the electronic device 200 or an external environmental state, and may generate an electrical signal or data value corresponding to the sensed state. According to an embodiment, the sensor 203 may acquire sensor information related to the posture of the electronic device 200 . For example, whenever the posture of the electronic device 200 is changed, the sensor 203 measures the change angle of the electronic device 200 and provides the measured change angle to the processor 206 as sensor information. can

According to an embodiment, the sensor 203 may measure a change angle of the electronic device 200 based on an imaginary line passing through the plurality of speakers 202 disposed in the electronic device 200 , and measure The changed angle of the electronic device 200 may be provided to the processor 206 as sensor information. The sensor 203 may include, for example, at least one of a gyro sensor and an acceleration sensor. However, the type of the sensor 203 is not limited thereto.

The camera 204 may acquire image data by photographing an object (eg, a user). The image data may include at least one of still image data and moving image data.

The memory 205 may store various data used by at least one component of the electronic device 200 . For example, the memory 205 stores various data such as voice acquired from a plurality of microphones 201 , audio data output through a plurality of speakers 202 , and a captured image acquired from the camera 204 . can be saved

The processor 206 may be operatively connected to other components of the electronic device 200 to control operations of the other components. For example, the processor 206 is operatively connected to a plurality of microphones 201 , a plurality of speakers 202 , a sensor 203 , a camera 204 , and a memory 205 to enable the plurality of microphones ( 201 ), the plurality of speakers 202 , the sensor 203 , the camera 204 , and the memory 205 .

The processor 206 may receive a user's voice through the plurality of microphones 201 . Also, the processor 206 may determine the positional relationship between the electronic device 200 and the user based on a difference in reception time of the user's voice received through each of the plurality of microphones 201 . For example, when receiving the user's voice through the first microphone 201a and the second microphone 201b included in the plurality of microphones 201 , the processor 206 controls the first microphone 201a A positional relationship between the electronic device 200 and the user based on a difference between a first time when the user's voice is received through ) and a second time when the user's voice is received through the second microphone 201b can be judged

According to an embodiment, the processor 206 compares the received first time with the second time, and based on the first threshold value and the compared value, a position between the electronic device 200 and the user relationship can be judged. The electronic device 200 may determine whether the user is within a preset area based on the determined positional relationship.

According to an embodiment, when the value of the received first time is 5 and the value of the received second time is 8, the processor compares the first time with the second time, and a comparison value of 3 can confirm. When the first threshold value is set to 5, the processor 206 may compare the obtained comparison value 3 with a first threshold value 5, in which case, if the comparison value is equal to or greater than the first threshold value, The processor 206 may determine that the user is not located in the preset region, and when the comparison value is less than the first threshold value, determine that the user is located in the preset region.

Accordingly, the processor 206 may determine that the current user is within the preset area. That is, the first threshold value may be referred to as reference information for determining the positional relationship between the electronic device 200 and the user based on the time of the received voice. However, the first threshold value may be changed according to a size of the electronic device 200 and a location where each of the plurality of speakers 202 disposed in the electronic device 200 is disposed.

According to an embodiment, the processor 206 may obtain sensor information related to the posture of the electronic device 200 through the sensor 203 . According to an embodiment, the processor 206 may determine the posture of the electronic device 200 based on angle information of the electronic device 200 measured by the sensor 203 . For example, the processor 206 may determine whether the electronic device is disposed along the vertical axis (or vertically) or horizontally (or horizontally) based on the angle information of the electronic device 200 .

According to an embodiment, whenever the posture of the electronic device 200 is changed through the sensor 203 , the processor 206 measures a change angle of the electronic device to obtain and obtain sensor information The one piece of information may be compared with the second threshold value. According to an embodiment, the processor 206 may determine that the change angle of the electronic device 200 is 49 degrees based on the sensor information acquired through the sensor 203 . In this case, when the second threshold value is 45 degrees, the processor 206 may determine the posture of the electronic device 200 by comparing the change angle of 49 degrees with the second threshold value of 45 degrees.

According to an embodiment, when the change angle is smaller than the second threshold value, the processor 206 determines that the user is not located in an area in which the positional relationship between the electronic device 200 and the user is preset. can On the other hand, when the change angle is greater than the second threshold value, it may be determined that the user is located in the region in which the positional relationship is preset. That is, the second threshold value is reference information for determining whether the postures of the processor 206 and the current electronic device 200 are postures capable of providing stereo audio data to the user, and is changed for each electronic device 200 . It can be considered as standard information.

According to an embodiment, the processor 206 is configured to generate audio output through the plurality of speakers 202 based on the positional relationship between the electronic device 200 and the user and the posture of the electronic device 200 . data can be determined. The processor 206 may determine audio data to be output through each of the plurality of speakers 202 according to whether the user is located in a preset area.

For example, when the user is located in the preset area, the processor 206 may output at least partially different audio data to each of the plurality of speakers 202 to provide stereo sound. As another example, when the user is located outside the preset area, the processor 206 may output the same audio data to each of the plurality of speakers 202 to provide mono sound. That is, the processor 206 determines whether the user is located in a preset area in which the stereo sound can be provided from the electronic device 200 based on the first threshold value and the second threshold value, and determines the result of the determination. Based on this, you can decide whether to provide stereo sound or mono sound to the user.

Also, the preset area may be changed according to the location of each of the plurality of speakers 202 disposed in the electronic device 200 and/or the size of the electronic device 200 . As will be described below, referring to FIG. 9 , the first area and the third area may be set as preset areas according to the positions of the first speaker located above the electronic device 200 and the second speaker located below the electronic device 200 . . When the positional relationship between the electronic device 200 and the user determined by the processor 206 is that the user is located in the first area and/or the third area, the processor 206 determines that the user is in the preset area. It is determined that it is located, and in order to provide an optimal sound to the user, at least a portion of different audio data may be output to each of the plurality of speakers 202 to provide stereo sound to the user.

On the other hand, when the processor 206 determines that the user is not located in the preset region in the positional relationship between the electronic device 200 and the user, the processor 206 may provide a mono sound instead of a stereo sound.

According to an embodiment, the processor 206 may receive a user's voice through each of the plurality of microphones 201 , and determine whether to provide a mono sound to the user based on a difference in reception time of the received voice. For example, the processor 206 may determine a positional relationship between the electronic device 200 and the user based on a difference in reception time of a voice received through each of the plurality of microphones 201 . At this time, if it is determined that the user is located outside a preset area based on the arrangement positions of the plurality of speakers 202 included in the electronic device 200 , the processor 206 provides a mono sound to the user. can

According to an embodiment, the processor 206 may acquire an image obtained by photographing the object from the camera 204 . Also, the processor 206 may more accurately determine the positional relationship between the electronic device 200 and the user based on the captured image, the time difference between the identified voice signals, and the determined posture of the electronic device. The processor 206 may determine audio data to be output through the plurality of speakers 202 based on the determined positional relationship between the electronic device 200 and the user.

According to an embodiment, when the electronic device 200 is mounted on a cradle, the processor 206 uses the first threshold value, the second threshold value, and the camera 204 to operate the electronic device ( 200) and the user may be determined, and audio data to be output through each of the plurality of speakers 202 may be set based on the determined positional relationship. However, when the posture of the electronic device 200 mounted on the cradle is changed to a posture lying on the ground, the processor 206 determines that the electronic device 200 is lying down through the sensor 203, The positional relationship between the electronic device 200 and the user is determined again based on the user's voice received through each of the plurality of microphones 201 and the image captured by the camera 204, and the plurality of Audio data to be output through each of the speakers 202 may be reset.

According to an embodiment, the processor 206 may determine audio data to be output through the plurality of speakers 202 according to whether the user is located in a preset area. For example, when the user is located in the preset area, the processor 206 may output at least partially different audio data to each of the plurality of speakers 202 to provide stereo sound. As another example, when the user is located outside the preset area, the processor 206 may output the same audio data to each of the plurality of speakers 202 to provide mono sound.

According to an embodiment, the reason that the processor 206 determines that the user is not located inside the preset area and provides the mono sound to the user is that when stereo sound is provided to the user located outside the preset area, the plurality of At least a portion of the audio data output through each of the speakers 202 may cause an extremely aggravated interference phenomenon. Accordingly, since the sound quality of the stereo sound deteriorated due to the interference phenomenon is inferior to that of the mono sound, the processor 206 performs the same through each of the plurality of speakers 202 if the user is not located within the preset area. A mono sound can be provided by outputting audio data.

According to an embodiment, the preset area may be determined according to a location of a plurality of speakers 202 disposed in the electronic device 200 or a size of the electronic device 200 . For example, the preset area may include a sweet spot that provides the best sound quality when stereo sound is provided to the user based on the arrangement positions of the plurality of speakers 202 .

According to an embodiment, when the processor 206 outputs stereo sound through each of the plurality of speakers 202 , the processor 206 uses a filter to prevent crosstalk, which is a phenomenon that occurs when audio data interferes with each other. (Example: XTC filter) can be applied. A filter may be applied to the stereo sound output by the processor 206 through each of the plurality of speakers 202 to cancel crosstalk.

As described above, according to various embodiments, the electronic device (eg, the electronic device 200) includes a plurality of microphones (eg, the first microphone 201a and the second microphone 201b) and a plurality of speakers. (eg, first speaker 202a and second speaker 202b), sensor (eg, sensor 203), memory (eg, memory 205), and the plurality of microphones, the plurality of speakers , a processor (eg, processor 206 ) operatively coupled to the sensor and the memory, wherein the processor receives, through each of the plurality of microphones, a user's voice, and each of the plurality of microphones based on a difference in reception time of the user's voice received through the user, determine a positional relationship between the electronic device and the user, and determine the posture of the electronic device based on sensor information measured through the sensor, Audio data output through the plurality of speakers included in the electronic device may be determined based on the determined positional relationship and the determined posture of the electronic device.

According to various embodiments, the processor determines whether the user is located in a preset area based on the determined positional relationship and the determined posture of the electronic device, and determines that the user is not located in the preset area. When it is determined that the same audio data is output through each of the plurality of speakers, and when it is determined that the user is located in the preset area, at least some different audio data are output through each of the plurality of speakers can be set.

According to various embodiments, the processor may set the preset area based on a location where the plurality of speakers are disposed in the electronic device.

According to various embodiments, the electronic device further includes a camera (eg, a camera 204 ), and the processor includes an image captured by the camera and a voice of the user received through each of the plurality of microphones. It may be configured to determine a positional relationship between the electronic device and the user based on a reception time difference of .

According to various embodiments of the present disclosure, the electronic device further includes at least one other microphone, and the processor is configured to include a reception time of the user's voice received through each of the plurality of microphones and the at least one other microphone. It may be configured to determine a positional relationship between the electronic device and the user based on a reception time of the received user's voice.

According to various embodiments, the electronic device may further include a filter for preventing a crosstalk phenomenon occurring between the audio data output through each of the plurality of speakers.

According to various embodiments, the processor outputs the same audio data through each of the plurality of speakers when the value indicating the difference in the reception time of the user's voice is greater than or equal to a preset first threshold value, When a value indicating a difference in reception time of a user's voice is smaller than the first threshold value, at least a portion of the audio data may be set to output different audio data through each of the plurality of speakers.

According to various embodiments, the processor calculates the angle of the electronic device based on the sensor information, and when the calculated angle of the electronic device is greater than or equal to a preset second threshold value, the plurality of speakers The same audio data may be output through each, and when the calculated angle of the electronic device is smaller than the second threshold value, at least some of the different audio data may be output through each of the plurality of speakers.

According to various embodiments, the electronic device further includes a camera (eg, a camera 204 ), the processor acquires an image captured by the camera, and a position of an object corresponding to the user in the image It may be configured to obtain a value and reconstruct the determined audio data based on the determined positional relationship, the determined posture of the electronic device, and the position value of the object.

As described above, according to various embodiments, the electronic device (eg, the electronic device 200) includes a plurality of microphones (eg, the first microphone 201a and the second microphone 201b) and a plurality of speakers. (eg, first speaker 202a and second speaker 202b), camera (eg, camera 204), memory (eg, memory 205), and the plurality of microphones, the plurality of speakers , the camera and a processor (eg, processor 206 ) operatively connected to the memory, wherein the processor receives a user's voice through each of the plurality of microphones, and an image captured by the camera to obtain the position value of the object corresponding to the user in the image, and based on the reception time difference of the user's voice received through each of the plurality of microphones and the position value of the object, the electronic and determine a positional relationship between the device and the user, and determine the audio data to be output through the plurality of speakers based on the determined positional relationship.

According to various embodiments, the processor selects at least two of the plurality of speakers based on the determined positional relationship, and outputs at least partially different audio data through each of the selected at least two speakers. can be set to

Referring to FIG. 3 , in operation 301 , the processor (eg, the processor 206 ) performs each of a plurality of microphones (eg, the plurality of microphones 201 ) disposed in the electronic device (eg, the electronic device 200 ). It is possible to receive the user's voice signal through the

In operation 303, the processor may determine a positional relationship between the electronic device and the user based on a received time difference of a voice signal received through each of the plurality of microphones. For example, a first microphone (eg, a first microphone 201a) among the plurality of microphones is disposed on one side (eg, a left side) of the electronic device, and a second microphone (eg, a second microphone) among the plurality of microphones 2 When the microphone 201b) is disposed on the other side (eg, the right side) of the electronic device, when a user located in the direction of one side (eg, the left side) of the electronic device speaks, the first microphone located closer to the user will 2 Can receive voice faster than microphone. That is, based on the cross-correlation, the processor compares T1, which is the time of the voice signal received by the first microphone 201a, with T2, which is the time of the voice signal received by the second microphone 201b, based on the cross-correlation, and receives the voice You can see the time difference. The processor may determine a positional relationship between the electronic device and the user based on a difference in reception time of the received voice.

In operation 305, the processor may determine the posture of the electronic device based on sensor information received from a sensor (eg, the sensor 203). The sensor may include a gyro sensor and an acceleration sensor, and is not limited thereto as long as sensor information for determining the posture of the electronic device can be obtained. The sensor may measure a change angle of the electronic device based on a virtual line passing through the plurality of speakers disposed in the electronic device, and provide the measured change angle of the electronic device as sensor information to the processor. The processor may determine the posture of the electronic device based on the sensor information.

In operation 307, the processor may determine and output audio data output through each of a plurality of speakers (eg, a plurality of speakers 202) based on the positional relationship and the determined posture of the electronic device. . When it is determined that the user is located in a preset area based on the positional relationship and the determined posture of the electronic device, the processor may output stereo sound through the plurality of speakers. When the processor determines that the user is located outside the preset area, the processor may output a mono sound through the plurality of speakers.

Referring to FIG. 4 , in operation 401, the processor (eg, the processor 206) may compare the voice received through each of the plurality of microphones (eg, the plurality of microphones 201) based on the cross-correlation. , it is possible to check the difference in reception time of the received voice based on the comparison result. The processor may compare a first threshold value that is preset reference information for determining the positional relationship between the electronic device and the user and a difference in reception time of the received voice. When the difference in the reception time of the received voice is greater than or equal to the first threshold value, the processor may determine that the user is not located in the preset area.

According to an embodiment, when the processor determines that the user is not located within the preset area, in operation 403 , the processor outputs the same audio data through each of the plurality of speakers (eg, the plurality of speakers 202 ). A mono sound can be provided to the user. If the processor provides stereo sound when the user's location is outside the preset area, interference may occur between audio data that is at least partially different from each other outputted through each of the plurality of speakers. Accordingly, when the user's location is outside the preset area, the processor may provide a mono sound outputting the same audio data through the plurality of speakers.

In operation 402, the processor may compare the angle of the electronic device determined based on sensor information received from a sensor (eg, the sensor 203) with a preset second threshold value. The processor may determine the posture of the electronic device based on the comparison result. For example, when the angle of the electronic device is smaller than the second threshold value, the processor may determine that the electronic device is disposed along the vertical axis. Also, when the angle of the electronic device is greater than or equal to the second threshold value, the processor may determine that the electronic device is arranged in a horizontal axis.

According to an embodiment, when the angle of the electronic device is smaller than a preset second threshold value (when the electronic device is disposed in a vertical axis), the processor determines a preset area according to positions of a plurality of speakers disposed in the electronic device. It is determined that the user is not located inside the , and in operation 403 , the same audio data may be output through each of the plurality of speakers to provide a mono sound to the user.

According to an embodiment, when the angle of the electronic device is greater than or equal to a preset second threshold value (when the electronic device is disposed in a horizontal axis), the processor determines the positions of the plurality of speakers disposed in the electronic device It is determined that the user is located in the preset area according to , and in operation 404 , at least some different audio data may be output through each of the plurality of speakers to provide stereo sound to the user.

In operation 501 , when the user's voice is received through each of the plurality of microphones (eg, the plurality of microphones 201 ), the processor (eg, the processor 206 ) determines a difference between the reception time of the received voice and the preset first Thresholds can be compared. When the difference in the reception time of the received voice is greater than or equal to the first threshold value, the processor may determine that the user is not located in the preset area.

According to an embodiment, when it is determined that the user is not located inside the preset area, in operation 503 , the processor outputs the same audio data through each of a plurality of speakers (eg, a plurality of speakers 202 ). Thus, a mono sound can be provided to the user.

In operation 502, the processor may compare an angle of the electronic device determined based on sensor information received from a sensor (eg, the sensor 203) with a preset second threshold value. The processor may determine the posture of the electronic device based on the comparison result. For example, when the angle of the electronic device is smaller than the second threshold value, the processor may determine that the electronic device is disposed along the vertical axis. Also, when the angle of the electronic device is greater than or equal to the second threshold value, the processor may determine that the electronic device is arranged in a horizontal axis.

According to an embodiment, when the angle of the electronic device is smaller than a preset second threshold value (when the electronic device is disposed in a vertical axis), the processor determines a preset area according to positions of a plurality of speakers disposed in the electronic device. It is determined that the user is not located inside the , and in operation 503 , the same audio data may be output through each of the plurality of speakers to provide a mono sound to the user.

According to an embodiment, when the angle of the electronic device is greater than or equal to a preset second threshold value (when the electronic device is disposed in a horizontal axis), the processor determines the positions of the plurality of speakers disposed in the electronic device It is determined that the user is located in the preset area according to , and in operation 504 , stereo audio data to be output through each of the plurality of speakers may be configured. For example, the processor may configure audio data that is at least partially different from each other output through each of the plurality of speakers.

In operation 505, the processor may specify a positional relationship between the electronic device and the user based on an image captured by a camera (eg, the camera 204). For example, the processor may obtain an image obtained by photographing an object (eg, a user) through the camera, and determine the position of the object in the image. Also, the processor may specify a positional relationship between the electronic device and the user based on a difference in reception time of a voice received through each of the plurality of microphones and a position of an object in the image. In more detail, the processor may roughly determine a positional relationship between the electronic device and the user based on a difference in reception time of a voice received through each of the plurality of microphones in operation 501 .

For example, the positional relationship between the electronic device and the user determined based on the difference in the reception time of the voice may include information on the distance and the direction between the electronic device and the user. Here, since the direction between the electronic device and the user may not be specified in any one direction, the processor checks the position value of the object corresponding to the user in the image captured by the user through the camera, and the electronic device and a direction between the user and the user may be specified in any one direction.

In operation 506, the processor may apply a filter to the stereo audio data based on the specified positional relationship between the electronic device and the user. When the at least partly different audio data is output through each of the plurality of speakers, the processor filters the at least partly different audio data to prevent a crosstalk phenomenon in which the at least partly different audio data interferes with each other. (Example: XTC filter) can be applied.

In operation 507, the processor may provide stereo sound to a user located in the preset area by outputting audio data to which at least a portion to which the filter is applied is different through the plurality of speakers.

According to an embodiment, when the camera does not recognize a user or recognizes two or more users, the processor transmits at least partially different audio data based on the determined positional relationship and the posture of the electronic device. A stereo sound may be provided to the user by outputting the output through each of the plurality of speakers.

According to an embodiment, the processor may reconstruct the stereo sound based on the specified positional relationship between the electronic device and the user. For example, the processor may reconstruct audio data that is at least partially different from the audio data output through the plurality of speakers based on the specified positional relationship between the electronic device and the user.

In operation 601 , the processor (eg, the processor 206 ) compares the reception time difference of the voice received through each of the plurality of microphones (eg, the plurality of microphones 201 ) with a preset first threshold value to the electronic device and a positional relationship between the user and the user may be determined. The processor compares the sensor information received from the sensor (eg, the sensor 203 ) with the second threshold value in operation 602 when the difference in the reception time of the received voice is smaller than a preset first threshold value to the electronic device position can be judged.

According to an embodiment, when the angle of the electronic device is greater than or equal to a preset second threshold value, in operation 604 , the processor may determine that the user is located in a preset area. When the processor determines that the user is located in the preset area, the processor configures at least partly different audio data based on the determined positional relationship and the determined posture of the electronic device in order to provide stereo sound to the user. can

According to an embodiment, when at least one other microphone is additionally configured in the electronic device in addition to the plurality of microphones, in operation 605, the processor determines the reception time of the voice received through the plurality of microphones and the at least one By comparing each of the reception times of the voices received through the other microphones based on the cross-correlation, the difference in the reception times of the received voices can be confirmed.

According to an embodiment, the processor may perform trilateration based on a difference in reception time of the received voice to more accurately identify a positional relationship between the electronic device and the user. For example, a reception time of a voice signal received by each of the plurality of microphones (eg, the first microphone 201a and the second microphone 201b) is referred to as T1 and T2, and the at least one other microphone (eg, : When the reception time of the voice signal received through the third microphone (not shown) is T3, trilateration is performed based on the time difference between T1, T2, and T3 to determine the positional relationship between the electronic device and the user. can be specified.

According to an embodiment, if the positional relationship between the electronic device and the user is not specified, in operation 607, the processor may output the stereo audio data configured in operation 604 through the plurality of speakers.

According to an embodiment, if the positional relationship between the electronic device and the user is specified, in operation 606, the processor may apply a filter to the stereo audio data based on the specified positional relationship between the electronic device and the user. When the at least partly different audio data is output through each of the plurality of speakers, the processor filters the at least partly different audio data to prevent a crosstalk phenomenon in which the at least partly different audio data interferes with each other. (Example: XTC filter) can be applied.

In operation 607, the processor may provide stereo sound to a user located in the preset area by outputting audio data to which at least a portion of which the filter is applied is different through the plurality of speakers.

In operation 701, the processor (eg, the processor 206) may receive the user's voice through each of the plurality of microphones (eg, the plurality of microphones 201).

In operation 702, the processor may obtain an image of an object (eg, a user) from a camera (eg, the camera 204).

In operation 703, the processor may determine a positional relationship between the electronic device and the user based on a difference in reception time of a voice received through each of the plurality of microphones and an image acquired through the camera. For example, the processor may determine a difference in reception time of the received voice by comparing the reception times of the voice signals received through each of the plurality of microphones based on the cross-correlation. Also, the processor may obtain a position value of an object corresponding to the user in the captured image. The processor may determine the positional relationship between the electronic device and the user based on a difference between the position value of the object corresponding to the user and the reception time of the received voice.

In more detail, in operation 703 , the processor may roughly determine a positional relationship between the electronic device and the user based on a difference in reception time of the voice received through each of the plurality of microphones in operation 701 . For example, the positional relationship between the electronic device and the user determined based on the difference in the reception time of the voice may include information on the distance and the direction between the electronic device and the user. Here, since the direction between the electronic device and the user may not be specified in any one direction, the processor checks the position value of the object corresponding to the user in the image captured by the user through the camera, and the electronic device and a direction between the user and the user may be specified in any one direction.

In operation 704, the processor may determine audio data to be output through each of the plurality of speakers based on the determined positional relationship. According to an embodiment, when the user is located inside a preset area according to the arrangement positions of the plurality of speakers based on the determined positional relationship, the processor may at least partially use different audio By outputting data, stereo sound can be provided to the user. According to an embodiment, when the user is not located inside the preset area, the processor may output the same audio data through each of the plurality of speakers to provide a mono sound to the user.

The plurality of microphones (eg, the plurality of microphones 201 ) in FIG. 8 may include at least three or more microphones. In operation 801, the processor (eg, the processor 206) may receive the user's voice through each of the plurality of microphones (eg, the plurality of microphones 201).

In operation 803, the processor may determine a positional relationship between the electronic device and the user based on a difference in reception time of a voice received through each of the plurality of microphones. For example, the processor may determine a positional relationship between the electronic device and the user by trilaterating the reception time difference of the voice received through each of the plurality of microphones. According to an embodiment, the processor compares the time of the voice signal received through each of the plurality of microphones based on the cross-correlation to determine the difference in the reception time of the received voice.

In operation 805, the processor may determine and output audio data output through a plurality of speakers (eg, a plurality of speakers 202) based on the determined positional relationship. For example, when the user is located in a predetermined area according to the arrangement position of the plurality of speakers, at least a part of different audio data may be output through each of the plurality of speakers to provide stereo sound to the user. As another example, when the user is not located in the preset area, the processor may output the same audio data through each of the plurality of speakers to provide a mono sound to the user.

As described above, according to various embodiments, a method of outputting audio data of an electronic device (eg, the electronic device 200) includes a plurality of microphones (eg, the first microphone 201a) included in the electronic device and A positional relationship between the electronic device and the user based on an operation of receiving a user's voice through each of the second microphones 201b) and a difference in reception time of the user's voice received through each of the plurality of microphones an operation of determining a posture of the electronic device based on sensor information measured through a sensor (eg, sensor 203) included in the electronic device, an operation of determining the posture of the electronic device, and the determined positional relationship and the determined determining the audio data to be output through the plurality of speakers (eg, the first speaker 202a and the second speaker 202b) included in the electronic device based on the posture of the electronic device have.

According to various embodiments, the determining of the audio data may include determining whether the user is located within a preset area based on the determined positional relationship and the determined posture of the electronic device; outputting the same audio data through each of the plurality of speakers when it is determined that the user is not located within the preset area; and when it is determined that the user is located within the preset area, at least through each of the plurality of speakers Some may include an operation of outputting other audio data.

According to various embodiments of the present disclosure, the method of outputting the audio data may further include setting the preset region based on a position where the plurality of speakers are arranged in the electronic device.

According to various embodiments, the determining of the positional relationship between the electronic device and the user may include using an image captured by a camera (eg, camera 204 ) included in the electronic device and each of the plurality of microphones. and determining a positional relationship between the electronic device and the user based on a difference in reception time of the received user's voice.

According to various embodiments, the determining of the positional relationship between the electronic device and the user may include a reception time of the user's voice received through each of the plurality of microphones and at least one other microphone included in the electronic device. and determining a positional relationship between the electronic device and the user based on the reception time of the user's voice received through the .

According to various embodiments, the method of outputting the audio data may further include preventing a crosstalk phenomenon occurring between the audio data output through each of the plurality of speakers through a filter included in the electronic device. can

According to various embodiments, the determining of the audio data may include, when a value representing a difference in reception time of the user's voice is greater than or equal to a preset first threshold value, audio outputted through each of the plurality of speakers Determining data as the same audio data, and when a value indicating a difference in reception time of the user's voice is less than the first threshold value, at least partially different audio data output through each of the plurality of speakers It may include an operation to determine with data.

According to various embodiments, the determining of the posture of the electronic device includes calculating the angle of the electronic device based on the sensor information, and the determining of the audio data includes the calculated electronic device determining that the audio data output through each of the plurality of speakers is the same audio data when the angle of is greater than or equal to a preset second threshold value, and the calculated angle of the electronic device is the second threshold value In a smaller case, the method may include determining, at least in part, audio data output through each of the plurality of speakers as different audio data.

According to various embodiments of the present disclosure, the method of outputting the audio data includes an operation of acquiring an image photographed through a camera of the electronic device, an operation of acquiring a position value of an object corresponding to the user from the image, and the determined The method may further include reconstructing the determined audio data based on the positional relationship, the determined posture of the electronic device, and the position value of the object.

9 is a view for explaining a preset area according to an arrangement position of a plurality of speakers (eg, a plurality of speakers 202) according to an embodiment of the present invention.

According to an embodiment, the processor (eg, the processor 206 ) may set a preset area based on a location where the plurality of speakers are disposed in the electronic device (eg, the electronic device 200 ). The preset area is an area set according to the positions of the plurality of speakers disposed on the electronic device, and may include a sweet spot that provides stereo sound to the user and provides the best sound quality.

Referring to FIG. 9 , the first area and the third area are preset areas according to the positions of the first speaker 901 located above the electronic device and the second speaker 902 located below the electronic device. When stereo sound is provided through 901 and the second speaker 902, the stereo sound can be effectively heard in the first area or the third area. Accordingly, when the user is located in the first area and the third area, the processor may provide stereo sound to the user by outputting at least partially different audio data through a plurality of speakers.

According to an embodiment, when the user is located in the second area and the fourth area, the processor outputs the same audio data through the first speaker 901 located above and the second speaker 902 located below the user. can provide mono sound.

According to various embodiments of the present disclosure, when a plurality of speakers are respectively disposed on upper, lower, and both sides of an electronic device (eg, the electronic device 200), the processor determines the positions of the electronic device and the user. Based on the relationship, at least two or more of the plurality of microphones may be selected to provide stereo sound to the user. For example, when the user is located in the first area (or the third area), the processor is at least capable of providing stereo sound to the user located in the first area (or the third area) among the plurality of speakers. More than one speaker can be selected. For example, the processor outputs stereo sound to the first region (or the third region) by outputting at least partially different audio data through the first and

second speakers

901 and 902 disposed above and below the electronic device. area) can be provided to users located in As another example, when the user is located in the second area (or fourth area), the processor may provide at least stereo sound to the user located in the second area (or fourth area) among the plurality of speakers. More than one speaker can be selected. For example, the processor outputs stereo sound to the second region (or the second region) by outputting at least partially different audio data through a third speaker (not shown) and a fourth speaker (not shown) disposed on both sides of the electronic device. It can be provided to users located in area 4).

The electronic device according to various embodiments disclosed in this document may have various types of devices. The electronic device may include, for example, a portable communication device (eg, a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance device. The electronic device according to the embodiment of the present document is not limited to the above-described devices.

It should be understood that the various embodiments of this document and the terms used therein are not intended to limit the technical features described in this document to specific embodiments, and include various modifications, equivalents, or substitutions of the embodiments. In connection with the description of the drawings, like reference numerals may be used for similar or related components. The singular form of the noun corresponding to the item may include one or more of the item, unless the relevant context clearly dictates otherwise. As used herein, "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "A , B, or C" each may include any one of, or all possible combinations of, items listed together in the corresponding one of the phrases. Terms such as "first", "second", or "first", "second" may simply be used to distinguish the element from other elements in question, and may refer to elements in other aspects (e.g., importance or order) is not limited. It is said that one (eg, first) component is "coupled" or "connected" to another (eg, second) component, with or without the terms "functionally" or "communicatively". When referenced, it means that one component can be connected to the other component directly (eg by wire), wirelessly, or through a third component.

As used herein, the term “module” may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as, for example, logic, logic block, component, or circuit. A module may be an integrally formed part or a minimum unit or a part of the part that performs one or more functions. For example, according to an embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).

Various embodiments of the present document include one or more stored in a storage medium (eg, the internal memory 136 or the external memory 138) readable by a machine (eg, the electronic device 101). It may be implemented as software (eg, program 140) including instructions. For example, the processor (eg, the processor 120 ) of the device (eg, the electronic device 101 ) may call at least one of one or more instructions stored from a storage medium and execute it. This makes it possible for the device to be operated to perform at least one function according to the at least one command called. The one or more instructions may include code generated by a compiler or code executable by an interpreter. The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' only means that the storage medium is a tangible device and does not contain a signal (eg, electromagnetic wave), and this term refers to the case where data is semi-permanently stored in the storage medium and It does not distinguish between temporary storage cases.

According to one embodiment, the method according to various embodiments disclosed in this document may be provided as included in a computer program product. Computer program products may be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a device-readable storage medium (eg compact disc read only memory (CD-ROM)), or via an application store (eg Play Store ^TM ) or on two user devices ( It can be distributed (eg downloaded or uploaded) directly, online between smartphones (eg: smartphones). In the case of online distribution, at least a part of the computer program product may be temporarily stored or temporarily created in a machine-readable storage medium such as a memory of a server of a manufacturer, a server of an application store, or a relay server.

According to various embodiments, each component (eg, a module or a program) of the above-described components may include a singular or a plurality of entities. According to various embodiments, one or more components or operations among the above-described corresponding components may be omitted, or one or more other components or operations may be added. Alternatively or additionally, a plurality of components (eg, a module or a program) may be integrated into one component. In this case, the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component among the plurality of components prior to the integration. . According to various embodiments, operations performed by a module, program, or other component are executed sequentially, in parallel, repeatedly, or heuristically, or one or more of the operations are executed in a different order, or omitted. or one or more other operations may be added.

Claims

In an electronic device,

a plurality of microphones;

a plurality of speakers;

sensor;

Memory; and

a processor operatively coupled to the plurality of microphones, the plurality of speakers, the sensor, and the memory;

The processor is

Receive a user's voice through each of the plurality of microphones,

determining a positional relationship between the electronic device and the user based on a difference in reception time of the user's voice received through each of the plurality of microphones;

determining the posture of the electronic device based on the sensor information measured through the sensor,

an electronic device configured to determine audio data output through the plurality of speakers included in the electronic device based on the determined positional relationship and the determined posture of the electronic device.
The method according to claim 1,

The processor is

determining whether the user is located within a preset area based on the determined positional relationship and the determined posture of the electronic device;

When it is determined that the user is not located within the preset area, the same audio data is output through each of the plurality of speakers,

An electronic device configured to output at least partially different audio data through each of the plurality of speakers when it is determined that the user is located within the preset area.
3. The method according to claim 2,

The processor is

An electronic device configured to set the preset area based on positions where the plurality of speakers are disposed in the electronic device.
The method according to claim 1,

further including a camera,

The processor is

An electronic device configured to determine a positional relationship between the electronic device and the user based on a difference in reception time between the image captured by the camera and the user's voice received through each of the plurality of microphones.
The method according to claim 1,

at least one other microphone,

The processor is

Based on the reception time of the user's voice received through each of the plurality of microphones and the reception time of the user's voice received through the at least one other microphone, a positional relationship between the electronic device and the user is determined An electronic device set up to do so.
The method according to claim 1,

and a filter for preventing a crosstalk phenomenon occurring between the audio data output through each of the plurality of speakers.
The method according to claim 1,

The processor is

outputting the same audio data through each of the plurality of speakers when the value representing the difference in the reception time of the user's voice is greater than or equal to a preset first threshold value;

The electronic device is configured to output at least partially different audio data through each of the plurality of speakers when the value indicating the difference in reception time of the user's voice is smaller than the first threshold value.
The method according to claim 1,

The processor is

calculating the angle of the electronic device based on the sensor information,

outputting the same audio data through each of the plurality of speakers when the calculated angle of the electronic device is greater than or equal to a preset second threshold value;

an electronic device configured to output at least partially different audio data through each of the plurality of speakers when the calculated angle of the electronic device is smaller than the second threshold value.
The method according to claim 1,

further including a camera,

The processor is

Obtaining an image taken through the camera,

obtaining a position value of an object corresponding to the user in the image,

The electronic device is configured to reconstruct the determined audio data based on the determined positional relationship, the determined posture of the electronic device, and the position value of the object.
A method of outputting audio data from an electronic device, the method comprising:

receiving a user's voice through each of the plurality of microphones included in the electronic device;

determining a positional relationship between the electronic device and the user based on a difference in reception time of the user's voice received through each of the plurality of microphones;

determining a posture of the electronic device based on sensor information measured through a sensor included in the electronic device; and

and determining audio data to be output through the plurality of speakers included in the electronic device based on the determined positional relationship and the determined posture of the electronic device.
11. The method of claim 10,

The operation of determining the audio data includes:

determining whether the user is located in a preset area based on the determined positional relationship and the determined posture of the electronic device;

outputting the same audio data through each of the plurality of speakers when it is determined that the user is not located within the preset area; and

and outputting at least partially different audio data through each of the plurality of speakers when it is determined that the user is located within the preset area.
12. The method of claim 11,

The method of outputting audio data further comprising the operation of setting the preset area based on positions where the plurality of speakers are arranged in the electronic device.
11. The method of claim 10,

The operation of determining the positional relationship between the electronic device and the user includes:

and determining a positional relationship between the electronic device and the user based on a difference in reception time between an image captured by a camera included in the electronic device and the user's voice received through each of the plurality of microphones How to output audio data.
In an electronic device,

a plurality of microphones;

a plurality of speakers;

camera;

Memory; and

a processor operatively coupled to the plurality of microphones, the plurality of speakers, the camera, and the memory;

The processor is

Receive a user's voice through each of the plurality of microphones,

Obtaining an image taken through the camera,

obtaining a position value of an object corresponding to the user in the image,

determining a positional relationship between the electronic device and the user based on a difference in reception time of the user's voice received through each of the plurality of microphones and a position value of the object,

The electronic device is configured to determine the audio data to be output through the plurality of speakers based on the determined positional relationship.
15. The method of claim 14,

The processor is

Based on the determined positional relationship, selecting at least two of the plurality of speakers,

An electronic device configured to output at least a portion of different audio data through each of the selected at least two speakers.