WO2016190460A1

WO2016190460A1 - Method and device for 3d sound playback

Info

Publication number: WO2016190460A1
Application number: PCT/KR2015/005253
Authority: WO
Inventors: 조현; 김선민; 박영진; 정지현
Original assignee: 삼성전자 주식회사; 한국과학기술원
Priority date: 2015-05-26
Filing date: 2015-05-26
Publication date: 2016-12-01
Also published as: KR20180012744A; KR102357293B1

Abstract

A method for 3D sound playback according to an embodiment may comprise the steps of: grouping a plurality of speakers into one group; receiving an input of an audio signal; using the grouped plurality of speakers to locate one or more virtual sound sources of the audio signal at a predetermined position; and playing back the virtual sound source(s) through the plurality of speakers.

Description

Stereo playback method and apparatus

The present invention relates to a method and an apparatus for reproducing stereo sound, and more particularly, to a method and an apparatus for positioning a virtual sound source at a predetermined position using a plurality of speakers.

Thanks to the development of image and sound processing technology, a large amount of high-quality, high-quality content is being produced. The listeners, who have demanded high-quality, high-quality content, want a realistic image and sound. Accordingly, studies on stereoscopic images and stereoscopic sounds have been actively conducted.

Stereo sound is a technique that arranges a plurality of speakers at different positions on a horizontal plane and outputs the same or different sound signals from each speaker so that the listener feels a sense of space.

However, in the case of the virtual elevation generating technology using the home theater speaker, the sweet spot may be limited to the center of the home theater configuration, and the reflection sound generating technology using the sound bar may be affected by the characteristics of the room. Accordingly, there is a need for a three-dimensional audio rendering method that is not affected by the characteristics of a room by using a plurality of speakers and is not restricted by the position of a sweet spot.

An apparatus and method for reproducing stereo sound for providing a stereoscopic sense and a spatial sense to a listener may be provided.

The present invention also provides a computer-readable recording medium having recorded thereon a program for executing the method on a computer. The technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may be inferred from the following embodiments.

1 illustrates a stereoscopic sound reproduction environment of a listener according to an embodiment.

2 illustrates a stereoscopic sound reproducing apparatus according to an embodiment.

3 illustrates a stereoscopic sound reproducing apparatus using a wave field synthesis rendering method.

4A shows a stereoscopic sound reproducing apparatus that renders using a minimum error summing method.

FIG. 4B is a view showing an arbitrary point in the virtual sound source and the sweet spot in the stereoscopic sound reproduction environment of FIG.

5 shows a 3D sound reproducing apparatus that performs rendering for high-altitude reproduction.

6A illustrates a stereoscopic sound reproducing apparatus for tracking a listener's head position according to an embodiment.

6B illustrates a change in a sweet spot of a stereoscopic sound reproduction environment according to an embodiment.

7 is a flowchart of a method of reproducing stereoscopic sound, according to an exemplary embodiment.

8 shows a flowchart of a further embodiment of the method for the stereophonic reproduction apparatus to reproduce stereoscopic sound.

According to an embodiment, a stereoscopic sound reproducing method includes grouping a plurality of speakers into a group, receiving a sound signal, and using one or more grouped speakers to position one or more virtual sound sources of the sound signal. And positioning the virtual sound source through the plurality of speakers.

The grouping of the plurality of speakers into a group may include including a speaker constituting one home theater system and a separate loudspeaker not constituting the home theater system in the one group. .

The home theater system may be a loudspeaker array in which a plurality of loudspeakers are linearly connected.

Grouping the plurality of speakers may include connecting the plurality of physically separated speakers through a wireless or wired network.

The positioning of the virtual sound source at a predetermined position may include positioning the virtual sound image at a predetermined position by a sound field synthesis rendering method.

The positioning of the virtual sound source at a predetermined position may include: a first sound pressure in a sweet spot generated from the speakers included in the group, and in the sweet spot generated from the virtual sound source at the predetermined position; The method may include determining a sound pressure signal for each speaker included in the group capable of minimizing a difference in second sound pressure, and modulating the received sound signal based on the determined sound pressure signal for each speaker.

The calculating of the sound pressure signal for each speaker included in the group includes determining an impulse response to be applied to each speaker included in the group, and modulating the received sound signal includes the group. The method may include convolving the determined impulse response to the sound signal input for each speaker.

The positioning of the virtual sound image at a predetermined position may include passing the received sound signal through a filter corresponding to a predetermined altitude, replicating the filtered sound signal to generate a plurality of sound signals, and And performing at least one of amplification, attenuation, and delay for each of the replicated acoustic signals based on at least one of a gain value and a delay value corresponding to each of the speakers to which the duplicated acoustic signals are to be output. You can do

And tracking the position of the head of the listener in real time, and positioning the virtual sound source at a predetermined position comprises: at least one of the speakers included in the group based on the tracked position of the head of the listener. And changing the gain and phase delay values of the speaker.

According to an embodiment, an apparatus for reproducing stereo sound includes a grouping unit for grouping a plurality of speakers into a group, a receiving unit for receiving an audio signal, and a plurality of virtual sound sources of the sound signal by using the grouped speakers. And a rendering unit for positioning at a position, and a reproducing unit for reproducing the virtual sound source through the plurality of speakers.

The grouping unit may include a speaker constituting one home theater system and a separate loudspeaker not constituting the home theater system in the one group.

The grouping unit may connect the plurality of physically separated speakers through a wireless or wired network.

The rendering unit may orient the virtual sound image to a predetermined position by using the received sound signal by a sound field synthesis rendering method.

The rendering unit may minimize the difference between the first sound pressure in the sweet spot generated from the speakers included in the group and the second sound pressure in the sweet spot generated from the virtual sound source existing at the predetermined position. The sound pressure signal for each speaker included in the group may be determined, and the received sound signal may be modulated based on the determined sound pressure signal for each speaker.

The rendering unit may determine an impulse response to be applied to each speaker included in the group, and convolve the determined impulse response to an acoustic signal input for each speaker included in the group.

The rendering unit may include a filtering unit which passes the input sound signal to a filter corresponding to a predetermined altitude, a copy unit which generates a plurality of sound signals by copying the filtered sound signal, and the copied sound signals may be output. And an amplifier configured to perform at least one of amplification, attenuation, and delay of each of the replicated acoustic signals based on at least one of a gain value and a delay value corresponding to each of the speakers.

And a listener tracker configured to track the position of the head of the listener in real time, wherein the renderer is configured to obtain gain and phase delay values of at least one of the speakers included in the group based on the tracked position of the listener. It may be characterized by changing.

A computer readable recording medium having recorded thereon a program for executing the stereo sound reproduction method on a computer may be provided.

Hereinafter, with reference to the accompanying drawings will be described a preferred embodiment of the present invention; Advantages and features, and how to achieve them will become apparent with reference to the embodiments described below in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various forms, and the disclosed embodiments are merely provided to fully inform the scope of the invention to those skilled in the art. The invention is only defined by the scope of the claims. Terms used in the specification have been selected from the most widely used general terms in consideration of the function as possible, but may vary according to the intention or precedent of the person skilled in the art, the emergence of new technologies, and the like. In addition, in certain cases, there is also a term arbitrarily selected by the applicant, in which case the meaning will be described in detail in the description of the invention. Therefore, the terms used in the specification should be defined based on the meanings of the terms and the contents throughout the specification, rather than simply the names of the terms. Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. Configurations shown in the embodiments and drawings described herein are only one embodiment, and do not represent all of the technical idea of the present invention, various equivalents and modifications that can be substituted for them at the time of the present application is It should be understood that there may be.

Also, as used herein, the term "part" or "module" refers to a hardware component or circuit, such as an FPGA or an ASIC.

The stereoscopic reproduction environment 100 is an example of an environment in which the listener 110 views stereoscopic sounds through the stereoscopic reproduction device 200 which will be described later. The stereoscopic playback environment 100 is an environment for the playback of audio content alone or with other content such as video, and may be any, such as a room that can be embodied in a home, cinema, theater, auditorium, studio, game console, or the like. It can mean an open, partially closed, or completely closed area of a.

The listener 110 may enjoy multimedia content through the multimedia player 140 such as television or audio. For convenience of description, it is assumed that the listener 110 of the stereoscopic sound reproduction environment 100 listens to the sound of the content played on the television through the plurality of

speakers

145, 160, and 165.

Typically, the television 140 may have a built-in speaker, but the stereoscopic sound reproduction environment 100 may include a separate home theater system. For example, a separate sound bar 145 may be present directly below the television 140. The sound bar 130 may be a speaker array module including a plurality of loudspeakers.

The sound bar 145 according to an exemplary embodiment may include panning, wave field synthesis, beam forming, focus source, and head transmission under a stereo sound reproduction environment 100. A three-dimensional sound field processing technique such as a head related transfer function may be used to virtually reproduce the multi-channel audio signal.

Although FIG. 1 shows the sound bar 145 as a single horizontal linear array positioned at the bottom of the television, the sound bar 145 is a dual horizontal linear array installed above and below the television 140 to provide high elevation, It may be composed of a double vertical linear array positioned to the left and right of the television 140 and a window array of a type surrounding the television 140. In addition, the sound bar 145 may be installed in a form surrounding the listener 110 or positioned in front of and behind the listener 110.

In addition, the stereo sound reproduction environment 100 may include a speaker (not shown) of a home theater system other than the sound bar 145, and may not necessarily include a home theater speaker such as the sound bar 145. .

The listener 110 may include a speaker that constitutes one home theater system and a separate loudspeaker that does not constitute the home theater system in one group to enjoy stereoscopic sound through a plurality of speakers included in the group. . For example, the listener 110 may combine the

separate loudspeakers

160 and 165 physically separated from the sound bar 145 to enjoy the content played on the television 140. Alternatively, the listener 110 may combine the loudspeakers (not shown) built in the television 140 with

separate loudspeakers

160 and 165 that are physically separated from each other to enjoy the content played on the television 140. .

That is, the listener 110 may add

separate loudspeakers

160 and 165 to the existing TV-embedded speaker or sound bar 145 to group them into one group and enjoy stereoscopic sound.

The stereoscopic reproduction environment 100 according to an exemplary embodiment may be configured by grouping the television built-in speaker 140, the sound bar 145, the left loudspeaker 160, and the right loudspeaker 165 into one group 180. Sound can be reproduced. Although only the left loudspeaker 160 and the right loudspeaker 165 of the listener 110 are illustrated in FIG. 1, the listener 110 may be configured according to the size of the space for listening to stereoscopic sound or the style of the content for listening. The number and location can be configured adaptively. For example, the stereoscopic reproduction environment 110 may further include a left rear loudspeaker (not shown) and a right rear loudspeaker (not shown).

The television 140 or a separate display device (not shown) may display a list of speakers composed of one group 180 to the listener 110, and the listener 110 may add any speaker constituting the group, or Can be removed.

The stereoscopic reproduction environment 100 may include a sweet spot 120 which is a spatial range in which optimal stereoscopic sounds can be enjoyed. The stereoscopic sound reproduction environment 100 may set the position of the virtual ear of the listener 110 so that the optimal stereoscopic sound is output from the position of the ear and the adjacent sweet spot 120.

The stereoscopic reproduction environment 100 may perform rendering in which the virtual sound source is positioned at a desired position using the

speakers

145, 160, and 165 in the group 180, and the listener 110 may determine the actual speaker position. It feels as if sound is heard from the position of the virtual sound source.

The stereoscopic sound reproducing apparatus 200 performs the 3D audio rendering to place the virtual sound source at a predetermined position on the input audio signals in the stereoscopic sound reproducing environment 100 described above with reference to FIG. 1 to the listener 110. You can feel the sense of space and three-dimensional.

The stereoscopic sound reproducing apparatus 200 according to an exemplary embodiment may include a receiver 210, a controller (not shown), and a reproducer 240. The controller (not shown) may include a renderer 220 and a grouper 230. The controller (not shown) includes at least one processor such as a central processing unit (CPU), an application processor (AP), an application specific integrated circuit (ASIC), an embedded processor, a microprocessor, hardware control logic, and a hardware finite state machine (FSM). , Digital signal processor (DSP), or a combination thereof.

The receiver 210 may receive an input audio signal (ie, an acoustic signal) from a device such as a digital versatile disc (DVD), a Blu-ray disc (BD), an MP3 player, or the like. The input audio signal may be a multi-channel audio signal such as a stereo signal (2 channels), 5, 1 channel, 7.1 channel, 10.2 channel and 22.2 channel. In addition, the input audio signal may be an object-based audio signal in which a plurality of mono input signals and real-time positions of objects are transmitted in the form of metadata. The object-based audio signal refers to a form in which the position of each audio object arranged in three-dimensional space is compressed into metadata along with sound. In addition, the input audio signal may be a hybrid input audio signal in which a channel audio signal and an object-based audio signal are mixed.

The grouping unit 230 may group at least two speakers existing in the 3D sound reproducing environment 100 into one group. For example, the grouping unit 230 may group the television built-in speaker and the separate loudspeaker into one group. In addition, the grouping unit 230 may group the built-in TV, one or more soundbars and one or more loudspeakers into one group. In addition, the grouping unit 230 may group the existing home theater speaker and the one or more loudspeakers purchased separately by the listener 110 into one group. Speakers in a group may be physically separated from each other. The listener 110 may select speakers to be grouped, and may determine speakers to be added based on the size and characteristics of the space where the listener 110 is located or the nature of the content to be enjoyed.

The grouping unit 230 may group a plurality of physically separated speakers into a group through various communication paths. The communication path may represent various networks and network topologies. For example, the communication path may include wireless communication, wired communication, optical, ultrasound, or a combination thereof. Satellite communications, mobile communications, Bluetooth, Infrared Data Association standard (lrDA), wirelessfidelity (WiFi), and worldwide interoperability for microwave access (WiMAX) can be included in the communication path. Examples of communication. Ethernet, digital subscriber line (DSL), fiber to the home (FTTH), and plain old telephone service (POTS) are examples of wireline communications that can be included in the communication path. In addition, the communication path may include a personal area network (PAN), a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), or a combination thereof.

The grouper 230 may store positions and gains of the speakers existing in the group, and transmit the positions of the speakers to the renderer 220.

The renderer 220 may perform 3D audio rendering for positioning the virtual sound source at a predetermined position with the input audio signals.

For example, the renderer 220 may generate at least one speaker signal corresponding to the audio signal by processing the input audio signal using a wave field synthesis rendering algorithm.

In addition, the rendering unit 220 processes an input audio signal by using a head related transfer function rendering, beam-forming rendering, or focused source rendering algorithm to correspond to the audio signal. At least one speaker signal may be generated.

In addition, the rendering unit 220 may calculate an impulse response for each speaker based on the minimum error summation, or perform rendering to reproduce the sense of altitude. A detailed process of performing the 3D audio rendering by the rendering unit 220 will be described later.

The reproduction unit 240 may reproduce the virtual sound source rendered by the rendering unit 220 through the multichannel speaker. The playback unit 240 may include speakers existing in the group 180.

FIG. 3 illustrates an embodiment of the rendering unit 220 of the stereoscopic sound reproducing apparatus 200. Although the descriptions are omitted below, the contents described above with respect to the stereoscopic sound reproducing apparatus 200 of FIG. The same applies to the stereoscopic sound reproducing apparatus 200 according to the embodiment of the present invention.

The renderer 220 may include an audio signal analyzer 310 and a sound field synthesis renderer 320. The rendering unit 220 may determine a gain and phase delay value for each speaker suitable for the position of the sound image according to the propagation characteristics of the sound image to reproduce the near field focused sound source.

That is, the rendering unit 220 uses a feature in which the magnitude of the sound pressure decreases to 1 / r according to the distance r between the listener 110 and the sound source, so that the output of the speakers in the group is located at the near sound image position to be positioned. The gain between the speakers in the group can be changed to achieve the same sound pressure. In addition, the rendering unit 220 may be configured to converge the output of all the speakers in the group without delay in the desired near-field position in consideration of the propagation delay of the sound field between the virtual sound source and the actual speaker.

The audio signal analyzer 310 may include speaker information in a group, sound source position information (for example, information about a position such as an angle of a virtual sound source with respect to a listening position), and a multichannel audio signal (sound source signal to be positioned). Get input. The speaker information in the group may include information about the sound bar (for example, information about the arrangement such as the position and spacing of the loudspeaker array), position information of the speakers in the group, and the space between the speakers.

The audio signal analyzer 310 may determine the number of channels of the audio signal by analyzing a sound source format of the received multichannel audio signal, and extract each channel sound source signal for each identified channel from the received multichannel audio signal. have.

The sound field synthesis rendering unit 320 renders the multi-channel audio signal by the sound field synthesis method according to the number of audio channels identified by the audio signal analyzer 310. That is, the sound field synthesis rendering unit 320 orients the virtual sound source to a desired position in accordance with the identified number of audio channels. The number of virtual sound images may vary depending on the number of audio channels checked by the audio signal analyzer 310. For example, when the sound source of the multi-channel audio signal is two channels, the sound field synthesis rendering unit 320 may render a virtual sound source in a front left direction and a front right direction, that is, in both directions, in a sound field synthesis method.

For example, the sound field synthesis rendering unit 320 generates a virtual sound source in a total of five directions such as front left direction, front right direction, center direction, rear left direction, and rear right direction based on the position of the speaker in the group. You can render in a composite way. The sound field synthesis rendering unit 320 may change the phase delay value and the gain value of each speaker in the group according to the number and position of the speakers in the group.

If the position of the listener 110 is tracked in real time as described below, the stereoscopic sound reproducing apparatus 200 may change the phase delay value and the gain value of the speakers in the group in real time. For example, if the listener 110 moves to the left side, the gain value or pre delay value of the speakers in the group may be changed to a value optimized for the position of the listener 110 moved to the left side.

FIG. 4A illustrates an embodiment of the rendering unit 220 of the stereoscopic sound reproducing apparatus 200. Although the descriptions are omitted below, the contents described above with respect to the stereoscopic sound reproducing apparatus 200 of FIG. 2 are illustrated in FIG. 4A. The same applies to the stereoscopic sound reproducing apparatus 200 according to the embodiment of the present invention.

The renderer 220 may include an audio signal analyzer 310 and a minimum error adder 420. Since the audio signal analyzer 310 is as described above with reference to FIG. 3, description thereof will be omitted.

Hereinafter, the operation of the minimum error adding unit 420 will be described with reference to FIG. 4B. FIG. 4B is a view showing the virtual sound source 460 and

arbitrary points

470 and 480 within the sweet spot 120 added to the stereo sound reproduction environment 100 of FIG.

The minimum error summing unit 420 is a method that can be applied to allow the listener 110 to enjoy optimal stereo sound within the set sweet spot 120.

The minimum error adder 420 may set a sound pressure pTarget within the sweet spot 120 due to the virtual sound source 460. The virtual sound source 460 refers to assuming that there is an actual sound source at a position where the sound signal is to be positioned.

The minimum error summing unit 420 sets the actual sound pressure pReproduce in the sweet spot 120 generated from the

speakers

145, 160, and 165 in the group, and then the difference between the two sound pressures pReproduce, pTarget (J). The sound pressure signal of each

speaker

145, 160, and 165, which is minimized, may be determined. The minimum error adder 420 may modulate the sound signal received by the receiver 210 based on the determined sound pressure signal.

The arrows shown in solid lines in FIG. 4B represent pReprodece for any point in sweet spot 120 and the arrows shown in dashed lines represent pTarget.

For example, the minimum error adder 420 may calculate pTarget and pReproduce of

arbitrary points

470 and 480 within the sweet spot 120. The minimum error summing unit 420 may calculate J by performing integration on the entire size of the sweet spot 120 to calculate pTarget and pReprodece of all points in the sweet spot 120.

J may be calculated as shown in [Equation 1].

Equation 1

v represents the size of the sweet spot 120, r is the distance between the position of the actual speaker (145, 160, 165) and a specific point (470, 480) within the sweet spot 120 or the position of the virtual sound source 460 And the distance from the

specific point

470, 480 in the sweet spot, t represents the time. w is a weighting function that can be arbitrarily set according to r.

As shown in [Equation 1], the problem of minimizing the difference between two sound pressures is as shown in [Equation 2]. It is possible to substitute the problem of minimizing the difference (J ') of the acoustic transfer function (hreproduced) generated from them.

Equation 2

In Equation 2, i denotes an index for each speaker in a group, and N denotes the total number of speakers in a space. In addition, the ki value may mean a filter coefficient (ie, an impulse response) to be applied to each speaker, and thus the ki value may be determined by minimizing J '. That is, the minimum error adder 420 may determine a filter for each speaker in the group (that is, an impulse response).

The minimum error summing unit 420 determines a sound pressure signal that each

speaker

145, 160, 165 should radiate from Equation 2 in the form of an impulse response for each

speaker

145, 160, 165, and determines The impulse response may be convolved with the acoustic signal received for each

speaker

145, 160, 165. Alternatively, the minimum error summing unit 420 may modulate an input sound signal for each

speaker

145, 160, 165 by estimating gain and phase values from filter values determined for each

speaker

145, 160, 165. The minimum error summing unit 420 according to an embodiment of the present invention considers that the speakers located at the side of the listener 110 have little effect of orienting the virtual sound source, so that the impulse of the speaker array 145 located in front of the listener 110 is small. You can also determine only the response.

The minimum error summing unit 420 sets the sweet spot 120 to be large enough to add the minimum error of the sound field (or sound pressure) transmitted to the sweet spot 120, and then adds the

speaker

145, 160, 165. Star filters can be calculated. Since the sweet spot 120 is large enough, the listener 110 has an advantage of being able to enjoy a stereoscopic sound of a predetermined level or more regardless of movement. If there are two or more listeners, the minimum error adder 420 may set the sweet spot 120 large enough to include two or more listeners.

The minimum error summing unit 420 according to an embodiment calculates the minimum error of the sound field (or sound pressure) transmitted to the sweet spot 120 by setting the sweet spot 120 to be small, and for each

speaker

145, 160, and 165. The filter can be calculated. In this case, since the sweet spot 120 is very small, even if the listener 110 moves a little, the sweet spot 120 may leave the sweet spot 120, thereby making it difficult to enjoy stereoscopic sound. However, since the sweet spot 120 is small, the optimized speaker stars 145, 160, and 165 can calculate the impulse response, so that the listener can enjoy high quality stereo sound within the determined sweet spot 120. If there are two or more listeners, the minimum error adder 420 may set a plurality of sweet spots 120 for providing optimal stereo sound to each listener.

As will be described later, if the position of the listener 110 can be tracked, the sweet spot 120 is also moved according to the movement path of the listener 110, so that the size of the sweet spot 120 may be set smaller than that of the listener 110. Optimum stereoscopic reproduction may be possible regardless of movement.

FIG. 5 illustrates an embodiment of the rendering unit 220 of the stereoscopic sound reproducing apparatus 200. Although the descriptions are omitted below, the contents described above with respect to the stereoscopic sound reproducing apparatus 200 of FIG. 2 are illustrated in FIG. 5. The same applies to the stereoscopic sound reproducing apparatus 200 according to the embodiment of the present invention.

The renderer 220 according to an embodiment may include a filter 520, a replica 530, and an amplifier 540.

The filtering unit 520 passes the sound signal through a predetermined filter corresponding to a predetermined altitude. In addition, the filtering unit 520 may pass the sound signal to a head related transfer filter (HRTF) filter corresponding to a predetermined altitude. HRTF includes the path information from the spatial position of the sound source to both ears of the listener 110, that is, the frequency transfer characteristic. HRTF is diffracted at the head surface, as well as simple path differences such as inter-aural level differences (ILD) between two ears and inter-aural time differences (ITD) between the two ears. In this case, the stereoscopic sound can be recognized by the phenomenon that the characteristics of the complicated path such as the reflection by the wheel and the wheel change according to the direction of sound arrival. Since HRTF has unique characteristics in each direction of space, it can be used to generate stereo sound.

The filtering unit 520 uses an HRTF filter to model sound generated at a higher altitude than actual speakers by using speakers arranged on a horizontal plane. Equation 3 below is an example of an HRTF filter used by the filtering unit 520.

[Equation 3]

HRTF = HRTF2 / HRTF1

HRTF2 is HRTF indicating path information from the position of the virtual sound source to the ear of the listener 110, and HRTF1 is HRTF indicating path information from the position of the actual speaker to the ear of the listener 110. Since the sound signal is output through the actual speaker, in order to recognize that the sound signal is output from the virtual speaker, the HRTF2 corresponding to the predetermined altitude is divided by the HRTF1 corresponding to the horizontal plane (or the height of the actual speaker).

The optimal HRTF filter corresponding to a given altitude is different from person to person such as fingerprint. Therefore, it is desirable to calculate and apply the HRTF for each listener 110, but this is not practical. Thus, HRTF is calculated for some listeners 110 within a group of listeners 110 having similar characteristics (e.g., physical characteristics such as age, height, or preferred frequency band, preferred music, etc.). The representative value (eg, average) may then be determined as the HRTF to apply to all listeners 110 in the population.

An example of the result of filtering the acoustic signal using the HRTF defined in Equation 3 is shown in Equation 4 below.

[Equation 4]

Y2 (f) = Y1 (f) * HRTF

Y1 (f) is a value obtained by converting an acoustic signal heard by the listener 110 into the frequency domain by a real speaker, and Y2 (f) is a value obtained by converting an acoustic signal heard by the listener 110 into the frequency domain by the virtual speaker. to be.

The filtering unit 520 may filter only some of the plurality of channel signals included in the sound signal.

The sound signal may include sound signals corresponding to a plurality of channels. Hereinafter, seven channel signals are defined for convenience of description. However, the channel signal to be described later is merely an example, and the sound signal may include a channel signal indicating a sound signal generated in a direction other than the seven directions described below.

The center channel signal represents an acoustic signal generated at the center of the front face and is output to the center speaker.

The right front channel signal represents an acoustic signal generated on the right side of the front and is output to the right front speaker.

The left front channel signal represents an acoustic signal generated on the left side of the front and is output to the left front speaker.

The right rear channel signal represents an acoustic signal generated on the right side of the rear side and is output to the right rear speaker.

The left rear channel signal represents an acoustic signal generated on the left side of the rear side and is output to the left rear speaker.

The right top channel signal represents an acoustic signal generated from the upper right side, and is output to the right top speaker.

The left top channel signal represents an acoustic signal generated from the upper left and is output to the left top speaker.

If the sound signal includes the right top channel signal and the left top channel signal, the filtering unit 520 filters the right top channel signal and the left top channel signal. Thereafter, the filtered right top channel signal and left top channel signal are used to model a virtual sound source generated at a desired altitude.

When the sound signal does not include the right top channel signal and the left top channel signal, the filtering unit 520 filters the right front channel signal and the left front channel signal. Thereafter, the filtered right front channel signal and left front channel signal are used to model a virtual sound source generated at a desired altitude.

According to an exemplary embodiment, the right top channel signal and the left top channel signal are upmixed to generate a right top channel signal and a left top channel signal, and then mixed. The right top channel signal and the left top channel signal may be filtered.

The replica unit 530 replicates the filtered channel signal into a plurality. The replica unit 530 replicates the number of speakers in the group to output the filtered channel signal. For example, when the filtered sound signal is output as a right top channel signal, a left top channel signal, a right rear channel signal, and a left rear channel signal, the copy unit 530 replicates the filtered channel signals into four. The number of copies of the filtered channel signal by the copying unit 530 may vary depending on the embodiment. However, the copying unit 530 may duplicate the filtered channel signal in two or more so that the filtered channel signal is output to at least the right rear channel signal and the left rear channel signal. It may be desirable.

The speaker on which the right top channel signal and the left top channel signal are to be reproduced are arranged on a horizontal plane. For example, it may be attached directly above the front speaker to reproduce the right front channel signal.

The amplifier 540 amplifies (or attenuates) the filtered sound signal according to a predetermined gain value. The gain value is set differently according to the type of filtered sound signal and the type of filtered sound signal.

For example, the right top channel signal to be output to the right top speaker is amplified according to the first gain value, and the right top channel signal to be output to the left top speaker is amplified according to the second gain value. In this case, the first gain value may be greater than the second gain value. In addition, the left top channel signal to be output to the right top speaker is amplified according to the second gain value, and the left top channel signal to be output to the left top speaker is amplified according to the first gain value so that corresponding channel signals are output from the left and right speakers. Be sure to

The 3D sound reproducing apparatus 200 may output the same sound signal by different gain values from the speakers in the group. The virtual sound source can be easily positioned at an altitude higher than that of the actual speaker, or the virtual sound source can be positioned at a specific altitude independent of the altitude of the actual speaker.

Obviously, the operation of the replica unit 530 and the amplifier 540 may vary according to the number of channel signals included in the input sound signal and the number of speakers in the group.

Although the stereo sound reproducing apparatus 200 has described the method of positioning the virtual sound source at a predetermined position, respectively with reference to FIGS. 3 to 5, one stereo sound reproducing apparatus 200 has been described with reference to FIGS. It is obvious that all or alternatively may be used. In addition, the method of positioning the virtual sound source at a predetermined position by the stereo sound reproducing apparatus 200 is not limited to the above-described example, and the stereo sound reproducing apparatus 200 may use any other method based on the position and the number of speakers in the group. Can be used to orient the virtual sound source at a predetermined position.

The stereoscopic sound reproducing apparatus 200 according to an embodiment may further include a communication unit (not shown). The communication unit (not shown) may include one or more hardware components that allow communication between the 3D sound reproducing apparatus 200 and the peripheral device. For example, the communication unit (not shown) may include short range communication or mobile communication.

Short-range wireless communication includes Bluetooth communication, BLE (Bluetooth Low Energy) communication, near field communication (Near Field Communication), WLAN (Wi-Fi) communication, Zigbee communication, Infrared (IrDA) ), Communication, Wi-Fi Direct (WFD) communication, ultra wideband (UWB) communication, Ant + communication, and the like, but is not limited thereto.

The mobile communication may transmit / receive a radio signal with at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various types of signals according to transmission and reception of an audio signal, an image signal signal, or a text / multimedia message.

The communicator (not shown) may include a listener tracker 610.

5A is a block diagram illustrating another example of a 3D sound reproducing apparatus 200 according to an exemplary embodiment. Therefore, even if omitted below, the above description of the stereoscopic sound reproducing apparatus 200 of FIG. 2 is also applied to the stereoscopic sound reproducing apparatus 200 according to the exemplary embodiment of FIG. 5.

The listener tracker 610 may track a position at which the listener 110 moves. As described above, the sweet spot 120, which is a position where the optimal stereoscopic sound can be enjoyed, is typically determined manually based on the positions of the

speakers

145, 160, and 165.

For example, when the left speaker and the right speaker exist around the listener 110 and the distance between the left speaker and the right speaker is 3 meters, the virtual line is folded by 60 degrees at both ends of the line between the left and right speakers. The sweet spot 120 may be determined based on a point where two virtual lines meet.

Therefore, when the listener 110 moves to the left at the point where two virtual lines meet, a pre-echo phenomenon may occur. For example, the pre-echo phenomenon means that when the position of the listener 110 is shifted from the center to the left, the influence of the left speaker having a large gain and a relatively high pre delay becomes dominant so that the position of the auditory sound image is out of the focused position. It is the phenomenon of listening by listening to the position of left speaker.

If the tracker 610 tracks the position of the head of the listener 110 in real time, the sweet spot 120 may move according to the head position of the listener 110 without being in a fixed position. The stereoscopic sound reproducing apparatus 200 according to an exemplary embodiment may update the sweet spot 120 in real time or at regular time intervals according to the position of the head of the listener acquired from the listener tracking unit 610.

The listener tracker 610 may acquire the head position information of the listener 110 in real time. For example, the listener tracking unit 610 may acquire the head position information of the listener 110 based on the mobile phone, the motion recognition sensor or the position sensor attached to the remote controller possessed by the listener 110. Alternatively, the listener tracking unit 610 may acquire the head position information of the listener 110 using an image processing algorithm such as object tracking or an accessory worn by the listener 110 or a wearable glass such as Google glass. It may be. How the listener tracking unit 610 tracks the head position of the listener 110 is not limited to the above-described example, and any other method may be used.

The listener tracker 610 may obtain head position information of the plurality of listeners 110 in real time. The stereoscopic sound reproducing apparatus 200 sets or obtains one sweet spot 120 including the plurality of listeners 110 based on the head position information of the plurality of listeners 110 obtained. A plurality of sweet spots 120 may be set based on the location.

6B illustrates a change in a sweet spot of a stereo sound listening environment according to an embodiment.

When the listener 110 moves to the left from the existing sweet spot 120, the 3D sound reproducing apparatus 200 may reset the sweet spot 120 based on the position of the moved listener 110.

The renderer 220 may change the gain and delay values of the speakers in the group to suit the moved sweet spot 120.

For example, when the rendering unit 220 orientates the virtual sound source using the WFS method described above with reference to FIG. 3, the pre-echo phenomenon by changing the gain value for each

speaker

145, 160, 165 in the group in real time. Can be reduced.

When the rendering unit 220 orients the virtual sound source using the minimum error calculation method described above with reference to FIG. 4, the sweet spot 120 is set near the position of the head of the tracked listener 110 and the optimized speaker ( 145, 160, 165) can be calculated per impulse response.

In addition, when the rendering unit 220 orients the virtual sound source to a predetermined altitude using the altitude reproduction method described above with reference to FIG. 5, a gain value to be applied to each

speaker

145, 160, 165 in the group and By changing the phase delay value, the position of the elevation angle can be kept constant.

Although not shown, the listener tracking unit 610 may track head positions of the plurality of listeners 110 and set a plurality of sweet spots based on head positions of the listeners. The renderer 220 may change gain and delay values of the speakers in the group based on the positions and sizes of the plurality of sweet spots.

Hereinafter, a method of providing stereo sound to a listener using a plurality of speakers by the stereo sound reproducing apparatus 200 according to an embodiment will be described with reference to the flowcharts of FIGS. 7 and 8. 7 to 8 are diagrams for describing a stereoscopic sound reproducing method performed by the stereoscopic sound reproducing apparatus 200 shown in FIGS. 1 to 6. Therefore, even if omitted below, the above description of the stereoscopic sound reproducing apparatus 200 of FIGS. 1 to 6 may be applied to the stereoscopic sound reproducing method according to the exemplary embodiment of FIGS. 7 to 8.

In operation 710, the 3D sound reproducing apparatus 200 may group the plurality of speakers.

The 3D sound reproducing apparatus 200 may group at least two physically separated speakers into one group. For example, the 3D sound reproducing apparatus 200 may group a television built-in speaker and a separate loudspeaker into one group. In addition, the 3D sound reproducing apparatus 200 may group a built-in television speaker, one or more soundbars, and one or more loudspeakers into one group. In addition, the 3D sound reproducing apparatus 200 may group existing home theater speakers and one or more loudspeakers separately purchased by the listener into one group. Speakers in a group may be physically separated from each other. The listener can select the speakers to be grouped, and can decide which speakers to add based on the size and characteristics of the space where the listener is located or the nature of the content to be enjoyed. The 3D sound reproducing apparatus 200 may group a plurality of physically separated speakers into a group through various communication paths. The communication path may represent various networks and network topologies. For example, the communication path may include wireless communication, wired communication, optical, ultrasound, or a combination thereof.

In operation 720, the 3D sound reproducing apparatus 200 may receive an audio signal. The 3D sound reproducing apparatus 200 may receive an input audio signal from a device such as a DVD, BD, or MP3 player. The input audio signal may be a multi-channel audio signal such as a stereo signal (2 channels), 5, 1 channel, 7.1 channel, 10.2 channel and 22.2 channel. In addition, the input audio signal may receive a plurality of mono input signals and object-based audio signals in which real-time positions of objects are transmitted in the form of metadata. The object-based audio signal refers to a form in which the position of each audio object arranged in three-dimensional space is compressed into metadata along with sound. In addition, the input audio signal may be a hybrid input audio signal in which a channel audio signal and an object-based audio signal are mixed.

In operation 730, the 3D sound reproducing apparatus 200 may perform 3D audio rendering for positioning the virtual sound source at a predetermined position.

For example, the 3D sound reproducing apparatus 200 may generate at least one speaker signal corresponding to the audio signal by processing the input audio signal using a wave field synthesis rendering algorithm. In addition, the stereo sound reproducing apparatus 200 processes an input audio signal using a head related transfer function rendering, beam-forming rendering, or focused source rendering algorithm to process an audio signal. At least one speaker signal corresponding to may be generated. In addition, the 3D sound reproducing apparatus 200 may calculate an impulse response for each speaker based on the minimum error summation, or perform rendering to reproduce the sense of altitude.

In operation 740, the 3D sound reproducing apparatus 200 may reproduce the rendered virtual sound source through the multi-channel speaker.

Steps

710, 720, and 740 are the same as those described with reference to FIG.

In operation 810, the 3D sound reproducing apparatus 200 may track the position of the head of the listener. If the stereo sound reproducing apparatus 200 tracks the position where the head of the listener moves in real time, the sweet spot may move according to the position of the listener without being in a fixed position. The stereoscopic sound reproducing apparatus 200 according to an embodiment may update the sweet spot in real time or periodically according to the acquired head position of the listener.

The 3D sound reproducing apparatus 200 may acquire the head position information of the listener in real time. For example, the 3D sound reproducing apparatus 200 may acquire the head position information of the listener based on a mobile phone, a motion recognition sensor, or a position sensor attached to the remote controller. Alternatively, the 3D sound reproducing apparatus 200 may acquire the head position information of the listener using an image processing algorithm such as object tracking or an accessory worn by the listener or a wearable glass such as Google Glass. It is apparent that the method for the stereo reproducing apparatus 200 to track the position of the head of the listener is not limited to the example described above, and any other method may be used.

In operation 820, the 3D sound reproducing apparatus 200 may position the virtual sound source at a predetermined position based on the head position information of the listener.

For example, the 3D sound reproducing apparatus 200 may change the gain value and the phase delay value of at least one of the speakers in the group using the WFS method based on the moved listener head position.

In addition, the 3D sound reproducing apparatus 200 may reset the sweet spot near the head position of the tracked listener and recalculate the impulse response of at least one of the speakers in the group by using the aforementioned minimum error calculation method.

Also, when the virtual sound source is positioned at a predetermined altitude using the altitude reproduction method, the stereo sound reproducing apparatus 200 changes the gain value and the phase delay value to be applied to at least one of the speakers in the group. The position of the angle can be kept constant.

Meanwhile, the stereoscopic sound reproducing method may be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM. CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and the like, and also include those implemented in the form of carrier waves such as transmission over the Internet. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

The methods, processes, devices, products and / or systems according to the present invention are simple, cost effective, and not complicated and are very versatile and accurate. In addition, by applying known components to processes, devices, products, and systems according to the present invention, efficient and economical manufacturing, application and utilization can be realized while being readily available. Another important aspect of the present invention is that it is in line with current trends that call for cost reduction, system simplification and increased performance. Useful aspects found in such embodiments of the present invention may consequently increase the level of current technology.

While the invention has been described in connection with specific best embodiments thereof, other inventions in which substitutions, modifications, and variations are applied to the invention will be apparent to those skilled in the art in view of the foregoing description. In other words, the claims are intended to cover all such alternatives, modifications and variations of the invention. Therefore, all content described in this specification and drawings should be interpreted in an illustrative and non-limiting sense.

Claims

Grouping the plurality of speakers into one group;

Receiving an audio signal;

Positioning at least one virtual sound source of the sound signal in a predetermined position by using the grouped plurality of speakers; And

Playing the virtual sound source through the plurality of speakers.
The method of claim 1, wherein the grouping of the plurality of speakers into a group comprises:

And including a speaker that constitutes one home theater system and a separate loudspeaker that does not constitute the home theater system in the one group.
The method of claim 2, wherein the home theater system,

A loudspeaker array comprising: a loudspeaker array in which a plurality of loudspeakers are linearly connected.
The method of claim 1, wherein the grouping of the plurality of speakers comprises:

Connecting the plurality of speakers that are physically separated through a wireless or wired network.
The method of claim 1, wherein the positioning of the virtual sound source at a predetermined position comprises:

And positioning the virtual sound image at a predetermined position by a sound field synthesis rendering method.
The method of claim 1, wherein the positioning of the virtual sound source at a predetermined position comprises:

In the group capable of minimizing the difference between the first sound pressure in the sweet spot generated from the speakers included in the group and the second sound pressure in the sweet spot generated from the virtual sound source present in the predetermined position Determining a sound pressure signal for each speaker included; And

And modulating the received sound signal based on the determined sound pressure signal for each speaker.
The method of claim 6,

Computing the sound pressure signal for each speaker included in the group includes the step of determining the impulse response to be applied for each speaker included in the group,

The modulating the received sound signal comprises convolving the determined impulse response to the sound signal input for each speaker included in the group.
The method of claim 1, wherein the positioning of the virtual sound image at a predetermined position comprises:

Passing the input sound signal through a filter corresponding to a predetermined altitude;

Replicating the filtered sound signal to generate a plurality of sound signals; And

Performing at least one of amplification, attenuation, and delay for each of the replicated acoustic signals based on at least one of a gain value and a delay value corresponding to each of the speakers to which the replicated acoustic signals are to be output. A stereo reproduction method characterized by the above-mentioned.
The method of claim 1,

Tracking the position of the head of the listener in real time;

Positioning the virtual sound source at a predetermined position includes changing a gain and phase delay value of at least one of the speakers included in the group based on the tracked listener's head position. Stereo playback method.
A grouping unit for grouping the plurality of speakers into one group;

Receiving unit for receiving a sound signal;

A rendering unit for positioning one or more virtual sound sources of the sound signal at a predetermined position by using the grouped speakers; And

And a reproducing unit reproducing the virtual sound source through the plurality of speakers.
The method of claim 10, wherein the grouping unit,

And a loudspeaker constituting one home theater system and a separate loudspeaker not constituting the home theater system in the one group.
The method of claim 10, wherein the home theater system,

And a loudspeaker array in which a plurality of loudspeakers are linearly connected.
The method of claim 10, wherein the grouping unit,

And connecting the plurality of physically separated speakers through a wireless or wired network.
The method of claim 10, wherein the rendering unit,

And reproducing the virtual sound image at a predetermined position by using the received sound signal by a wave field synthesis rendering method.
The method of claim 10, wherein the rendering unit,

In the group capable of minimizing the difference between the first sound pressure in the sweet spot generated from the speakers included in the group and the second sound pressure in the sweet spot generated from the virtual sound source present in the predetermined position And determining a sound pressure signal for each speaker included, and modulating the input sound signal based on the determined sound pressure signal for each speaker.
The method of claim 15, wherein the rendering unit,

And determining an impulse response to be applied for each speaker included in the group, and convolving the determined impulse response to an acoustic signal input for each speaker included in the group.
The method of claim 10, wherein the rendering unit,

A filtering unit which passes the input sound signal to a filter corresponding to a predetermined altitude;

A replica unit generating a plurality of sound signals by copying the filtered sound signal; And

And an amplifier configured to perform at least one of amplification, attenuation, and delay for each of the replicated acoustic signals based on at least one of a gain value and a delay value corresponding to each of the speakers to which the replicated acoustic signals are output. Stereo playback device characterized in that.
The method of claim 10,

Further comprising a listener tracker for tracking the position of the head of the listener in real time,

And the rendering unit changes a gain and a phase delay value of at least one of the speakers included in the group based on the tracked head position of the listener.
A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 9 on a computer.