WO2006057131A1

WO2006057131A1 - Sound reproducing device and sound reproduction system

Info

Publication number: WO2006057131A1
Application number: PCT/JP2005/019711
Authority: WO
Inventors: Yoshiki Ohta
Original assignee: Pioneer Corporation
Priority date: 2004-11-26
Filing date: 2005-10-26
Publication date: 2006-06-01
Also published as: JPWO2006057131A1

Abstract

Even if the listening position of a listener is changed, a filter coefficient is calculated flexibly and an optimum filter characteristic is acquired. A sound reproducing device (1) images the listening space of a viewer by a camera (161) installed in an imaging section (16). The position of the viewer is detected from the image data captured by the camera (161). From the detection result a signal processing section (13) computes a filter coefficient used for signal processing, performs signal processing of sound data using the computed filter coefficient, and outputs the processed data to an SP array system (2). When a change of the listening position of the viewer is detected, the filter coefficient is changed accordingly.

Description

Specification

Sound reproduction device, sound reproduction system

Technical field

TECHNICAL FIELD [0001] The present invention relates to an acoustic reproduction apparatus and an acoustic reproduction system, and more particularly to a directivity control technique for a linear arrangement or a planar arrangement type speaker apparatus.

Background art

[0002] In recent years, using an SP array system in which a plurality of speaker units (hereinafter, “speakers” are abbreviated as “SP”) having substantially uniform characteristics are arranged linearly or planarly, Various sound reproduction systems that reproduce a certain sound field have been proposed (for example, patent documents)

1). In this sound reproduction system, DSP (Digital Signal Processor) equipped with various filters such as FIR (Finite Impulse Response) filter is used to perform signal processing on the input sound data, and the sound is amplified from each SP unit. The system adopts a configuration that realizes an optimal sound field by controlling the directivity of the sound by changing the characteristics such as the sound expansion timing and sound pressure level.

Patent Document 1: JP-A-5-041897

Disclosure of the invention

Problems to be solved by the invention

[0003] By the way, in the conventional sound reproduction system as described above, if the listener's listening position changes, the sound that has been amplified must be changed unless the filter coefficient for signal processing is changed accordingly. It becomes impossible to control the directivity of. Therefore, in the conventional sound reproduction system, a configuration is adopted in which the filter coefficient is changed by actually amplifying the sound from the SP and collecting the amplified sound using a microphone installed at the listening position. I have

[0004] However, if the sound reproduction system is installed in a noise-prone environment that is easily affected by the surrounding noise environment or the like, the opportunity to re-set the filter coefficient is limited. Momentum, it was difficult to change the filter coefficient according to the change of the listening position of the user. In addition, the user's listening position is grasped by the microphone. In order to do so, the user must always wear the microphone, which is not realistic.

[0005] The present application has been made in view of the circumstances described above. As an example of the problem, even when the listener's listening position changes, the filter coefficient can be calculated flexibly and optimized. An object of the present invention is to provide a sound reproducing device and a sound reproducing system capable of obtaining a satisfactory filter characteristic.

Means for solving the problem

In order to solve the above-described problem, according to one aspect of the present application, the sound reproducing device according to claim 1 is configured to amplify an audio signal by using a plurality of speakers arranged in a listening space. An acquisition means for acquiring the acoustic signal, a detection means for detecting a listening position of the listener in the listening space, and a reproduction condition for controlling directivity with respect to the detected listening position Setting means for setting the signal, signal processing means for subjecting the acquired acoustic signal to signal processing based on the reproduction conditions, and driving the plurality of speakers based on the acoustic signal subjected to the signal processing And a driving means.

[0007] Further, according to another aspect of the present application, an acoustic reproduction system according to claim 13 includes a plurality of speakers arranged in a listening space, and an acoustic reproduction device that amplifies an acoustic signal by the speakers. The sound reproduction device includes: an acquisition unit that acquires the acoustic signal; a detection unit that detects a listening position of a listener in the listening space; and a directivity for the detected listening position. Setting means for setting the reproduction condition, a signal processing means for performing signal processing based on the reproduction condition for the acquired acoustic signal, and the plurality of sound signals based on the acoustic signal subjected to the signal processing. Drive means for driving a speaker.

Brief Description of Drawings

FIG. 1 is a block diagram showing a configuration of a sound reproduction system S in the first embodiment.

FIG. 2 is a diagram showing an installation example of an SP array system 2 and a camera 161 in the same embodiment.

FIG. 3 is a block diagram showing a specific configuration of a signal processing unit 13 in the same embodiment.

FIG. 4 is a flowchart showing processing executed by the system control unit 17 in the embodiment. It is.

FIG. 5 is a flowchart showing processing executed by the system control unit 17 in the same embodiment.

FIG. 6 is a conceptual diagram showing changes that occur in the image of the current frame when the system control unit 17 executes processing in the embodiment.

FIG. 7 is a flowchart showing setting processing of the SP array system 2 executed by the system control unit 17 in the same embodiment.

FIG. 8 is a diagram showing an example of the relationship between the SP array system 2 and the camera 161 in the listening space and the listener.

[Fig. 9] Fig. 9 is a diagram showing a relationship between a sound wave to be amplified by each SP unit 2-k and a delay amount when directivity is controlled in the same embodiment.

FIG. 10 is a diagram showing a configuration example of the signal processing unit 13 in the case of performing signal processing on 5. lch acoustic data in Modification 1-4.

FIG. 11 is a block diagram showing a configuration of a sound reproduction system S2 in the second embodiment.

FIG. 12 is a flowchart showing processing executed by the system control unit 17 in the same embodiment.

FIG. 13 is a flowchart showing processing executed by the system control unit 17 in the same embodiment.

Explanation of symbols

[0009] S, S 2 · · · Sound reproduction system

1 · · · Sound reproduction equipment

2 "SP array system

3 · · · Sound source output device

BEST MODE FOR CARRYING OUT THE INVENTION

[0010] First, before entering a specific description of the sound reproduction system that works on the present embodiment, the basic idea on which the present application is based will be described as follows. First, the main factor that has made it difficult to change the filter coefficient in the conventional sound reproduction system is that it is necessary to actually collect the sound expanded from each SP unit with a microphone. That is. When such a method is adopted, it is necessary to pay attention to the surrounding environment at the time of measurement that is easily affected by the measurement result due to the surrounding environment (particularly noise) at the time of measurement, and the user always wears a microphone. The need arises. Therefore, from the viewpoint of changing the filter coefficient flexibly and simply, it is never a good idea to calculate the filter coefficient using sound as in the conventional sound reproduction system.

[0011] On the other hand, in order to control the directivity of the expanded sound and reproduce an appropriate sound field, the direct sound reaches the listening position at the same time, and the focus of the sound expanded by each SP unit is heard. It is necessary to match the picking position. From this point of view, the distance from each SP unit to the listening position is the most important factor, and if this distance can be calculated, the filter coefficient can be calculated easily.

[0012] Therefore, when creating the present application, we searched for a method that could identify the listening position without using sound, and as a result, we focused on image processing technology as one method.

. Specifically, a method was adopted in which the listening space was sequentially imaged by an imaging device such as a camera, the listening position of the listener was identified based on the captured image, and the filter coefficient was calculated. As a result, it is possible to calculate the filter coefficient without using sound that is sensitive to the measurement environment, and to obtain optimum filter characteristics.

[0013] It should be noted that the methodology focused on powerful image processing technology is merely an example. For example, the position of the listener is specified using various sensors such as a temperature sensor, and the filter is based on the specified position. It is of course possible to calculate the coefficients.

[0014] [1] First embodiment

The sound field control technology realized by a flexible and simple method as described above, for example, has a high ambient noise level in a facility such as an art museum or a museum, and has a high ambient noise level. This is especially useful when the audience changes (successfully) as the audience moves (ie, the listener). Therefore, in the present embodiment, an explanation will be given by taking as an example an acoustic reproduction system S that makes audio announcements to visitors in this type of facility.

[0015] [1. 1] Configuration of the first embodiment

First, FIG. 1 is a block diagram showing a configuration of an acoustic reproduction system S that works on this embodiment. The As shown in the figure, the sound reproduction system S according to the present embodiment includes a sound reproduction device 1, an SP array system 2, and a sound source output device 3, and the sound data supplied from the sound source output device 3 On the other hand, the sound reproduction apparatus 1 performs signal processing, and the SP array system 2 is used to amplify the sound corresponding to the sound data.

[0016] This SP array system 2 is configured by arranging, for example, a plurality of SP units 2-k (k = l, 2,..., N) having substantially uniform characteristics such as performance in a horizontal row. Each SP unit 2-k is driven by the sound reproducing device 1. In addition, this SP array system 2 is installed in the vicinity of an exhibition E to be viewed in a facility such as a museum as shown in the installation example shown in FIG. 2, for example. This is used to amplify voice announcements.

[0017] The sound source output device 3 is configured by a media playback device such as a CD (Compact disc) or a DVD (Digital Versatile Disc), for example, and by playing a sound source such as a CD, it can respond to an audio announcement regarding each exhibit Output acoustic data. Note that the number of sound data channels (hereinafter abbreviated as “channel” and “ch”) output by the sound source output device 3 is arbitrary, but in this embodiment, the sound source output device 3 is monaural. The case where acoustic data corresponding to multi-channel audio is handled will be described later as a modified example.

The sound reproduction device 1 performs signal processing on the sound data output from the sound source output device 3 and outputs the processed signal to the SP array system 2. During such signal processing, the sound reproducing device 1 images the real space where the exhibit E is displayed (hereinafter referred to as “listening space” t), calculates the listening position of the audience, and calculates the calculated listening position. Based on the above, the filter coefficient for signal processing is calculated. As a result, signal processing according to the filter coefficient is performed on the sound data in the sound reproducing device 1, and the direct sound amplified from each SP unit 2-k constituting the SP array system 2 reaches the listening position at the same time. In this way, the directivity of the expanded voice is controlled.

[0019] It should be noted that the focal point of the voice amplified by the SP array system 2 may be made to coincide with the position of the audience, but in the system for making the voice announcement as in this embodiment, the voice is used. If the focus of the viewer is matched with the viewer's position, the power will be in front of the user. Since it may be heard as if an audio announcement has been made and this may give a sense of incongruity, the present embodiment adopts a configuration in which a focal point is provided at a position several tens of centimeters away from the audience.

[0020] In order to realize the functions as described above, the sound reproducing device 1 according to the present embodiment includes an operation unit 11, an external device interface unit 12 (hereinafter, "interface" is abbreviated as "IZF"), A signal processing unit 13, a DZA (digital Z analog) conversion unit 14, an amplifier unit 15, an imaging unit 16, a system control unit 17, an image recording unit 18, and a system bus that interconnects these elements 19 and.

The operation unit 11 is configured by, for example, an operation panel cover provided with a power button and the like, and outputs an operation signal corresponding to an operation performed by the operator to the system bus 19.

The external device I / F unit 12 is a communication iZF such as IEEE (The Institute of Electrical and Electronic Engineers) 1394, and has a plurality of connection terminals for connecting external devices. In this embodiment, the sound source output device 3 is connected to the connection terminal as an external device, and the sound reproduction device 1 exchanges data with the sound source output device 3 via the external device IZF unit 12. Do.

[0023] The signal processing unit 13 is mainly composed of a DSP (Digital Signal Processor), and performs signal processing on the acoustic data input from the external device IZF unit 12 according to the filter coefficient determined by the system control unit 11. , Output to DZA converter 14. A specific configuration of the signal processing unit 13 is shown in FIG.

As shown in the figure, in the present embodiment, the signal processing unit 13 includes an acoustic data dividing unit 131, and the acoustic data dividing unit 131 is supplied with an acoustic signal supplied from the external device IZF unit 12. Data is entered. The input acoustic data is divided into a number (“n”) corresponding to the SP unit 2-k in the acoustic data division unit 131 (hereinafter, the divided acoustic data is referred to as “unit data”). Delay processing unit through a separate path for each unit data)

32 delay finalizers 1321— are input to k (k = l, 2,..., N).

[0025] This delay filter 1321-k is a filter for delaying the output timing of the input sound, and changes the unit data output timing according to the filter coefficient input from the system control unit 17 to adjust the level. Level adjustment filter 1331- k (k = l, Output to 2, ..., n).

[0026] The level adjustment filter 1331-k is a filter for adjusting the sound pressure level of the input unit signal according to the filter coefficient input from the system control unit 17, and the unit signal after the level adjustment is obtained. Output to DZA converter 14.

[0027] The DZ A conversion unit 14 includes, for example, a number corresponding to the number of SP units 2-k constituting the SP array system 2, that is, "n" DZA converters. DZA conversion is performed on the unit data input from the signal processing unit 13 via the network (hereinafter, the unit data after the D ZA conversion is referred to as “unit signal”). Then, the DZA conversion unit 14 outputs the unit signals obtained in this way to the amplifier unit 15 through different noses.

[0028] The amplifier unit 15 includes SP units 2-k (n =) of amplifiers 15-k (k = l, 2,.

, N), and the amplifier 15-k also adjusts the gain of the unit signal input through the DZA converter 14 through different paths. The amplifier section 15 has an output terminal, and outputs the unit signal whose gain is adjusted by each amplifier 15-k to the corresponding SP unit 2-k through a different path.

[0029] It should be noted that the form of this output terminal is arbitrary, and a separate connection connector may be provided for each SP unit 2-k, or a plurality of output terminals may be provided in one connection connector. Each unit signal may be output to SP unit 2-k via a different path.

Next, the imaging unit 16 includes a camera 161, generates image data corresponding to an image captured by the camera 161, and outputs the image data to the system control unit 17. Although the output timing of the image data to the system control unit 17 is arbitrary, in order to make the description more specific in this embodiment, a buffer memory is provided in the imaging unit 16, and 1 The method of transferring to the system control unit 17 when the generation of image data for the frame is completed is adopted. In addition, the camera 161 may be configured separately from the sound reproducing device 1 or may be incorporated in the sound reproducing device 1. As an example of the installation, the SP array system shown in FIG. The method of installing in the center part of 2 can be considered.

Next, the system control unit 17 is mainly configured by a CPU (Central Processing Unit), and comprehensively controls each unit of the sound reproduction device 1. For example, the system control unit 17 Based on the image data supplied from the imaging unit 16, the listening position of the audience in the listening space is calculated. Then, a filter coefficient for performing signal processing is calculated based on the calculated listening position, and the calculated filter coefficient is output to the signal processing unit 13. As a result, in the signal processing unit 13, the filter coefficients used when performing signal processing in the delay filter 1321-k and the level adjustment filter 13 31-k are changed, and the output timing and sound pressure level of each unit data are changed. Will be adjusted. The specific processing contents executed by the system control unit 17 will be described in detail in the section “Operation”.

[0032] The image recording unit 18 is composed of, for example, a video random access memory (VRAM) or a static random access memory (SRAM), and is used as a work area when the system control unit 17 calculates the listening position of the audience. Used.

[0033] [1.2] Operation of the first embodiment

Next, a specific operation of the sound reproduction system S that has the above-described configuration and works according to the present embodiment will be described.

[0034] First, the sound reproduction device 1 and the sound source output device 3 are turned on to perform a sound announcement that introduces the exhibit E to the audience by the sound reproduction system S. Then, using this power-on as a trigger, the system control unit 17 starts the processing shown in FIGS.

In this process, the system control unit 17 first executes a background image acquisition process (step Sal). At this time, the system control unit 17 outputs a control signal to the imaging unit 16 that starts imaging of the listening space at a frame rate of, for example, about 30 frames Zsec. With the input of this control signal as a trigger, the imaging unit 16 starts imaging of the listening space by the camera 161, and each time image data corresponding to each frame is acquired, the imaging unit 16 is sequentially supplied to the system control unit 17. Transition. In this way, the system control unit 17 obtains image data corresponding to the background image (hereinafter referred to as “background image data”) based on the image data supplied from the imaging unit 16, and stores the background image data. It is recorded in the image recording unit 18.

In this case, what data is used as background image data is arbitrary. For example, data corresponding to a predetermined frame in image data supplied from the imaging unit 16 is extracted as background image data. You may make it do. However, in this embodiment, the background image In order to ensure the accuracy of the following, the following method shall be adopted.

[0037] First, the system control unit 17 sequentially buffers the image data supplied from the imaging unit 16 for a predetermined time (for example, 5 seconds), and sets the pixel component value corresponding to each frame to the following ( Substitute into equation 1).

[Number 1]

D B (x, y) = (1 / T) ∑ image (x, y, t) (Equation 1)

[0038] Here, in (Equation 1)! /, “T” means the total number of frames, and “image (x, y, t)” is the coordinate (x in the “t” -th frame) , y) means the pixel component value. Therefore, “DB (x, y)” calculated by (Equation 1) means the average value of the pixel component values from the “1” th frame to the “T” th frame. 17 handles the image data represented by the pixel component value “DB (x, y)” as background image data. Note that the pixel component value “imag e (x, y, t)” can be determined by the RGB system value and the YCC system value, and is determined by the format of the image data supplied from the imaging unit 16. You can do that.

[0039] When the acquisition of the background image data is completed in this way, the system control unit 17 outputs a control signal to the sound source output device 3 via the external device IZF unit 12, and the acoustic data corresponding to the voice announcement Starts playing (step Sa2). As a result, in the sound source output device 3, for example, acoustic data recorded on a medium such as a CD is read and sequentially supplied to the signal processing unit 13 via the external device IZF unit 12. Become.

[0040] When the power is applied, the acoustic data supplied from the external device IZF unit 12 is divided into unit data by the signal processing unit 13 and subjected to signal processing. A is converted, amplified in the amplifier unit 15, and sequentially output from the SP array system 2. It should be noted that what kind of coefficient is set as the filter coefficient at the time of power-on is arbitrary, and the filter coefficient set by default may be set in advance.

[0041] Next, the system control unit 17 monitors the image data sequentially supplied from the imaging unit 16, and acquires the image data corresponding to the current frame (step Sa3). Specifically, the system control unit 17 acquires image data sequentially supplied from the imaging unit 16 and supports the current frame. The processed image data is developed in the frame buffer in the image recording unit 18.

When the image data corresponding to the current frame is acquired in this way, the system control unit 17 performs the pixel component value “DB (x, y)” of the background image data calculated in (Equation 1) above. Is substituted into the following (Equation 2) (step Sa4), and based on this calculation result, the power or power that the viewer is present within the angle of view of the camera 161, that is, the power or power that the viewer is framed in. It becomes a state to judge whether or not (step Sa5).

[Equation 2]

D (x, y) =

I imageP (x, y)-DB (x, y) ί (Formula 2)

[0043] Here, in (Expression 2), “/ imageP (x, y)” on the right side means the pixel component value of the current frame. In general, if the current frame has not changed compared to the background image, "imag _e P ( _X , y)" and 0 ^) "take almost the same value and D (x, y) becomes smaller. On the other hand, for example, if the audience enters the frame and changes occur in the current frame, the difference between `` imageP (x, y) '' and WDB (x, y) increases and D (x, y ) Value increases. Therefore, “D (x, y)” is an index (hereinafter referred to as “energy amount”) for determining whether or not the visitor has entered the frame within the current frame, and this value exceeds a predetermined value. In this case, it is estimated that the audience has entered the frame.

Therefore, in step Sa5, the system control unit 17 compares the calculated energy amount “D (x, y)” with the threshold value “thD”, and the energy amount “D (x, y)” is compared with the threshold value. When “thD” is exceeded! /, It is determined that the viewer is in the current frame (step Sa5 “yes”). Here, when it is determined “no” in this determination, the system control unit 17 is in a state of determining whether or not to end the process (step Sal3). When it is determined “no”, the system control unit 17 executes step Sa3. While the image data corresponding to the next frame of the frame is acquired and the processing of steps Sa4 and Sa5 is repeated, the processing ends when V is determined as “yes” in step Sa13.

On the other hand, if it is determined as “yes” in step Sa5, the system control unit 17 executes the processing of steps Sa6 to SalO to identify the position of the audience face. The processing content at this time will be described in more detail with reference to FIG. Figure 6 shows the system FIG. 5 is a conceptual diagram showing changes that occur in the image of the current frame by processing executed by the control unit 17, and in the figure, the skin color region is indicated by diagonal lines.

First, at this time, the system control unit 17 performs skin color region extraction processing based on the image data corresponding to the current frame f (step Sa6). In general, it is known that pixels exhibiting a skin color are distributed between “Cr” = 133 to 173 and “Cb” = 77 to 127 in the pixel component values of the YCC system. Therefore, it is possible to extract a skin color region by specifying a pixel satisfying the above condition in the image data corresponding to the current frame.

[0047] For example, in the example shown in FIG. 6, an image in which the viewer's face and hands are extracted as the skin color area, and only the skin color area is extracted like fl (hereinafter referred to as "skin color extraction image"). ) Will be obtained. If the image data supplied from the imaging unit 16 is indicated by RGB pixel component values, use (Equation 3) to convert to YCC pixel component values to obtain Cr and Cb values. It is necessary to do.

[Equation 3]

[0048] When the skin color extraction image fl is obtained by the above processing, the system control unit 17 sets all the pixel component values corresponding to the skin color pixels to "1" and supports other pixels, for example. All pixel component values are set to “0”, and the current frame is binary-coded (step Sa7). As a result, the flesh color extraction image fl is expressed in black and white, only the flesh color area is painted white, and all other areas are filled with black, and converted to a binary image f2.

When the binarization of the current frame is completed in this way, the system control unit 17 reads the reference image f3 for area extraction corresponding to the face (step Sa8). Note that what image is used as the reference image for extracting the face area is arbitrary. For example, assuming that a circular area corresponding to the average size of a human face or an elliptical area is assumed as the reference image f3, a binary image having only “1” in the area is used. For example, the face area in the current frame can be properly identified. [0050] Next, the system control unit 17 determines whether or not the viewer's face is framed in the current frame (step Sa9). At this time, the system control unit 17 calculates the difference between the area “1” of the binarized image f2 and the reference image f3, and calculates the difference between all the circular areas set in the reference image f3. A search is made as to whether or not there is a region where the average value of the differences is equal to or less than a predetermined threshold value (for example, as small as possible). If such an area does not exist, the system control unit 17 makes a determination of “no” in step Sa9, determines whether or not the power to end the process (step Sal3), and determines r _y esj. If the determination is “no”, the process returns to step Sa3 to acquire image data of a new frame, and the processes of steps Sa4 to Sa9 are performed again based on the image data. Execute.

[0051] On the other hand, when it is determined that the face has entered into the frame! / (Step Sa9 “yes”), the system control unit 17 determines the coordinates (ΗΧ, ΗΥ) for specifying the face region. Is calculated (step Sa 10). At this time, there may be a plurality of regions corresponding to the face. In such a case, the system control unit 17 calculates coordinates (ΗΧ, ΗΥ) for each region.

[0052] It should be noted that what position is used as the coordinate origin is arbitrary. Also, the calculation method of the coordinates (ΗΧ, Η 任意) is arbitrary. For example, the above binary image 22 is compared with the reference image f3, and the coordinate value corresponding to the position where the correlation between the two is highest is obtained. You can also write (ΗΧ, とする).

[0053] Next, the system control unit 17 determines whether or not the amount of change in the previous frame force of the calculated coordinates (もの, ΗΥ) exceeds a predetermined value (step Sai l), so that the viewer can Determine the power of moving your face a little while standing or whether the audience is moving. In this determination, the system control unit 17 changes the process as follows depending on how many regions recognized as faces exist in the frame.

In this case, the system control unit 17 compares the coordinates (ΗΧ, ΗΥ) calculated in step SalO with the coordinates (ΗΧ, ΗΥ) calculated in the processing based on the previous frame, and changes in both coordinates (that is, It is determined whether or not the force (distance on the frame) exceeds a predetermined value. <When multiple areas are recognized as faces> In this case, a plurality of coordinates (ΗΧ, ΗΥ) are calculated as described above, but the system control unit 17 calculates based on the previous frame among the calculated coordinates (基づい, ΗΥ).ヽ coordinates (ΗΧ, ΗΥ) where the amount of change in the specified coordinates (ΗΧ, ΗΥ) is small, and the amount of change exceeds the specified value based on the specified coordinates (ΗΧ, ΗΥ) Judge whether or not it is. By adopting such a method, the sound image that was suddenly adjusted to one audience during the playback of the voice announcement was adjusted to the other audience, causing the viewer to feel uncomfortable. It becomes possible to prevent.

[0054] If it is determined in step Sal 1 that "!" And "no", the system control unit 17 executes the process of step Sal3 without executing the process of step Sal2. If “yes” is determined in step 3, the process ends. If “no” is determined, the process returns to step Sa3, and the processes in steps Sa4 to Sall are performed again based on the image data corresponding to the next frame. repeat.

[0055] On the other hand, if “yes” is determined in step Sai, the system control unit 17 executes the setting process of SP array system 2 (step Sal2), and then executes the process of step Sal3. If “no” is determined in step Sal3, the processing in steps Sa3 to Sal3 is repeated again, whereas if “yes” is determined in Sal3, the processing is terminated.

Here, the setting processing of the SP array system 2 in step Sal2 will be described in detail with reference to FIG. 7 and FIG. FIG. 7 is a flowchart showing the contents of the setting process, and FIG. 8 is a diagram showing the relationship between the SP array system 2 and the power camera 161 in the listening space and the listener.

[0057] In the setting process of the SP array system 2, the system control unit 17 first converts the coordinates (ΗΧ, ΗΥ) calculated in step SalO into real coordinates (RHX.RHY, RHZ) in the listening space. (Step Sal2-1). In this conversion, the system control unit 17 converts the coordinates (ΗΧ, ΗΥ) into real coordinates (RHX, RHY, RHZ) by the following method.

First, when the distance from the camera 161 is “d” as shown in FIG. 8, (RHZ) in the real coordinate system is represented by “d”, for example. For example, if the angle of view of the camera 161 in the horizontal direction (X direction) is “Θ” and the angle of view in the vertical direction (y direction) is “Φ”, the actual image displayed within the angle of view is displayed. The space size is in the X and y directions, [Expression 4] x = 2 dtan (0/2), y = 2 dtan (Φ / 2) *.. (Expression 4). Here, since the angles of view “Θ” and “Φ” can be determined when the camera 161 is manufactured, the coordinates (HX, HY) on the frame can be heard if only “d” in the listening space can be specified. It is possible to convert to real coordinates (RHX, RHY, RHZ) in space. In addition, about the identification method of "d", it is arbitrary, for example,

Decide the location where visitors can enter in advance, and set the distance “d” to a fixed value by, for example, placing a rope at the location.

A method of calculating the distance “d” from the camera 161 to the audience based on the focusing state of the camera 161 without setting the distance “d” as a fixed value;

Etc. can be adopted.

[0059] When the conversion to the real coordinates (RHX, RHY, RHZ) is completed in this way, the system controller 17 shifts from each SP unit 2-k constituting the SP array system 2 in step Sal2-1. The distance from the calculated actual coordinates (RHX, RHY, RHZ) is calculated (Step Sal 2-2), and the filter coefficient is calculated according to the calculation result (Step Sal 2-3).

The processing contents in the system control unit 17 at this time will be described as follows with reference to FIG. First, calculate the distance from each real unit (RHX, RHY, RHZ) to each SP unit 2-k. Note that the calculation method at this time is arbitrary. For example, coordinate values in real space corresponding to each SP unit 2-k are set in advance, and real coordinates (RHX, RHY, RHZ) are calculated from the coordinates. It may be possible to calculate the distance between them.

[0061] When the calculation of the effective distance is completed, the system control unit 17 determines that the SP unit 2-k (SP unit 2-1 in the illustrated example) and the real coordinates (RHX, RHY, RHZ) ) Distance is calculated as “LJ, the difference from the distance from other SP units 2-k” Δ 1-kJ is calculated for each SP unit 2-k. Next, the system controller 17 This calculation result

[Equation 5] 厶 t = △ 1 k Zc (Equation 5)

(However, “C” is the speed of sound.) Calculate the delay time of each unit data, that is, the delay time of the voice that is louder than each SP 2-k power. At this time, the system control unit 17 calculates the attenuation rate of the voice that is amplified based on the calculated “A l-k”, and determines the sound pressure level corresponding to each unit data based on the calculation result.

[0062] Then, the system control unit 17 inputs the filter coefficient calculated in step Sal 2-3 to the signal processing unit 13, and changes the filter coefficient used when performing signal processing in the signal processing unit 13. (Step Sal 2-4), and the process ends. As a result of executing a series of powerful processes, the filter coefficient in the signal processing unit 13 is changed at any time with the movement of the audience, and as a result, the focal point of the audio is between the audience and the exhibition E. It will move, and your power will be recognized as if the sound source exists between you and the exhibit.

[0063] In this way, the sound reproducing device 1 according to the present embodiment is a sound reproducing device 1 that amplifies a sound signal by the SP array system 2 arranged in the listening space, and acquires sound data. External device IZF unit 12, imaging unit 16 and system control unit 17 including camera 161 that detects the listening position of the listener in the listening space, and filter coefficient for controlling the directivity to the detected listening position A system control unit 17 for setting the signal, a signal processing unit 13 for performing signal processing on the acoustic data based on the set filter coefficient, and a DZA conversion for driving the SP array system 2 based on the acoustic data subjected to the signal processing The unit 14 and the amplifier unit 15 are provided.

[0064] With this configuration, the filter coefficient is determined based on the listening position acquired by the processing in the system control unit 17, and the signal processing is performed on the acoustic signal based on the filter coefficient! . Therefore, even in an environment where the listener's listening position changes, it is possible to calculate the filter coefficient flexibly and always obtain optimum filter characteristics.

[0065] In addition, since the SP array system 2 is used in the above configuration, filter coefficients such as delay times for each SP unit 2-k are calculated, and signals are generated based on the filter coefficients! By performing the processing, it is possible to precisely control the directivity of the sound output from the SP array system 2 and realize optimal sound field control. [0066] In particular, when the camera 161 is used to capture an image of the listening space as an image in units of frames, as in the case of the sound reproducing device that is particularly useful in the present embodiment, the listening position in each frame is accurately determined based on the image data. Therefore, it is possible to improve the calculation accuracy of the filter coefficient.

[0067] Further, if a method corresponding to the face of the listener in the frame is specified in the powerful configuration and a method corresponding to the face corresponding to the face is detected as a listening position, the focus of the sound is set around the face. It is possible to calculate the filter coefficient as provided.

[0068] Furthermore, if the skin color area in the frame is identified and the area is identified as an area corresponding to the listener's face, the position of the listener's face can be reliably identified. It becomes. As described above, the Cr and Cb values in this case are 133 to 173 (Cr value) and 77 to 127 (Cb value).

[0069] Further, when the image data corresponding to each frame is binarized by dividing it into a skin color area and other areas, the amount of calculation at the time of matching processing with the reference image f3 is reduced, and processing is performed. It is possible to improve the efficiency and time of the image and to improve the accuracy of identifying the skin color region.

[0070] Furthermore, as in the present embodiment, it is possible to detect a change in the listening position between the frames based on the image data, and change the filter coefficient as needed according to the detected change in the listening position. Even in an environment where the listener is moving, it is possible to change the focal position sequentially and reproduce the optimal sound field.

[0071] In the present embodiment, the case of outputting the sound data including the sound announcement has been described. However, the contents of the sound data output from the sound source output device 3 include, for example, music and movie sound. It can be anything.

[0072] In addition, in the sound reproduction system S that is effective in the present embodiment, the method of calculating the filter coefficient in the above step Sal2-3 is adopted. For example, the SP unit 2-k and the actual coordinates (RHX, RHY , RHZ) may be provided with a table for converting the calculation result of the distance to the filter coefficient, and the filter coefficient may be determined based on the table.

Furthermore, in the present embodiment, the level adjusting unit 133 is provided in the signal processing unit 13, and the sound pressure level of each unit data is changed by performing filtering. Shi However, the sound pressure level may be adjusted by adjusting the amplification factor in the amplifier unit 15.

[0074] Furthermore, in the above embodiment, the SP array system 2 has the force described for the case where the SP units 2-k are arranged in a horizontal row. SP array system 2 with 2-k may be configured! If this method is used, the focal point of the sound can be changed three-dimensionally.

Furthermore, in the above-described embodiment, the method of imaging the listening space with one camera 161 has been adopted. However, the listening space may be imaged with a plurality of cameras.

[0076] In addition, in the above-described embodiment, in the case where the method of focusing the sound on one of the spectators when a plurality of spectators enter the frame is used, the sound is displayed at the center position of the frame. It's okay to form the focus of! ,.

Furthermore, in the above embodiment, the case where the SP array system 2 is used has been described. However, a configuration using a plurality of full-range speakers may be used. Even in this case, the same effect can be obtained by the same configuration as described above.

Furthermore, in the first embodiment, the acoustic data dividing unit 131 divides the acoustic data into a number corresponding to the SP unit 2-k number of the SP array system 2, and the unit obtained by the division. A configuration is adopted in which data is processed by the delay processing unit 132 or the like! However, it is not always necessary to divide the acoustic data into unit data of the number of all SP units 2-k. An SP unit group is formed for each SP unit 2-k, and the SP unit is also used when dividing the acoustic data. Signal processing may be performed by dividing into groups.

Furthermore, in the first embodiment, an example in which the SP array system 2 and the camera 161 are arranged in the vicinity of the exhibit E has been described (FIGS. 2 and 8). However, the camera 161 is not limited to this, and may be installed at other positions. For example, by installing the camera 16 1 at the upper part of the room and imaging the upper force listening space of the room, the coordinates may be calculated, and the filter coefficient may be determined based on the calculated coordinate values. Is possible.

In the first embodiment, pixels distributed between “Cr” = 133 to 173 and “Cb” = 77 to 127 in the image captured using the camera 161 are skin-colored regions. age The extraction method was adopted. However, by setting the values of “Cr” and “Cb” to appropriate values, for example, the positions of listeners of different races can be specified.

[0081] [1. 3] Modification

(1) Modification 1-1

In the above embodiment, a configuration has been adopted in which a change in the position of the listener is detected for each frame and the filter coefficient is changed. However, the change of the filter coefficient may be performed, for example, by separating voice announcements. When this method is adopted, after step Sal 1 in Fig. 5 above, it is determined whether or not the force has reached the filter coefficient change timing. If this determination is "no", the processing of step Sal2 is performed. The process of step Sal3 is performed without performing the process. On the other hand, if “yes” is determined, the process of step Sal2 may be performed.

[0082] (2) Modification 1-2

In the first embodiment, a binary image 22 is simply generated for each frame, the binary image 22 is compared with the reference image f3, and the viewer's face is frame-in. Therefore, a configuration for determining whether or not was adopted. However, by adopting the following method, it is possible to further improve the accuracy of specifying the face area.

[0083] First, it is generally considered that the human face region is located at the top of the body. Therefore, it can be assumed that there is a low possibility that a face exists in the lower region in the captured image. Therefore, the image corresponding to each frame is divided into a plurality of areas, and for example, the area is divided into a high area and a low area where the face is likely to appear, and each area is weighted. When there are a plurality of areas determined to be faces, it is determined which area should be prioritized according to the weighting. As a result, for example, even if the viewer has a skin tone! /, The face region can be identified with certainty.

In this case, the region dividing method and the weighting method are arbitrary.

[0085] (3) Modification 1-3 In the above embodiment, the system control unit 17 compares the binary image f2 and the reference image f3 to identify the area of the viewer's face, and coordinates based on the area corresponding to the face. The method of calculating (ΗΧ, ΗΥ) was adopted. This is to prevent the filter coefficient from being changed because the area is recognized as a face even though the face is not actually in frame.

However, if there is no need to prevent the filter coefficient from being changed, the following method can be adopted. That is, if the region estimated as a face is determined based on the reference image f3 in step Sa8 in FIG. 4 and it is determined that the region does not exist, the process does not proceed to step Sal3. In the digitized image f2, it becomes `` 1 '', and a certain number or more of the pixels are concentrated, and the region that is present at the top is identified as the face, and the step SalO is specified. The coordinates (ΗΧ, ΗΥ) are calculated based on the area!

If this method is adopted, it is possible to reliably specify a face area even for a viewer whose face is smaller than the area defined by the reference image f3.

[0088] (4) Modification 1-4

In the first embodiment, the sound reproduction system s has been described by way of example in the case where it is installed in a facility such as a museum, but the above method can also be applied to a home sound reproduction system.

In this case, in the sound reproduction system S, the sound reproduced is usually 2ch or 5.lch, which is not monophonic sound as in the above embodiment. When processing similar to the above is performed on such multi-channel acoustic data, it is necessary to change the configuration of the signal processing unit 13 to that shown in FIG. FIG. 10 is a diagram showing a configuration example when signal processing is performed on 5. lch acoustic data.

[0090] As shown in the figure, in the case of a powerful configuration, for each component corresponding to each channel of acoustic data (ie, front right, front left, center, surround right, surround left, subwoofer) Acoustic data division unit 131FR, FL, C, SR, SL, SW, delay processing unit 132FR, FL, C, SR, SL, SW, level adjustment unit 133FR, FL, C, SR, SL, SW, It is necessary to provide When the signal processor 13 performs signal processing, In this case, the acoustic data is divided for each channel, and the delay processing and the sound pressure level are adjusted for each unit data.

[0091] In this case, the signal processing unit 13 includes the same number of addition circuits P-k (l, 2, · · · as the SP units 2-k.

· ·, N) are provided, and in this adder circuit P-k, the unit data of each channel must be added for each SP unit 2-k. By adopting a powerful configuration, the delay unit and the sound pressure level are adjusted separately for each channel's sound data, and then the unit data for each channel for each SP unit 2-k. Are added and output to the DZA conversion unit 14.

On the other hand, since the configuration of the signal processing unit 13 is different as described above, it is necessary to change the processing content to be executed in Step Sal2 in FIG. That is, when the system control unit 17 calculates the filter coefficient, it is necessary to calculate the filter coefficient in each filter and input the calculated coefficient to the signal processing unit 13.

Note that other configurations and operations are the same as those in the first embodiment described above, and thus the details are omitted.

[0094] Thus, according to the sound reproduction system S that works on the present modification, even when a system for reproducing multi-channel sound data is constructed, the position of the listener is also estimated as the image power. It is possible to generate the focal point of the sound at the position. For example, even when using the sound reproduction system S at home, the filter coefficient at the time of signal processing can be changed without performing complicated measurement work. It becomes.

[0095] It should be noted that the installation position of the camera 161 is arbitrary in Modification 1-4 above, and may be installed in the vicinity of the SP array system 2, and the sound reproduction system S is installed. You may make it install the camera 161 in the upper part of a room.

Further, in the present modification, the following application can also be performed.

First, the sound reproducing device 1 is provided with a memory for recording a history of listening positions. When the system control unit 17 calculates the actual coordinates (RHX, RHY, RHZ), the calculated actual coordinates (RHX, RHY, RHZ) are recorded in this memory as a history. . If there is no listener power in the listening space, the average listening position is statistically calculated based on the history of the listening position, that is, the history of the actual coordinates (RHX, RHY, RHZ). To calculate The filter coefficient is calculated using the position as the listening position, and signal processing is performed based on the calculated filter coefficient. In this case, as a method of managing the listening position history, the real space (RHX, RHY, RHZ) itself is used to divide the listening space into several areas rather than managing the listening position. It is also possible to manage the history for each area. Specifically, the area to which the real coordinates (RHX, RHY, RHZ) that are set as the listening position belong is recorded in the memory as a history, and when there is no listener power in the listening space ^ An area that has often been set as a listening position is identified. In the region, for example, a predetermined position (for example, a center point) is determined in advance, and the filter coefficient is calculated using the position as the listening position.

[0098] When a powerful method is adopted, the position assumed to be the listening position is automatically identified based on the history of the listening position, and the optimum sound field is automatically reproduced at that position. It is possible to set the filter coefficient to.

[0099] [2] Second embodiment

[2.1] Configuration and operation of the second embodiment

FIG. 11 is a block diagram showing the configuration of the sound reproduction system S 2 that works on the present embodiment. In the figure, elements similar to those in FIG. 1 are given the same reference numerals as in FIG.

[0100] Here, the sound reproduction system S that is powerful in the first embodiment starts to reproduce the sound announcement when the sound reproduction device 1 is turned on, and then continues to reproduce the sound announcement. Was adopted. On the other hand, the sound reproduction system S according to the present embodiment has the sound data output to the sound source output device 3-1 (1 = 1, 2,..., M) when the viewer enters the frame. A configuration is adopted in which playback is started and playback of the acoustic data is stopped when the audience is out of frame. In addition, in this sound reproduction system S2, when a plurality of visitors enter the frame at the same time, a separate voice announcement is given to each visitor, and an optimal sound field is reproduced at each listener's listening position. Thus, the directivity of the voice is controlled.

[0101] In order to realize a powerful function, the external device IZF unit 12 is provided with a plurality of connection terminals in the sound reproducing device 1 that is powerful in the present embodiment, and a plurality of sound source output devices 3-1 are connected. It has been continued. The reproduction and stop of the sound data in the plurality of sound source output devices 3-1 are controlled by the system control unit 17 of the sound reproduction device 1 based on the detection results of the frame-in and frame-out of the viewer. In addition, each sound source output device 3-1 sound data to which power is supplied is input to the signal processing unit 13 through a separate path, and after being subjected to different signal processing, the DZA conversion unit 14 To be supplied.

[0102] Thus, in this embodiment, each sound source output device 3-1 needs to perform separate signal processing on the supplied acoustic data, and therefore the signal processing unit 13 is different from that shown in FIG. The circuit configuration is as follows. Specifically, in this embodiment, the signal processing unit 13 divides the number of acoustic data (ie, “m”) corresponding to the sound source output device 3-1 having the configuration shown in FIG. The unit 131, the delay processing unit 132, and the level adjustment unit 133 are provided. The sound data output from each sound source output device 3-1 is divided into unit data by the corresponding sound data dividing unit 131, subjected to signal processing, and then added to each SP unit by the adder circuit P. It is added every 2-k and output to the D / A converter 14.

In the sound reproduction system S2 having a powerful configuration, when the power of the sound reproduction device 1 is turned on, the system control unit 17 starts the processes shown in FIGS. 12 and 13 are processes executed by the system control unit 17. In these figures, the same steps as those in FIGS. 4 and 5 described above are denoted by the same step numbers. is there.

In this process, similarly to the process shown in FIG. 4, the system control unit 17 first executes the background image acquisition process (step Sal), generates background image data, and then executes step Sa3. Image data corresponding to the current frame is acquired, and in steps Sa4 and Sa5, it is determined whether or not the viewer is framed in the current frame.

If “no” is determined in this determination, the system control unit 17 determines whether sound data has already been reproduced in the sound source output device 3-1 (step SalOl), and “no” ”, The process proceeds to step Sal 3 as it is, whereas when“ yes ”is determined, playback of the already played sound data is stopped (step Sal02), and then the process proceeds to step Sal3. To do. As a result, for example, if the viewer goes out of the frame in the middle of the voice announcement, the reproduction of the acoustic data is stopped. On the other hand, if it is determined as “yes” in step Sa5, the system control unit 17 executes the processing of steps Sa6 to Sa9. If it is determined “no” in step Sa9, the system control unit 17 executes the process of step Sal3.

On the other hand, if “yes” is determined in step Sa9, system control unit 17 calculates coordinates (ΗΧ, ΗΥ) for specifying the face area (step SalO). At this time, there may be a plurality of regions corresponding to the face. In such a case, the system control unit 17 calculates coordinates (ΗΧ, ΗΥ) for each region.

[0108] Next, the system control unit 17 determines whether there is a framed one or a framed out one in all the calculated coordinates (ΗΧ, ΗΥ) (step Sal03) o At this time The system control unit 17 determines “yes” when at least one of the coordinates (ΗΧ, ΗΥ) calculated in step SalO is not present in the previous frame.

[0109] At this time, a method for determining whether or not the coordinates (ΗΧ, ΗΥ) framed in or out from the system control unit 17 exists is arbitrary, but in the present embodiment, Will be described as adopting the following method.

First, a threshold is set for the amount of change in coordinate values between frames, and the coordinates (ΗΧ, ΗΥ) that have changed within the threshold range are recorded in association with the coordinates (ΗΧ, ΗΥ) in the previous frame. Manage the history of changes in (ΗΧ, ΗΥ). Then, it is predicted whether or not each coordinate (ΗΧ, 力) is out of frame in the current frame. Also, coordinates (ΗΧ, ΗΥ) that are not history-managed are handled as being framed in.

[0110] Through the above processing, when it is determined "yes" in step Sal03, the system control unit 17 executes a reproduction control process (step Sal04). Specifically, the system control unit 17 allocates the sound source output device 3-1 to the newly framed coordinates (ΗΧ, ΗΥ), and allocates the acoustic data division unit 131 of the signal processing unit 13, A control signal is output to the allocated sound source output device 3-1, and playback of the acoustic data is started. In addition, when the framed out coordinates (ΗΧ, ΗΥ) exist, the system control unit 17 outputs a control signal to the sound source output device 3-1 assigned to the coordinates (ΗΧ, Η Υ). , Acoustic data Is stopped.

[0111] When the reproduction control process is completed in this way, the system control unit 17 executes the setting process of the SP array system 2 (step Sal05), and then executes the process of step Sal3. At this time, the system control unit 17 calculates a filter coefficient for each coordinate (ΗΧ, ΗΥ), and inputs the calculated filter coefficient to the corresponding delay processing unit 132 and level adjustment unit 133 to change the filter coefficient. To make it happen. The other points are the same as in FIG.

On the other hand, if “ηο” is determined in step Sal03 〖, the system control unit 17 executes step Sal05 without executing the regeneration control process (step Sal04), and then performs step Sal3. Execute the process. In this way, by changing the filter coefficient in the signal processing unit 13, a focus is formed around each viewer, and an optimal sound field is reproduced.

[0113] In this way, in the sound reproduction system S2 that is effective in the present embodiment, even if a plurality of visitors are in frame, it is possible to provide an appropriate sound field for each visitor. It becomes possible. In addition, when adopting a powerful configuration, voice announcements are stopped when the audience is not in the frame, so it is possible to reduce power consumption.

[012] [2.2] Modification of Second Embodiment

In the second embodiment, the speed of movement of each visitor has not been taken into account. However, it is also possible to manage the change history of each coordinate (ΗΧ, ΗΥ) with the same configuration as in the second embodiment, calculate the moving speed of the coordinate, and perform the following control. That is, when the moving speed of the coordinates (ΗΧ, ΗΥ) exceeds a predetermined threshold, the sound source output device 3 assigned to the coordinates (ΗΧ, Η Υ) stops playing the acoustic data. It is assumed to be configured. When such a configuration is adopted, for example, it is possible to perform control such as being interested in the exhibits and not performing audio announcements to the visitors.

[0115] In addition, when adopting a powerful method, it is also possible to make more detailed audio announcements for visitors with long downtime.

Claims

The scope of the claims

[1] A sound reproduction device that amplifies an acoustic signal by a plurality of speakers arranged in a listening space,

Obtaining means for obtaining the acoustic signal;

Detecting means for detecting a listening position of the listener in the listening space;

Setting means for setting a reproduction condition for controlling directivity with respect to the detected listening position;

Signal processing means for performing signal processing based on the reproduction condition on the acquired acoustic signal;

Driving means for driving the plurality of speakers based on the acoustic signal subjected to the signal processing;

A sound reproducing device comprising:

[2] The plurality of speakers constitute a speaker array whose arrangement positions are fixed in advance, and the signal processing unit divides the acoustic signal into the same number of speaker groups each including a predetermined number of the speakers. The signal processing is performed on each of the divided acoustic signals,

The setting means sets the reproduction condition for each speaker group.

The sound reproducing device according to claim 1, wherein:

[3] The detection means further includes a camera that captures an image of the listening space as a frame unit,

2. The sound reproduction device according to claim 1, wherein the listening position is detected based on image data corresponding to an image captured by the camera.

[4] The sound according to claim 3, wherein the detection means identifies an area corresponding to the face of the listener in the frame, and detects an area corresponding to the face as the listening position. Playback device.

5. The sound reproducing device according to claim 4, wherein the detecting means specifies a skin color region in the frame, and specifies the region as a region corresponding to the face of the listener.

[6] The detection means has a Cr value of 133 to 173 and a Cb value of 77 to 127. 6. The sound reproduction device according to claim 5, wherein a pixel is specified as a skin color pixel, and an area where a predetermined number or more of the pixels are collected is specified as the skin color area.

[7] The detection means divides the image data corresponding to each frame into the skin color area and the other areas, and then specifies the area corresponding to the face of the listener. 6. The sound reproducing device according to claim 5, wherein

[8] The detection means detects a change in the listening position between the frames based on the image data,

4. The sound reproduction device according to claim 3, wherein the setting unit changes the reproduction condition as needed according to the detected change in the listening position.

[9] The detection means calculates a moving speed of the listener based on a change in the listening position,

9. The sound reproduction device according to claim 8, wherein the acquisition unit changes a content of an acoustic signal to be acquired according to the moving speed.

[10] The detection means detects whether or not the listener is present in the frame based on the image data,

4. The sound reproduction apparatus according to claim 3, wherein the acquisition unit acquires the acoustic signal only when the detection unit detects that the listener is present.

[11] The signal processing means includes a plurality of filters for performing different signal processing on the same or different acoustic signals acquired simultaneously.

The detecting means detects a plurality of listening positions corresponding to each of the plurality of listeners based on the image data when a plurality of the listeners are projected in the frame,

4. The sound reproducing device according to claim 3, wherein the setting unit sets a filter coefficient in each filter of the signal processing unit based on each of the detected plurality of listening positions.

[12] The acquisition means acquires the acoustic signals of two or more channels,

The setting means, based on the detected listening position, for each channel. 2. The sound reproduction device according to claim 1, wherein reproduction conditions are set.

A plurality of speakers arranged in the listening space, and a sound reproducing device for expanding a sound signal by the speakers,

The sound reproducing device is

Obtaining means for obtaining the acoustic signal;

A sound reproduction system comprising: