CN110099351B

CN110099351B - Sound field playback method, device and system

Info

Publication number: CN110099351B
Application number: CN201910257583.9A
Authority: CN
Inventors: 徐冠基; 李文强; 吕晓鹏; 张魁炜; 张海进
Original assignee: CRRC Qingdao Sifang Co Ltd
Current assignee: CRRC Qingdao Sifang Co Ltd
Priority date: 2019-04-01
Filing date: 2019-04-01
Publication date: 2020-11-03
Anticipated expiration: 2039-04-01
Also published as: CN110099351A

Abstract

The embodiment of the invention discloses a method, a device and a system for playing back a sound field.A sound acquisition module acquires a source sound signal and sends a digital sound signal obtained by encoding to a filtering submodule of the sound acquisition module; the filtering submodule separates the digital sound signal into a low-frequency sound signal and a high-frequency sound signal; the decoding submodule respectively decodes the low-frequency sound signals and the high-frequency sound signals to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to 10 sound boxes; and each sound box respectively performs superposition playback on the received low-frequency sub sound signal and the high-frequency sub sound signal so as to restore the sound field of the collected source sound. Therefore, collected sound signals are divided into high-frequency sound signals and low-frequency sound signals through filtering, the high-frequency sound signals and the low-frequency sound signals are decoded respectively, a sound field playback system comprising 10 sound boxes in special position relation is introduced, the sound field playback precision is improved, the coverage area of the sound field playback is enlarged, and therefore the listening experience of a listener is improved.

Description

Sound field playback method, device and system

Technical Field

The present invention relates to the field of signal processing technologies, and in particular, to a sound field playback method, apparatus, and system.

Background

With the increasing demand and the development of sound field playback technology, the subjective feeling of the listener is more considered during sound field playback. Currently, commonly used sound field playback techniques include: the 3D sound rendering technology based on the spherical coordinate system can only build a three-dimensional virtual sound source, cannot realize accurate playback of a source sound field, and builds a sound field highly similar to the source sound field, so the 3D sound rendering technology based on the spherical coordinate system is only applied to playback of artificial environment effect sounds in the occasions of commercial performances, cinemas, operas houses, concerts and the like, but is not applicable to the occasions with the requirement of accurate playback of the sound field.

At present, recorded binaural sound recording signals can be played back through a binaural headphone or a 2.1 channel speaker by using a sound spatialization technology and a 3D sound playback technology of a head-related transfer function. However, the playback mode can only ensure that the playback of one point is accurate, that is, only one optimal listening point exists, and as long as the head position of a listener deviates from the optimal listening point, feelings such as lack of on-site feeling and three-dimensional space feeling can occur, so that the playback effect of a sound field is influenced; the head of the listener is fixed at the optimal listening point, and poor listening experience is brought to the listener due to the limitation that the listener can not move after wearing the earphone for a long time.

Based on this, it is desirable to provide a sound field playback method, which can expand the optimum listening range and improve the listening experience of the user on the basis of providing playback of various sound fields with high reproduction degree.

Disclosure of Invention

In order to solve the above technical problems, the present invention provides a method, an apparatus and a system for playing back a sound field, so as to accurately play back sound fields of various frequency bands even in a wide range, thereby improving the listening experience of a listener.

In a first aspect, a method for playing back a sound field is provided, which is applied to a system for playing back sound, and the system comprises a sound acquisition module, a sound processing module, a sound card, a digital-analog DA converter and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with the DA converter, and the DA converter is respectively connected with the 10 sound boxes placed in the semi-anechoic chamber; the method comprises the following steps:

the sound acquisition module acquires a source sound signal and sends a digital sound signal obtained by encoding to the filtering submodule of the sound acquisition module;

the filtering submodule carries out frequency division processing on the digital sound signal and separates out a low-frequency sound signal and a high-frequency sound signal;

the decoding submodule respectively decodes the low-frequency sound signals and the high-frequency sound signals to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to the 10 sound boxes;

the DA converter receives the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals forwarded by the sound card, performs DA conversion on the signals respectively, and then correspondingly sends the signals to the 10 sound boxes;

and each loudspeaker box in the 10 loudspeaker boxes respectively superposes and plays back the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal so as to restore the sound field of the source sound.

Optionally, the 10 enclosures comprise 2 subwoofer enclosures and 8 active listening enclosures;

4 active monitoring sound boxes of the 8 active monitoring sound boxes are uniformly distributed on a circumference which takes a reference point of a semi-anechoic chamber as a circle center and takes a preset length as a radius, the circumference is installed at a first preset height away from the ground through a support, and the playing surfaces of two non-adjacent active monitoring sound boxes are oppositely placed; the other 4 active monitoring sound boxes are uniformly distributed on a circumference which takes the reference point as the circle center and takes the preset length as the radius, the active monitoring sound boxes are arranged at a second preset height away from the ground through a support, and the playing surfaces of the 2 nonadjacent active monitoring sound boxes are oppositely arranged;

the 2 subwoofer sound boxes are symmetrically arranged on two sides of the center line of the circumference, the distance from the reference point is greater than the preset length, and the 2 subwoofer sound boxes are arranged on the ground of the semi-anechoic chamber;

the preset length is determined according to the frequency of a source sound signal, and the first preset height and the second preset height are determined according to the distance between the two ears of a listener and the ground.

Optionally, the first preset height is 0.53 m, the second preset height is 1.6 m, and the distance between the two ears of the listener and the ground is 1.1 m; the preset length is 2.5 meters.

Optionally, the decoding submodule performs decoding using an Ambisonic decoding algorithm.

Optionally, the decoding sub-module decodes the low-frequency sound signals to obtain 10 low-frequency sub-sound signals corresponding to the 10 sound boxes, and includes:

the decoding submodule calculates a speed vector corresponding to each sound box according to the gain, the direction angle and the elevation angle of each sound box;

the decoding submodule determines a first directivity factor corresponding to each sound box in an iterative optimization mode based on the speed vector corresponding to each sound box and the gains of all the sound boxes;

the decoding submodule calculates a low-frequency sub sound signal corresponding to each sound box based on a first directivity factor corresponding to each sound box, the gain, the direction angle, the elevation angle and the digital sound signal of the sound box;

the decoding submodule decodes the high-frequency sound signals to obtain 10 high-frequency sub-sound signals corresponding to the 10 sound boxes, and the decoding submodule comprises:

the decoding submodule calculates an energy vector corresponding to each sound box according to the gain, the direction angle and the elevation angle of each sound box;

the decoding submodule determines a second directivity factor corresponding to each sound box in an iterative optimization mode based on the energy vector corresponding to each sound box and the gains of all the sound boxes;

and the decoding submodule calculates the high-frequency sub-sound signal corresponding to each sound box based on the second directivity factor corresponding to each sound box, the gain, the direction angle, the elevation angle and the digital sound signal of the sound box.

Optionally, the 2 subwoofer boxes are adapted to have a frequency range of 19hz to 100hz, and the 8 active listening boxes are adapted to have a frequency range of 48hz to 20000 hz; the system for sound playback is suitable for use in a frequency range of 19hz to 20000 hz.

In a second aspect, a device for playing back a sound field is also provided, which is applied to a system for playing back sound, and the system comprises a sound acquisition module, a sound processing module, a sound card, a digital-analog DA converter and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with the DA converter, and the DA converter is respectively connected with the 10 sound boxes placed in the semi-anechoic chamber; the device comprises:

the acquisition unit is used for acquiring a source sound signal by the sound acquisition module and transmitting a digital sound signal obtained by encoding to the filtering submodule of the sound acquisition module;

the frequency division unit is used for performing frequency division processing on the digital sound signal by the filtering submodule to separate out a low-frequency sound signal and a high-frequency sound signal;

the decoding unit is used for the decoding submodule to respectively decode the low-frequency sound signals and the high-frequency sound signals to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to the 10 sound boxes;

the conversion unit is used for receiving the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals forwarded by the sound card through the DA converter, respectively performing DA conversion on the signals and then correspondingly sending the signals to the 10 sound boxes;

and the playback unit is used for performing superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal by each sound box in the 10 sound boxes so as to restore the sound field of the source sound.

Optionally, the decoding submodule performs decoding by using an Ambisonic decoding algorithm.

Optionally, the decoding unit comprises a first decoding sub-unit and a second decoding sub-unit,

the first decoding subunit includes:

the first calculating subunit is used for calculating the corresponding speed vector of each sound box by the decoding submodule according to the gain, the direction angle and the elevation angle of each sound box;

the first determining subunit is used for determining a first directivity factor corresponding to each sound box by the decoding submodule in an iterative optimization mode based on the speed vector corresponding to each sound box and the gains of all the sound boxes;

the second calculation subunit is used for the decoding submodule to calculate the low-frequency sub sound signal corresponding to each sound box based on the first directivity factor corresponding to each sound box, the gain, the direction angle, the elevation angle and the digital sound signal of the sound box;

the second decoding subunit includes:

the third calculation subunit is used for calculating the energy vector corresponding to each sound box by the decoding submodule according to the gain, the direction angle and the elevation angle of each sound box;

the second determining subunit is used for determining a second directivity factor corresponding to each sound box by the decoding submodule in an iterative optimization mode based on the energy vector corresponding to each sound box and the gains of all the sound boxes;

and the fourth calculating subunit is used for calculating the high-frequency sub sound signal corresponding to each sound box by the decoding submodule based on the second directivity factor corresponding to each sound box, the gain, the direction angle, the elevation angle and the digital sound signal of the sound box.

In a third aspect, there is also provided a system for sound field playback, including: the sound processing system comprises a sound acquisition module, a sound processing module, a sound card, a digital analog DA converter and 10 sound boxes;

the sound processing module comprises a filtering submodule and a decoding submodule, and the 10 sound boxes comprise 2 subwoofer sound boxes and 8 active monitoring sound boxes; 4 active monitoring sound boxes of the 8 active monitoring sound boxes are uniformly distributed on a circumference which takes a reference point of a semi-anechoic chamber as a circle center and takes a preset length as a radius, the circumference is installed at a first preset height away from the ground through a support, and the playing surfaces of two non-adjacent active monitoring sound boxes are oppositely placed; the other 4 active monitoring sound boxes are uniformly distributed on a circumference which takes the reference point as the circle center and takes the preset length as the radius, the active monitoring sound boxes are arranged at a second preset height away from the ground through a support, and the playing surfaces of the 2 nonadjacent active monitoring sound boxes are oppositely arranged; the 2 subwoofer sound boxes are symmetrically arranged on two sides of the center line of the circumference, the distance from the reference point is greater than the preset length, and the 2 subwoofer sound boxes are arranged on the ground of the semi-anechoic chamber; the preset length is determined according to the frequency of a source sound signal, and the first preset height and the second preset height are determined according to the distance between the two ears of a listener and the ground;

the sound acquisition module is used for acquiring a source sound signal and transmitting a digital sound signal obtained by encoding to the filtering submodule of the sound acquisition module;

the sound processing module is used for carrying out frequency division processing on the digital sound signal through the filtering submodule to separate out a low-frequency sound signal and a high-frequency sound signal; decoding the low-frequency sound signals and the high-frequency sound signals respectively through the decoding submodule to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to the 10 sound boxes;

the sound card is used for receiving the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals and forwarding the DA converter;

the DA converter is used for DA converting the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals and correspondingly sending the converted signals to the 10 sound boxes;

the 10 sound boxes are respectively used for performing superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal so as to restore the sound field of the source sound.

In the embodiment of the present invention, the sound field playback is accurately performed by a sound playback system, which specifically includes: the sound processing system comprises a sound acquisition module, a sound processing module, a sound card, a digital analog DA converter and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with a DA converter, and the DA converter is respectively connected with 10 sound boxes placed in the semi-anechoic chamber. The method for playing back sound by the system specifically comprises the following steps: the sound acquisition module acquires a source sound signal and sends a digital sound signal obtained by encoding to a filtering submodule of the sound acquisition module; the filtering submodule carries out frequency division processing on the digital sound signal and separates out a low-frequency sound signal and a high-frequency sound signal; the decoding submodule respectively decodes the low-frequency sound signals and the high-frequency sound signals to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to 10 sound boxes; the DA converter receives the 10 low-frequency sub-sound signals and the 10 high-frequency sub-sound signals forwarded by the sound card, and respectively performs DA conversion on the signals and then correspondingly sends the signals to the 10 sound boxes; and each sound box respectively performs superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal so as to restore the sound field of the collected source sound. Therefore, according to the sound field playback method provided by the embodiment of the invention, digital sound signals corresponding to the source sound signals are filtered, and the obtained high-frequency sound signals and low-frequency sound signals are decoded respectively, so that the sound field playback precision is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art to obtain other drawings according to these drawings.

FIG. 1 is a schematic structural diagram of a system for sound field playback according to an embodiment of the present invention;

FIG. 2 is a schematic top view of a semi-anechoic chamber in an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a method for playing back a sound field according to an embodiment of the present invention;

fig. 4 is a schematic diagram of placement positions of 10 speakers according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for sound field playback according to an embodiment of the present invention.

Detailed Description

At present, a commonly used sound field playback method is: the 3D sound rendering technology based on the spherical coordinate system can only build a three-dimensional virtual sound source and build a sound field which is highly similar to a source sound field, so that the 3D sound rendering technology based on the spherical coordinate system is only applied to playback of artificial environment effect sounds in the occasions such as commercial performances, cinemas, operas houses, concerts and the like, and is not suitable for the occasions with the requirement of accurate playback of the sound field because the accurate playback of the source sound field cannot be realized. Another way of playing back the sound field is: the recorded double-ear recorded sound signals are subjected to relatively accurate stereo field playback through a double-track earphone or a 2.1-track loudspeaker by adopting a sound spatialization technology and a 3D sound playback technology of a head-position related transfer function, but the playback accuracy of one point can be ensured, namely, only one optimal listening point exists, so that feelings such as lack of site feeling and lack of three-dimensional space feeling can be caused as long as the head position of a listener deviates from the optimal listening point, and the sound field playback effect is influenced; the head of the listener is fixed at the optimal listening point, and poor listening experience is brought to the listener due to the limitation that the listener can not move after wearing the earphone for a long time.

Based on this, the embodiment of the present invention provides a sound field playback method, which performs accurate sound field playback through a sound playback system, where the system specifically includes: the sound processing system comprises a sound acquisition module, a sound processing module, a sound card, a digital analog DA converter and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with a DA converter, and the DA converter is respectively connected with 10 sound boxes placed in the semi-anechoic chamber. According to the method for playing back the sound, on one hand, the digital sound signals corresponding to the source sound signals are filtered, and the obtained high-frequency sound signals and low-frequency sound signals are decoded respectively, so that the precision of sound field playback is improved; on the other hand, the sound field playback system comprising 10 sound boxes is introduced, so that the coverage area of sound field playback can be enlarged, the defect that only one optimal listening point exists is overcome, and the effect of accurately playing back sound fields in various frequency bands in a larger range is realized. Therefore, the sound field playback method provided by the embodiment of the invention can enlarge the optimum listening range on the basis of providing the playback of various sound fields with high reproduction degree, thereby improving the listening experience of listeners.

Before describing the method for playing back the sound field provided by the embodiment of the present invention, a system for playing back the sound field provided by the embodiment of the present invention, to which the method is applied, will be described.

Referring to fig. 1, a schematic structural diagram of a system for production playback provided by an embodiment of the present invention is shown. The system 100 specifically includes: the system comprises a sound acquisition module 110, a sound processing module 120, a sound card 130, a digital-analog-Digital (DA) converter 140 and 10 sound boxes 150-159; the sound processing module 120 comprises a filtering submodule 121 and a decoding submodule 122, and the 10 sound boxes comprise 2 subwoofer sound boxes 158 and 159 and 8 active monitoring sound boxes 150-157; the sound boxes 150, 152, 154 and 156 are uniformly distributed on a circumference which takes the reference point O of the semi-anechoic chamber as the center of a circle and takes the preset length as the radius, the sound boxes are arranged at a first preset height away from the ground through a support, and the playing surfaces of two non-adjacent active monitoring sound boxes are oppositely arranged; the sound boxes 151, 153, 155 and 157 are uniformly distributed on a circumference which takes the reference point O as a circle center and takes the preset length as a radius, the sound boxes are arranged at a second preset height away from the ground through a support, and the playing surfaces of 2 non-adjacent active monitoring sound boxes are oppositely arranged; the sound boxes 158 and 159 are symmetrically placed on both sides of the center line of the circumference at a distance greater than the preset length from the reference point O, and are placed on the ground of the semi-anechoic chamber. The preset length is determined according to the frequency of a source sound signal, and the first preset height and the second preset height are determined according to the distance between the two ears of a listener and the ground.

For example: assuming that the semi-anechoic chamber is shown in a top view in fig. 2, there is a room with reflections on the floor of the semi-anechoic chamber and no reflections on the other five sides. Wherein, the reference point O is 3.24 meters away from the left wall of the semi-anechoic chamber, 4.2 meters away from the front wall of the semi-anechoic chamber, and the height of the reference point O from the ground is 1.1 meters, namely, the height of the ear of a listener sitting on a conventional chair is from the ground. The first preset height is 0.53 m, and the second preset height is 1.6 m; the radii (i.e., the predetermined lengths) of the circumferences of the speakers 150 to 157 are 2.5 m, respectively.

The connection relationship of the system 100 is: the sound collection module 110 is connected with a filtering submodule 121 in the sound processing module 120, the filtering submodule 121 is connected with a decoding submodule 122 in the sound processing module 120, and the decoding submodule 122 is connected with the sound card 130; the sound card 130 is connected to the DA converter 140, and the DA converter 140 is connected to 10 speakers placed in the semi-anechoic chamber, respectively. It should be noted that, between the DA converter 140 and the sound box, the limit stop 160 may also be used to limit the signal transmitted to the sound box, so as to prevent the sound box from being damaged by the signal exceeding the limit.

In order to improve the range of the system 100 for playing back the sound field, the frequency ranges applicable to the 2

subwoofer sound boxes

158 and 159 can be 19hz to 100hz, and the frequency ranges applicable to the 8 active monitoring sound boxes 150 to 157 can be 48hz to 20000 hz, so that the frequency range applicable to the system 100 for playing back the sound can be 19hz to 20000 hz, and all frequency bands that can be received by human ears are satisfied.

As an example, the sound collecting module 110 in the system 100 is configured to collect a source sound signal, and send a digital sound signal obtained by encoding to the filtering sub-module 121 of the sound collecting module 120; the sound processing module 120 is configured to perform frequency division processing on the digital sound signal through the filtering submodule 121, and separate a low-frequency sound signal and a high-frequency sound signal; the decoding submodule 122 is used for respectively decoding the low-frequency sound signals and the high-frequency sound signals to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to 10 sound boxes 150-159; a sound card 130 for receiving 10 low frequency sub sound signals and 10 high frequency sub sound signals and forwarding the DA converter 140; the DA converter 140 is configured to perform DA conversion on the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals, and correspondingly send the converted signals to the 10 sound boxes; and the 10 sound boxes are respectively used for carrying out superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal so as to restore the sound field of the source sound. Therefore, the effect of accurately playing back sound fields of various frequency bands in a larger range is realized, and the listening experience of a listener is improved.

The following describes a specific implementation manner of a sound field playback method in an embodiment of the present invention in detail by way of an embodiment with reference to the accompanying drawings.

Fig. 3 is a flow chart illustrating a method for playing back a sound field, which is provided by an embodiment of the present invention and is applied to the system for playing back a sound field shown in fig. 1. Referring to fig. 3, the embodiment of the present invention may specifically include the following steps 301 to 305:

step 301, a sound collection module collects a source sound signal, and sends a digital sound signal obtained by encoding to the filtering submodule of the sound collection module.

It can be understood that, in order to improve the subjective feeling of the listener, it is necessary to improve the accuracy from the source, that is, the traditional binaural sound data acquisition and recording mode may be changed, and a 3D stereo sound acquisition and recording mode is adopted. Because the 3D stereo sound collection method is adopted in the sound collection module, more information of source sound can be collected, so that the collected source sound information is richer and closer to the source sound, and the restoration degree during playback can be improved to a certain extent.

It can be understood that the source sound signal collected by the sound collection module is an analog signal, and as compared with the digital recording sound, the analog recording sound has many disadvantages, such as: the source audio signal needs to be encoded because of weak anti-noise capability, small audio dynamic range, severe attenuation of multiple recorded signals, etc. during transmission.

In specific implementation, a source sound signal S is encoded to obtain an encoded digital sound signal B, [ wxyz ], where the signal W is a non-directional factor and is mainly used to represent the intensity of the source sound signal; the signal X, Y, Z is a direction factor and is mainly used for positioning information of antenna source sound signals in up-down, left-right and front-back directions. The intensity and position information of the source sound signal in 3D space can be recovered from the digital sound signal B and thus from the four signals (i.e., W, X, Y and Z). Specifically, the encoding can be seen in the following formula:

B＝[B X Y Z]

W＝0.707×S

X＝x×S x＝cosθcosβ

Y＝y×S y＝sinθcosβ

Z＝z×S z＝sinβ

where S denotes a source sound signal, θ denotes a direction angle of the sound source, and β denotes an elevation angle of the sound source.

In some implementations, the analog source sound signal may also be sampled and quantized before being encoded in step 301.

It should be noted that the encoded digital sound signal B must be subjected to audio decoding and spatialization processing before it can be recognized and played by each speaker.

And step 302, the filtering submodule performs frequency division processing on the digital sound signal to separate a low-frequency sound signal and a high-frequency sound signal.

It can be understood that, in order to optimize the accuracy of the playback of the sound field, the psychoacoustics of the listener is considered, and the psychoacoustics is related to the binaural effect and the spatial localization, and the localization method of the sound by the human ears is two different mechanisms for the high frequency sound waves and the low frequency sound waves. At low frequency sound waves (i.e., sound waves with frequencies below 700 hertz (Hz)), binaural localization sounds depend on the velocity of the sound waves, because for the low frequency band, the sound waves are longer, so there is no significant difference in the intensity of sound received between the ears, but there is a difference in the time at which sound is received between the ears, i.e., sound localization depends on the time difference between the ears. At high frequency sound waves (i.e., sound waves with frequencies greater than 700 Hz), binaural localization sound depends on the intensity of the sound because for high frequency bands, where the sound waves are shorter, the head will block the sound waves, resulting in weaker sound intensity received by the ear at the end farther from the sound source, and therefore, the clues of localization depend on the intensity of the sound.

In order to better decode the digital sound signal, the embodiment of the invention adds a filtering submodule before decoding, divides the frequency of the digital sound signal, and separates a high-frequency sound signal and a low-frequency sound signal.

As an example, the frequency division threshold of the filtering submodule may be set to 700Hz, and then, after the digital sound signal is input to the filtering submodule, the filtering submodule determines the sound signal with the frequency higher than 700Hz as the high frequency sound signal and determines the sound signal with the frequency lower than 700Hz as the low frequency sound signal.

Therefore, the high-frequency part and the low-frequency part of the digital sound signal can be extracted through a filtering means so as to be respectively processed in the decoding process, and the decoding precision is improved.

Step 303, the decoding submodule respectively decodes the low-frequency sound signal and the high-frequency sound signal to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to the 10 sound boxes.

It will be appreciated that since the high and low frequency sound waves are referenced differently in binaural localization, the low frequency sound wave is dependent on the velocity of the sound wave, while the high frequency sound wave is dependent on the intensity of the sound, and therefore, the high and low frequency sound signals are separately decoded by the decoding sub-module. The decoding process is essentially the optimization of the assigned low frequency signal and high frequency signal for each loudspeaker.

Specifically, the decoding may be implemented by an Ambisonic decoding algorithm based on a vector synthesis method, where the decoding method is that a plurality of speakers are synthesized from a velocity vector V and an energy vector E respectively according to different frequency domains based on the direction of a virtual sound image synthesized at a listening position (i.e., at a reference point O).

During specific implementation, for low-frequency sound signals, the decoding submodule decodes the low-frequency sound signals to obtain 10 low-frequency sound signals corresponding to 10 sound boxes, and the specific implementation mode may include:

and S11, the decoding submodule calculates the corresponding velocity vector of each sound box according to the gain, the direction angle and the elevation angle of each sound box.

It can be understood that, assuming that the axes of the sound boxes are opposite to the reference point O, the sound wave velocity emitted by each sound box has a direction, and a velocity vector V pointing to the point O along the axis of the sound box, each V can be decomposed into Vx in the x-axis direction, Vy in the y-axis direction and Vz in the z-axis direction, after the sound wave velocity vectors of the sound boxes are superimposed, according to a vector synthesis method, a total synthesized sound wave velocity vector retransmitted by the sound boxes at the point O can be obtained, and the opposite direction is the direction of the virtual sound image synthesized by multiple sound sources.

As an example, it is assumed that calculating the velocity vector component corresponding to each speaker in S11 can be seen from the following formula (1):

the sound box is characterized in that Vx is a component of a velocity vector V of the sound box pointing to a point O along the axis of the sound box in the x-axis direction, Vy is a component of the velocity vector V in the y-axis direction, and Vz is a component of the velocity vector V in the z-axis direction; i refers to the ith sound box, N is the number of sound boxes, and in the embodiment of the invention, N is 10; g_iRepresents the proportion of sound allocated to each loudspeaker, also called the gain of the ith loudspeaker;

the elevation angle of the ith sound box is pointed; theta_iRefers to the direction angle of the ith loudspeaker box.

In a specific implementation, the tangent values corresponding to the direction angle and the elevation angle of the virtual sound image can be obtained according to the formula (2):

wherein, theta_VIRefers to the corresponding direction angle of the low-frequency virtual sound image,

and refers to the elevation angle corresponding to the low-frequency virtual sound image.

The velocity vector r corresponding to the sound box can be calculated by referring to the following formula (3)_V：

Wherein the pointing angle of V is theta_VIAnd

its length r_VAnd the value is equal to or more than 0 and is equal to the value of the synthetic speed gain value generated at the reference point O by the vector synthesis mode divided by the sound pressure gain value P directly superposed at the reference point O by all the sound boxes.

And S12, determining a first directivity factor corresponding to each sound box by the decoding submodule in an iterative optimization mode based on the speed vector corresponding to each sound box and the gains of all the sound boxes.

Ideally, the following equation (4) should be satisfied:

where 1 is the velocity vector of the source sound signal, θ_origBeing the direction angle of the source sound signal,

is the elevation angle of the source sound signal.

Thus, for each of the 10 sound boxes, the optimal solution satisfying the formula (4) can be calculated through continuous iterative optimization, and recorded as the first directivity factor corresponding to the sound box in the low frequency band, so as to be used for subsequently calculating the low-frequency sub-sound signal corresponding to the sound box in the low frequency band.

It should be noted that, for each sound box, a first directivity factor corresponding to a low frequency band can be obtained, so that the low frequency sub sound signal corresponding to the sound box can be calculated according to S13.

And S13, the decoding submodule calculates the low-frequency sub sound signal corresponding to each sound box based on the first directivity factor corresponding to each sound box, the gain, the direction angle, the elevation angle and the digital sound signal of the sound box.

It can be understood that, for each sound box, the first directivity factor d corresponding to the sound box can be obtained_iAnd gain g of the sound box_iAngle of orientation theta_iAngle of elevation phi_iAnd the digital sound signal B can calculate the low-frequency sub sound signal corresponding to the sound box through the following formula (5):

S_i＝g_i*[(2-d_i)*g_w*W+d_i*(g_x*X+g_y*Y+g_z*Z)]… … formula (5)

Wherein,

S_inamely the low-frequency sub sound signal corresponding to the ith sound box.

Thus, for 10 sound boxes in the sound field playback system, 10 low-frequency sub sound signals corresponding to low frequency bands can be obtained, and the low-frequency sub sound signals correspond to the 10 sound boxes one to one respectively.

Similarly, for the high-frequency sound signals, the decoding submodule decodes the high-frequency sound signals to obtain 10 high-frequency sub-sound signals corresponding to 10 sound boxes, and the specific implementation manner may include:

and S21, the decoding submodule calculates the energy vector corresponding to each sound box according to the gain, the direction angle and the elevation angle of each sound box.

It will be appreciated that high frequency sound waves are primarily dependent on the intensity of the sound, and that the antenna may be used to direct the flow of acoustic energy in the vicinity of the human head for this frequency band. According to the vector synthesis method, for a plurality of repeating speakers as sound sources, at a reference point O, the directions of virtual sound images thereof are directed from different sound sources to different energy vectors E of the reference point O, and each E is decomposed into a component E on the x-axis_XComponent E on the y-axis_YAnd a component E in the z-axis direction_Z。

As an example, it is assumed that the energy vector component corresponding to each loudspeaker calculated in S21 can be referred to the following formula (6):

wherein Ex is the component of the energy vector E of the sound box pointing to the point O along the axis in the direction of the x axis, Ey is the component of the energy vector E in the direction of the y axis, and Ez is the component of the energy vector E in the direction of the z axis; i refers to the ith sound box, N is the number of sound boxes, and in the embodiment of the invention, N is 10; g_iRepresents the proportion of sound allocated to each loudspeaker, also called the gain of the ith loudspeaker;

In a specific implementation, the tangent values corresponding to the direction angle and the elevation angle of the virtual sound image can be obtained according to the formula (7):

wherein, theta_EIRefers to the direction angle corresponding to the high-frequency band virtual sound image,

refers to the elevation angle corresponding to the virtual sound image in the high frequency band.

The energy vector r corresponding to the sound box can be calculated by referring to the following formula (8)_E：

Wherein the pointing angle of E is theta_EIAnd

its length r_EAnd the value is equal to or more than 0 and is equal to the value of the synthetic speed gain value generated at the reference point O by the vector synthesis mode divided by the sound pressure gain value E directly superposed at the reference point O by all the sound boxes.

And S22, determining a second directivity factor corresponding to each sound box by the decoding submodule in an iterative optimization mode based on the energy vector corresponding to each sound box and the gains of all the sound boxes.

Ideally, the following formula (9) should be satisfied:

where 1 is the energy vector of the source sound signal, θ_origBeing the direction angle of the source sound signal,

is the elevation angle of the source sound signal.

Thus, for each loudspeaker box in 10 loudspeaker boxes, the optimal solution meeting the formula (9) can be calculated through continuous iterative optimization, and the optimal solution is recorded as a second directivity factor corresponding to the loudspeaker box in the high frequency band and used for subsequently calculating the high-frequency sub-sound signal corresponding to the loudspeaker box in the high frequency band.

It should be noted that, for each sound box, a second directivity factor corresponding to a high frequency band may be obtained, so that the high frequency sub sound signal corresponding to the sound box may be calculated according to S23.

And S23, the decoding submodule calculates the high-frequency sub-sound signal corresponding to each sound box based on the second directivity factor corresponding to each sound box, the gain, the direction angle, the elevation angle and the digital sound signal of the sound box.

It can be understood that, for each loudspeaker box, a second directivity factor b corresponding to the loudspeaker box can be obtained_iAnd gain g of the sound box_iAngle of orientation theta_iAngle of elevation phi_iAnd the digital sound signal B can calculate the high-frequency sub sound signal corresponding to the sound box through the following formula (10):

T_i＝g_i*[(2-b_i)*g_w*W+b_i*(g_x*X+g_y*Y+g_z*Z)]… formula (10)

Wherein,

T_inamely the high-frequency sub sound signal corresponding to the ith sound box.

Thus, for 10 sound boxes in the sound field playback system, 10 high-frequency sub sound signals corresponding to high frequency bands can be obtained, and the high-frequency sub sound signals correspond to the 10 sound boxes one to one respectively.

The value ranges of the first directivity factor and the second directivity factor are 0-2.

It can be seen that, the decoding submodule in step 303 decodes the low-frequency sound signal and the high-frequency sound signal respectively to obtain 10 low-frequency sub sound signals and 10 high-frequency sub sound signals corresponding to 10 sound boxes, so as to improve a data basis for accurate playback of a subsequent sound field.

It should be noted that, in the embodiment of the present invention, the process of decoding by using the Ambisonic decoding algorithm is described in detail, but other decoding algorithms may be used as the decoding algorithm used by the decoding submodule in the embodiment of the present invention as long as the decoding of the high-low frequency sound signals into 10 pairs of high-low frequency sub sound signals for 10 sound boxes can be achieved, and the implementation manner of step 303 in the other decoding algorithms is not described herein again.

After decoding, the decoding result may also be spatially processed, and the specific processing process is not described again.

And 304, receiving the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals forwarded by the sound card by a DA converter, respectively performing DA conversion on the signals, and correspondingly sending the signals to the 10 sound boxes.

It is understood that the high frequency sub sound signal and the low frequency sub sound signal obtained by decoding in step 303 may be forwarded to the DA converter through the sound card. Because the high-frequency sub-sound signal and the low-frequency sub-sound signal are both digital signals, before playback, the digital signals need to be converted into analog signals, and therefore, the signals output after passing through the DA converter are: the 20 analog sound signals are analog signals corresponding to 10 high-frequency sub sound signals and analog signals corresponding to 10 low-frequency sub sound signals.

During specific implementation, the analog signals corresponding to the 10 high-frequency sub sound signals and the analog signals corresponding to the 10 low-frequency sub sound signals after DA conversion can be respectively sent to the corresponding sound boxes, so that each sound box can perform superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal, the sound signals played by the 10 sound boxes are superposed again, and high-precision playback of source sound can be realized.

It should be noted that before the DA converter sends the converted analog sound signal to the sound box, the DA converter may limit the signal transmitted to the sound box by the limiter, so as to prevent the signal exceeding the limit from damaging the sound box, thereby playing a role in protecting the sound box, and improving the service life of the sound box and the experience of the listener.

In step 305, each loudspeaker box of the 10 loudspeaker boxes respectively performs superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal so as to restore the sound field of the source sound.

It can be understood that the 10 sound boxes include 2 subwoofer sound boxes and 8 active monitoring sound boxes; to cover the frequency range of 20Hz to 20kHz, 2 subwoofer enclosures can be used in the frequency range of 19Hz to 100Hz, and 8 active listening enclosures can be used in the frequency range of 48Hz to 20 kHz.

As an example, fig. 4 shows a schematic distribution of 10 speakers. As shown in fig. 4, 4 active

listening sound boxes

1, 3, 5, and 7 are uniformly distributed on a circumference with a circle center at a reference point O and a radius of 2.5 meters, and are mounted at a position 0.53 meters away from the ground through a support, and the playing surfaces of 1, 5, 3, and 7 are placed opposite to each other; the other 4 active monitoring sound boxes 2, 4, 6 and 8 are uniformly distributed on a circumference which takes O as the center of a circle and 2.5 as the radius, are arranged at a position 1.6 meters away from the ground through a support, and the playing surfaces of 2, 6, 4 and 8 are oppositely arranged; the 2

subwoofer sound boxes

9 and 10 are symmetrically placed on two sides of the center line of the circumference, are more than 2.5 meters away from the reference point, and are placed on the ground of the semi-anechoic chamber. Wherein, a in fig. 4 is a top view, b in fig. 4 is a space structure view.

Where the circumference radius may be determined according to the frequency of the source sound signal, 2.5 meters indicates an example, but may be any value that is compatible with a frequency range of 20Hz to 20 kHz. It should be noted that the embodiment of the present invention realizes that sound of different frequency bands can be accurately played back in a spatial spherical region centered on a reference point, but the size of the playback region is related to the frequency range, and the size of the region radius is 1/4 corresponding to the wavelength of the sound wave frequency. For example: for a 200Hz sound source, the wavelength wl/F340/200 m is 1.7m, the area radius wl/4 is 42.5cm, i.e. the frequency of the sound source is 200Hz, and the playback area is a spatial sphere area with O as the center of the sphere and a radius of 42.5 cm.

The two height values for placing the speakers can be determined according to the distance between the two ears of the listener and the ground, the listener sits on the reference point O through the seat, the left ear and the right ear are approximately 1.1 m away from the ground, and thus the two heights approximately symmetrical about 1.1 m can be determined as the circumferential heights for arranging 8 active listening speakers.

It should be noted that, by means of the way of specially arranging the sound boxes in the sound field playback system, the sound field playback method provided by the embodiment of the present invention can ensure that the stereo playback accuracy can reach: within the range of 20Hz to 20kHz, the octave error is less than +/-1 dB; the total Sound level error is less than + -0.5 dB A Sound Pressure level (SPL for short).

In this way, by the sound field playback method provided by the embodiment of the present invention, digital sound signals corresponding to the source sound signals are filtered, and the obtained high frequency sound signals and low frequency sound signals are decoded respectively, so that the sound field playback precision is improved.

In addition, an embodiment of the present invention further provides a device for playing back a sound field, referring to a schematic structural diagram of the device for playing back a sound field shown in fig. 5, which is applied to a system for playing back sound, where the system includes a sound collection module, a sound processing module, a sound card, a digital-analog DA converter, and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with the DA converter, and the DA converter is respectively connected with the 10 sound boxes placed in the semi-anechoic chamber; the device comprises:

the acquisition unit 501 is used for acquiring a source sound signal by the sound acquisition module, and transmitting a digital sound signal obtained by encoding to the filtering submodule of the sound acquisition module;

a frequency dividing unit 502, configured to perform frequency division processing on the digital sound signal by the filtering sub-module, and separate a low-frequency sound signal and a high-frequency sound signal;

the decoding unit 503 is configured to decode the low-frequency sound signal and the high-frequency sound signal by the decoding sub-module, respectively, to obtain 10 low-frequency sub-sound signals and 10 high-frequency sub-sound signals corresponding to the 10 sound boxes;

a conversion unit 504, configured to receive the 10 low-frequency sub sound signals and the 10 high-frequency sub sound signals forwarded by the sound card, and perform DA conversion on the received signals respectively and then send the signals to the 10 sound boxes correspondingly;

a playback unit 505, configured to perform superposition playback on the received 1 low-frequency sub-sound signal and 1 high-frequency sub-sound signal respectively by each loudspeaker box of the 10 loudspeaker boxes so as to restore the sound field of the source sound.

Optionally, the decoding unit 503 includes a first decoding sub-unit and a second decoding sub-unit,

the first decoding subunit includes:

the second decoding subunit includes:

It should be noted that the above description is related to a sound field playback apparatus, and specific implementation manners and achieved effects may refer to the description of the above sound field playback method embodiment, and are not repeated here.

The "first" in the names of "first directivity factor", "first determination unit", and the like mentioned in the embodiments of the present invention is used only for name identification, and does not represent the first in order. The same applies to "second" etc.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. With this understanding, the technical solution of the present invention can be embodied in the form of a software product, which can be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present invention.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the apparatus and system embodiments are substantially similar to the method embodiments and are therefore described in a relatively simple manner, where relevant, reference may be made to some descriptions of the method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, wherein modules described as separate parts may or may not be physically separate, and parts shown as modules may or may not be physical modules, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. It should be noted that, for a person skilled in the art, several modifications and refinements can be made without departing from the invention, and these modifications and refinements should be regarded as the protection scope of the present invention.

Claims

1. A method for playing back a sound field is characterized in that the method is applied to a system for playing back the sound, and the system comprises a sound acquisition module, a sound processing module, a sound card, a digital-analog-Digital (DA) converter and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with the DA converter, and the DA converter is respectively connected with the 10 sound boxes placed in the semi-anechoic chamber; the method comprises the following steps:

the sound acquisition module acquires a source sound signal and sends a digital sound signal obtained by encoding to the filtering submodule of the sound processing module;

each loudspeaker box in the 10 loudspeaker boxes respectively superposes and plays back the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal so as to restore the sound field of the source sound;

the 10 sound boxes comprise 2 subwoofer sound boxes and 8 active monitoring sound boxes;

2. The method of claim 1 wherein said first predetermined height is 0.53 meters, said second predetermined height is 1.6 meters, and said listener's ears are 1.1 meters from the ground; the preset length is 2.5 meters.

3. The method of claim 1, wherein the decoding submodule decodes using an Ambisonic decoding algorithm.

4. The method of claim 3,

the decoding submodule decodes the low-frequency sound signals to obtain 10 low-frequency sub-sound signals corresponding to the 10 sound boxes, and the decoding submodule comprises:

5. The method of claim 1 or 2, wherein the 2 subwoofer enclosures are adapted to have a frequency range of 19 to 100hz, and the 8 active listening enclosures are adapted to have a frequency range of 48 to 20000 hz; the system for sound playback is suitable for use in a frequency range of 19hz to 20000 hz.

6. The device for playing back the sound field is characterized by being applied to a system for playing back the sound, wherein the system comprises a sound acquisition module, a sound processing module, a sound card, a digital-analog-Digital (DA) converter and 10 sound boxes; the sound acquisition module is connected with a filtering submodule in the sound processing module, the filtering submodule is connected with a decoding submodule in the sound processing module, and the decoding submodule is connected with the sound card; the sound card is connected with the DA converter, and the DA converter is respectively connected with the 10 sound boxes placed in the semi-anechoic chamber; the device comprises:

the acquisition unit is used for acquiring a source sound signal by the sound acquisition module and transmitting a digital sound signal obtained by encoding to the filtering submodule of the sound processing module;

a playback unit, configured to perform superposition playback on the received 1 low-frequency sub sound signal and 1 high-frequency sub sound signal respectively by each of the 10 sound boxes so as to restore a sound field of the source sound;

7. The apparatus of claim 6 wherein said first predetermined height is 0.53 meters, said second predetermined height is 1.6 meters, and said listener's ears are 1.1 meters from the ground; the preset length is 2.5 meters.

8. The apparatus of claim 6, wherein the decoding submodule decodes using an Ambisonic decoding algorithm.

9. The apparatus of claim 8, wherein the decoding unit comprises a first decoding sub-unit and a second decoding sub-unit,

the first decoding subunit includes:

the second decoding subunit includes:

10. The apparatus of claim 6 or 7, wherein the 2 subwoofer enclosures are adapted to have a frequency range of 19 to 100hz, and the 8 active listening enclosures are adapted to have a frequency range of 48 to 20000 hz; the system for sound playback is suitable for use in a frequency range of 19hz to 20000 hz.

11. A system for playback of a sound field, comprising: the sound processing system comprises a sound acquisition module, a sound processing module, a sound card, a digital analog DA converter and 10 sound boxes;

the sound acquisition module is used for acquiring a source sound signal and sending a digital sound signal obtained by encoding to the filtering submodule of the sound processing module;