CN110933589B

CN110933589B - Earphone signal feeding method for conference

Info

Publication number: CN110933589B
Application number: CN201911192637.4A
Authority: CN
Inventors: 王恒; 曾维坚; 东莲正; 李子强; 陈科壬; 高韦涵; 朱镇熙
Original assignee: Guangzhou DSPPA Audio Co Ltd
Current assignee: Guangzhou DSPPA Audio Co Ltd
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2021-07-16
Anticipated expiration: 2039-11-28
Also published as: CN110933589A

Abstract

The invention discloses an earphone signal feeding method for a conference, which comprises the following steps: acquiring space parameters of a target area, and calculating according to the space parameters to obtain an average sound absorption coefficient of the target area; establishing a rectangular coordinate system of a target area, labeling coordinate parameters of a speech floor and audience floors in the rectangular coordinate system, and calculating the relative angle of each audience floor relative to the speech floor; acquiring HRIR data of direct sound of a horizontal plane in the direction of a speaking seat and acquiring HRIR data of reflected sound of the horizontal plane; selecting corresponding direct sound HRIR data and reflected sound HRIR data according to the relative angle of each auditorium relative to the speech seat, and calculating binaural sound signals under the static condition by combining the average sound absorption coefficient of the target area; and after the relative angle of each auditorium relative to the speech seat is updated according to the horizontal angle change value of each auditorium earphone, selecting corresponding direct sound HRIR data and reflected sound HRIR data, and calculating the binaural sound signal under the dynamic condition by combining the average sound absorption coefficient of the target area.

Description

Earphone signal feeding method for conference

Technical Field

The invention relates to the technical field of signal processing, in particular to an earphone signal feeding method for a conference.

Background

The conference room is an acoustic place mainly based on language, is used for conferences, academic reports, learning and training and the like, and requires that the definition of the language is high enough, and the sound and image direction perceived by listeners is consistent with the direction of a speaker. Since the voice of a speaker is attenuated with increasing distance as it travels in a conference room, most conference rooms are equipped with a sound amplification system to increase the sound pressure level at the auditorium in order to provide the auditorium in the conference room with a sufficiently high sound pressure level. However, there are also occasions where the speaker's voice is played through an earphone worn by the listener, or the speaker's speech content is translated into another language and then played through an earphone worn by the listener.

In order to make the conference room have better effect, the electronic conference system engineering design specification (GB50799-2012) puts forward requirements on each part of the conference system, including classification and composition of the conference public address system, functional design and requirements, performance design requirements, and main equipment design requirements. However, in the case where the listener wears the headphones to receive the speech information of the speaker, no acoustic characteristic index is set for the sound signal played by the headphones, and the sound signal is monophonic and has no direction information.

For other applications, there are techniques for reproducing sound space information using headphones. Commonly used is a technique based on Head Related Transfer Functions (HRTFs), the basic principle being as follows:

let the spatial position of the sound source S relative to the listener' S head center be represented by spherical coordinates (r, θ, φ), as shown in FIG. 1. Wherein r is the sound source distance; -90 ° ≦ φ ≦ 90 ° and-180 ° ≦ θ <180 ° representing elevation and azimuth, respectively; phi is 0 ° and +90 ° respectively representing the horizontal plane and directly above; the horizontal plane θ represents straight front, straight right, and straight left, respectively, at 0 °, 90 °, and 90 °. The HRTF is defined as an acoustic transfer function (1) from a free-field point sound source to two ears;

wherein, P_LAnd P_RRespectively, the sound pressure of the sound source with the position of (r, theta, phi) in the frequency domain generated by the ears; p₀Is the sound pressure at the center of the head after the head is removed. Typically, HRTFs are related to sound source position and frequency f, and to individuals. For a far field with the sound source distance r being more than or equal to 1.0m, the HRTF is approximately independent of the distance. But for r<The near field of 1.0m, the HRTF is distance dependent and thus contains a factor for distance location. The time domain representations of the HRTFs are head-related impulse responses (HRIRs), which are related by fourier transforms.

To synthesize a virtual point source with a spatial position (r, θ, φ) in the free field, a single time-domain signal e can be used₀(t) sum after appropriate delay processing and amplitude scalingA pair of corresponding HRIR convolutions resulting in a binaural sound signal (2);

wherein h is_LAnd h_RHRIR from the source to the left and right ear, respectively, and t is time. T r/c is the transmission delay from the source to the listener, c is the speed of sound; the amplitude scale 1/r simulates the amplitude decay of spherical sound waves in the free field with distance.

The binaural sound signals obtained by the method are fed to a pair of earphones for reproduction, so that the sound pressure at the two ears of the listener is in direct proportion to the binaural sound pressure generated by the actual point sound source in the free field, and the virtual sound source at the corresponding spatial position can be virtualized.

In order to make the effect closer to the actual sound field environment, the binaural impulse response (BRIR) in the actual environment may also be used to replace the HRIR in the signal processing. However, the method has large data amount and operation amount, and has great difficulty in realization. For this purpose, early reflections in the sound field can be simulated by the virtual source method, and late reverberant sounds can be simulated by the perception model.

The basic idea of simulating early reflected sound by using a virtual sound source method is to replace the reflection of the interface with an equivalent virtual sound source, and fig. 2(a) shows a virtual sound source of primary reflected sound. For a bounded space, the reflected sound may be of second, third or even higher order, and may be represented by second, third or higher order virtual sources, fig. 2(b) being a schematic diagram of a virtual source for a two-dimensional rectangular room. For a room of arbitrary shape, not all virtual sound sources contribute to the sound wave at the receiving point, and a visual inspection is required. The virtual source simulates specular reflection at the interface and does not simulate diffuse reflection.

In some occasions with low requirements on the accuracy of the reflected sound, a specific physical model is not required to be associated, and the indoor reflected sound can be simulated by adopting an artificial delay and reverberation algorithm to obtain a corresponding subjective auditory effect. Since this method is from the perspective of human perception, it is also called a perception simulation method. Most of the delay reverberation algorithms are single-path, and when the delay reverberation algorithms are used for virtual auditory reproduction, two paths of reverberation signals need to be generated and subjected to decorrelation processing.

The above is a static processing technique, but in actual application, binaural sound pressure varies with the movement of the listener's head even if the sound source is fixed. Therefore, the dynamic virtual auditory environment system should be able to simulate the binaural sound pressure changes caused by the sound source and listener movements in order to achieve a more accurate localization effect and a realistic auditory environment perception. The movement of the listener's head in three-dimensional space can be represented by 6 degrees of freedom, 3 translational degrees of freedom and 3 rotational degrees of freedom, respectively, and fig. 3 is a schematic view of the head rotation, including rotation about the x-axis (Pitch), rotation about the y-axis (Roll), and rotation about the z-axis (Yaw).

Head movements can be detected using head trace trackers, commonly used are electromagnetic trackers, ultrasonic trackers, optical trackers, hybrid inertial trackers, and the like. By using the head motion parameters output by the head trace tracker, the distance and the direction (r, theta, phi) of the virtual sound source relative to the head center can be calculated in real time according to the geometric relation, and then the HRTF in the corresponding direction can be called to filter the input signal.

Besides the HRTF techniques described above, there are three techniques of vector basis amplitude adjustment (VBAP), AMBISONICS (AMBISONICS), and Wave Field Synthesis (WFS). These three techniques are generally not used in the context of playback languages, since they are relatively complex to implement.

The prior art has the following disadvantages:

(1) in the existing conference system, a single-channel signal is used when an earphone is used for reproducing sound signals, the sound signals heard by listeners have no sense of direction, and sound images appear at positions in the head and are inconsistent with the directions of speakers.

(2) Although virtual sound sources in desired directions can be generated by using the HRTF technique, there are some problems: firstly, the sense of distance is not strong, and a virtual sound source perceived by a listener is near the scalp and has a far difference with an actual scene; secondly, the HRTF data volume needing to be stored is large, and the HRTF data volume comprises hundreds of groups of data in the whole three-dimensional direction; thirdly, replacing HRIR (or HRTF) with BRIR has the effect closer to the actual acoustic scene, but the data volume and the operation amount are both orders of magnitude larger. If the BRIR is replaced by adding early reflected sound and reverberant sound, the data volume and the operation amount are correspondingly reduced, but the real-time realization still has great difficulty; and fourthly, although the effect is better after dynamic processing is added, the HRTF is generally refreshed according to 1 degree, the calculation amount is particularly large, and the real-time implementation has higher requirement on hardware.

(3) Technologies other than HRTFs, such as vector basis amplitude adjustment (VBAP), AMBISONICS (AMBISONICS), Wave Field Synthesis (WFS), are used, which are complex to implement and generally cannot be used in situations where only languages are played back.

Disclosure of Invention

The invention provides an earphone signal feeding method for a conference, which realizes that a listener can perceive that sound comes from the direction of a speaker; the angle parameter of each auditorium is calculated according to the arrangement condition of the target area, the monaural speech signal is calculated according to the angle parameter combination, and the binaural signal is obtained, so that the azimuth information of the speaker can be virtualized, and the listener can perceive the sound image in the direction of the speaker; and the change angle is calculated by detecting the rotation of the head of the listener, so that the sound image perceived by the listener still comes from the direction of the speaker, and the signal feeding effect of the speech table can be realized under the static and dynamic conditions.

In order to solve the above technical problem, an embodiment of the present invention provides an earphone signal feeding method for a conference, including:

acquiring space parameters of a target area, and calculating to obtain an average sound absorption coefficient of the target area according to the space parameters;

establishing a rectangular coordinate system of the target area, labeling coordinate parameters of a speech floor and audience floors in the rectangular coordinate system, and calculating the relative angle of each audience floor relative to the speech floor;

acquiring HRIR data of direct sound of a horizontal plane in the direction of a speaking seat and acquiring HRIR data of reflected sound of the horizontal plane;

selecting corresponding direct sound HRIR data and reflected sound HRIR data according to the relative angle of each auditorium relative to the speech seat, and calculating binaural sound signals under the static condition by combining the average sound absorption coefficient of the target area;

and after the relative angle of each auditorium relative to the speech seat is updated according to the horizontal angle change value of each auditorium earphone, selecting corresponding direct sound HRIR data and reflected sound HRIR data, and calculating the binaural sound signal under the dynamic condition by combining the average sound absorption coefficient of the target area.

As a preferred scheme, the acquiring the HRIR data of the direct sound at the floor direction horizontal plane specifically includes:

and in the angle range of 0-90 degrees of the relative angle, selecting HRIR data of nine different relative angles in a distributed mode as candidate direct sound HRIR data.

As a preferred scheme, the acquiring reflected sound HRIR data of the horizontal plane specifically includes:

and respectively selecting two angle directions within the angle ranges of-180 DEG to-90 DEG and-90 DEG to 0 DEG of the relative angle, and taking the HRIR data of four fixed angle directions as the reflected sound HRIR data.

As a preferred scheme, the selecting corresponding direct sound HRIR data and reflected sound HRIR data according to the relative angle of each auditorium relative to the speaking mat, and calculating the binaural sound signal under the static condition by combining the average sound absorption coefficient of the target area specifically includes:

selecting the corresponding direct sound HRIR data according to the relative angle to calculate to obtain a left ear direct sound signal and a right ear direct sound signal;

selecting the corresponding reflected sound HRIR data according to the average sound absorption coefficient and the relative angle to calculate to obtain a left ear reflected sound signal and a right ear reflected sound signal;

calculating to obtain a total left ear sound signal according to the left ear direct sound signal and the left ear reflected sound signal; and calculating to obtain a total right ear sound signal according to the right ear direct sound signal and the right ear reflected sound signal.

Preferably, the updating the relative angle of each auditorium relative to the speech floor according to the horizontal angle change value of each auditorium earphone specifically includes:

acquiring a horizontal angle initial value of each auditorium earphone;

acquiring a horizontal angle real-time value of each auditorium earphone in real time;

and calculating and updating the relative angle of each auditorium relative to the speech seat according to the initial value of the horizontal angle and the real-time value of the horizontal angle.

As a preferred scheme, selecting the direct sound HRIR data corresponding to the relative angle according to the relative angle to calculate and obtain a left ear direct sound signal and a right ear direct sound signal, specifically including:

comparing the relative angle with nine distributively selected relative angles, and selecting a pair of time domain discrete HRIR data corresponding to the angle closest to the nine relative angles;

and respectively convolving the left ear data and the right ear data of the time domain discrete HRIR data with the single-channel time domain discrete data of the speech floor signal, and calculating to obtain a left ear direct sound signal and a right ear direct sound signal.

As a preferred scheme, the obtaining of the left ear reflected sound signal and the right ear reflected sound signal by selecting the corresponding reflected sound HRIR data according to the average sound absorption coefficient and the relative angle specifically includes:

respectively determining the numerical values of the time delay sample points of the four fixed-angle reflected sounds relative to the direct sound;

adjusting and calculating according to the average sound absorption coefficient to obtain the intensity coefficient of the reflected sound;

and calculating to obtain a left ear reflected sound signal and a right ear reflected sound signal according to a pair of time domain discrete HRIR data corresponding to each reflected sound, single-path time domain discrete data of the speech floor signal, a delay sample point value not exceeding 20ms and the intensity coefficient of the reflected sound.

Preferably, when comparing the relative angle with nine distributively selected relative angles, the method further includes:

when the relative angle belongs to the angle range of 0-90 degrees, directly comparing the relative angle with nine opposite angles selected in a distributed mode;

when the relative angle is in the angle range of-90 degrees to 0 degrees, calculating the absolute value of the relative angle, and comparing the absolute value with nine opposite angles selected in a distributed manner;

and when the relative angle is in the angle range of-90 degrees to 0 degrees, the total left and right ear sound signals obtained by calculation are interchanged.

Preferably, the acquisition of the horizontal angle change value of each auditorium earphone is realized by mounting a gyroscope sensor on an ear cover or a head beam of the earphone, and monitoring the horizontal angle change of the earphone; the gyroscope sensor cannot change relative to the earphone along with the rotation of the head;

the horizontal angle range output by the gyroscope sensor is-180 degrees to 180 degrees; when the horizontal angle is 0 °, it represents a straight ahead; when the horizontal angle is 90 degrees, the right direction is indicated; when the horizontal angle is-90 deg., it indicates the positive left direction.

As a preferred scheme, the calculating the average sound absorption coefficient of the target region according to the spatial parameters specifically includes:

uniformly taking a plurality of positions at an auditorium to measure reverberation time, and calculating average reverberation time;

and calculating according to the average reverberation time and the space parameters to obtain an average sound absorption coefficient of a target area.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

1. the invention calculates the angle parameter of each auditorium according to the arrangement condition of the target area, and calculates the single-channel speaking signal according to the angle parameter, so as to obtain the binaural signal, thus realizing the virtual presentation of the orientation information of the speaker, and leading the auditorium to perceive the sound image in the direction of the speaker; and the change angle is calculated by detecting the rotation of the head of the listener, so that the sound image perceived by the listener still comes from the direction of the speaker, thereby realizing the signal feeding function of the speaker under both static and dynamic conditions and realizing the effect that the listener can perceive the sound from the direction of the speaker.

2. The binaural sound signal of each auditorium can be calculated by knowing only the conference room dimensions and reverberation time, the floor and auditorium coordinates, the horizontal hrtf (hrir) data, and the head orientation in the horizontal plane.

3. Regardless of whether the voice of the speaker is received or the voice after translation is received, the voice direction perceived by each auditorium through the reproduction of the earphone comes from the speaker, the sound image is out of the head, and the in-head localization effect is completely eliminated.

4. The data quantity is small and the calculation quantity is small. Only HRTF (HRIR) data of direct sound and reflected sound need to be stored, and the speaking person can increase the sound image distance feeling through the same reflected sound no matter in which direction. Compared with a complete three-dimensional HRTF (HRIR) database, the data in hundreds of directions are much less, and the reflected sound is simulated by only HRIR in 4 fixed directions to obtain a better effect, which is close to the sound image distance sense obtained by calculating the reflected sound to the second order (36 directions in total) by a virtual source method.

5. Compared with the method that the earphone reproduces a single-channel speech signal, the sound definition is improved, the time interval between the reflected sound and the direct sound is controlled within 20ms, and the sound image carries direction information.

Drawings

FIG. 1: is a schematic diagram of a coordinate system of a listener's head in the prior art embodiment;

FIG. 2: is a schematic diagram of a virtual sound source in the prior art embodiment; wherein, fig. 2(a) is a schematic diagram of a virtual sound source of primary reflected sound, and fig. 2(b) is a schematic diagram of a virtual sound source of a two-dimensional rectangular room;

FIG. 3: schematic view of the head in the prior art embodiment rotating about 3 axes;

FIG. 4: is a schematic diagram of a rectangular coordinate system in the embodiment of the invention;

FIG. 5: a flow chart of the steps of a method for feeding a signal to a headset for a conference in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1 to 5, a preferred embodiment of the present invention provides an earphone signal feeding method for a conference, including:

s1, obtaining the space parameters of the target area, and calculating the average sound absorption coefficient of the target area according to the space parameters; in this embodiment, the calculating the average sound absorption coefficient of the target area according to the spatial parameters specifically includes: s11, uniformly taking a plurality of positions at the auditorium to measure the reverberation time, and calculating the average reverberation time; and S12, calculating according to the average reverberation time and the space parameters to obtain the average sound absorption coefficient of the target area.

First, the length and width dimensions of the room are measured, recorded as L, W, H, in meters. The reverberation time RT is measured by uniformly taking 3 positions in an auditorium_k(k is 1,2,3), calculating an average reverberation time:

and then calculating the average sound absorption coefficient alpha of the room:

s2, establishing a rectangular coordinate system of the target area, labeling coordinate parameters of a speech seat and audience seats in the rectangular coordinate system, and calculating the relative angle of each audience seat relative to the speech seat;

and establishing an O-XY coordinate system on the room, wherein the O-XY plane is a conference table plane, the X axis is a long edge, and the Y axis is a short edge. As shown in fig. 4. The speaker position coordinate is (x)_S,y_S) The position coordinates of the auditorium are set as (x)_i,y_i) (i ═ 1,2, … …, N. N is the number of auditorium).

0 ° and 90 ° in the horizontal plane respectively indicate right front and rightThe angle of each auditorium relative to the speech position is set as theta_i(i＝1,2,……，N)。

Calculating theta according to the coordinates of the auditorium and the speech-room_i：

S3, acquiring HRIR data of direct sound at the horizontal plane of the speaking seat direction and acquiring HRIR data of reflected sound at the horizontal plane;

in this embodiment, the acquiring the HRIR data of the direct sound at the floor direction horizontal plane specifically includes: and in the angle range of 0-90 degrees of the relative angle, selecting HRIR data of nine different relative angles in a distributed mode as direct sound HRIR data.

In this embodiment, the acquiring the reflected sound HRIR data of the horizontal plane specifically includes: and respectively selecting two angle directions within the angle ranges of-180 DEG to-90 DEG and-90 DEG to 0 DEG of the relative angle, and taking the HRIR data of four fixed angle directions as the reflected sound HRIR data.

Specifically, the HRIR data of the horizontal plane direct sound (floor direction) is acquired by a measuring or calculating method, and since the floor is in front of or in the side direction of the auditorium in the conference, the HRIR data between-90 ° and 90 ° is only needed. In addition, although the human ear has strong resolving power for the front sound, certain errors exist, and particularly the resolving error is larger when the human ear is closer to the side direction. Moreover, the sound image direction in the conference system does not need to be very accurate, and the human ears are approximately bilaterally symmetrical, so that only HRIR data of a part of angle between 0-90 degrees at the front right is needed, and the method takes theta_Q，Q＝1,2,3,……，9。θ_QCorresponding values are HRIR data of 0 °,5 °, 10 °, 20 °, 30 °, 40 °, 55 °, 70 °, 90 °, respectively.

HRIR data of horizontal plane reflected sound is acquired by a measuring or calculating method. The invention adopts 4-direction sound to simulate early reflected sound, which is distributed between-180 DEG and-0 DEG, wherein 2 directions are respectively selected between-180 DEG to-90 DEG and-0 DEG, such as-25 DEG, -69 DEG, -102 DEG and-151 deg. Once these directions are determined, these 4 fixed directions are used as reflected sound for all directions of direct sound (floor direction).

S4, selecting corresponding direct sound HRIR data and reflected sound HRIR data according to the relative angle of each auditorium relative to the speech seat, and calculating binaural sound signals under the static condition by combining the average sound absorption coefficient of the target area;

in this embodiment, the step S4 specifically includes:

s41, selecting the HRIR data corresponding to the relative angle according to the relative angle, and calculating to obtain a left ear direct sound signal and a right ear direct sound signal;

in this embodiment, the step S41 specifically includes: s411, comparing the relative angle with nine distributively selected relative angles, and selecting a pair of time domain discrete HRIR data corresponding to the angle closest to the nine relative angles; in this embodiment, when the relative angle falls within the angle range of 0 ° to 90 °, the relative angle is directly compared with nine relative angles selected in a distributed manner; when the relative angle is in the angle range of-90 degrees to 0 degrees, calculating the absolute value of the relative angle, and comparing the absolute value with nine opposite angles selected in a distributed manner; and when the relative angle is in the angle range of-90 degrees to 0 degrees, the total left and right ear sound signals obtained by calculation are interchanged. And S412, convolving the left ear data and the right ear data of the time domain discrete HRIR data with the single-channel time domain discrete data of the speech floor signal respectively, and calculating to obtain a left ear direct sound signal and a right ear direct sound signal.

S42, selecting the corresponding reflected sound HRIR data according to the average sound absorption coefficient and the relative angle to calculate to obtain a left ear reflected sound signal and a right ear reflected sound signal;

in this embodiment, the step S42 specifically includes: s421, respectively determining the time delay sample point values of the four fixed-angle reflected sounds relative to the direct sound; s422, adjusting and calculating according to the average sound absorption coefficient to obtain the intensity coefficient of the reflected sound; and S423, calculating to obtain a left ear reflected sound signal and a right ear reflected sound signal according to the pair of time domain discrete HRIR data corresponding to each reflected sound, the single-channel time domain discrete data of the speech floor signal, the delay sample point value and the intensity coefficient of the reflected sound.

S43, calculating to obtain a total left ear sound signal according to the left ear direct sound signal and the left ear reflected sound signal; and calculating to obtain a total right ear sound signal according to the right ear direct sound signal and the right ear reflected sound signal.

Specifically, θ for each auditorium_iThe adjustment is carried out by considering theta between 0 DEG and less_iThe temperature is less than or equal to 90 degrees. Will theta_iAnd theta_QBy contrast, will theta_iAdjusted to the closest theta_QValue is recorded as θ'_i. Single-channel time-domain discrete signal e for signal of set speaker₀(n) represents, h_Li(n)、h_Ri(n) is θ'_iA corresponding pair of time domain discrete HRIRs.

Binaural signals for each auditorium direct sound (floor direction) are calculated:

e_Li(n)＝h_Li(n)*e₀(n)

e_Ri(n)＝h_Ri(n)*e₀(n) (6)

wherein e_Li(n)、e_Ri(n) respectively represent the left and right ear direct sound signals of each auditorium.

Calculating binaural Sound Signal h of reflected Sound (floor Direction) of Each audience floor_Lj(n)、h_Rj(n) is a pair of time domain discrete HRIRs representing the jth reflected sound, j being 1,2,3, 4. The value range of the delay sample points Dj of the 4 reflected sounds relative to the direct sound is 0.01 fs-0.02 fs (namely 10 ms-20 ms), and the values of the 4 Dj are not equal, wherein fs is the sampling frequency of the system. For example, Dj ═ 810,699,591,953]Fs is 48000 Hz. The binaural sound signal is calculated by equation (7).

e′_Lj(n)＝h_Lj(n)*β*e₀(n-D_j)

e′_Rj(n)＝h_Rj(n)*β*e₀(n-D_j) (7)

Wherein e'_Lj(n)、e’_Rj(n) each isLeft and right ear sound signals representing the jth reflected sound. Beta represents the intensity coefficient of the reflected sound, and is obtained by adjusting on the basis of the calculated conference average sound absorption coefficient alpha in the step (4). In order to ensure that the binaural signal has better acoustic image distance feeling, the value of beta is 0.5-0.7, and is obtained by (8) mapping:

from the above calculation results, the total binaural sound signal for each auditorium can be calculated:

wherein e_{All_Li}(n)、e_{All_Ri}And (n) respectively represents the total left and right ear sound signals of the direct sound and the reflected sound of each auditorium after being added.

For theta less than or equal to-90 degrees_i<0 deg. theta for each auditorium_iThe adjustment is carried out according to the following method: to theta_iTaking the absolute value, and dividing | theta_iI and theta_QBy contrast, will theta_iAdjusted to the closest theta_QValue is recorded as θ'_i. Single-channel time-domain discrete signal e for signal of set speaker₀(n) represents, h_Li(n)、h_Ri(n) is θ'_iA corresponding pair of time domain discrete HRIRs. Step S3 is executed.

Then, the total binaural sound signal for each auditorium is calculated:

And S5, after the relative angle of each auditorium relative to the speaking seat is updated according to the horizontal angle change value of each auditorium earphone, selecting corresponding direct sound HRIR data and reflected sound HRIR data, and calculating the binaural sound signal under the dynamic condition by combining the average sound absorption coefficient of the target area.

In this embodiment, the updating the relative angle of each auditorium with respect to the floor according to the horizontal angle variation value of each auditorium earphone specifically includes: s51, acquiring the initial value of the horizontal angle of each auditorium earphone; s52, acquiring real-time horizontal angle values of the earphones of each auditorium in real time; and S53, calculating and updating the relative angle of each auditorium relative to the speech seat according to the initial value of the horizontal angle and the real-time value of the horizontal angle.

In this embodiment, the horizontal angle variation value of each auditorium earphone is obtained by installing a gyroscope sensor on an ear cup or a head beam of the earphone, so as to monitor the horizontal angle variation of the earphone; the gyroscope sensor cannot change relative to the earphone along with the rotation of the head; the horizontal angle range output by the gyroscope sensor is-180 degrees to 180 degrees; when the horizontal angle is 0 °, it represents a straight ahead; when the horizontal angle is 90 degrees, the right direction is indicated; when the horizontal angle is-90 deg., it indicates the positive left direction.

Specifically, a gyro sensor is mounted on the ear cup or on the head beam of the headphone used by the listener for monitoring the horizontal angle change of the headphone. The gyroscope sensor is required to be fixed well, and cannot change relative position with the earphone along with the rotation of the head. If the earmuffs are arranged on the earmuffs, the earmuffs can be left earmuffs or right earmuffs. The horizontal angle range of the gyroscope output is defined as-180 ° ≦ θ <180 °, θ ≦ 0 °, 90 °, and-90 ° indicating right front, right side, and right left side, respectively.

First, the initial value θ of the horizontal angle of the listener's head set is measured_{ref_i}. Before the conference, every audience is connectedThe earphone of the mat is placed on the table top and faces to the right front. Record the initial value theta of the horizontal angle of each auditorium earphone_{ref_i}。

Then, the horizontal angle real-time value θ of the listener's seat earphone is measured_{ref_i}. After the conference is started, the audience wears the earphones to listen to the speaking content. Recording real-time horizontal angle value theta of each auditorium earphone in real time_{RT_i}. The data refresh rate is required to be 20Hz or more.

Updating the calculated angle theta of the audience space relative to the floor_i：

θ_i＝θ_i–(θ_{RT_i}-θ_{ref_i}) (10)

The binaural sound signal is refreshed. For theta of 0 DEG or less_iIn the case of ≦ 90, step S4 is performed for θ ≦ 90 ≦ for-90 ≦ θ_i<In the case of 0 °, after calculating the absolute value of the relative angle, step S4 is executed.

The technical scheme has the advantages that:

1. each auditorium replays the speaker sound signals (or the translated sound signals) through the earphones, and the sound direction perceived by the auditorium is basically consistent with the actual sound source. After the listener's head is turned, the sound orientation perceived by the listener is still substantially identical to the actual sound source.

2. The method has the advantages of small data quantity, small operand and easy realization. Only HRTF (HRIR) data of 13 directions (9 direct sounds and 4 reflected sounds) need to be stored, only HRIR data of 4 fixed directions need to simulate reflected sounds, and the sound image distance sense is close to the sound image distance sense obtained by calculating the reflected sounds to the second order (36 directions in total) through a virtual source method.

3. Compared with the method that the earphone reproduces a single-channel speech signal, the sound definition is improved, the time interval between the reflected sound and the direct sound is controlled within 20ms, and the sound image carries direction information.

The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.

Claims

1. A method of headset signal feeding for a conference, comprising:

acquiring spatial parameters of a target area, uniformly taking a plurality of positions on an auditorium to measure reverberation time, and calculating average reverberation time;

calculating according to the average reverberation time and the space parameters to obtain an average sound absorption coefficient of a target area;

selecting corresponding direct sound HRIR data and reflected sound HRIR data according to the relative angle of each auditorium relative to the speech seat, and calculating binaural sound signals under the static condition by combining the average sound absorption coefficient of the target area; it includes:

selecting the corresponding direct sound HRIR data according to the relative angle to calculate and obtain a left ear direct sound signal and a right ear direct sound signal, specifically,

convolving left ear data and right ear data of the time domain discrete HRIR data with single-channel time domain discrete data of a speech floor signal respectively, and calculating to obtain a left ear direct sound signal and a right ear direct sound signal;

2. The method of headset signal feed for conferencing of claim 1, wherein the obtaining of direct sound HRIR data for the floor direction level, comprises:

within the angle range of 0-90 degrees of the relative angle, selecting HRIR data of nine different relative angles as candidate direct sound HRIR data in a distributed manner.

3. The method of headset signal feed for conferencing of claim 2, wherein the acquiring reflected acoustic HRIR data for a horizontal plane specifically comprises:

within the angle ranges of-180 DEG to-90 DEG and-90 DEG to 0 DEG of the relative angle, two angle directions are respectively selected, and HRIR data in four fixed angle directions are used as reflected sound HRIR data.

4. A method of headphone signal feed for conferences as claimed in claim 3 wherein the selection of corresponding direct sound HRIR data and reflected sound HRIR data as a function of the relative angle of each auditorium to the auditorium and the calculation of binaural sound signals in a static situation in combination with the average sound absorption coefficient for the target area comprises:

5. The method of claim 3, wherein updating the relative angle of each auditorium with respect to the floor based on the change in the horizontal angle of each auditorium's headphones comprises:

acquiring a horizontal angle initial value of each auditorium earphone;

6. The method of earphone signal feed for conferences of claim 4, wherein the calculating left and right ear reflected sound signals from the average sound absorption coefficient and from the relative angle selecting their corresponding reflected sound HRIR data, comprises:

calculating to obtain a left ear reflected sound signal and a right ear reflected sound signal according to a pair of time domain discrete HRIR data corresponding to each reflected sound, single-path time domain discrete data of the speech floor signal, a delay sample point value and an intensity coefficient of the reflected sound; and the numerical value of the delay sample point ranges from 10ms to 20 ms.

7. The method of headphone signal feed for a conference as described in claim 6, wherein in comparing the relative angle to nine distributively selected relative angles, further comprising:

when the relative angle is within an angle range of 0-90 degrees, directly comparing the relative angle with nine opposite angles selected in a distributed mode;

when the relative angle is in an angle range of-90 degrees to 0 degrees, calculating an absolute value of the relative angle, and comparing the absolute value with nine opposite angles selected in a distributed mode;

and when the relative angle is in the angle range of-90-0 degrees, interchanging the total left and right ear sound signals obtained by calculation.

8. The method of claim 1, wherein the value of the change in horizontal angle of each auditorium headset is obtained by mounting a gyro sensor on the ear cup or on the head rail of the headset for monitoring the change in horizontal angle of the headset; the gyroscope sensor cannot change relative to the earphone along with the rotation of the head;