CN112954579A - Method and device for reproducing on-site listening effect - Google Patents

Method and device for reproducing on-site listening effect Download PDF

Info

Publication number
CN112954579A
CN112954579A CN202110104317.XA CN202110104317A CN112954579A CN 112954579 A CN112954579 A CN 112954579A CN 202110104317 A CN202110104317 A CN 202110104317A CN 112954579 A CN112954579 A CN 112954579A
Authority
CN
China
Prior art keywords
sound signal
ear
signal received
left ear
right ear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110104317.XA
Other languages
Chinese (zh)
Other versions
CN112954579B (en
Inventor
闫震海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Music Entertainment Technology Shenzhen Co Ltd
Original Assignee
Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Music Entertainment Technology Shenzhen Co Ltd filed Critical Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority to CN202110104317.XA priority Critical patent/CN112954579B/en
Publication of CN112954579A publication Critical patent/CN112954579A/en
Application granted granted Critical
Publication of CN112954579B publication Critical patent/CN112954579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/00992Circuits for stereophonic or quadraphonic recording or reproducing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10009Improvement or modification of read or write signals
    • G11B20/10481Improvement or modification of read or write signals optimisation methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10592Audio or video recording specifically adapted for recording or reproducing multichannel signals
    • G11B2020/10601Audio or video recording specifically adapted for recording or reproducing multichannel signals surround sound signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Stereophonic System (AREA)

Abstract

A method and a device for reproducing the on-site listening effect. The sharing end equipment corrects the sound signal received by the left ear according to the first delay time of the played sound signal relative to the sound signal received by the left ear, and corrects the sound signal received by the right ear according to the second delay time of the played sound signal relative to the sound signal received by the right ear; obtaining a left ear/right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear/right ear; obtaining a left ear/right ear time domain pulse vector according to the left ear/right ear frequency domain amplitude response; and sending the left ear/right ear time domain pulse vector to the shared end equipment. The shared end equipment outputs the left/right sound channel signals according to the left ear/right ear time domain pulse vectors sent by the shared end equipment, and the reproduction of the on-site listening effect is reliably realized.

Description

Method and device for reproducing on-site listening effect
Technical Field
The present disclosure relates to the field of multimedia processing technologies, and in particular, to a method and an apparatus for reproducing a live listening effect.
Background
Sharing the live listening effect to others (i.e., reproduction of the live listening effect) is a common user requirement. The reproduction of the live listening effect refers to the reproduction of the overall listening effect (including factors such as environmental and physical characteristics) rather than just the song itself.
If the user wants to share the effect of the external sound reproduction of the loudspeaker of the user to other people who are not on the spot, the most direct method in the prior art is to record the sound heard by the user by using a microphone of a mobile phone. Or recording the distance and angle of the current loudspeaker, and correspondingly selecting a proper Head Related Transfer Function (HRTF) in another space to synthesize a binaural stereo signal. Both of these approaches have significant drawbacks:
1) the recording method records the sound effect and records irrelevant environmental noise signals. If the sound is recorded by a single microphone, the stereo effect cannot be reproduced; in the case of dual-microphone recording, the surround effect actually heard at the moment cannot be accurately reproduced due to the fact that human characteristic parameters of sharers are not combined.
2) The HRTF function selected by the second method is only common mode data. Personalized matching cannot be carried out on the human body physical sign parameters of the current scene and the sharer. The azimuth information of the music can be only roughly presented, and the actual listening effect of the current room is not achieved.
Disclosure of Invention
The present disclosure provides a method and apparatus for reproducing a live listening effect.
In a first aspect, a method for reproducing a live listening effect is provided, which is applied to a sharing-end device, and includes:
acquiring sound signals respectively received by a left ear and a right ear, and acquiring a first delay time of a played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear; wherein the sound signal received by the left ear and the sound signal received by the right ear are the played sound signals;
correcting the sound signal received by the left ear according to the first delay time, and correcting the sound signal received by the right ear according to the second delay time;
obtaining a left ear frequency domain amplitude response according to the played sound signal and the sound signal received by the modified left ear, and obtaining a right ear frequency domain amplitude response according to the played sound signal and the sound signal received by the modified right ear;
obtaining a left ear time domain pulse vector according to the left ear frequency domain amplitude response, and obtaining a right ear time domain pulse vector according to the right ear frequency domain amplitude response;
and sending the left ear time domain pulse vector and the right ear time domain pulse vector to a shared end device.
In one possible implementation, the obtaining a first delay time of the played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear includes:
performing a cross correlation operation on the sound signal received by the left ear and the played sound signal to obtain a peak position of the sound signal received by the left ear, and performing a cross correlation operation on the sound signal received by the right ear and the played sound signal to obtain a peak position of the sound signal received by the right ear;
determining a first delay time of the sound signal received by the left ear according to the peak position of the sound signal received by the left ear and a first preset delay, and determining a second delay time of the sound signal received by the right ear according to the peak position of the sound signal received by the right ear and the first preset delay.
In yet another possible implementation, the modifying the sound signal received by the left ear according to the first delay time and modifying the sound signal received by the right ear according to the second delay time includes:
translating the sound signal received by the left ear according to the first delay time, so that the sound signal received by the left ear is aligned with the played sound signal;
and translating the sound signal received by the right ear according to the second delay time, so that the sound signal received by the right ear is aligned with the played sound signal.
In yet another possible implementation, the method further comprises:
and according to the frequency domain amplitude response of the sharing end equipment, respectively correcting the left ear frequency domain amplitude response and the right ear frequency domain amplitude response to obtain a corrected left ear frequency domain amplitude response and a corrected right ear frequency domain amplitude response.
In another possible implementation, the obtaining a left-ear time-domain pulse vector according to the left-ear frequency-domain amplitude response and obtaining a right-ear time-domain pulse vector according to the right-ear frequency-domain amplitude response includes:
transforming the modified left ear frequency domain amplitude response to a time domain to obtain a cepstrum vector of the left ear frequency domain amplitude response, and transforming the modified right ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the right ear frequency domain amplitude response;
and weighting the cepstrum vector of the left ear frequency domain amplitude response to obtain the left ear time domain pulse vector, and weighting the cepstrum vector of the right ear frequency domain amplitude response to obtain the right ear time domain pulse vector.
In a second aspect, a method for reproducing a live listening effect is provided, which is applied to a shared end device, and includes:
receiving a left ear time domain pulse vector and a right ear time domain pulse vector from a sharing end device;
associating the left ear time domain pulse vector with a left channel of the shared end device to obtain a left channel signal of the shared end device;
associating the right ear time domain pulse vector with a right channel of the shared end device to obtain a right channel signal of the shared end device;
outputting the left channel signal and the right channel signal.
In one possible implementation, the association is a convolution operation.
In a third aspect, there is provided an apparatus for reproducing live listening effects, the apparatus comprising:
a first obtaining unit, configured to obtain sound signals received by a left ear and a right ear, respectively, and obtain a first delay time of a played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear; wherein the sound signal received by the left ear and the sound signal received by the right ear are the played sound signals;
a first correction unit, configured to correct the sound signal received by the left ear according to the first delay time, and correct the sound signal received by the right ear according to the second delay time;
a second obtaining unit, configured to obtain a left ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear, and obtain a right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the right ear;
a third obtaining unit, configured to obtain a left-ear time-domain pulse vector according to the left-ear frequency-domain amplitude response, and obtain a right-ear time-domain pulse vector according to the right-ear frequency-domain amplitude response;
and the sending unit is used for sending the left ear time domain pulse vector and the right ear time domain pulse vector to shared end equipment.
In one possible implementation, the first obtaining unit includes:
a cross-correlation operation unit, configured to perform a cross-correlation operation on the sound signal received by the left ear and the played sound signal to obtain a peak position of the sound signal received by the left ear, and perform a cross-correlation operation on the sound signal received by the right ear and the played sound signal to obtain a peak position of the sound signal received by the right ear;
the determining unit is used for determining a first delay time of the sound signal received by the left ear according to the peak position of the sound signal received by the left ear and a first preset delay, and determining a second delay time of the sound signal received by the right ear according to the peak position of the sound signal received by the right ear and the first preset delay.
In yet another possible implementation, the first modifying unit is configured to translate the sound signal received by the left ear according to the first delay time, so that the sound signal received by the left ear is aligned with the played sound signal;
the first correction unit is further configured to translate the sound signal received by the right ear according to the second delay time, so that the sound signal received by the right ear is aligned with the played sound signal.
In yet another possible implementation, the apparatus further includes:
and the second correction unit is used for correcting the left ear frequency domain amplitude response and the right ear frequency domain amplitude response respectively according to the frequency domain amplitude response of the sharing end equipment, so as to obtain the corrected left ear frequency domain amplitude response and the corrected right ear frequency domain amplitude response.
In yet another possible implementation, the third obtaining unit includes:
the time domain transformation unit is used for transforming the modified left ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the left ear frequency domain amplitude response, and transforming the modified right ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the right ear frequency domain amplitude response;
and the weighting unit is used for weighting the cepstrum vector of the left ear frequency domain amplitude response to obtain the left ear time domain pulse vector and weighting the cepstrum vector of the right ear frequency domain amplitude response to obtain the right ear time domain pulse vector.
In a fourth aspect, there is provided an apparatus for reproducing a live listening effect, the apparatus comprising:
the receiving unit is used for receiving the left ear time domain pulse vector and the right ear time domain pulse vector from the sharing end equipment;
the correlation unit is used for correlating the left ear time domain pulse vector with a left channel of the shared end equipment to obtain a left channel signal of the shared end equipment;
the correlation unit is further configured to correlate the right ear time domain pulse vector with a right channel of the shared end device, so as to obtain a right channel signal of the shared end device;
an output unit for outputting the left channel signal and the right channel signal.
In one possible implementation, the association is a convolution operation.
In a fifth aspect, there is provided a device for reproducing a live listening effect, comprising: a processor and a memory, the memory having stored therein program instructions, the processor executing the program instructions to implement the method as described in the first aspect or any possible implementation of the first aspect.
In a sixth aspect, there is provided an apparatus for reproducing a live listening effect, comprising: a processor and a memory, the memory having stored therein program instructions, the processor executing the program instructions to implement the method as described in the second aspect or any possible implementation of the second aspect.
In a seventh aspect, there is provided a computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to execute the method of any one of the above aspects or any one of the possible implementations of any one of the above aspects.
The method and the device for reproducing the on-site listening effect have the following beneficial effects:
the sharing end equipment corrects the sound signal received by the left ear according to the first delay time of the played sound signal relative to the sound signal received by the left ear, and corrects the sound signal received by the right ear according to the second delay time of the played sound signal relative to the sound signal received by the right ear; obtaining a left ear/right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear/right ear; obtaining a left ear/right ear time domain pulse vector according to the left ear/right ear frequency domain amplitude response; and sending the left ear/right ear time domain pulse vector to the shared end equipment. The shared end equipment outputs the left/right sound channel signals according to the left ear/right ear time domain pulse vectors sent by the shared end equipment, and the reproduction of the on-site listening effect is reliably realized.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a method for reproducing a live listening effect according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a method for reproducing a live listening effect according to an embodiment of the present disclosure;
FIG. 3 is a diagram of the delay effect of the right ear time domain pulse;
fig. 4 is a schematic structural diagram of an apparatus for reproducing a live listening effect according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a reproduction apparatus for a live listening effect provided by an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a reproduction apparatus for a live listening effect according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The present disclosure provides a reproduction scheme of a live listening effect, in which a sharing end device corrects a sound signal received by a left ear according to a first delay time of a played sound signal relative to the sound signal received by the left ear, and corrects a sound signal received by a right ear according to a second delay time of the played sound signal relative to the sound signal received by the right ear; obtaining a left ear/right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear/right ear; obtaining a left ear/right ear time domain pulse vector according to the left ear/right ear frequency domain amplitude response; and sending the left ear/right ear time domain pulse vector to the shared end equipment. The shared end equipment outputs the left/right sound channel signals according to the left ear/right ear time domain pulse vectors sent by the shared end equipment, and the reproduction of the on-site listening effect is reliably realized.
The sharing end device and the shared end device can be any electronic device with audio receiving and processing functions, such as a mobile phone, a tablet and the like. The sharing end device and the shared end device can be connected in a wired or wireless mode.
The above reproduction scheme of the live listening effect is described in detail by the specific embodiment below.
As shown in fig. 1, a flow chart of a method for reproducing a live listening effect according to an embodiment of the present disclosure may include the following steps:
s101, the sharing end equipment acquires the sound signals respectively received by the left ear and the right ear, and acquires first delay time of the played sound signals relative to the sound signals received by the left ear and second delay time of the played sound signals relative to the sound signals received by the right ear.
The sharing end equipment receives a sound signal which is played by a loudspeaker and reaches the position near the left ear after being reflected by a room and a human body, and the sound signal received by the left ear is obtained; and receiving the sound signal which is played by the loudspeaker and reaches the vicinity of the right ear after being reflected by the room and the human body, and obtaining the sound signal received by the right ear. The sound signal received by the left ear and the sound signal received by the right ear are played sound signals.
Since the playing of the speaker and the receiving of the sound signal by the sharing device are performed simultaneously, ideally without delay, the input signal (i.e. the played sound signal) and the output signal (i.e. the sound signal received by both ears) are in one-to-one correspondence. However, in a real situation, due to the actions of bluetooth and air transmission, the sharing device can receive the sound signal after the sound signal is played for a certain time. Thus, a first delay time of the played sound signal with respect to the sound signal received by the left ear and a second delay time of the played sound signal with respect to the sound signal received by the right ear are obtained.
S102, the sharing end device corrects the sound signal received by the left ear according to the first delay time, and corrects the sound signal received by the right ear according to the second delay time.
Since there is a certain deviation between the binaural received sound signal in the actual situation and the binaural received sound signal in the ideal non-delay situation, the sharing-end device needs to correct the sound signal received by the left ear according to the first delay time, and correct the sound signal received by the right ear according to the second delay time.
S103, the sharing end equipment obtains a left ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear, and obtains a right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the right ear.
And performing power spectrum (e.g. welch spectrum) estimation on the played sound signal, the modified sound signal received by the left ear and the modified sound signal received by the right ear to obtain corresponding spectral energy: a left ear frequency domain amplitude response and a right ear frequency domain amplitude response.
And S104, the sharing end equipment obtains a left ear time domain pulse vector according to the left ear frequency domain amplitude response and obtains a right ear time domain pulse vector according to the right ear frequency domain amplitude response.
And transforming the obtained left ear frequency domain amplitude response to the time domain to obtain a left ear time domain pulse vector, and transforming the obtained right ear frequency domain amplitude response to the time domain to obtain a right ear time domain pulse vector.
And S105, the sharing end equipment sends the left ear time domain pulse vector and the right ear time domain pulse vector to the shared end equipment.
Accordingly, the shared device receives the left ear time domain pulse vector and the right ear time domain pulse vector.
The above describes a method for obtaining the left-ear time-domain pulse vector and the right-ear time-domain pulse vector in the scene of obtaining any one speaker (the speaker located on the left side of the sharer (abbreviated as "left speaker") or the speaker located on the right side of the sharer (abbreviated as "right speaker"). Taking the placement of stereo speakers as an example, the calculation of the time domain pulse vectors from the left and right speakers to the left and right ears of the sharer can be sequentially completed with reference to the description of the above steps, and the time domain pulse vectors are respectively denoted as IRLL, IRLR, IRRR, and IRRL. Wherein IRLL represents the impulse vector from the left speaker to the left ear, IRLR represents the impulse vector from the left speaker to the right ear, IRRR represents the impulse vector from the right speaker to the right ear, and IRRL represents the impulse vector from the right speaker to the left ear. When a user wants to share the listening effect on the spot at the moment, the sharer can share the IRLL and the IRLR; or share IRRR and IRRL. Optionally, the sharer may also share the four impulse responses together with the information of the song that the user is listening to, so that the sharee can feel the current unique listening effect of the sharer.
The sharing end device can send the left ear time domain pulse vector and the right ear time domain pulse vector to the shared end device through wired or wireless connection.
And S106, the shared device associates the left ear time domain pulse vector with a left channel of the shared end device, and associates the right ear time domain pulse vector with a right channel of the shared end device to obtain a right channel signal of the shared end device.
Specifically, if the shared device receives the IRLL and the IRLR, the left ear time domain pulse vector of the left speaker and the left channel of the shared device are convolved to obtain a left channel signal of the shared device, and the right ear time domain pulse vector of the left speaker and the right channel of the shared device are convolved to obtain a right channel signal of the shared device.
And if the shared device receives the IRRL and the IRRR, performing convolution operation on the left ear time domain pulse vector of the right loudspeaker and the left channel of the shared end device to obtain a left channel signal of the shared end device, and performing convolution operation on the right ear time domain pulse vector of the right loudspeaker and the right channel of the shared end device to obtain a right channel signal of the shared end device.
If the shared device receives the four pulse vectors of IRLL, IRLR, IRRR and IRRL, at the shared device end, the shared device associates the left ear time domain pulse vector of the left loudspeaker with the left sound channel of the shared device, associates the left ear time domain pulse vector of the right loudspeaker with the right sound channel of the shared device, obtains the left sound channel signal of the shared device, associates the right ear time domain pulse vector of the right loudspeaker with the right sound channel of the shared device, and associates the right ear time domain pulse vector of the left loudspeaker with the left sound channel of the shared device, obtains the right sound channel signal of the shared device.
In particular, the association may be a convolution operation.
And S107, the shared device outputs a left channel signal and a right channel signal.
In another space, the shared device listens to the left and right sound channel signals by using the earphones, so that the on-site listening effect of the shared user can be realized to the greatest extent.
According to the method for reproducing the live listening effect provided by the disclosure, the sharing end device corrects the sound signal received by the left ear according to the first delay time of the played sound signal relative to the sound signal received by the left ear, and corrects the sound signal received by the right ear according to the second delay time of the played sound signal relative to the sound signal received by the right ear; obtaining a left ear/right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear/right ear; obtaining a left ear/right ear time domain pulse vector according to the left ear/right ear frequency domain amplitude response; and sending the left ear/right ear time domain pulse vector to the shared end equipment. The shared end equipment outputs left/right sound channel signals according to the left ear/right ear time domain pulse vectors sent by the sharing end equipment, and the reproduction of the on-site listening effect is reliably realized;
the method of the embodiment does not simply record the sound heard by the user by using the mobile phone microphone, because irrelevant environmental noise signals are recorded while the sound effect is recorded. The method comprises the steps that delay time of a sound signal reflected back to a left ear and a right ear under the environment where the sound signal is located is obtained through sharing end equipment, signals received by the left ear and the right ear are corrected and processed, and finally a left ear time domain pulse vector and a right ear time domain pulse vector in the actual environment are obtained, so that the influence of an environmental noise signal is overcome;
the method of the embodiment utilizes a cross-correlation method to calculate the time difference of the arrival time of the loudspeaker at two ears, breaks through the physical limitation of measuring equipment, can be completed only by a single microphone, and overcomes the problems that stereo effect cannot be reproduced by adopting single-microphone recording in the prior art and the surround effect which is really heard at the moment cannot be accurately reproduced by adopting double-microphone recording without combining human characteristic parameters of a sharer;
the method of the embodiment measures the delay time of the sound signal reflected back to the left ear and the right ear under the environment of the sharer, corrects and processes the received signals of the left ear and the right ear without adopting a common-mode HRTF function, so that personalized matching can be performed on the physical parameters of the current scene and the sharer, and the real listening effect of the current environment can be reproduced; and the transfer function is not required to be measured, the method does not depend on professional equipment and professionals, and common users can easily reproduce the on-site listening effect.
As shown in fig. 2, a flow chart of another method for reproducing a live listening effect provided by the embodiment of the present disclosure may include the following steps:
s201, the sharing end equipment acquires sound signals respectively received by a left ear and a right ear.
For example, the present embodiment uses a white noise signal (denoted as white noise) with a relatively uniform distribution of spectral components as a test signal. The method can be applied to any terminal having an audio receiving and processing function. Taking the sharing device as a mobile phone, the mobile phone can control the speaker of the player to play the content on site by performing bluetooth connection with the player. Meanwhile, a call microphone of the mobile phone is used as receiving equipment of the test signal. When the sound signal is received by the left ear, the microphone of the mobile phone is tightly attached to the left ear, and the sound guide port faces the right front direction so as to simulate the effect of receiving the sound signal by the left ear. The mobile phone controls the player to play a white noise signal, meanwhile, a microphone at the left ear receives a sound signal of which the test signal reaches the vicinity of the left ear after being reflected by a room and a human body, and records the sound signal as xLeft, namely the xLeft is the sound signal received by the left ear. When receiving the audio signal by the right ear, the same operation is repeated on the right ear side, and the audio signal is recorded and recorded as xRight, that is, xRight is the audio signal received by the right ear. The sound signal received by the left ear and the sound signal received by the right ear are played sound signals.
S202, the sharing device performs a cross correlation operation on the sound signal received by the left ear and the played sound signal to obtain a peak position of the sound signal received by the left ear, and performs a cross correlation operation on the sound signal received by the right ear and the played sound signal to obtain a peak position of the sound signal received by the right ear.
It is assumed that the length of the played sound signal (or input signal) and the recorded sound signal (or binaural received sound signal, output signal) is 6s and the sampling rate is 44100 Hz. Because the playing and recording of the mobile phone are carried out simultaneously, the played sound signals and the recorded sound signals are in one-to-one correspondence under the ideal condition without delay. If the cross-correlation is performed between the reproduced sound signal and the recorded sound signal, the peak should be at 44100 × 6. This position will be used as a reference point to estimate the exact position of the true delay.
However, in real situations, due to the actions of bluetooth and air transmission, the played sound signal is delayed for a certain time before being recorded by the microphone. Performing cross correlation on the sound signal received by the left ear and the sound signal played according to formula 1 to obtain a peak position IndexMax1 of the sound signal received by the left ear, and performing cross correlation on the sound signal received by the right ear and the sound signal played according to formula 2 to obtain a peak position IndexMax2 of the sound signal received by the right ear:
[ ConvYXMax1, IndexMax1] ═ max (conv, whiteNoise)) … … equation 1
[ ConvYXMax2, IndexMax2] ═ max (conv (xRight, whiteNoise)) … … equation 2
Wherein the function conv () represents the cross-correlation operation, the function max represents taking the maximum value of the returned vector and the index corresponding to the maximum value, ConvYXMax represents the maximum value of the returned vector, and IndexMax represents the index corresponding to the maximum value.
S203, the sharing device determines a first delay time for the left ear to receive the sound signal according to the peak position of the sound signal received by the left ear and the first preset delay, and determines a second delay time for the right ear to receive the sound signal according to the peak position of the sound signal received by the right ear and the first preset delay.
Since there is a certain deviation between the peak position IndexMax1 of the sound signal received by the left ear obtained as described above and the peak position of the sound signal received by the left ear in the ideal case without delay (the position of the first preset delay, for example, 44100 × 6), the total delay time for the speaker to reach the left ear is obtained as shown in the following equation 3:
DelayLeft ═ (IndexMax1-44100 x 6)/44100 … … formula 3
Similarly, the overall delay time DelayRight for the speaker to reach the right ear can also be found.
According to the definition of cross-correlation, in order to increase the calculation speed, the cross-correlation operation can also be converted into the operation of point-by-point multiplication in the frequency domain.
The time difference between the arrival of the speaker at the left and right ears is DelayLeft-DelayRight, taking the left ear as reference. The time difference effectively eliminates the delayed interference of Bluetooth and the like.
And S204, the sharing end equipment translates the sound signal received by the left ear according to the delay time of the left ear to align the sound signal received by the left ear with the played sound signal, translates the sound signal received by the right ear according to the delay time of the right ear to align the sound signal received by the right ear with the played sound signal, and corrects the played sound signal.
Since there is a certain deviation between the peak position IndexMax1 of the received audio signal of the left ear obtained as described above and the peak position of the received audio signal of the left ear in the ideal case without delay (the first preset delay, for example, the position of 44100 × 6), the sharing-end device needs to correct the received audio signal of the left ear according to the delay time of the left ear and correct the received audio signal of the right ear according to the delay time of the right ear.
Specifically, after the delay time for the speaker to reach both ears is obtained, the output signal is translated according to the delay time, so that the output signal is strictly aligned with the input signal. For example, for the left ear, the top DelayLeft irrelevant data points are discarded according to equation 4 below:
xleftere (xleyfleft +1: end) … … equation 4
Where end denotes the index of the last element of the output signal vector.
Similarly, the sound signal received by the right ear is translated according to the delay time of the right ear, so that the sound signal received by the right ear is aligned with the input signal.
In addition, the played sound signal can be modified, that is, the played sound signal correspondingly discards the last DelayLeft data points according to equation 5 so as to keep the length equal to the output signal,
whiteteineise (1: end-DelayLeft) … … equation 5
S205, the sharing device obtains a left-ear frequency domain amplitude response according to the modified left-ear received signal and the modified played sound signal, and obtains a right-ear frequency domain amplitude response according to the modified right-ear received signal and the modified played sound signal.
The modified played sound signal, the modified left-ear received sound signal, and the modified right-ear received sound signal are subjected to power spectrum (e.g., welch spectrum) estimation, and the corresponding spectral energies are respectively recorded as whiteNoiseReH, xLeftReH, and xright reh. The frequency domain amplitude response of the speaker to the left ear can be expressed according to equation 6 as:
HLeft ═ sqrt (xleftreh./whiteNoiseReH) … … equation 6
Where HLeft represents the left ear frequency domain amplitude response, the function sqrt () represents the square-on operation, and the symbol/represents the point-by-point division between the vectors. Similarly, the frequency domain amplitude response HRight for the speaker to reach the right ear can also be calculated.
S206, the sharing end device corrects the left ear frequency domain amplitude response and the right ear frequency domain amplitude response respectively according to the frequency domain amplitude response of the sharing end device, and the corrected left ear frequency domain amplitude response and the corrected right ear frequency domain amplitude response are obtained.
The handset microphone itself also has a certain frequency response that interferes with the amplitude response of the speaker to both ears. To eliminate microphone interference, the difference characteristic of the binaural amplitude response is preserved, requiring further processing of the amplitude response. Namely, the frequency domain response amplitude value with overlarge energy is reduced, and the frequency domain response amplitude value with undersized energy is raised. Specifically, the left ear frequency domain amplitude response is modified according to the following equation 7:
hleftre (k) ═ hleft (k)/sqrt (pow (hleft (k),2) + pow (hright (k),2)) … … formula 7
The right ear frequency domain amplitude response is modified according to the following equation 8:
HRightRe (k) ((k))/sqrt (pow (HLeft (k)), 2) + pow (HRight (k),2)) … … equation 8
Where k represents the index of the frequency domain vector and the function pow () represents the exponential operation.
And S207, the sharing end equipment transforms the modified left ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the left ear frequency domain amplitude response, and transforms the modified right ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the right ear frequency domain amplitude response.
After the frequency domain amplitude response of the loudspeaker reaching two ears is obtained, the loudspeaker is temporarily converted into a time domain according to a minimum phase principle due to the lack of phase information. The cepstrum vector of the left ear frequency domain response can be obtained according to equation 9:
HLeftReP (log (hleftre)) … … formula 9
Where the function ifft () represents the inverse fourier transform and the function log () represents the natural logarithm.
Similarly, the cepstrum vector of the right ear frequency domain response can be obtained by referring to equation 9.
S208, the sharing end device weights the cepstrum vector of the left ear frequency domain amplitude response to obtain a left ear time domain pulse vector, and weights the cepstrum vector of the right ear frequency domain amplitude response to obtain a right ear time domain pulse vector.
Specifically, the left ear cepstrum vector may be weighted according to formula 10 to obtain a left ear time domain pulse vector:
LeftIR (exp (W. HLeftReP))) … … formula 10
Wherein the function exp () represents an exponential operation with a natural constant e as the base, and fft () represents a fast fourier transform; the notation denotes the point-by-point multiplication between vectors, vector W denotes the weight factor of the cepstrum, the common values are: w ═ 1,2,2 … 2,1,0,0 … 0 ]; that is, only the first N/2+1 points of the vector are retained and the remainder, except for the first and last two points, are multiplied by 2. N is the length of the vector HLeftRep.
Similarly, the time domain impulse response RightIR of the minimum phase of the speaker to reach the right ear can be calculated.
The first peak of the time domain pulse recovered according to the minimum phase constraint is close to the origin and is equal in length due to the LeftIR and RightIR. Thus, the two delays can be considered equal. If the LeftIR is kept unchanged, the RightIR is delayed by Delay according to the following equation 11, where Delay is DelayLeft-delayleright. The difference in time for the speakers to reach both ears at this time can be considered to be Delay. The delay effect of the right ear time domain pulse is shown in fig. 3.
RightIR [ zeros (1, Delay), RightIR ] … … equation 11
Wherein the function zeros () represents 0 which generates a Delay point. In addition, the time domain impulse responses of the left ear and the right ear can introduce the same number of delay points on the basis of keeping the relative delay unchanged so as to meet the delay requirements of different scenes.
And S209, the sharing end equipment sends the left ear time domain pulse vector and the right ear time domain pulse vector to the shared end equipment.
Accordingly, the shared device receives the left ear time domain pulse vector and the right ear time domain pulse vector.
The above describes a method for obtaining the left-ear time-domain pulse vector and the right-ear time-domain pulse vector in the scene of obtaining any one speaker (the speaker located on the left side of the sharer (abbreviated as "left speaker") or the speaker located on the right side of the sharer (abbreviated as "right speaker"). Taking the placement of stereo speakers as an example, the calculation of the time domain pulse vectors from the left and right speakers to the left and right ears of the sharer can be sequentially completed with reference to the description of the above steps, and the time domain pulse vectors are respectively denoted as IRLL, IRLR, IRRR, and IRRL. Wherein IRLL represents the impulse vector from the left speaker to the left ear, IRLR represents the impulse vector from the left speaker to the right ear, IRRR represents the impulse vector from the right speaker to the right ear, and IRRL represents the impulse vector from the right speaker to the left ear. When the user wants to share the listening effect of the scene at the moment, the system will share out together with the four pulse vectors and the information of the song the user is listening to. Optionally, the sharer may also share IRLL and IRLR; or share IRRR and IRRL.
Specifically, the sharing end device may send the left ear time domain pulse vector and the right ear time domain pulse vector to the shared end device through wired or wireless connection.
S210, the shared device performs convolution operation on the left ear time domain pulse vector and a left channel of the shared end device to obtain a left channel signal of the shared end device, and performs convolution operation on the right ear time domain pulse vector and a right channel of the shared end device to obtain a right channel signal of the shared end device.
Specifically, if the shared device receives the IRLL and the IRLR, the left ear time domain pulse vector of the left speaker and the left channel of the shared device are convolved to obtain a left channel signal of the shared device, and the right ear time domain pulse vector of the left speaker and the right channel of the shared device are convolved to obtain a right channel signal of the shared device.
And if the shared device receives the IRRL and the IRRR, performing convolution operation on the left ear time domain pulse vector of the right loudspeaker and the left channel of the shared end device to obtain a left channel signal of the shared end device, and performing convolution operation on the right ear time domain pulse vector of the right loudspeaker and the right channel of the shared end device to obtain a right channel signal of the shared end device.
If the shared device receives the four pulse vectors of IRLL, IRLR, IRRR and IRRL, the left ear time domain pulse vector of the left loudspeaker and the left sound channel of the shared end device are subjected to convolution operation, the left ear time domain pulse vector of the right loudspeaker and the right sound channel of the shared end device are subjected to convolution operation, a left sound channel signal of the shared end device is obtained, the right ear time domain pulse vector of the right loudspeaker and the right sound channel of the shared end device are subjected to convolution operation, the right ear time domain pulse vector of the left loudspeaker and the left sound channel of the shared end device are subjected to convolution operation, and a right sound channel signal of the shared end device is obtained.
At the shared device, all the information is combined and can be played back by the earphone. Assuming that the shared song is stereo music, the left and right channels are denoted as xL and xR, respectively. The binaural signal reconstructed by the headphones is xlchar and xrchar, respectively, which can be expressed as,
Figure BDA0002916741270000121
Figure BDA0002916741270000122
wherein, the symbol
Figure BDA0002916741270000131
Representing a convolution operation between vectors.
And S211, the shared device outputs a left channel signal and a right channel signal.
In another space, the shared device can listen to the left and right channel signals xlchar and xrchar by using the earphones, so that the listening effect of the site of the shared user can be realized to the maximum extent.
According to the method for reproducing the live listening effect provided by the disclosure, the sharing end device corrects the sound signal received by the left ear according to the first delay time of the played sound signal relative to the sound signal received by the left ear, and corrects the sound signal received by the right ear according to the second delay time of the played sound signal relative to the sound signal received by the right ear; obtaining a left ear/right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear/right ear; obtaining a left ear/right ear time domain pulse vector according to the left ear/right ear frequency domain amplitude response; and sending the left ear/right ear time domain pulse vector to the shared end equipment. The shared end equipment outputs the left/right sound channel signals according to the left ear/right ear time domain pulse vectors sent by the shared end equipment, and the reproduction of the on-site listening effect is reliably realized.
Based on the same concept of the method for reproducing the live listening effect, the present disclosure also provides a device for reproducing the live listening effect.
When part or all of the method for reproducing the live listening effect of the above embodiment is implemented by software or firmware, it can be implemented by the apparatus 1000, 2000 for reproducing the live listening effect provided in fig. 4 and 5.
Fig. 4 is a schematic structural diagram of an apparatus for reproducing a live listening effect according to an embodiment of the present application, where the apparatus may be the sharing-side device. The apparatus 1000 comprises:
a first obtaining unit 11, configured to obtain sound signals received by a left ear and a right ear, respectively, and obtain a first delay time of a played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear; wherein the sound signal received by the left ear and the sound signal received by the right ear are the played sound signals;
a first correcting unit 12, configured to correct the sound signal received by the left ear according to the first delay time, and correct the sound signal received by the right ear according to the second delay time;
a second obtaining unit 13, configured to obtain a left ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear, and obtain a right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the right ear;
a third obtaining unit 15, configured to obtain a left-ear time-domain pulse vector according to the left-ear frequency-domain amplitude response, and obtain a right-ear time-domain pulse vector according to the right-ear frequency-domain amplitude response;
a sending unit 16, configured to send the left ear time domain pulse vector and the right ear time domain pulse vector to a shared device.
In one possible implementation, the first obtaining unit 11 includes:
a cross-correlation operation unit 111, configured to perform a cross-correlation operation on the sound signal received by the left ear and the played sound signal to obtain a peak position of the sound signal received by the left ear, and perform a cross-correlation operation on the sound signal received by the right ear and the played sound signal to obtain a peak position of the sound signal received by the right ear;
the determining unit 112 is configured to determine a first delay time for the left ear to receive the sound signal according to the peak position of the sound signal received by the left ear and a first preset delay, and determine a second delay time for the right ear to receive the sound signal according to the peak position of the sound signal received by the right ear and the first preset delay.
In yet another possible implementation, the first modification unit 12 is configured to translate the sound signal received by the left ear according to the first delay time, so that the sound signal received by the left ear is aligned with the played sound signal;
the first correcting unit 12 is further configured to translate the sound signal received by the right ear according to the second delay time, so that the sound signal received by the right ear is aligned with the played sound signal.
In yet another possible implementation, the apparatus further includes:
the second correcting unit 14 is configured to correct the left ear frequency domain amplitude response and the right ear frequency domain amplitude response respectively according to the frequency domain amplitude response of the sharing end device itself, so as to obtain a corrected left ear frequency domain amplitude response and a corrected right ear frequency domain amplitude response.
In yet another possible implementation, the third obtaining unit 15 includes:
a time domain transforming unit 151, configured to transform the modified left ear frequency domain amplitude response to a time domain to obtain a cepstrum vector of the left ear frequency domain amplitude response, and transform the modified right ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the right ear frequency domain amplitude response;
a weighting unit 152, configured to weight the cepstrum vector of the left-ear frequency domain amplitude response to obtain the left-ear time domain pulse vector, and weight the cepstrum vector of the right-ear frequency domain amplitude response to obtain the right-ear time domain pulse vector.
According to the present disclosure, there is provided a live listening effect reproducing apparatus for correcting a sound signal received by a left ear according to a first delay time of a played sound signal with respect to the sound signal received by the left ear, and correcting a sound signal received by a right ear according to a second delay time of the played sound signal with respect to the sound signal received by the right ear; obtaining a left ear/right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear/right ear; obtaining a left ear/right ear time domain pulse vector according to the left ear/right ear frequency domain amplitude response; and sending the left ear/right ear time domain pulse vector to the shared end equipment. The shared end equipment outputs the left/right sound channel signals according to the left ear/right ear time domain pulse vectors sent by the shared end equipment, and the reproduction of the on-site listening effect is reliably realized.
Fig. 5 is a schematic structural diagram of another apparatus for reproducing a live listening effect according to an embodiment of the present application, which may be the shared-end device. The apparatus 2000 comprises:
a receiving unit 21, configured to receive a left-ear time domain pulse vector and a right-ear time domain pulse vector from a sharing end device;
the associating unit 22 is configured to associate the left ear time domain pulse vector with a left channel of the shared end device, so as to obtain a left channel signal of the shared end device;
the associating unit 22 is further configured to associate the right ear time domain pulse vector with a right channel of the shared end device, so as to obtain a right channel signal of the shared end device;
an output unit 23, configured to output the left channel signal and the right channel signal.
In one possible implementation, the association is a convolution operation.
According to the field listening effect reproduction device provided by the disclosure, the device receives the left ear time domain pulse vector and the right ear time domain pulse vector sent by the sharing end equipment, and outputs the left sound channel signal and the right sound channel signal according to the left ear time domain pulse vector and the right ear time domain pulse vector sent by the sharing end equipment, so that the reproduction of the field listening effect is reliably realized.
Alternatively, the means for reproducing the live listening effect may be embodied as a chip or an integrated circuit.
Alternatively, when part or all of the method for reproducing the live listening effect of the above embodiment is implemented by hardware, it can be implemented by the apparatus 3000 for reproducing the live listening effect provided in fig. 6.
Fig. 6 is a schematic structural diagram of a reproduction apparatus for a live listening effect according to an embodiment of the present disclosure. In one embodiment, the apparatus for reproducing the live listening effect may implement the operations of the sharing end device or the shared end device in the embodiments shown in fig. 1 or fig. 2, respectively. As shown in fig. 6, the apparatus for reproducing the live listening effect may include: the processor, the network interface and the memory, and the apparatus for reproducing the live listening effect may further comprise: a user interface, and at least one communication bus. Wherein the communication bus is used for realizing connection communication among the components. The user interface may include a display screen (display) and a keyboard (keyboard), and the selectable user interface may also include a standard wired interface and a standard wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory may optionally be at least one memory device located remotely from the processor. As shown in fig. 6, a memory, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the reproduction apparatus of the live listening effect shown in fig. 6, the network interface may provide a network communication function; the user interface is mainly used for providing an input interface for a user; the processor may be configured to invoke the device control application stored in the memory to implement the description of the method for reproducing the spot listening effect in the embodiment corresponding to any one of fig. 1 or fig. 2, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
It should be understood that the apparatus for reproducing the live listening effect described in the embodiments of the present disclosure may perform the description of the method for reproducing the live listening effect in the corresponding embodiment of any one of fig. 1 or fig. 2, and thus, the description thereof is omitted here. In addition, the beneficial effects of the same method are not described in detail.
Further, here, it is to be noted that: the present disclosure also provides a computer-readable storage medium, and the computer-readable storage medium stores therein a computer program executed by the aforementioned live listening effect reproduction apparatus 1000 or 2000, and the computer program includes program instructions, and when the program instructions are executed by a processor, the description of the method for reproducing the live listening effect in any one of the embodiments of fig. 1 or 2 can be performed, and therefore, the description thereof will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium to which the present disclosure relates, refer to the description of embodiments of the method of the present disclosure.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the disclosure are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

Claims (10)

1. A method for reproducing a live listening effect, which is applied to a sharing-end device, the method comprising:
acquiring sound signals respectively received by a left ear and a right ear, and acquiring a first delay time of a played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear; wherein the sound signal received by the left ear and the sound signal received by the right ear are the played sound signals;
correcting the sound signal received by the left ear according to the first delay time, and correcting the sound signal received by the right ear according to the second delay time;
obtaining a left ear frequency domain amplitude response according to the played sound signal and the sound signal received by the modified left ear, and obtaining a right ear frequency domain amplitude response according to the played sound signal and the sound signal received by the modified right ear;
obtaining a left ear time domain pulse vector according to the left ear frequency domain amplitude response, and obtaining a right ear time domain pulse vector according to the right ear frequency domain amplitude response;
and sending the left ear time domain pulse vector and the right ear time domain pulse vector to a shared end device.
2. The method of claim 1, wherein obtaining a first delay time of the played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear comprises:
performing a cross correlation operation on the sound signal received by the left ear and the played sound signal to obtain a peak position of the sound signal received by the left ear, and performing a cross correlation operation on the sound signal received by the right ear and the played sound signal to obtain a peak position of the sound signal received by the right ear;
determining a first delay time of the sound signal received by the left ear according to the peak position of the sound signal received by the left ear and a first preset delay, and determining a second delay time of the sound signal received by the right ear according to the peak position of the sound signal received by the right ear and the first preset delay.
3. The method according to claim 1 or 2, wherein the modifying the sound signal received by the left ear according to the first delay time and modifying the sound signal received by the right ear according to the second delay time comprises:
translating the sound signal received by the left ear according to the first delay time, so that the sound signal received by the left ear is aligned with the played sound signal;
and translating the sound signal received by the right ear according to the second delay time, so that the sound signal received by the right ear is aligned with the played sound signal.
4. The method of claim 1, further comprising:
and according to the frequency domain amplitude response of the sharing end equipment, respectively correcting the left ear frequency domain amplitude response and the right ear frequency domain amplitude response to obtain a corrected left ear frequency domain amplitude response and a corrected right ear frequency domain amplitude response.
5. The method of claim 4, wherein obtaining a left ear time domain pulse vector from the left ear frequency domain magnitude response and a right ear time domain pulse vector from the right ear frequency domain magnitude response comprises:
transforming the modified left ear frequency domain amplitude response to a time domain to obtain a cepstrum vector of the left ear frequency domain amplitude response, and transforming the modified right ear frequency domain amplitude response to the time domain to obtain a cepstrum vector of the right ear frequency domain amplitude response;
and weighting the cepstrum vector of the left ear frequency domain amplitude response to obtain the left ear time domain pulse vector, and weighting the cepstrum vector of the right ear frequency domain amplitude response to obtain the right ear time domain pulse vector.
6. A method for reproducing a live listening effect, applied to a shared device, the method comprising:
receiving a left ear time domain pulse vector and a right ear time domain pulse vector from a sharing end device;
associating the left ear time domain pulse vector with a left channel of the shared end device to obtain a left channel signal of the shared end device;
associating the right ear time domain pulse vector with a right channel of the shared end device to obtain a right channel signal of the shared end device;
outputting the left channel signal and the right channel signal.
7. The method of claim 6, wherein the correlation is a convolution operation.
8. An apparatus for reproducing live listening effects, said apparatus comprising:
a first obtaining unit, configured to obtain sound signals received by a left ear and a right ear, respectively, and obtain a first delay time of a played sound signal relative to the sound signal received by the left ear and a second delay time of the played sound signal relative to the sound signal received by the right ear; wherein the sound signal received by the left ear and the sound signal received by the right ear are the played sound signals;
a first correction unit, configured to correct the sound signal received by the left ear according to the first delay time, and correct the sound signal received by the right ear according to the second delay time;
a second obtaining unit, configured to obtain a left ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the left ear, and obtain a right ear frequency domain amplitude response according to the played sound signal and the modified sound signal received by the right ear;
a third obtaining unit, configured to obtain a left-ear time-domain pulse vector according to the left-ear frequency-domain amplitude response, and obtain a right-ear time-domain pulse vector according to the right-ear frequency-domain amplitude response;
and the sending unit is used for sending the left ear time domain pulse vector and the right ear time domain pulse vector to shared end equipment.
9. The apparatus of claim 8, wherein the first obtaining unit comprises:
a cross-correlation operation unit, configured to perform a cross-correlation operation on the sound signal received by the left ear and the played sound signal to obtain a peak position of the sound signal received by the left ear, and perform a cross-correlation operation on the sound signal received by the right ear and the played sound signal to obtain a peak position of the sound signal received by the right ear;
the determining unit is used for determining a first delay time of the sound signal received by the left ear according to the peak position of the sound signal received by the left ear and a first preset delay, and determining a second delay time of the sound signal received by the right ear according to the peak position of the sound signal received by the right ear and the first preset delay.
10. An apparatus for reproducing live listening effects, said apparatus comprising:
the receiving unit is used for receiving the left ear time domain pulse vector and the right ear time domain pulse vector from the sharing end equipment;
the correlation unit is used for correlating the left ear time domain pulse vector with a left channel of the shared end equipment to obtain a left channel signal of the shared end equipment;
the correlation unit is further configured to correlate the right ear time domain pulse vector with a right channel of the shared end device, so as to obtain a right channel signal of the shared end device;
an output unit for outputting the left channel signal and the right channel signal.
CN202110104317.XA 2021-01-26 2021-01-26 Method and device for reproducing on-site listening effect Active CN112954579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104317.XA CN112954579B (en) 2021-01-26 2021-01-26 Method and device for reproducing on-site listening effect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104317.XA CN112954579B (en) 2021-01-26 2021-01-26 Method and device for reproducing on-site listening effect

Publications (2)

Publication Number Publication Date
CN112954579A true CN112954579A (en) 2021-06-11
CN112954579B CN112954579B (en) 2022-11-18

Family

ID=76237092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104317.XA Active CN112954579B (en) 2021-01-26 2021-01-26 Method and device for reproducing on-site listening effect

Country Status (1)

Country Link
CN (1) CN112954579B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130315422A1 (en) * 2012-05-24 2013-11-28 Canon Kabushiki Kaisha Sound reproduction apparatus and sound reproduction method
CN104853283A (en) * 2015-04-24 2015-08-19 华为技术有限公司 Audio signal processing method and apparatus
CN106797519A (en) * 2014-10-02 2017-05-31 索诺瓦公司 The method that hearing auxiliary is provided between users in self-organizing network and correspondence system
WO2021004362A1 (en) * 2019-07-10 2021-01-14 阿里巴巴集团控股有限公司 Audio data processing method and apparatus, and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130315422A1 (en) * 2012-05-24 2013-11-28 Canon Kabushiki Kaisha Sound reproduction apparatus and sound reproduction method
CN106797519A (en) * 2014-10-02 2017-05-31 索诺瓦公司 The method that hearing auxiliary is provided between users in self-organizing network and correspondence system
CN104853283A (en) * 2015-04-24 2015-08-19 华为技术有限公司 Audio signal processing method and apparatus
WO2021004362A1 (en) * 2019-07-10 2021-01-14 阿里巴巴集团控股有限公司 Audio data processing method and apparatus, and electronic device
CN112287129A (en) * 2019-07-10 2021-01-29 阿里巴巴集团控股有限公司 Audio data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN112954579B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN107852563B (en) Binaural audio reproduction
KR101333031B1 (en) Method of and device for generating and processing parameters representing HRTFs
US8699742B2 (en) Sound system and a method for providing sound
WO2017185663A1 (en) Method and device for increasing reverberation
JP5533248B2 (en) Audio signal processing apparatus and audio signal processing method
JP4780119B2 (en) Head-related transfer function measurement method, head-related transfer function convolution method, and head-related transfer function convolution device
Valimaki et al. Assisted listening using a headset: Enhancing audio perception in real, augmented, and virtual environments
US10652686B2 (en) Method of improving localization of surround sound
CN104604254A (en) Audio processing device, method, and program
CN108605193A (en) Audio output device, method of outputting acoustic sound, program and audio system
JP2002209300A (en) Sound image localization device, conference unit using the same, portable telephone set, sound reproducer, sound recorder, information terminal equipment, game machine and system for communication and broadcasting
WO2018193163A1 (en) Enhancing loudspeaker playback using a spatial extent processed audio signal
CN112954579B (en) Method and device for reproducing on-site listening effect
JPH05168097A (en) Method for using out-head sound image localization headphone stereo receiver
CN108605197B (en) Filter generation device, filter generation method, and sound image localization processing method
US11388540B2 (en) Method for acoustically rendering the size of a sound source
JP5163685B2 (en) Head-related transfer function measurement method, head-related transfer function convolution method, and head-related transfer function convolution device
KR101111734B1 (en) Sound reproduction method and apparatus distinguishing multiple sound sources
JP7319687B2 (en) 3D sound processing device, 3D sound processing method and 3D sound processing program
CN211860528U (en) Headset and audio processing system
WO2023210699A1 (en) Sound generation device, sound reproduction device, sound generation method, and sound signal processing program
KR100494288B1 (en) A apparatus and method of multi-channel virtual audio
JP2023164284A (en) Sound generation apparatus, sound reproducing apparatus, sound generation method, and sound signal processing program
CN115938376A (en) Processing apparatus and processing method
CN116367050A (en) Method for processing audio signal, storage medium, electronic device, and audio device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant