CN117641223A - Method and device for realizing stereo surround sound effect, electronic equipment and storage medium - Google Patents

Method and device for realizing stereo surround sound effect, electronic equipment and storage medium Download PDF

Info

Publication number
CN117641223A
CN117641223A CN202410009799.4A CN202410009799A CN117641223A CN 117641223 A CN117641223 A CN 117641223A CN 202410009799 A CN202410009799 A CN 202410009799A CN 117641223 A CN117641223 A CN 117641223A
Authority
CN
China
Prior art keywords
channel
audio
stereo
target
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410009799.4A
Other languages
Chinese (zh)
Inventor
戚成杰
陈小波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Original Assignee
Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Happly Sunshine Interactive Entertainment Media Co Ltd filed Critical Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Priority to CN202410009799.4A priority Critical patent/CN117641223A/en
Publication of CN117641223A publication Critical patent/CN117641223A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stereophonic System (AREA)

Abstract

The application discloses a method and a device for realizing stereo surround sound effect, electronic equipment and a storage medium, wherein the method comprises the following steps: decoding the stereo audio to be processed to obtain PCM data of the stereo audio to be processed; filtering PCM data of the stereo audio to be processed to obtain ultralow frequency signals; performing sound source separation on PCM data of the stereo audio to be processed to obtain a plurality of target sound source data; respectively placing each target sound source data of the stereo audio to be processed on each target stereo channel of the target surround channel, and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel; the rotation speed and the direction of each target stereo channel are controlled through a self-phase algorithm respectively; performing down-mixing rendering on the virtual target sound channel by adopting a data set of the target sound effect database to obtain the current restored stereo audio; and carrying out dynamic loudness balance processing and loudness limiting processing on the current restored stereo audio, and outputting the current restored stereo audio.

Description

Method and device for realizing stereo surround sound effect, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to a method and apparatus for implementing stereo surround sound, an electronic device, and a storage medium.
Background
The 360-degree surround sound effect is a surround sound effect based on three-dimensional sound effect, and particularly, based on the original three-dimensional sound characteristic, the 360-degree surround sound effect is generated by an automatic phase technology, and the surround sound effect is a sound effect of 360-degree left-to-right rotation effect.
The current specific mode for realizing 360-degree surround sound effect is to acquire three-dimensional stereo audio data, then to process the whole stereo audio data through an automatic phase algorithm, namely to change the amplitude difference of left and right channels of the whole stereo audio data through an automatic phase mode so as to generate sound pressure difference in two ears, thereby simulating the sound rotation effect and realizing 360-degree surround sound effect.
However, since stereo data is usually obtained by mixing multiple sound sources such as human voice and background voice, the existing method of directly processing the whole channel can cause unbalanced information of different channels, so that noise and tooth sound are caused, and further the problem of poor hearing effect is caused.
Disclosure of Invention
Based on the defects of the prior art, the application provides a method and a device for realizing stereo surround sound effect, electronic equipment and a storage medium, so as to solve the problem that noise and tooth sound exist in sound effect produced by the prior art, and the hearing effect is poor.
In order to achieve the above object, the present application provides the following technical solutions:
the first party of the application provides a method for realizing stereo surround sound effect, which comprises the following steps:
decoding the stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed;
filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed;
performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed;
respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of a target surround channel, and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel;
according to the initial phase angle and trigonometric function corresponding to each target stereo channel, controlling the rotation speed and direction of each target stereo channel through a self-phase algorithm;
respectively carrying out down-mixing rendering on the virtual target sound channels by adopting a data set of a target sound effect database based on the directions of all the sound channels of the virtual target surrounding sound channels to obtain current restored stereo audio;
Performing dynamic loudness balancing processing on the current restored stereo audio to obtain balanced stereo audio;
and outputting the balanced stereo audio after the loudness limiting processing.
Optionally, in the method for implementing stereo surround sound, the filtering processing is performed on PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed, including:
creating a high-pass filter and a low-pass filter;
intercepting an audio signal in a first target frequency in the PCM data of the stereo audio to be processed by using the low-pass filter;
and cutting off the audio signal lower than the second target frequency in the intercepted audio signals by using the high-pass filter to obtain the ultralow frequency signal of the stereo audio to be processed.
Optionally, in the above method for implementing stereo surround sound, the target sound source data of the stereo audio to be processed includes singing, background sound, bass and drum sound, and the placing the target sound source data of the stereo audio to be processed on the target stereo channels of the target surround channels respectively, and combining the ultra-low frequency signals of the stereo audio to be processed to form a virtual target surround channel includes:
Placing the singing in a middle channel, placing the drum sound in a front left channel, placing the bass in a front right channel, placing the background sound in a left surround channel and a right surround channel respectively, and taking an ultralow frequency signal of the stereo audio to be processed as a bass channel to form the virtual target surround channel; wherein the center channel is located right in front; the front left sound channel is positioned in the left ear parallel direction; the front right sound channel is positioned in the parallel direction of the right ear; the left surround channel is positioned at the left rear of the left ear; the right surround channel is located at the left rear of the right ear.
Optionally, in the above method for implementing stereo surround sound, the controlling, by a self-phase algorithm, a rotation speed and a direction of each target stereo channel according to an initial phase angle and a trigonometric function corresponding to each target stereo channel includes:
according to the initial phase of 45 degrees and the period of the first time length, controlling the left channel of the middle channel by utilizing a sine function and controlling the right channel of the middle channel by utilizing a cosine function;
according to the initial phase of 0 degrees and the period of the second time length, controlling the left channel of the front left channel by utilizing a sine function and controlling the right channel of the front left channel by utilizing a cosine function; wherein the second time length is half of the first time length;
The left channel of the front right channel is controlled with a cosine function and the right channel of the front right channel is controlled with a positive function according to an initial phase of 90 degrees and a period of the second time length.
Optionally, in the above method for implementing stereo surround sound, the performing downmixing rendering on the virtual target channel by using a data set of a target sound database based on directions of the channels of the virtual target surround channel respectively to obtain current restored stereo audio includes:
respectively carrying out loudness calculation on the current left channel audio data and the current right channel audio data of each target stereo channel based on the current direction of each target stereo channel to obtain left channel loudness audio data and right channel loudness audio data of each target stereo channel;
windowing each channel of the virtual target surround channel;
acquiring a plurality of data sets from the target sound effect database; wherein each of the data sets corresponds to an angular range in all directions except the first data set;
summing the convolution value of the initial left channel audio data of the middle channel and the first data set, the convolution value of the left channel loudness audio data of each target stereo channel and the data set currently corresponding to the convolution value, and the left channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the left channel restored audio of the current restored stereo audio; wherein the dataset corresponding to the current direction of the target stereo channel refers to the dataset corresponding to the angle range in which the current direction of the target stereo channel is located;
And summing the convolution value of the initial right channel audio data of the middle channel and the first data set, the convolution value of the right channel loudness audio data of each target stereo channel and the data set currently corresponding to the convolution value, and the right channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the right channel restored audio of the current restored stereo audio.
Optionally, in the method for implementing stereo surround sound effect, the dynamically loudness balancing processing is performed on the current restored stereo audio to obtain balanced stereo audio, including:
calculating half of the sum of the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current middle-set audio signal, and calculating half of the difference between the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current side-chain audio signal;
if the ratio of the average amplitude of the current mid-set audio signal to the average amplitude of the side-chain audio signal is larger than a first threshold, the level of the current side-chain audio signal is increased by a preset decibel;
if the ratio of the average amplitude of the current mid-set audio signal to the average amplitude of the side-chain audio signal is smaller than a second threshold, and the average amplitude of the mid-set audio signal and the average amplitude of the side-chain audio signal are not smaller than a preset minimum value, the level of the current mid-set audio signal is raised by a preset decibel; wherein the preset minimum value is close to zero;
If the average amplitude of the mid-set audio signal or the average amplitude of the side-chain audio signal is smaller than a preset minimum value, reducing the current mid-set audio signal by half of the power spectrum, and lifting the current side-chain audio signal by half of the power spectrum;
and adding the final current side-chain audio signal to the final current middle-set audio signal to obtain right channel audio data of the balanced stereo audio, and subtracting the final current side-chain audio signal from the final current middle-set audio signal to obtain left channel audio data of the balanced stereo audio.
Optionally, in the method for implementing stereo surround sound, the outputting the balanced stereo audio after the loudness limiting processing includes:
detecting an audio signal which is not in a loudness range in the balanced stereo audio;
setting the detected audio signal greater than the upper limit of the loudness range as the upper limit of the loudness range, and setting the detected audio signal less than the lower limit of the loudness range as the lower limit of the loudness range;
outputting the processed balanced stereo audio.
A second aspect of the present application provides a stereoscopic surround sound effect implementation apparatus, including:
The decoding unit is used for decoding the stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed;
the filtering unit is used for filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed;
the separation unit is used for performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed;
the building unit is used for respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of the target surround channel and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel;
the control unit is used for controlling the rotation speed and the direction of each target stereo channel through a self-phase algorithm according to the initial phase angle and the trigonometric function corresponding to each target stereo channel;
the downmix unit is used for performing downmix rendering on the virtual target sound channels by adopting a data set of a target sound effect database based on the directions of all sound channels of the virtual target surround sound channels respectively to obtain current restored stereo audio;
The balance processing unit is used for carrying out dynamic loudness balance processing on the current restored stereo audio to obtain balanced stereo audio;
and the loudness control unit is used for outputting the balanced stereo audio after the loudness limiting processing.
Optionally, in the above apparatus for realizing stereo surround sound, the filtering unit includes:
a creation unit for creating a high-pass filter to mention a low-pass filter;
a low-pass filter unit for intercepting an audio signal in a first target frequency in the PCM data of the stereo audio to be processed by using the low-pass filter;
and the high-pass filter unit is used for cutting off the audio signal lower than the second target frequency in the intercepted audio signal by using the high-pass filter to obtain the ultralow frequency signal of the stereo audio to be processed.
Optionally, in the above apparatus for realizing a stereo surround sound effect, the target sound source data of the stereo audio to be processed includes a singing, a background sound, a bass, and a drumbeat, and the building unit includes:
the sound source placing unit is used for placing the singing in a middle sound channel, placing the drum sound in a front left sound channel, placing the bass in a front right sound channel, respectively placing the background sound in a left surrounding sound channel and a right surrounding sound channel, and forming the virtual target surrounding sound channel by taking an ultralow frequency signal of the stereo audio to be processed as a bass sound channel; wherein the center channel is located right in front; the front left sound channel is positioned in the left ear parallel direction; the front right sound channel is positioned in the parallel direction of the right ear; the left surround channel is positioned at the left rear of the left ear; the right surround channel is located at the left rear of the right ear.
Optionally, in the above apparatus for realizing a stereo surround sound effect, the control unit includes:
a first control unit, configured to control a left channel of the center channel with a sine function and control a right channel of the center channel with a cosine function according to an initial phase of 45 degrees and a period of a first time length;
a second control unit for controlling the left channel of the front left channel by using a sine function and controlling the right channel of the front left channel by using a cosine function according to an initial phase of 0 degrees and a period of a second time length; wherein the second time length is half of the first time length;
and the third control unit is used for controlling the left channel of the front right channel by using a cosine function and controlling the right channel of the front right channel by using a positive function according to the initial phase of 90 degrees and the period of the second time length.
Optionally, in the above apparatus for realizing stereo surround sound, the downmix unit includes:
the loudness calculation unit is used for calculating the loudness of the current left channel audio data and the current right channel audio data of each target stereo channel based on the current direction of each target stereo channel respectively to obtain the left channel loudness audio data and the right channel loudness audio data of each target stereo channel;
A windowing processing unit, configured to perform windowing processing on each channel of the virtual target surround channel;
a data acquisition unit for acquiring a plurality of data sets from the target sound effect database; wherein each of the data sets corresponds to an angular range in all directions except the first data set;
a left channel down-mixing unit, configured to sum up the convolution value of the initial left channel audio data of the center channel and the first data set, the convolution value of the left channel loudness audio data of each target stereo channel and the data set currently corresponding to the data set, and the left channel audio data of the ultralow frequency signal of the stereo audio to be processed, to obtain a left channel restored audio of the current restored stereo audio; wherein the dataset corresponding to the current direction of the target stereo channel refers to the dataset corresponding to the angle range in which the current direction of the target stereo channel is located;
and the right channel down-mixing unit is used for summing the convolution value of the initial right channel audio data of the middle channel and the first data set, the convolution value of the right channel loudness audio data of each target stereo channel and the data set currently corresponding to the data set, and the right channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the right channel restored audio of the current restored stereo audio.
Optionally, in the above apparatus for realizing stereo surround sound, the balance processing unit includes:
the first calculation unit is used for calculating half of the sum of the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current middle-set audio signal, and calculating half of the difference between the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current side-chain audio signal;
the first lifting unit is used for lifting the level of the current side-chain audio signal by a preset decibel when the ratio of the average amplitude of the current middle-set audio signal to the average amplitude of the side-chain audio signal is larger than a first threshold value;
the second lifting unit is used for lifting the level of the current middle-set audio signal by a preset decibel when the ratio of the average amplitude of the current middle-set audio signal to the average amplitude of the side-chain audio signal is smaller than a second threshold value and the average amplitude of the middle-set audio signal and the average amplitude of the side-chain audio signal are not smaller than a preset minimum value; wherein the preset minimum value is close to zero;
the adjusting unit is used for reducing the current mid-set audio signal by half of the power spectrum and improving the current side-chain audio signal by half of the power spectrum when the average amplitude of the mid-set audio signal or the average amplitude of the side-chain audio signal is smaller than a preset minimum value;
And the second calculation unit is used for adding the final current middle-set audio signal with the final current side-chain audio signal to obtain right channel audio data of the balanced stereo audio, and subtracting the final current side-chain audio signal from the final current middle-set audio signal to obtain left channel audio data of the balanced stereo audio.
Optionally, in the aforementioned stereo surround sound effect implementation device, the loudness control unit includes:
the detection unit is used for detecting the audio signals which are not in the loudness range in the balanced stereo audio;
a loudness processing unit configured to set the detected audio signal greater than the upper limit value of the loudness range as the upper limit value of the loudness range, and set the detected audio signal less than the lower limit value of the loudness range as the lower limit value of the loudness range;
and the output unit is used for outputting the processed balanced stereo audio.
A third aspect of the present application provides an electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, where the program is executed, and specifically configured to implement the method for implementing the stereo surround sound effect according to any one of the foregoing.
A fourth aspect of the present application provides a computer storage medium storing a computer program for implementing a stereo surround sound effect implementation method according to any one of the above, when the computer program is executed.
The embodiment of the application provides a method for realizing stereo surround sound effect, which decodes stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed. And then filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed, and performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed, so that the stereo audio to be processed is decomposed into data of 6 sound sources. And then, respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of the target surround channels, and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel. And then respectively controlling the rotation speed and the direction of each target stereo channel through a self-phase algorithm according to the initial phase angle and the trigonometric function corresponding to each target stereo channel, respectively carrying out down-mixing rendering on the virtual target channel by adopting a data set of a target sound effect database based on the direction of each channel of the virtual target surround channel to obtain the current restored stereo audio, thereby realizing phase control on the sound source data of each channel, controlling the corresponding loudness, and then restoring the stereo channel. And finally, carrying out dynamic loudness balance processing on the current restored stereo audio to obtain balanced stereo audio, and outputting the balanced stereo audio after loudness limiting processing, so that the hearing effect is further ensured, the whole sound channel is not processed any more, but is separated into a plurality of sound sources, and the virtual target surrounding sound channel is simulated for processing, thereby avoiding the problems of unbalanced information of different sound channels, noise and tooth sound and further effectively improving the hearing effect.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flowchart of a method for implementing stereo surround sound according to an embodiment of the present application;
fig. 2 is a flowchart of a method for filtering stereo audio to be processed according to an embodiment of the present application;
fig. 3 is a flowchart of a method for performing downmix rendering on a virtual target channel according to an embodiment of the present application;
fig. 4 is a schematic architecture diagram of a stereoscopic surround sound implementation device according to an embodiment of the present application;
fig. 5 is a schematic architecture diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In this application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiment of the application provides a method for realizing stereo surround sound effect, as shown in fig. 1, comprising the following steps:
s101, decoding the stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed.
Wherein the pulse code modulation (Pulse Code Modulation, PCM) data is a bare stream of uncompressed audio sample data. It is standard digital audio data that is converted from analog signals by sampling, quantization, and encoding.
Since the stereo audio data to be processed is required to be processed later, it is required to be decoded. Alternatively, FFMPEG may be employed to decode the stereo audio to be processed. And in order to maintain the sampling rate of the stereo audio to be processed, decoding is performed at its own sampling rate.
It should be noted that, the stereo audio to be processed is stereo audio, so PCM data of a left channel and a right channel are included in PCM data obtained by corresponding parsing.
S102, filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed.
It should be noted that, in the embodiment of the present application, by constructing the stereo audio to be processed into a virtual target surround channel, the low-frequency signal belongs to the virtual target surround channel, and the ultra-low-frequency signal of the stereo audio to be processed needs to be obtained. In the embodiment of the present application, the target surround channel is mainly illustrated as 5.1, but is not limited to 5.1 channels, and may be other surround channels.
Specifically, the 5.1 channels include a center channel, a front left channel, a front right channel, a surround channel, and a so-called 0.1 channel subwoofer channel. The front channels are stereo channels where normal audio is placed, while the last 0.1 channel is a heavy bass, requiring placement of bass audio. Therefore, in the embodiment of the application, the PCM data of the stereo audio to be processed is filtered, so as to obtain the ultralow frequency signal of the stereo audio to be processed, and thus the LFE audio signal of the stereo audio to be processed is obtained.
Alternatively, the PCM data of the stereo audio to be processed may be subjected to filtering processing by a filter.
Optionally, in another embodiment of the present application, a specific implementation of step S102, as shown in fig. 2, includes the following steps:
s201, creating a high-pass filter and a low-pass filter.
It should be noted that, the filter formula of the IIR filter for audio is as follows:
wherein b < 0 >, b < 1 >, b < 2 >, a < 0 >, a < 1 >, a < 2 > are parameters of second-order IIR filtering; y n is the data of a certain sampling point in the original audio PCM, which satisfies that n is more than or equal to 2 and less than or equal to len (y) -2; len (y) represents the total number of audio sampling points for a mono channel.
Specifically, the design of the high-pass filter is as follows:
w 0 =2*π*f low /fs
alpha=0.5*sin(w 0 )*f low /Q
b[0]=(1+cos(w 0 ))/2
b[1]=-1-cos(w 0 )
b[2]=(1+cos(w 0 ))/2
a[0]=1+alpha
a[1]=-2*cos(w 0 )
a[2]=1-alpha
the design of the low pass filter is as follows:
w 0 =2*π*f high /fs
alpha=0.5*sin(w 0 )*f high /Q
b[0]=(1-cos(w 0 ))/2
b[1]=-1-cos(w 0 )
b[2]=(1-cos(w 0 ))/2
a[0]=1+alpha
a[1]=-2*cos(w 0 )
a[2]=1-alpha
wherein f low For a first target frequency, typically 120Hz; f (f) high For a second target frequency, typically 30hz, q is 1, representing the frequency width over which the audio center frequency acts; fs is the sampling frequency.
In the present example, therefore, the bass sound obtained is 30hz to 120hz. Of course, other frequencies are possible, and specifically, the first target frequency and the second target frequency may be set.
S202, intercepting an audio signal in a first target frequency in PCM data of the stereo audio to be processed by using a low-pass filter.
S203, cutting off the audio signal lower than the second target frequency in the intercepted audio signals by utilizing a high-pass filter to obtain the ultralow frequency signal of the stereo audio to be processed.
S103, performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed.
It should be noted that in the embodiment of the present application, instead of integrally processing stereo audio to be processed, audio sources are separated, and audio data of each audio source are respectively placed in different channels, so as to construct virtual 5.1 channels, so as to respectively process each channel, and realize separate processing of audio data of each audio source. For example, PCM data of stereo audio to be processed may be separated into four sound source data of singing, background sound, bass, and drumbeat.
Alternatively, specifically, the PCM data of the stereo audio to be processed may be subjected to sound source separation using an AI sound source separation technique.
S104, respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of the target surround channels, and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel.
It should be noted that, since a plurality of sound source data are obtained and then each sound source data needs to be processed separately, each sound source data is placed separately in a target surround channel containing a plurality of target stereo channels, and an ultralow frequency signal of the stereo audio to be processed is also placed as a sound source channel, thereby forming a virtual target surround channel.
Alternatively, in the embodiment of the present application, the target surround channel is 5.1 channels, and then the target stereo channel refers to the remaining 5 channels of the 5.1 channels except the bass channel. The ultra-low frequency signal of the stereo audio to be processed must be on the bass channel. Only the sound source data needs to be set for the remaining respective target stereo channels accordingly, one target surround channel can be formed. In this embodiment, therefore, the target audio source data of the stereo audio to be processed are respectively placed on the five target stereo channels of the target surround channels, so as to form virtual target surround channels corresponding to the stereo audio to be processed, which are specifically denoted as singing y respectively c Drumbeat y fl Basy y fr Background sound y sl And y sr Ultra low frequency signal y lfe
It should be noted that, the stereo audio to be processed does not have six channels of the target surround channel, and there are also speakers corresponding to each channel for playing, and then the stereo audio needs to be restored, so that a virtual 5.1 channel is constructed, which is convenient for subsequent processing.
S105, controlling the rotation speed and the direction of each target stereo channel through a self-phase algorithm according to the initial phase angle and the trigonometric function corresponding to each target stereo channel.
It should be noted that, in the embodiment of the present application, the stereo audio to be processed is subjected to audio source separation and is set in the corresponding audio channel, so that each audio source data can be respectively subjected to self-phase control, and the whole stereo audio is not directly processed any more. Therefore, each target stereo channel is independently controlled by adopting corresponding triangular control, and the loudness of the left ear and the right ear can be balanced from the beginning, so that the corresponding initial phase angle is also set.
S106, respectively carrying out down-mixing rendering on the virtual target channels by adopting a data set of the target sound effect database based on the directions of all channels of the virtual target surround channels to obtain the current restored stereo audio.
The target sound effect database is a designated sound effect database. For example, sound effect databases such as HRTF, MARL, multiple Distances, etc. may be employed. In the embodiment of the present application, the HRTF database is mainly used for description, but is not limited to only using the database for performing the downmix rendering.
Since the constructed target surround channel is virtual, it is eventually to be played or needs to be restored to one stereo audio containing only the left and right channels, and thus it is also necessary to perform downmix rendering on the respective channels.
In order to simulate the distance of each sound source and the position transformation of each sound source, in the embodiment of the application, the direction of each channel of the target surround channel needs to be considered when down-mixing and rendering each channel of the virtual target channel. The corresponding data sets in the target sound effect database are adopted according to the directions of the sound channels, so that the audio effects in different directions can be obtained.
S107, dynamic loudness balancing processing is carried out on the current restored stereo audio to obtain balanced stereo audio.
In order to effectively smooth the loudness of the singing sound source and the loudness of the environmental sound, the current important audio is ensured to be highlighted, and therefore, the loudness balance processing is also needed to be carried out on the restored stereo audio data.
The loudness balancing process may be performed specifically in a mid/side process.
S108, outputting the balanced stereo audio after loudness limiting processing.
Finally, in order to enable each signal to be in the loudness range, the influence of the signals with excessive or small loudness on the hearing effect is avoided, and therefore, the signals are finally output after the loudness limiting processing is carried out.
Optionally, in another embodiment of the present application, a specific implementation of step S108 includes:
detecting an audio signal which is not in the loudness range in the balanced stereo audio, setting the detected audio signal which is larger than the upper limit value of the loudness range as the upper limit value of the loudness range, setting the detected audio signal which is smaller than the lower limit value of the loudness range as the lower limit value of the loudness range, and finally outputting the processed balanced stereo audio.
Optionally, in another embodiment of the present application, the target audio source data of the stereo audio to be processed includes singing, background sound, bass, and drumbeats. Accordingly, in the embodiment of the present application, a specific implementation manner of step S104 includes:
the method comprises the steps of placing a singing in a middle channel, placing a drumbeat in a front left channel, placing a bass in a front right channel, placing a background sound in a left surround channel and a right surround channel respectively, and forming a virtual target surround channel by taking an ultralow frequency signal of stereo audio to be processed as a bass channel.
Optionally, in order to avoid the background noise from being too loud, the background noise can be attenuated according to preset decibels and then placed in the left surround channel and the right surround channel.
It should be noted that, the real 5.1 channel is supported by its corresponding speakers, so the direction positions of the center channel, the front left channel, the front right channel, the left surround channel, and the right surround channel are: (0 °, 30), (0 ° ), (0 °,60 °), (0 °,110 °), (0 °,220 °).
In the implementation of the present application, a virtual 5.1 channel is constructed, which has no corresponding speaker support, and needs to be mixed down to stereo later, so that in order to set the initial phase information and at the initial moment, the basic left-right ear loudness balance can be ensured, and thus the specific center channel is located right in front, i.e., (0, 90 °). The front left channel is located in the left ear parallel direction, i.e., (0 ° ). The front right channel is located in the right ear parallel direction, i.e., (0, -180). The front left and front right sound sources are maintained on a horizontal line so as to ensure that only the sound of the corresponding sound source can be heard by a single ear at the initial time. The left surround channel is located at the left rear of the left ear, specifically (-30 °, -60 °). The right surround channel is located at the left rear of the right ear, specifically (-150 °, -120 °).
Accordingly, in the implementation of the present application, a specific embodiment of step S105 includes the following steps:
The left channel of the center channel is controlled by a sine function and the right channel of the center channel is controlled by a cosine function according to an initial phase of 45 degrees and a period of a first time length.
The left channel of the front left channel is controlled with a sine function and the right channel of the front left channel is controlled with a cosine function according to an initial phase of 0 degrees and a period of a second time length. Wherein the second time length is half of the first time length.
The left channel of the front right channel is controlled with a cosine function and the right channel of the front right channel is controlled with a positive function according to the initial phase of 90 degrees and the period of the second time length.
It should be noted that each channel in the 5.1 channels is a stereo channel, that is, each channel includes a left channel and a right channel. In the embodiment of the present application, only the center channel, the front left channel, and the front right channel are phase-shifted, because the center channel, the front left channel, and the front right channel are audio sources that mainly produce surround sound effects.
Specifically, the first time period may be 10 seconds. In order to ensure that the volume of the left and right ear singing is consistent at the initial moment, the effect of right to left and then left to right is generated later, so that the initial phase of the middle sound channel is 45 degrees. And the left channel of the front right channel is controlled by a cosine function and the right channel of the front right channel is controlled by a sine function.
For the front left channel, a period of 5 seconds is set, and in order to hear a drum sound only in the left ear at an initial time, an effect of first left to right and then right to left is reproduced, so its initial phase is set to 0, and the left channel of the front right channel is controlled by a cosine function and the right channel of the front right channel is controlled by a positive function.
In order to make it possible to hear a bass only in the left ear at the initial time, the front right channel has an initial phase set to 90 degrees, and the front right channel has a left channel controlled by a cosine function and the front right channel has a right channel controlled by a positive function.
From the above, the human voice and bass are from right to left first, and the drumbeat is from left to right first, so when the automatic phase starts, the right ear is the human voice plus bass, and the left ear is only the first drumbeat. This is designed so that the drumbeat is high in loudness and Bei Sisheng is low in loudness and the human ear, while perceivable, does not mask the human voice and does not vary much from overall loudness. On the other hand, through binaural rendering, the right and left ears are actually a combination of human sound, bass, drum sound and background sound, but the duty ratio of the loudness changes with time. Bass and drumbeats can maintain a relatively balanced sound source position in space because of the same period. In the two periods of the drum sound and the bass, the singing is only carried out for one period, so that the loudness superposition in the same direction with the bass and the drum sound can be carried out once.
Accordingly, in the embodiment of the present application, a specific implementation manner of step S106, as shown in fig. 3, includes the following steps:
s301, respectively carrying out loudness calculation on current left channel audio data and current right channel audio data of each target stereo channel based on the current direction of each target stereo channel to obtain left channel loudness audio data and right channel loudness audio data of each target stereo channel.
In the existing rendering and downmixing process, each audio data to be downmixed is directly utilized to convolve with a data set, and then accumulated.
However, in the embodiment of the present application, the change of loudness along with the phase is increased, so that it is also necessary to calculate the loudness audio data first.
It should also be noted that, because of the fixed left and right surround channels, only one calculation is required for both channels. The directions of the singing, drumming, and bass placed in the center channel, the front left channel, and the front right channel are not constantly changed. Therefore, when the phases of the three sound sources are changed, the loudness heard by the left ear and the right ear is also changed continuously, so that the loudness audio data of the three sound sources need to be calculated correspondingly.
Alternatively, the initial loudness heard by the left and right ears setting the three points may be (3 db,0 db), (3 db ), (0 db,3 db) with the drummer, singing, bass three points as three fixed points, respectively. The corresponding change in loudness audio data of the drum sound of the front left channel is calculated as follows:
the clear loudness audio data of the center channel is calculated as follows:
the loudness audio data of the bass of the front right channel is calculated as follows:
wherein the parameter db is equal to 3db, and T is the period of drumbeat and bass loudness; y is i_pan_r Current right channel audio data for target stereo channel i; y is i_pan_l Current left channel audio data for target stereo channel i; y is i_pan_db_r Right channel loudness audio data for target stereo channel i; y is i_pan_db_l Left channel loudness audio data for target stereo channel i; the target stereo channel i includes a front left channel f l Front right channel f r And a center channel c.
S302, windowing is carried out on each channel of the virtual target surrounding channels.
In order to ensure that the number of sampling points for each channel remains consistent, a windowing process is required. Wherein, in the processing, the supplemental data may be unified in quantity, and the supplemental location may be unified as the original signal end.
S303, acquiring a plurality of data sets from the target sound effect database.
Wherein each data set corresponds to one of the all-around angular ranges except the first data set, i.e. in order to adapt the downmixing rendering effect to the azimuth direction of the channel, the data set corresponding to each angular range is selected for processing the channel within that angular range. While the first data set is fixed for processing the singing to ensure that there is a clear singing continuously.
Alternatively, when an HRIR database is employed, a full library in the MIT dataset may be disclosed to read the HRIR data.
Optionally, in the embodiment of the present application, the omni-bearing is divided according to 45 degrees, that is, each angle range is 45 degrees, so in the embodiment of the present application, 9 data sets are selected. The method specifically comprises the following steps: elev-10, elev0, elev10, elev20, elev-20, elev30, elev-30, elev40 and elev-40. Wherein, elev0 is the first data set, each of the other data sets corresponds to an angle range of 45 degrees, specifically, 0 ° to 45 ° corresponds to elev10, 45 ° to 90 ° corresponds to elev20, 90 ° to 135 ° corresponds to elev30, 135 ° to 180 ° corresponds to elev40, and so on to obtain the angle access corresponding to each data set.
The relationship of the data set angles can be expressed specifically as:
s304, the convolution value of the initial left channel audio data of the middle channel and the first data set, the convolution value of the left channel loudness audio data of each target stereo channel and the current corresponding data set thereof, and the left channel audio data of the ultralow frequency signal of the stereo audio to be processed are summed to obtain the left channel restored audio of the current restored stereo audio.
Wherein, the data set corresponding to the current direction of the target stereo channel refers to the data set corresponding to the angle range of the current direction of the target stereo channel, and the directions of the surrounding left channel and the surrounding right channel are fixed, so the data sets corresponding to the angle range of the current direction are fixed, namely elev-10 and elev-40 respectively. And the convolution value of the loudness audio data of the target stereo channel and the current corresponding data set is specifically obtained by selecting one audio from the corresponding data set and performing time domain convolution with the loudness audio data of the target stereo channel.
In order to maintain the singing loudness, the initial left channel audio data of the center channel is convolved with the first data set, and then the rotary-rendered singing is projected straight ahead.
Specifically, a specific downmix formula of the left channel restored audio of the current restored stereo audio may be as follows:
wherein cos (pi T/T+pi/4) is a projection calculation coefficient of singing in front of the pair of different angles; y is c_l Initial left channel audio data for the center channel; y is lfe_l Left channel audio data of ultralow frequency signals of the stereo audio to be processed; y is sl_pan_db_l And y sr_pan_db_l Left channel loudness audio data for the surround left channel and surround right channel, respectively.
Therefore, in the embodiment of the application, the audio can be played, so that the automatic phase rotation and the azimuth loudness of the audio database are matched, and the azimuth change and the loudness of the audio content which can be heard by the left ear and the right ear in different time are adjusted.
And S305, summing the convolution value of the initial right channel audio data of the middle channel and the first data set, the convolution value of the right channel loudness audio data of each target stereo channel and the current corresponding data set thereof, and the right channel audio data of the ultralow frequency signal of the stereo audio to be processed, and obtaining the right channel restored audio of the current restored stereo audio.
Similarly, a specific downmix formula for the right channel restored audio of the current restored stereo audio may be as follows:
Wherein y is c_r Initial right channel audio data for the center channel; y is lfe_r Right channel audio data of an ultralow frequency signal of the stereo audio to be processed; y is sl_pan_db_r And y sr_pan_db_r Right channel loudness audio data for the surround left channel and the surround right channel, respectively.
Accordingly, in the embodiment of the present application, a specific implementation manner of step S107 includes:
and calculating half of the sum of the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current middle-set audio signal, and calculating half of the difference between the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current side-chain audio signal.
Therefore, specifically, the calculation formulas of the mid-audio signal and the side-chain audio signal are respectively:
y mid =0.5*(y r +y l )
y side =0.5*(y r -y l )
in the conventional balancing process, the sum of the current mid-audio signal and the current side-chain audio signal and the difference between the two signals are directly used as the final output. In the embodiment of the application, however, the final output is determined by further appropriately adjusting the average amplitude of the two.
Specifically, if the ratio of the average amplitude of the current mid-set audio signal to the average amplitude of the side-chain audio signal is greater than a first threshold, the level of the current side-chain audio signal is increased by a preset decibel.
If the ratio of the average amplitude of the current mid-set audio signal to the average amplitude of the side-chain audio signal is smaller than the second threshold, and the average amplitude of the mid-set audio signal and the average amplitude of the side-chain audio signal are not smaller than the preset minimum value, the level of the current mid-set audio signal is increased by a preset decibel.
Wherein the preset minimum value is close to zero, in the embodiment of the application, the preset minimum value is set to 10 3/20
If the average amplitude of the mid-set audio signal or the average amplitude of the side-chain audio signal is smaller than the preset minimum value, the current mid-set audio signal is reduced by half of the power spectrum, and the current side-chain audio signal is lifted by half of the power spectrum.
And adding the final current side-chain audio signal to the final current middle-set audio signal to obtain right channel audio data of the balanced stereo audio, and subtracting the final current side-chain audio signal from the final current middle-set audio signal to obtain left channel audio data of the balanced stereo audio.
Optionally, in this embodiment of the present application, the first threshold may be set to 4 times, the second threshold is set to 2 times, and the preset db is set to 3 db, when the average amplitude of the current mid-audio signal is more than 4 times that of the side-chain audio signal, the level of the current side-chain audio signal needs to be raised by 3 db, and the current mid-audio signal is not adjusted, so the calculation mode of the balanced stereo audio at this time may be expressed as:
y l ′=y mid +y side *10 3/20
y r ′=y mid *10 3/20 -y side
The average amplitude of the current mid-set audio signal is within 2 times of the average amplitude of the side-chain audio signal, the level of the current mid-set audio signal is improved by 3 db, the current side-chain audio signal is kept unchanged, and the calculation mode of the balanced stereo audio can be expressed as follows:
y l ′=y mid *10 3/20 +y side
y r ′=y mid *10 3/20 -y side
if the average amplitude of the current mid-audio signal or the average amplitude of the side-chain audio signal is smaller than the preset minimum value, the current mid-audio signal is reduced by half of the power spectrum, and the current side-chain audio signal is lifted by half of the power spectrum, and at the moment, the calculation mode of the balanced stereo audio can be expressed as follows:
if the average amplitude of the current mid-set audio signal and the average amplitude of the side-chain audio signal do not meet the conditions, the current mid-set audio signal and the current side-chain audio signal are directly added and subtracted, and then the balanced stereo audio can be obtained.
The embodiment of the application provides a method for realizing stereo surround sound effect, which decodes stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed. And then filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed, and performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed, so that the stereo audio to be processed is decomposed into data of 6 sound sources. And then, respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of the target surround channels, and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel. And then respectively controlling the rotation speed and the direction of each target stereo channel through a self-phase algorithm according to the initial phase angle and the trigonometric function corresponding to each target stereo channel, respectively carrying out down-mixing rendering on the virtual target channel by adopting a data set of a target sound effect database based on the direction of each channel of the virtual target surround channel to obtain the current restored stereo audio, thereby realizing phase control on the sound source data of each channel, controlling the corresponding loudness, and then restoring the stereo channel. And finally, carrying out dynamic loudness balance processing on the current restored stereo audio to obtain balanced stereo audio, and outputting the balanced stereo audio after loudness limiting processing, so that the hearing effect is further ensured, the whole sound channel is not processed any more, but is separated into a plurality of sound sources, and the virtual target surrounding sound channel is simulated for processing, thereby avoiding the problems of unbalanced information of different sound channels, noise and tooth sound and further effectively improving the hearing effect.
Another embodiment of the present application provides a stereo surround sound effect implementation device, as shown in fig. 4, including:
the decoding unit 401 is configured to decode the stereo audio to be processed according to the sampling rate of the stereo audio to be processed, so as to obtain PCM data of the stereo audio to be processed.
The filtering unit 402 is configured to perform filtering processing on PCM data of the stereo audio to be processed, so as to obtain an ultralow frequency signal of the stereo audio to be processed.
And the separation unit 403 is configured to perform audio source separation on PCM data of the stereo audio to be processed, so as to obtain a plurality of target audio source data of the stereo audio to be processed.
The component unit 404 is configured to place each target audio source data of the stereo audio to be processed onto each target stereo channel of the target surround channel, and combine the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel.
The control unit 405 is configured to control the rotation speed and direction of each target stereo channel according to the initial phase angle and trigonometric function corresponding to each target stereo channel through a self-phase algorithm.
The downmix unit 406 is configured to perform downmix rendering on the virtual target channels by using the data set of the target sound effect database based on the directions of the channels of the virtual target surround channels, to obtain the current restored stereo audio.
The balance processing unit 407 is configured to perform dynamic loudness balance processing on the current restored stereo audio to obtain balanced stereo audio.
And the loudness control unit 408 is used for outputting the balanced stereo audio after the loudness limiting process.
Optionally, in the stereo surround sound implementation apparatus provided in another embodiment of the present application, the filtering unit includes:
a creation unit for creating a high-pass filter referring to a low-pass filter.
And the low-pass filter unit is used for intercepting an audio signal in a first target frequency in the PCM data of the stereo audio to be processed by using the low-pass filter.
And the high-pass filter unit is used for cutting off the audio signal lower than the second target frequency in the intercepted audio signal by utilizing the high-pass filter to obtain an ultralow frequency signal of the stereo audio to be processed.
Optionally, in the stereo surround sound implementation apparatus provided in another embodiment of the present application, each target sound source data of stereo audio to be processed includes a singing, a background sound, a bass, and a drumbeat, and the building unit includes:
and the sound source placing unit is used for placing a singing on the middle channel, placing a drum sound on the front left channel, placing a bass on the front right channel, placing a background sound on the left surrounding channel and the right surrounding channel respectively, and taking an ultralow frequency signal of the stereo audio to be processed as a bass channel to form a virtual target surrounding channel. Wherein the center channel is located directly in front. The front left channel is located in the left ear parallel direction. The front right channel is located in the right ear parallel direction. The left surround channel is located at the left rear of the left ear. The right surround channel is located at the left rear of the right ear.
Optionally, in the stereo surround sound implementation device provided in another embodiment of the present application, the control unit includes:
and the first control unit is used for controlling the left channel of the center channel by utilizing a sine function and controlling the right channel of the center channel by utilizing a cosine function according to the initial phase of 45 degrees and the period of the first time length.
And the second control unit is used for controlling the left channel of the front left channel by utilizing a sine function and controlling the right channel of the front left channel by utilizing a cosine function according to the initial phase of 0 degrees and the period of the second time length. Wherein the second time length is half of the first time length.
And the third control unit is used for controlling the left channel of the front right channel by using a cosine function and controlling the right channel of the front right channel by using a positive function according to the initial phase of 90 degrees and the period of the second time length.
Optionally, in the stereo surround sound implementation apparatus provided in another embodiment of the present application, the downmix unit includes:
and the loudness calculation unit is used for calculating the loudness of the current left channel audio data and the current right channel audio data of each target stereo channel based on the current direction of each target stereo channel respectively to obtain the left channel loudness audio data and the right channel loudness audio data of each target stereo channel.
And the windowing processing unit is used for windowing each channel of the virtual target surround channel.
And the data acquisition unit is used for acquiring a plurality of data sets from the target sound effect database. Wherein each of the data sets corresponds to an angular range in an omni-direction except for the first data set.
And the left channel down-mixing unit is used for summing the convolution value of the initial left channel audio data of the middle channel and the first data set, the convolution value of the left channel loudness audio data of each target stereo channel and the current corresponding data set, and the left channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the left channel restored audio of the current restored stereo audio. The data set corresponding to the current direction of the target stereo channel refers to the data set corresponding to the angle range where the current direction of the target stereo channel is located.
And the right channel down-mixing unit is used for summing the convolution value of the initial right channel audio data of the middle channel and the first data set, the convolution value of the right channel loudness audio data of each target stereo channel and the current corresponding data set, and the right channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the right channel restored audio of the current restored stereo audio.
Optionally, in the stereo surround sound implementation device provided in another embodiment of the present application, the balance processing unit includes:
the first calculating unit is used for calculating half of the sum of the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current middle-set audio signal, and calculating half of the difference between the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current side-chain audio signal.
And the first lifting unit is used for lifting the level of the current side-chain audio signal by a preset decibel when the ratio of the average amplitude of the current middle-set audio signal to the average amplitude of the side-chain audio signal is larger than a first threshold value.
And the second lifting unit is used for lifting the level of the current middle-set audio signal by a preset decibel when the ratio of the average amplitude of the current middle-set audio signal to the average amplitude of the side-chain audio signal is smaller than a second threshold value and the average amplitude of the middle-set audio signal and the average amplitude of the side-chain audio signal are not smaller than a preset minimum value. Wherein the preset minimum value is close to zero.
And the adjusting unit is used for reducing the current mid-set audio signal by half of the power spectrum and improving the current side-chain audio signal by half of the power spectrum when the average amplitude of the mid-set audio signal or the average amplitude of the side-chain audio signal is smaller than a preset minimum value.
And the second calculation unit is used for adding the final current middle-set audio signal with the final current side-chain audio signal to obtain right channel audio data of the balanced stereo audio, and subtracting the final current side-chain audio signal from the final current middle-set audio signal to obtain left channel audio data of the balanced stereo audio.
Optionally, in the stereo surround sound implementation device provided in another embodiment of the present application, the loudness control unit includes:
and the detection unit is used for detecting the audio signals which are not in the loudness range in the balanced stereo audio.
And the loudness processing unit is used for setting the detected audio signal which is larger than the upper limit value of the loudness range as the upper limit value of the loudness range and setting the detected audio signal which is smaller than the lower limit value of the loudness range as the lower limit value of the loudness range.
And the output unit is used for outputting the processed balanced stereo audio.
It should be noted that, in the specific working process of each unit provided in the foregoing embodiment of the present application, reference may be correspondingly made to the implementation process of the corresponding step in the foregoing method embodiment, which is not repeated herein.
Another embodiment of the present application provides an electronic device, as shown in fig. 5, including:
A memory 501 and a processor 502.
Wherein the memory is used for storing programs.
The processor 502 is configured to execute a program stored in the memory 501, where the program is executed, and specifically configured to implement a method for implementing stereo surround sound according to any one of the embodiments.
Another embodiment of the present application provides a computer storage medium storing a computer program, where the computer program is executed to implement a method for implementing a stereo surround sound effect provided in any one of the foregoing embodiments.
Computer storage media, including both non-transitory and non-transitory, removable and non-removable media, may be implemented in any method or technology for storage of information. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The method for realizing the stereo surround sound effect is characterized by comprising the following steps:
decoding the stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed;
filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed;
performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed;
respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of a target surround channel, and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel;
according to the initial phase angle and trigonometric function corresponding to each target stereo channel, controlling the rotation speed and direction of each target stereo channel through a self-phase algorithm;
respectively carrying out down-mixing rendering on the virtual target sound channels by adopting a data set of a target sound effect database based on the directions of all the sound channels of the virtual target surrounding sound channels to obtain current restored stereo audio;
Performing dynamic loudness balancing processing on the current restored stereo audio to obtain balanced stereo audio;
and outputting the balanced stereo audio after the loudness limiting processing.
2. The method according to claim 1, wherein the filtering the PCM data of the stereo audio to be processed to obtain the ultra-low frequency signal of the stereo audio to be processed includes:
creating a high-pass filter and a low-pass filter;
intercepting an audio signal in a first target frequency in the PCM data of the stereo audio to be processed by using the low-pass filter;
and cutting off the audio signal lower than the second target frequency in the intercepted audio signals by using the high-pass filter to obtain the ultralow frequency signal of the stereo audio to be processed.
3. The method according to claim 1, wherein the target audio source data of the stereo audio to be processed includes singing, background sound, bass, and drumming, and the placing the target audio source data of the stereo audio to be processed onto the target stereo channels of the target surround channels, respectively, and forming the virtual target surround channels in combination with the ultra-low frequency signals of the stereo audio to be processed includes:
Placing the singing in a middle channel, placing the drum sound in a front left channel, placing the bass in a front right channel, placing the background sound in a left surround channel and a right surround channel respectively, and taking an ultralow frequency signal of the stereo audio to be processed as a bass channel to form the virtual target surround channel; wherein the center channel is located right in front; the front left sound channel is positioned in the left ear parallel direction; the front right sound channel is positioned in the parallel direction of the right ear; the left surround channel is positioned at the left rear of the left ear; the right surround channel is located at the left rear of the right ear.
4. The method according to claim 2, wherein the controlling the rotation speed and direction of each of the target stereo channels by the self-phase algorithm according to the initial phase angle and trigonometric function corresponding to each of the target stereo channels, respectively, comprises:
according to the initial phase of 45 degrees and the period of the first time length, controlling the left channel of the middle channel by utilizing a sine function and controlling the right channel of the middle channel by utilizing a cosine function;
according to the initial phase of 0 degrees and the period of the second time length, controlling the left channel of the front left channel by utilizing a sine function and controlling the right channel of the front left channel by utilizing a cosine function; wherein the second time length is half of the first time length;
The left channel of the front right channel is controlled with a cosine function and the right channel of the front right channel is controlled with a positive function according to an initial phase of 90 degrees and a period of the second time length.
5. A method according to claim 3, wherein the downmixing rendering of the virtual target channels with the data set of the target sound effect database based on the direction of each channel of the virtual target surround channels, respectively, to obtain the current restored stereo audio, comprises:
respectively carrying out loudness calculation on the current left channel audio data and the current right channel audio data of each target stereo channel based on the current direction of each target stereo channel to obtain left channel loudness audio data and right channel loudness audio data of each target stereo channel;
windowing each channel of the virtual target surround channel;
acquiring a plurality of data sets from the target sound effect database; wherein each of the data sets corresponds to an angular range in all directions except the first data set;
summing the convolution value of the initial left channel audio data of the middle channel and the first data set, the convolution value of the left channel loudness audio data of each target stereo channel and the data set currently corresponding to the convolution value, and the left channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the left channel restored audio of the current restored stereo audio; wherein the dataset corresponding to the current direction of the target stereo channel refers to the dataset corresponding to the angle range in which the current direction of the target stereo channel is located;
And summing the convolution value of the initial right channel audio data of the middle channel and the first data set, the convolution value of the right channel loudness audio data of each target stereo channel and the data set currently corresponding to the convolution value, and the right channel audio data of the ultralow frequency signal of the stereo audio to be processed, so as to obtain the right channel restored audio of the current restored stereo audio.
6. The method of claim 5, wherein the dynamically loudness balancing the current restored stereo audio to obtain balanced stereo audio comprises:
calculating half of the sum of the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current middle-set audio signal, and calculating half of the difference between the right channel restored audio and the left channel restored audio of the current restored stereo audio to obtain a current side-chain audio signal;
if the ratio of the average amplitude of the current mid-set audio signal to the average amplitude of the side-chain audio signal is larger than a first threshold, the level of the current side-chain audio signal is increased by a preset decibel;
if the ratio of the average amplitude of the current mid-set audio signal to the average amplitude of the side-chain audio signal is smaller than a second threshold, and the average amplitude of the mid-set audio signal and the average amplitude of the side-chain audio signal are not smaller than a preset minimum value, the level of the current mid-set audio signal is raised by a preset decibel; wherein the preset minimum value is close to zero;
If the average amplitude of the mid-set audio signal or the average amplitude of the side-chain audio signal is smaller than a preset minimum value, reducing the current mid-set audio signal by half of the power spectrum, and lifting the current side-chain audio signal by half of the power spectrum;
and adding the final current side-chain audio signal to the final current middle-set audio signal to obtain right channel audio data of the balanced stereo audio, and subtracting the final current side-chain audio signal from the final current middle-set audio signal to obtain left channel audio data of the balanced stereo audio.
7. The method of claim 1, wherein the loudness limiting the balanced stereo audio is output, comprising:
detecting an audio signal which is not in a loudness range in the balanced stereo audio;
setting the detected audio signal greater than the upper limit of the loudness range as the upper limit of the loudness range, and setting the detected audio signal less than the lower limit of the loudness range as the lower limit of the loudness range;
outputting the processed balanced stereo audio.
8. A stereoscopic surround sound effect implementation apparatus, comprising:
The decoding unit is used for decoding the stereo audio to be processed according to the sampling rate of the stereo audio to be processed to obtain PCM data of the stereo audio to be processed;
the filtering unit is used for filtering the PCM data of the stereo audio to be processed to obtain an ultralow frequency signal of the stereo audio to be processed;
the separation unit is used for performing sound source separation on the PCM data of the stereo audio to be processed to obtain a plurality of target sound source data of the stereo audio to be processed;
the building unit is used for respectively placing the target sound source data of the stereo audio to be processed on each target stereo channel of the target surround channel and combining the ultralow frequency signals of the stereo audio to be processed to form a virtual target surround channel;
the control unit is used for controlling the rotation speed and the direction of each target stereo channel through a self-phase algorithm according to the initial phase angle and the trigonometric function corresponding to each target stereo channel;
the downmix unit is used for performing downmix rendering on the virtual target sound channels by adopting a data set of a target sound effect database based on the directions of all sound channels of the virtual target surround sound channels respectively to obtain current restored stereo audio;
The balance processing unit is used for carrying out dynamic loudness balance processing on the current restored stereo audio to obtain balanced stereo audio;
and the loudness control unit is used for outputting the balanced stereo audio after the loudness limiting processing.
9. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the program is specifically configured to implement the stereo surround sound implementation method according to any one of claims 1 to 7.
10. A computer storage medium storing a computer program which, when executed, is adapted to implement the stereo surround sound effect implementation method according to any one of claims 1 to 7.
CN202410009799.4A 2024-01-03 2024-01-03 Method and device for realizing stereo surround sound effect, electronic equipment and storage medium Pending CN117641223A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410009799.4A CN117641223A (en) 2024-01-03 2024-01-03 Method and device for realizing stereo surround sound effect, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410009799.4A CN117641223A (en) 2024-01-03 2024-01-03 Method and device for realizing stereo surround sound effect, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117641223A true CN117641223A (en) 2024-03-01

Family

ID=90016534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410009799.4A Pending CN117641223A (en) 2024-01-03 2024-01-03 Method and device for realizing stereo surround sound effect, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117641223A (en)

Similar Documents

Publication Publication Date Title
US11184727B2 (en) Audio signal processing method and device
US9197977B2 (en) Audio spatialization and environment simulation
JP4434951B2 (en) Spatial conversion of audio channels
KR100591008B1 (en) Multidirectional Audio Decoding
JP5149968B2 (en) Apparatus and method for generating a multi-channel signal including speech signal processing
US8509454B2 (en) Focusing on a portion of an audio scene for an audio signal
US9154896B2 (en) Audio spatialization and environment simulation
KR101572894B1 (en) A method and an apparatus of decoding an audio signal
RU2416172C1 (en) Method and apparatus for improving audio playback
US8036767B2 (en) System for extracting and changing the reverberant content of an audio input signal
TWI686794B (en) Method and apparatus for decoding encoded audio signal in ambisonics format for l loudspeakers at known positions and computer readable storage medium
RU2666316C2 (en) Device and method of improving audio, system of sound improvement
US9743215B2 (en) Apparatus and method for center signal scaling and stereophonic enhancement based on a signal-to-downmix ratio
CA3226617A1 (en) Generating binaural audio in response to multi-channel audio using at least one feedback delay network
JP2005354695A (en) Audio signal processing
CN105308988A (en) Audio decoder configured to convert audio input channels for headphone listening
Wiggins An investigation into the real-time manipulation and control of three-dimensional sound fields
JP2024028526A (en) Sound field related rendering
KR19980087427A (en) Sound field correction circuit
CN117641223A (en) Method and device for realizing stereo surround sound effect, electronic equipment and storage medium
CN115802274A (en) Audio signal processing method, electronic device, and computer-readable storage medium
Griesinger Speaker placement, externalization, and envelopment in home listening rooms
JP6519507B2 (en) Acoustic signal transfer method and reproduction apparatus
Uhle Center signal scaling using signal-to-downmix ratios
JP2018029306A (en) Channel number converter and program therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination