CN117336649A

CN117336649A - Audio processing method and system and electronic equipment

Info

Publication number: CN117336649A
Application number: CN202311377828.4A
Authority: CN
Inventors: 陈联武; 李旭阳; 孙学京
Original assignee: Ruisheng Kaitai Acoustic Technology Shanghai Co ltd
Current assignee: Ruisheng Kaitai Acoustic Technology Shanghai Co ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-01-02

Abstract

The embodiment of the disclosure relates to the technical field of audio reverberation, and provides an audio processing method, an audio processing system and electronic equipment, wherein the method comprises the following steps: determining target tuning parameters of a target tuning scene based on tuning parameters corresponding to a preset tuning scene; the tuning parameters are obtained based on adjustment values obtained by mapping the space sizes of the corresponding preset tuning scenes; and carrying out reverberation tuning processing on the input audio signal based on the target tuning parameter to obtain a final reverberation audio signal of the target tuning scene. According to the method and the device for simulating the reverberation effect of the target tuning scene, the target tuning scene to be simulated can be regulated and controlled based on the corresponding space size, so that the simulation of the reverberation effect of the target tuning scene with different space sizes is realized, and the simulation requirement of the reverberation effect of the virtual scene which is richer and more flexible is further met.

Description

Audio processing method and system and electronic equipment

Technical Field

The disclosure relates to the technical field of audio reverberation, and in particular relates to an audio processing method and system and electronic equipment.

Background

When sound propagates in different acoustic scenes, different reverberation effects can be generated due to different space sizes and different reflecting surface materials. The person mainly perceives the acoustic scene where the person is located through the details of the reverberation, and judges the size of the space where the person is currently located, such as a recording studio, a concert hall or a gym.

Sound reaching the human ear from the sound source through the shortest path is called direct sound. In addition, sound forms a reverberant signal through multiple reflections in space. The reverberant signal is in turn divided into Early reflections (Early reflections) and Late reverberations (Late reverberations). Early reflections refer to sounds that have undergone only one or two reflections, and late reverberation refers to the collection of sounds that have undergone multiple reflections.

In sound effect design, the prior art typically virtualizes the listening effect in different acoustic scenes by superimposing a specific reverberation signal in the original audio.

Common reverberation generation methods include convolution reverberation methods and artificial reverberation methods. The convolution reverberation method is to measure the room impact response (Room Impulse Response, RIR) of an actual scene, and then to convolve the target audio with the RIR when the sound effect is generated, so as to reproduce the corresponding reverberation effect. The convolution reverberation method can generate a true aliasing effect, but its complexity is high. The artificial reverberation method is to simulate the generation of reverberation by a model method, and comprises early radiation, late reverberation, time delay, frequency attenuation characteristics and the like, so as to simulate the reverberation effect of a target scene. The artificial reverberation method is flexible and low in complexity.

In the proposal, an input microphone signal and a music signal are processed by a preprocessing module to obtain an input signal required by artificial reverberation, the input signal is processed by a reverberation generation algorithm to obtain a multi-channel artificial reverberation signal, and finally the multi-channel artificial reverberation signal is processed by a post-processing module such as time delay, gain control, dry-wet sound proportion mixing and the like to obtain final virtual scene audio output.

However, the solution algorithm module and the parameters in the prior art are numerous, and in practical application, each target virtual scene needs to be tuned independently, so that more expert experience support is needed, and flexible and various scene requirements cannot be met.

Disclosure of Invention

The present disclosure aims to solve at least one of the problems in the prior art, and provides an audio processing method, an audio processing system and an electronic device.

In one aspect of the present disclosure, there is provided an audio processing method including:

determining target tuning parameters of a target tuning scene based on tuning parameters corresponding to a preset tuning scene; the tuning parameters are obtained based on adjustment values obtained by mapping the space sizes of the corresponding preset tuning scenes;

And carrying out reverberation tuning processing on the input audio signal based on the target tuning parameter to obtain a final reverberation audio signal of the target tuning scene.

Optionally, the determining the target tuning parameter of the target tuning scene based on the tuning parameter corresponding to the preset tuning scene includes:

selecting a first candidate tuning scene and a second candidate tuning scene from preset tuning scenes based on the target tuning scene; wherein the adjustment value of the first candidate tuning scene is not greater than the adjustment value of the target tuning scene, and the adjustment value of the second candidate tuning scene is not less than the adjustment value of the target tuning scene;

and performing interpolation processing based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding to the adjustment value of the first candidate tuning scene and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding to the adjustment value of the second candidate tuning scene to obtain the target tuning parameter of the target tuning scene.

Optionally, the first candidate tuning scene is a tuning scene with the largest adjustment value in the preset tuning scenes with the adjustment value not greater than the adjustment value of the target tuning scene;

the second candidate tuning scene is a tuning scene with the minimum adjustment value in the preset tuning scenes with the adjustment value not smaller than the adjustment value of the target tuning scene.

Optionally, the interpolating processing is performed based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding to the adjustment value of the first candidate tuning scene and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding to the adjustment value of the second candidate tuning scene, so as to obtain the target tuning parameter of the target tuning scene, including:

determining target tuning parameters of the target tuning scene according to the following formula (1):

P(Vt)＝(Vt-Vi)/(Vj-Vi)*(P(Vj)-P(Vi))+P(Vi)(1)

wherein, P (Vt) represents a target tuning parameter corresponding to the adjustment value Vt of the target tuning scene t, P (Vi) represents a tuning parameter corresponding to the adjustment value Vi of the first candidate tuning scene i, and P (Vj) represents a tuning parameter corresponding to the adjustment value Vj of the second candidate tuning scene j.

acquiring tuning parameter estimation values of key acoustic parameters through corresponding room impulse responses based on the preset tuning scene;

based on the tuning parameter estimation value, obtaining an intermediate tuning parameter of the target tuning scene;

and obtaining the target tuning parameters of the target tuning scene based on the intermediate tuning parameters and a pre-trained tuning parameter prediction model.

Optionally, the obtaining the intermediate tuning parameter of the target tuning scene based on the tuning parameter estimation value includes:

Selecting a first intermediate tuning scene and a second intermediate tuning scene from preset tuning scenes based on the target tuning scene; wherein the adjustment value of the first intermediate tuning scene is not greater than the adjustment value of the target tuning scene, and the adjustment value of the second intermediate tuning scene is not less than the adjustment value of the target tuning scene;

and carrying out interpolation processing based on the adjustment value of the first intermediate tuning scene and the corresponding tuning parameter estimation value thereof and the adjustment value of the second intermediate tuning scene and the corresponding tuning parameter estimation value thereof to obtain the intermediate tuning parameter of the target tuning scene.

Optionally, the tuning parameter prediction model is obtained through training according to the following steps:

generating training data based on tuning parameters corresponding to the preset tuning scene and the tuning parameter estimation values corresponding to the tuning parameters;

and training the tuning parameter prediction model by taking the tuning parameter estimated value in the training data as input and the tuning parameter in the training data as output to obtain the pre-trained tuning parameter prediction model.

Optionally, the performing reverberation tuning processing on the input audio signal based on the target tuning parameter to obtain a final reverberation audio signal of the target tuning scene includes:

Preprocessing the input audio signal based on the preprocessing parameters in the target tuning parameters to obtain a reverberation input signal;

performing reverberation processing on the reverberation input signal based on the reverberation parameter in the target tuning parameter to generate an initial reverberation audio signal of the target tuning scene;

based on the weighted mixing parameters in the target tuning parameters, carrying out weighted mixing processing on the dry sound signals in the input audio signals and the initial reverberation audio signals according to a preset proportion to obtain intermediate reverberation audio signals of the target tuning scene;

and performing system tuning processing on the intermediate reverberation audio signal based on the system tuning parameters in the target tuning parameters to obtain a final reverberation audio signal of the target tuning scene.

In another aspect of the present disclosure, there is provided an audio processing system including:

the parameter control module is used for determining target tuning parameters of a target tuning scene based on tuning parameters corresponding to a preset tuning scene; the tuning parameters are obtained based on adjustment values obtained by mapping the space sizes of the corresponding preset tuning scenes;

And the audio generation module is used for carrying out reverberation tuning processing on the input audio signal based on the target tuning parameter to obtain a final reverberation audio signal of the target tuning scene.

Optionally, the parameter control module is configured to determine, based on tuning parameters corresponding to a preset tuning scene, target tuning parameters of a target tuning scene, including:

the parameter control module is used for:

Optionally, the parameter control module is configured to perform interpolation processing based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding to the adjustment value of the first candidate tuning scene, and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding to the adjustment value of the second candidate tuning scene, so as to obtain the target tuning parameter of the target tuning scene, and includes:

the parameter control module is used for:

P(Vt)＝(Vt-Vi)/(Vj-Vi)*(P(Vj)-P(Vi))+P(Vi)(1)

the parameter control module is used for:

Optionally, the parameter control module is configured to obtain an intermediate tuning parameter of the target tuning scene based on the tuning parameter estimation value, and includes:

the parameter control module is used for:

Optionally, the audio processing system further comprises a training module;

the training module is used for training and obtaining the tuning parameter prediction model according to the following steps:

Optionally, the audio generating module includes:

the preprocessing unit is used for preprocessing the input audio signal based on the preprocessing parameters in the target tuning parameters to obtain a reverberation input signal;

the reverberation generating unit is used for carrying out reverberation processing on the reverberation input signal based on the reverberation parameter in the target tuning parameter to generate an initial reverberation audio signal of the target tuning scene;

the mixing unit is used for carrying out weighted mixing processing on the dry sound signal in the input audio signal and the initial reverberation audio signal according to a preset proportion based on the weighted mixing parameters in the target tuning parameters to obtain an intermediate reverberation audio signal of the target tuning scene;

and the system tuning unit is used for performing system tuning processing on the intermediate reverberation audio signal based on the system tuning parameters in the target tuning parameters to obtain a final reverberation audio signal of the target tuning scene.

In another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the audio processing method described above.

Compared with the prior art, the method and the device have the advantages that the target tuning parameters of the target tuning scene are determined by utilizing the tuning parameters corresponding to the adjustment values obtained based on the space size mapping of the preset tuning scene, the target tuning parameters are utilized to carry out reverberation tuning processing on the input audio signals to obtain the final reverberation audio signals of the target tuning scene, the target tuning scene to be simulated can be adjusted and controlled based on the corresponding space size, the reverberation effect simulation of the target tuning scene with different space sizes is realized, and the richer and more flexible virtual scene reverberation effect simulation requirements are further met.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures do not depict a proportional limitation unless expressly stated otherwise.

FIG. 1 is a flow chart of an audio processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an audio processing system according to another embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to another embodiment of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present disclosure, numerous technical details have been set forth in order to provide a better understanding of the present disclosure. However, the technical solutions claimed in the present disclosure can be implemented without these technical details and with various changes and modifications based on the following embodiments. The following divisions of the various embodiments are for convenience of description, and should not be construed as limiting the specific implementations of the disclosure, and the various embodiments may be mutually combined and referred to without contradiction.

One embodiment of the present disclosure relates to an audio processing method, the flow of which is shown in fig. 1, including:

step S110, determining target tuning parameters of a target tuning scene based on tuning parameters corresponding to a preset tuning scene; the tuning parameters are obtained based on adjustment values obtained by mapping the space sizes of the corresponding preset tuning scenes.

Specifically, the preset tuning scene refers to a plurality of preset virtual acoustic scenes corresponding to different space sizes, such as a recording studio scene, a living room scene, a concert hall scene, a football field scene and the like.

For preset tuning scenes, different audio processing operations have different types of audio processing parameters. For example, for pre-processing operations such as dereverberation (Dereverb), EQ tuning, i.e., equalization tuning, delay (Delay) control, etc., the types of audio processing parameters involved include, but are not limited to, the degree of dereverberation, key frequency points and gain values of the EQ curve, delay time, etc., pre-processing parameters. For reverberations operations such as early reflection generation, late reverberation generation, decorrelation, etc., the types of audio processing parameters involved include, but are not limited to, delay time of early reflection and late reverberation, reverberation density, echo intensity, room absorption coefficients, filter frequency points and gain values, delay time of a delay, degree of decorrelation, etc. For operations such as mixing and system tuning, the types of audio processing parameters include, but are not limited to, mixing parameters such as dry-wet sound proportion, and system tuning parameters such as gain delay control of multiple channels. Assuming that the preset tuning scene refers to N audio processing parameters in total, the corresponding parameter values thereof may be denoted as p= { P1, P2,..and pN }, where P1, P2,..and pN represent the parameter values corresponding to the N audio processing parameters, respectively.

It should be noted that, tuning parameters corresponding to the preset tuning scene are obtained based on the numerical value of the space size mapping of each virtual acoustic scene in the preset tuning scene, that is, the adjustment value. For example, assuming that the preset tuning scene includes K virtual acoustic scenes, where an adjustment value obtained by mapping any virtual acoustic scene K (k= [1, 2..once, K ]) according to its spatial size is Vk, the tuning parameter corresponding to the virtual acoustic scene K may be denoted as P (Vk). The specific value of the adjustment value Vk may be set based on expert experience, for example, a corresponding adjustment value may be set for each virtual acoustic scene according to the size of the corresponding space, where the larger the corresponding space is, the larger the adjustment value of the corresponding virtual acoustic scene is. For example, the value interval of the adjustment value corresponding to each virtual acoustic scene included in the preset tuning scene may be set to be [0,1], and when the preset tuning scene includes a studio scene (k=1), a living room scene (k=2), a concert hall scene (k=3), and a soccer field scene (k=4) according to the corresponding space size, the adjustment values corresponding to each scene may be expressed as v1=0, v2=0.3, v3=0.7, and v4=1, respectively.

The target tuning scene may be specified by the user based on a preset tuning scene. For example, when the preset tuning scene includes a virtual acoustic scene such as a studio scene, a living room scene, a concert hall scene, a soccer field scene, etc., the target tuning scene may be any one of the virtual acoustic scenes such as a studio scene, a living room scene, a concert hall scene, a soccer field scene, etc.

The target tuning parameters of the target tuning scene refer to audio processing parameters that need to be adjusted to simulate the reverberation effect in the target tuning scene on the basis of the input audio signal, including, but not limited to, one or more audio processing parameters involved in preprocessing operations, reverberation operations, mixing, and system tuning operations. On this basis, after the tuning parameter P (Vk) determined based on the spatial size of the preset tuning scene is obtained in step S110, the target tuning parameters of the target tuning scene may be determined according to the preset relationship between P (Vk) and the target tuning parameters of the target tuning scene, such as a linear relationship or a nonlinear relationship, so as to simulate the reverberation effects of different target tuning scenes according to the spatial size of the preset tuning scene.

And step S120, performing reverberation tuning processing on the input audio signal based on the target tuning parameters to obtain a final reverberation audio signal of the target tuning scene.

Specifically, after the target tuning parameters of the target tuning scene are obtained, step S120 may perform corresponding audio processing operation on the input audio signal according to one or more of the audio processing parameters related to the target tuning parameters, such as the pre-processing parameters, the reverberation processing parameters, the mixing and system tuning processing parameters, so as to obtain a final reverberation audio signal simulating the reverberation effect in the target tuning scene.

For example, when the reverberation tuning process is performed on the input audio signal based on the target tuning parameter, step S120 may include: and preprocessing the input audio signal based on the preprocessing parameters in the target tuning parameters to obtain a reverberation input signal. And carrying out reverberation processing on the reverberation input signal based on the reverberation parameter in the target tuning parameter to generate an initial reverberation audio signal of the target tuning scene. And carrying out weighted mixing processing on the dry sound signal and the initial reverberation audio signal in the input audio signal according to a preset proportion based on the weighted mixing parameters in the target tuning parameters to obtain the intermediate reverberation audio signal of the target tuning scene. And performing system tuning processing on the intermediate reverberation audio signal based on the system tuning parameters in the target tuning parameters to obtain a final reverberation audio signal of the target tuning scene.

Specifically, the weighted mixing parameters herein include, but are not limited to, dry-wet sound ratio and the like. In the case of performing the weighted mixing process on the dry sound signal and the initial reverberation audio signal in the input audio signal according to a predetermined ratio, the input audio signal may be first subjected to dry-wet sound separation, and then the separated dry sound signal and the initial reverberation audio signal may be subjected to the weighted mixing process according to a predetermined ratio, for example, 1:1, 1.1:0.8, 0.7:1.2, and the like, thereby obtaining the intermediate reverberation audio signal.

It should be noted that, the audio processing method provided in this embodiment is not only suitable for simulating the reverberation effect of different types of virtual acoustic scenes with different space sizes, such as a recording studio scene, a living room scene, a concert hall scene, a football field scene, etc., but also suitable for simulating the reverberation effect of the same type of virtual acoustic scene with different space sizes, such as simulating the reverberation effect of concert hall scenes with different space sizes.

Compared with the prior art, the audio processing method provided by the embodiment of the disclosure determines the target tuning parameters of the target tuning scene by utilizing the tuning parameters corresponding to the adjustment values obtained based on the space size mapping of the preset tuning scene, performs reverberation tuning processing on the input audio signals by utilizing the target tuning parameters to obtain the final reverberation audio signals of the target tuning scene, and can realize the adjustment control of any audio processing parameters on the target tuning scene to be simulated based on the corresponding space size, thereby realizing the simulation of the reverberation effect of the target tuning scene with different space sizes and further meeting the simulation demand of the reverberation effect of the richer and more flexible virtual scene.

Illustratively, step S110 includes: selecting a first candidate tuning scene and a second candidate tuning scene from preset tuning scenes based on the target tuning scenes; the adjustment value of the first candidate tuning scene is not larger than the adjustment value of the target tuning scene, and the adjustment value of the second candidate tuning scene is not smaller than the adjustment value of the target tuning scene. And performing interpolation processing based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding to the adjustment value, and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding to the adjustment value, so as to obtain the target tuning parameter of the target tuning scene.

Specifically, the adjustment value obtained by the target tuning scene t based on the space size mapping is recorded as Vt, the adjustment value obtained by the first candidate tuning scene i based on the space size mapping is recorded as Vi, and the adjustment value obtained by the second candidate tuning scene j based on the space size mapping is recorded as Vj, so that Vi < = Vt < = Vj is satisfied. Accordingly, the tuning parameter of the first candidate tuning scene i may be denoted as P (Vi), and the tuning parameter of the second candidate tuning scene j may be denoted as P (Vj). On this basis, the target tuning parameter P (Vt) when the target tuning scene t corresponds to the adjustment value Vt can be obtained by performing interpolation processing of the tuning parameters based on Vi, vj, P (Vi), P (Vj).

The present embodiment is not limited to a specific manner of interpolation processing. For example, the interpolation process herein may include, but is not limited to, one or more of linear interpolation, nearest neighbor interpolation, polynomial interpolation, spline interpolation, least squares interpolation, and the like. For example, in this embodiment, any one of interpolation processing manners such as linear interpolation, nearest neighbor interpolation, polynomial interpolation, spline interpolation, and least square interpolation may be adopted to perform interpolation processing based on all adjustment values of the first candidate tuning scene and the second candidate tuning scene and the tuning parameters corresponding thereto. Alternatively, in this embodiment, the interpolation may be performed by using linear interpolation based on the partial adjustment values of the first candidate tuning scene or the second candidate tuning scene and the tuning parameters corresponding thereto, and at the same time, one or more of nearest neighbor interpolation, polynomial interpolation, spline interpolation, least square interpolation, and the like may be used to perform the interpolation based on the remaining adjustment values of the first candidate tuning scene or the second candidate tuning scene and the tuning parameters corresponding thereto.

Exemplary, interpolation processing is performed based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding thereto, and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding thereto, to obtain the target tuning parameter of the target tuning scene, including: determining target tuning parameters of the target tuning scene according to the following formula (1):

P(Vt)＝(Vt-Vi)/(Vj-Vi)*(P(Vj)-P(Vi))+P(Vi)(1)

According to the method, the target tuning parameters of the target tuning scene are obtained through parameter interpolation processing by utilizing the adjustment values obtained through space size mapping of the first candidate tuning scene and the second candidate tuning scene and the tuning parameters corresponding to the adjustment values, so that the final reverberation audio signal obtained through reverberation tuning processing of the input audio signal based on the target tuning parameters can be closer to the real reverberation audio signal corresponding to the target tuning scene, simplicity and effectiveness are achieved, and the simulated reverberation effect of the target tuning scene is further improved.

For example, in order to make the final reverberant audio signal of the target tuning scene closer to the real reverberant audio signal corresponding to the target tuning scene, the first candidate tuning scene may be set to a virtual acoustic scene with the largest adjustment value among preset tuning scenes with adjustment values not greater than the adjustment value of the target tuning scene, i.e., a tuning scene, and the second candidate tuning scene may be set to a virtual acoustic scene with the smallest adjustment value among preset tuning scenes with adjustment values not less than the adjustment value of the target tuning scene, i.e., a tuning scene, so as to further improve the simulated reverberation effect of the target tuning scene.

It should be noted that, besides determining the target tuning parameters of the target tuning scene by adopting the interpolation method, the neural network model may be used to determine the target tuning parameters of the target tuning scene, so as to simulate the complex nonlinear relationship of the mutual coupling between different audio processing parameters included in the target tuning parameters through the neural network model, thereby further improving the simulated reverberation effect of the target tuning scene.

For example, when determining the target tuning parameters of the target tuning scene using the neural network model, step S110 may include: and obtaining tuning parameter estimation values of the key acoustic parameters through corresponding room impulse responses based on a preset tuning scene. And obtaining the intermediate tuning parameters of the target tuning scene based on the tuning parameter estimation value. And obtaining the target tuning parameters of the target tuning scene based on the intermediate tuning parameters and the pre-trained tuning parameter prediction model.

Specifically, the key acoustic parameters herein refer to acoustic parameters that can affect spatial perception and may include, but are not limited to, total reverberation duration, reverberation durations of different frequency bands, delays of early and late reverberation, late reverberation density, inter-channel correlation coefficients, and the like. For different virtual acoustic scenes in the preset tuning scenes, tuning parameter estimated values of the corresponding key acoustic parameters, namely estimated values of the key acoustic parameters, are also different.

When determining tuning parameter estimation values of key acoustic parameters based on preset tuning scenes, unit impulse signals can be input for each virtual acoustic scene in the preset tuning scenes, corresponding room impulse responses RIR (Room Impulse Response) are obtained by setting different audio processing parameter values P, and then estimation values Q of the corresponding key acoustic parameters are obtained based on room impulse responses RIR through a measurement algorithm.

The intermediate tuning parameters herein refer to tuning parameters determined based on adjustment values obtained by mapping the spatial sizes of virtual acoustic scenes such as a recording studio scene, a living room scene, a concert hall scene, a soccer field scene, etc. in preset tuning scenes. For example, for a virtual acoustic scene k in a preset tuning scene, when an adjustment value obtained by mapping according to a space size is Vk, an intermediate tuning parameter corresponding to the virtual acoustic scene may be denoted as Q (Vk).

Illustratively, obtaining the intermediate tuning parameters of the target tuning scene based on the tuning parameter estimation values includes: selecting a first intermediate tuning scene and a second intermediate tuning scene from preset tuning scenes based on the target tuning scene; wherein the adjustment value of the first intermediate tuning scene is not greater than the adjustment value of the target tuning scene, and the adjustment value of the second intermediate tuning scene is not less than the adjustment value of the target tuning scene. And carrying out interpolation processing based on the adjustment value of the first intermediate tuning scene and the corresponding tuning parameter estimation value thereof and the adjustment value of the second intermediate tuning scene and the corresponding tuning parameter estimation value thereof to obtain the intermediate tuning parameter of the target tuning scene.

Specifically, the adjustment value obtained by the target tuning scene t based on the space size mapping is recorded as Vt, the adjustment value obtained by the first intermediate tuning scene x based on the space size mapping is recorded as Vx, and the adjustment value obtained by the second intermediate tuning scene y based on the space size mapping is recorded as Vy, so that Vx < = Vt < = Vy is satisfied. Accordingly, the tuning parameter estimate for the first intermediate tuning scene x may be denoted as Q (Vx) and the tuning parameter estimate for the second intermediate tuning scene y may be denoted as Q (Vy). On the basis, the intermediate tuning parameter Q (Vt) when the target tuning scene t corresponds to the adjustment value Vt can be obtained by interpolation processing of the tuning parameters based on Vx, vy, Q (Vx) and Q (Vy).

Referring to the above formula (1), when the intermediate tuning parameter of the target tuning scene t is determined by interpolation, the intermediate tuning parameter Q (Vt) when the target tuning scene t corresponds to the adjustment value Vt may be expressed as the following formula (2):

Q(Vt)＝(Vt-Vx)/(Vy-Vx)*(Q(Vy)-Q(Vx))+Q(Vx)(2)

it should be noted that, in order to make the final reverberant audio signal of the target tuning scene closer to the real reverberant audio signal corresponding to the target tuning scene, the first intermediate tuning scene may be set to be a virtual acoustic scene with the largest adjustment value among preset tuning scenes with adjustment values not greater than the adjustment value of the target tuning scene, that is, a tuning scene, and the second intermediate tuning scene may be set to be a virtual acoustic scene with the smallest adjustment value among preset tuning scenes with adjustment values not less than the adjustment value of the target tuning scene, that is, a tuning scene, so as to further improve the simulated reverberation effect of the target tuning scene.

Illustratively, the tuning parameter prediction model is trained according to the following steps: and generating training data based on tuning parameters corresponding to the preset tuning scene and tuning parameter estimation values corresponding to the preset tuning scene. And taking the tuning parameter estimated value in the training data as input, taking the tuning parameter in the training data as output, and training the tuning parameter prediction model to obtain a pre-trained tuning parameter prediction model.

Specifically, when training the tuning parameter prediction model, the tuning parameter P (Vk) of the virtual acoustic scene k in the preset tuning scene and the tuning parameter estimation value Q (Vk) corresponding to the tuning parameter P (Vk) may be formed into a data pair { P, Q }, training data is generated based on a large number of data pairs { P, Q } corresponding to the virtual acoustic scene, Q in the data pair { P, Q } is then used as an input of the tuning parameter prediction model, P in the data pair { P, Q } is used as an output of the tuning parameter prediction model, the tuning parameter prediction model constructed based on the neural network model is trained, and the mapping relationship between P and Q may be represented as p=f (Q).

It should be noted that, in this embodiment, the specific type of the neural network model used to construct the tuning parameter prediction model is not limited, and those skilled in the art may select according to actual needs.

After the pre-trained tuning parameter prediction model F () is obtained, based on the mapping relationship p=f (Q) between P and Q, the target tuning parameter P (Vt) corresponding to the adjustment value Vt of the target tuning scene t, that is, P (Vt) =f (Q (Vt)) can be obtained based on the intermediate tuning parameter Q (Vt) and F () when the adjustment value Vt is corresponding to the target tuning scene t.

Another embodiment of the present disclosure relates to an audio processing system, as shown in fig. 2, comprising:

the parameter control module 210 is configured to determine a target tuning parameter of the target tuning scene based on a tuning parameter corresponding to the preset tuning scene; the tuning parameters are obtained based on adjustment values obtained by mapping the space sizes of the corresponding preset tuning scenes;

the audio generating module 220 is configured to perform reverberation tuning processing on the input audio signal based on the target tuning parameter, so as to obtain a final reverberation audio signal of the target tuning scene.

Compared with the prior art, the audio reverberation system provided by the embodiment of the disclosure determines the target tuning parameters of the target tuning scene by using the tuning parameters corresponding to the adjustment values obtained based on the space size mapping of the preset tuning scene through the parameter control module, performs reverberation tuning processing on the input audio signals by using the target tuning parameters through the audio generation module to obtain final reverberation audio signals of the target tuning scene, and can realize the adjustment control of any audio processing parameters on the target tuning scene to be simulated based on the corresponding space size, thereby realizing the simulation of the reverberation effect of the target tuning scene with different space sizes and further meeting the simulation demand of the reverberation effect of the richer and more flexible virtual scene.

Illustratively, the parameter control module 210 is configured to determine, based on tuning parameters corresponding to a preset tuning scene, target tuning parameters of a target tuning scene, including:

the parameter control module 210 is configured to: selecting a first candidate tuning scene and a second candidate tuning scene from preset tuning scenes based on the target tuning scenes; wherein the adjustment value of the first candidate tuning scene is not greater than the adjustment value of the target tuning scene, and the adjustment value of the second candidate tuning scene is not less than the adjustment value of the target tuning scene; and performing interpolation processing based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding to the adjustment value, and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding to the adjustment value, so as to obtain the target tuning parameter of the target tuning scene.

The first candidate tuning scene is a tuning scene with the largest adjustment value in preset tuning scenes with the adjustment value not larger than the adjustment value of the target tuning scene. The second candidate tuning scene is the tuning scene with the minimum adjustment value in the preset tuning scenes with the adjustment value not smaller than the adjustment value of the target tuning scene.

Illustratively, the parameter control module 210 is configured to perform interpolation processing based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding thereto, and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding thereto, to obtain the target tuning parameter of the target tuning scene, including:

The parameter control module 210 is configured to determine a target tuning parameter of the target tuning scene according to the following formula (1):

P(Vt)＝(Vt-Vi)/(Vj-Vi)*(P(Vj)-P(Vi))+P(Vi)(1)

the parameter control module 210 is configured to obtain tuning parameter estimation values of key acoustic parameters through corresponding room impulse responses based on a preset tuning scene; based on the tuning parameter estimation value, obtaining an intermediate tuning parameter of the target tuning scene; and obtaining the target tuning parameters of the target tuning scene based on the intermediate tuning parameters and the pre-trained tuning parameter prediction model.

Illustratively, the parameter control module 210 is configured to obtain an intermediate tuning parameter of the target tuning scene based on the tuning parameter estimation value, including:

the parameter control module 210 is configured to select a first intermediate tuning scene and a second intermediate tuning scene from preset tuning scenes based on the target tuning scene; wherein the adjustment value of the first intermediate tuning scene is not greater than the adjustment value of the target tuning scene, and the adjustment value of the second intermediate tuning scene is not less than the adjustment value of the target tuning scene; and carrying out interpolation processing based on the adjustment value of the first intermediate tuning scene and the corresponding tuning parameter estimation value thereof and the adjustment value of the second intermediate tuning scene and the corresponding tuning parameter estimation value thereof to obtain the intermediate tuning parameter of the target tuning scene.

The audio processing system also includes a training module. The training module is used for training to obtain a tuning parameter prediction model according to the following steps: generating training data based on tuning parameters corresponding to a preset tuning scene and tuning parameter estimation values corresponding to the preset tuning scene; and taking the tuning parameter estimated value in the training data as input, taking the tuning parameter in the training data as output, and training the tuning parameter prediction model to obtain a pre-trained tuning parameter prediction model.

Illustratively, as shown in fig. 2, the audio generation module 220 includes a pre-processing unit 221, a reverberation generation unit 222, a mixing unit 223, and a system tuning unit 224.

The preprocessing unit 221 is configured to perform preprocessing on the input audio signal based on the preprocessing parameter in the target tuning parameter, so as to obtain a reverberant input signal. Among them, preprocessing includes, but is not limited to, dereverberation, equalization adjustment, delay control, etc.

The reverberation generating unit 222 is configured to perform reverberation processing on the reverberant input signal based on the reverberation parameter in the target tuning parameter, and generate an initial reverberation audio signal of the target tuning scene. Among them, the reverberation processing includes, but is not limited to, early reflection generation, late reverberation generation, decorrelation, etc. reverberation operations.

The mixing unit 223 is configured to perform weighted mixing processing on the dry sound signal and the initial reverberation audio signal in the input audio signal according to a preset ratio based on the weighted mixing parameter in the target tuning parameter, so as to obtain an intermediate reverberation audio signal of the target tuning scene.

The system tuning unit 224 is configured to perform system tuning processing on the intermediate reverberation audio signal based on the system tuning parameters in the target tuning parameters, so as to obtain a final reverberation audio signal of the target tuning scene.

The specific implementation method of the audio processing system provided in the embodiments of the present disclosure may be described with reference to the audio processing method provided in the embodiments of the present disclosure, which is not described herein again.

Another embodiment of the present disclosure relates to an electronic device, as shown in fig. 3, comprising:

at least one processor 301; the method comprises the steps of,

a memory 302 communicatively coupled to the at least one processor 301; wherein,

the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform the audio processing method described in the above embodiments.

Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments for carrying out the present disclosure, and that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure.

Claims

1. An audio processing method, characterized in that the audio processing method comprises:

2. The audio processing method according to claim 1, wherein the determining the target tuning parameters of the target tuning scene based on the tuning parameters corresponding to the preset tuning scene includes:

3. The audio processing method according to claim 2, wherein,

the first candidate tuning scene is a tuning scene with the largest adjustment value in the preset tuning scenes with the adjustment value not larger than the adjustment value of the target tuning scene;

4. The audio processing method according to claim 2, wherein the interpolating process is performed based on the adjustment value of the first candidate tuning scene and the tuning parameter corresponding thereto, and the adjustment value of the second candidate tuning scene and the tuning parameter corresponding thereto, to obtain the target tuning parameter of the target tuning scene, comprising:

P(Vt)＝(Vt-Vi)/(Vj-Vi)*(P(Vj)-P(Vi))+P(Vi)(1)

5. The audio processing method according to claim 1, wherein the determining the target tuning parameters of the target tuning scene based on the tuning parameters corresponding to the preset tuning scene includes:

6. The audio processing method according to claim 5, wherein the obtaining the intermediate tuning parameters of the target tuning scene based on the tuning parameter estimation values includes:

7. The audio processing method according to claim 5, wherein the tuning parameter prediction model is trained according to the following steps:

8. The audio processing method according to any one of claims 1 to 7, wherein the performing reverberation tuning processing on the input audio signal based on the target tuning parameter to obtain a final reverberation audio signal of the target tuning scene includes:

9. An audio processing system, the audio processing system comprising:

10. An electronic device, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the audio processing method of any one of claims 1 to 8.