CN113316077A

CN113316077A - Three-dimensional vivid generation system for voice sound source space sound effect

Info

Publication number: CN113316077A
Application number: CN202110715479.7A
Authority: CN
Inventors: 高小翎; 刘勇
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-06-27
Filing date: 2021-06-27
Publication date: 2021-08-27

Abstract

The method takes the consistency of subjective auditory perception obtained by a voice sound source in a three-dimensional video in a reconstructed three-dimensional sound field and subjective visual perception obtained in a reconstructed 3D video scene as a target, under the conditions of known sound source space information, sound source signals and reconstruction environment, an artificial three-dimensional reverberation model considering air attenuation is constructed on the basis of a perception factor influencing human ear distance, the sound source signals are corrected and controlled, sound source distance information is added, then inverse filtering for removing room reverberation at a listening position is obtained to remove the influence of signals played by a loudspeaker in a reverberation environment, the consistency of input signals of the loudspeaker and the signals received by a listener is ensured, and the distance recovery of the sound source is completed; and (4) carrying out gain factor distribution on the loudspeaker set signals to finish the recovery of the sound source direction and further finish the recovery of the three-dimensional space azimuth information of the sound source. The method can better realize the vivid restoration of the three-dimensional space information of the voice sound source.

Description

Three-dimensional vivid generation system for voice sound source space sound effect

Technical Field

The invention relates to a sound source space sound effect three-dimensional generation system, in particular to a voice sound source space sound effect three-dimensional vivid generation system, and belongs to the technical field of space sound effect three-dimensional generation.

Background

The great success of 3D movies in the box office has brought the film world into the 3D multimedia era, and in recent years, the 3D movies have been produced more and more, and some international classic movies have been continuously filmed into 3D movies, which are displayed to the audience with different visual effects. The mature development of the 3D video technology makes the three-dimensional stereo perception of the audience very obvious in vision, and can provide better visual presence experience for the audience, and a large number of devices supporting the 3D visual effect appear in the market. However, the current products on the market as 3D video audio systems still follow stereo or surround sound technology and do not provide the viewer with a consistent visual and audio experience. Therefore, the three-dimensional audio generation technology based on 3D video objects becomes an important direction for research and development in the multimedia technology field to break through hotspots and applications.

From generation to present, the digital audio technology goes through stages of a single-channel stereo system, a two-channel stereo system, a multi-channel system and the like according to the sound image characteristics of an audio signal, and sound image information contained in the signal is gradually expanded to any point in a three-dimensional space from a fixed single point in the space, so that a listener can feel the change of the sound image position along with time in the whole three-dimensional space. Three-dimensional audio is defined as: in a reconstructed sound field, a central point O in a listening area of a person is taken as a vertex, namely a listening position of the center of the head of the person is called a listening position, a three-dimensional coordinate system XYZ is constructed, the property of a sound image is defined according to the moving freedom degree of the virtual sound image in the space, if the virtual sound image is fixed at a certain point, the sound image is called as 0D audio, if the sound image moves on an arc between two loudspeakers, the sound image is called as 1D audio, if the sound image moves on a spherical surface surrounded by the loudspeakers, the sound image is called as 2D audio, and if the sound image moves in the whole three-dimensional space with the head of the listener as the center, the sound image is called as three-dimensional audio.

According to the definition of the three-dimensional audio, compared with the traditional stereo system or surround sound system, the three-dimensional audio technology can provide better spatial orientation sense and immersion sense for listeners, and the conventional sound accompanying system of the 3D video obviously cannot express complete 3D audio-visual feeling, so that unprecedented opportunities are created for the development of the three-dimensional audio technology, the three-dimensional audio technology also becomes an important development direction in the technical field of audio and video, and the three-dimensional audio technology based on the 3D video is developed into popular research and development content in the audio field.

Audio technology has revolutionized in all aspects of playback devices, storage means, audio content, and application range over a hundred years since its creation and development to date. According to the feature description of the audio signal to the sound image position, the sound field information contained in the signal is gradually expanded from a single point to a three-dimensional space level. The description of the virtual sound image by the three-dimensional audio comprises direction information (x is a horizontal angle, and y is a height angle) and distance information z, and the sound image reconstructed by the traditional stereo and surround sound only has freedom degrees in the horizontal direction and the height, and does not accord with the definition of the three-dimensional audio, so that the problem that the audio and video feeling is inconsistent exists in the 3D multimedia system during reconstruction, the true auditory immersion and surrounding feeling cannot be provided for audiences, and the immersive feeling is difficult to realize. Most researches on the three-dimensional audio technology in the prior art are focused on the recovery of the direction of a sound source, and few researches on the distance recovery are only focused on a free sound field, but in the free sound field, only the relative distance of the sound source can be provided by depending on strength clues, and accurate absolute distance information of the sound source cannot be provided for a listener; secondly, the research on distance perception in a reverberation environment is limited to the collection of listening positions in a reconstructed sound field by a microphone array and then the analysis of the relation between the sound source distance and factors influencing the sound source distance, so far, a general restored sound source distance model is not constructed, and signals played by a loudspeaker can be controlled according to the sound source distance to be restored. Therefore, how to acquire the virtual sound image localization parameter and the distance parameter consistent with the original sound source by controlling the signal of the loudspeaker and construct the virtual sound image consistent with the spatial information of the target object sound source in the 3D video becomes a problem to be solved urgently. The invention mainly develops a three-dimensional audio module, and directly generates a 3D voice spatial sound effect by a target sound source signal and spatial orientation information of the target sound source signal in a reconstructed sound field in a known 3D video scene.

Currently, three-dimensional audio technologies are mainly classified into two categories, namely physical sound field reconstruction and perceptual sound scene reconstruction. The method has the advantages that the original sound field is reconstructed by specifically placing a large number of loudspeaker arrays based on physical sound field reconstruction, the Ambisonics technology decomposes the original sound source through a spherical harmonic function, then the original sound field is accurately reconstructed by utilizing secondary sound sources uniformly distributed on an equidistant spherical surface, the method has the advantages that the encoding and decoding processes are not influenced mutually, the positions of the loudspeakers do not need to be known at an encoding end, the playing signals of the loudspeakers can be calculated at a decoding end according to the placing positions of the loudspeakers, the effective listening area and the sound field reduction effect of low-order Ambisonics reconstruction are very limited, when the order is increased, the positioning accuracy is improved by direction information contained in the spherical harmonic function, the data volume of a system is increased sharply, strict requirements on the arrangement of the loudspeakers are met, and loudspeaker sets are required to be uniformly distributed on the spherical surface. The WFS technology can be adopted to accurately reconstruct an original sound field, and a listener can move freely in a listening area, but the accurate reconstruction of the original sound field by the WFS technology is based on a huge number of loudspeakers, and has higher requirements on experimental environment and experimental equipment, and the total cost of the system is very expensive. The sound scene reconstruction based on perception is to perform perception reconstruction on the direction and the distance of a sound source according to auditory characteristics perceived by human ears, and the technology mainly represents vector amplitude translation technology VBAP and head related transfer function HRTF. The VBAP technology is applied to a stereo system and a surround sound system at present, a sound image reconstructed by the VBAP technology can move on a plane formed by a plurality of loudspeakers, namely two free dimensions of a horizontal angle and a height angle of the sound image can be expressed, but according to the definition of three-dimensional audio, the sound image obtained by the VBAP technology lacks the freedom degree on the distance, so that the change of the distance of a 3D video target object cannot generate the consistent change of the distance of the three-dimensional audio sound image, and the 3D audio-visual perception is also seriously influenced; the HRTF technology can reconstruct a virtual sound image more accurately by extracting data in an HRTF library, and can realize perceptual reconstruction of three-dimensional audio only by using an earphone, but the HRTF is a physical quantity closely related to human body personalization, and the existing HRTF library cannot be widely applied to individuals and is easy to generate a head effect.

The sound source space sound effect three-dimensional generation in the prior art has defects and shortcomings, and the difficulties and the problems to be solved by the invention are mainly focused on the following aspects:

firstly, the sound image reconstructed by the traditional stereo and surround sound only has the freedom degrees in the horizontal direction and the height, and does not conform to the definition of three-dimensional audio, so that the problem of inconsistent audio and video feeling exists in the 3D multimedia system during reconstruction, real immersion and enclosure feeling on hearing cannot be provided for audiences, and the 3D multimedia system is difficult to have an immersive feeling; the research on distance perception in a reverberation environment is limited to the acquisition of listening positions in a reconstructed sound field through a microphone array and the analysis of the listening positions to obtain the relationship between the sound source distance and factors influencing the sound source distance, so far, a general restored sound source distance model is not constructed, signals played by a loudspeaker can be controlled according to the sound source distance to be restored, virtual sound image positioning parameters and distance parameters consistent with an original sound source cannot be obtained through the signals of the loudspeaker, and a virtual sound image consistent with target object sound source space information in a 3D video cannot be constructed;

second, when a speaker is used as a playback device, although the reconstruction technique based on a physical sound field has a precise recovery effect and a large recovery area, it requires a large number of speakers and strict limitations on placement of speakers, which is difficult to implement in practical applications. In addition, although the VBAP technology in the prior art can reconstruct the direction information of the original sound source more accurately, the technology lacks the perception of the distance between sound images, and the perception of the distance by a listener seriously affects the overall feeling of watching 3D video by the listener. Therefore, the method aims at enabling the subjective auditory perception of the voice sound source of the 3D video target object obtained in the reconstructed three-dimensional sound field to be consistent with the subjective visual perception obtained in the reconstructed 3D video scene, and under the known spatial information, sound source signals and reconstruction environment conditions of the three-dimensional audio and video target sound source, the generation of the three-dimensional voice spatial sound effect is realized by combining a multi-channel multi-dimensional reverberation removal algorithm and an artificial reverberation model to control a direct-mixing ratio and improving three technical frameworks of a VBAP (visual basic access point) technology, the consistent experience of the vision and the auditory perception of a user is met, and the development of industries such as 3D movie and television is facilitated to be promoted;

thirdly, the rapid development of the 3D film and television industry and the initiation of the three-dimensional audio standardization process by the motion picture experts group have brought about a great deal of attention in the field of three-dimensional audio technology, and the real and complete spatial experience of the audience on the audiovisual events is seriously impaired due to the lack of the three-dimensional spatial information expression of the sound source by the two-dimensional audio system developed based on the conventional stereo or surround sound. In the prior art, Ambisonics and WFS technologies both require a huge number of speakers and have strict limitations on arrangement among the speakers, sound images formed by a sound field reconstruction technology based on perception are only located on a spherical surface where the speakers are located, listeners cannot perceive sound source distance information outside the spherical surface, HRTFs mainly adopt earphones for playback and are closely related to individual differences of people, and the limitations are large;

fourthly, if the reverberation effect in a specific room is to be simulated more accurately and comprehensively, the spatial sound effect reverberation model still has the following defects: firstly, when a spatial sound effect reverberation model simulates reverberation, the delay parameter and the attenuation coefficient of a 19-order FIR filter are still parameters when a simulated Boston symphony hall is adopted, and no relevant parameter is set for a specific occasion; secondly, feedback gain coefficients of the comb filter are calculated according to the indoor average reverberation time, however, as the energy absorption of the air to the sound is influenced by the signal frequency, the energy attenuation degrees of the sounds in different frequency sections in different environments are different, namely, the corresponding reverberation times are different, especially for high-frequency signals, although the model can change the reverberation time of the high-frequency part of the sound by adding a low-pass filter on a feedback branch, the reverberation time is only roughly controlled, and the reverberation time of the signals with different frequencies cannot be accurately reflected.

Fifthly, the prior art lacks of restoring the sound image distance in a non-free field, and does not provide an effective method for effectively controlling the energy ratio of direct sound and reverberation sound; the appropriate reverberation of the prior art can make the sound heard by a listener clearer and brighter, but if the reverberation is extremely large, bad feeling can be brought to the listener; although the current VBAP technology can reconstruct the direction information of the original sound source, the method lacks the perception of the sound image distance, and the perception of the distance by the listener seriously affects the overall perception of the listener watching the 3D video.

Disclosure of Invention

Aiming at the defects of the prior art, the invention firstly proposes the reduction of a sound source distance model in a reverberation environment, and in the model, an artificial three-dimensional reverberation model for removing the room reverberation and considering the air attenuation is combined, the improved reverberation model based on the air attenuation inconsistency of the frequency spectrum can more accurately simulate the reverberation effect in a specific room by improving the spatial sound effect reverberation model in the artificial three-dimensional reverberation model, thereby realizing more accurate and comprehensive control signal DRR to recover the sound source distance, and further combining VBAP technology, the method and the device can realize the vivid restoration of the three-dimensional space direction of the sound source, and realize the consistency of the subjective auditory perception obtained by the voice sound source of the 3D video target object in the reconstruction of the three-dimensional sound field and the subjective visual perception obtained in the reconstruction of the 3D video scene.

In order to achieve the technical effects, the technical scheme adopted by the invention is as follows:

three-dimensional lifelike generation system of voice source space sound effect includes: the method comprises the following steps of considering an artificial three-dimensional reverberation model of air attenuation, a listening position room reverberation removal method and a three-dimensional voice space sound effect lifelike generation system, wherein the listening position room reverberation removal method specifically comprises a room impulse response model based on a copy and a multi-channel multi-dimensional room reverberation removal method;

under the conditions of known sound source spatial information, sound source signals and reconstruction environment, the invention improves the VBAP technology recovery direction, restores the sound source distance in the environment with reverberation, provides a model for reconstructing the sound source distance in the reverberation environment and provides a sound effect three-dimensional vivid generation system under the reverberation condition;

firstly, a sound source sound image distance recovery model is provided, the recovery of sound source three-dimensional spatial information is realized by combining with a VBAP technology sound source direction, based on the factors influencing the perception of the human ear to the sound source distance, the DRR containing signal intensity information can provide the absolute distance of a sound source, the direct sound-to-reverberation energy ratio DRR is adopted as the main factor of the distance recovery, the reverberation of a reconstructed room is simulated by designing an artificial three-dimensional reverberation model considering air attenuation, the input signal of a system is controlled, the DRR reaching the human ear is matched with the existing distance perception model, and the recovery of the sound source distance is realized; then, removing the influence of the signal carrying sound source distance information on the propagation in the reverberation room, analyzing the characteristics of the indoor sound field and the characteristics of the audio signal in the indoor sound field, adopting a room impulse response model based on a copy, and removing the room reverberation by adopting a multi-channel multi-dimensional room reverberation removing method on the basis of the room impulse response model construction, so that the signal DRR reaching the ears of the listener is consistent with the DRR controlled by the artificial reverberation model; and finally, recovering the direction of the sound source by fusing a VBAP technology, and playing the signal carrying the spatial information of the sound source through a loudspeaker to reach a listener to recover the three-dimensional spatial information of the sound source.

The three-dimensional vivid generation system of the sound source space sound effect, further, consider the artificial three-dimensional reverberation model of the air attenuation: the spatial sound effect reverberation model mainly comprises two parts, wherein the first part comprises a 19-order FIR filter which simulates the attenuation of direct sound and early reflected sound of a signal, and the second part comprises six comb filters s 1-s 6 connected in parallel and an all-pass filter D1 connected in series, wherein the six comb filters connected in parallel are responsible for reverberation processing on different frequencies of the signal.

The invention relates to a voice sound source space sound effect three-dimensional vivid generation system, which further improves a space sound effect reverberation model by the following two points:

the improvement is as follows: setting the delay and attenuation coefficients of a 19-order FIR filter in a spatial sound effect reverberation model according to a simulated specific environment, carrying out impulse response simulation through the characteristics of a known simulated room, and then calculating the delay and attenuation coefficients of direct sound, 1-order reflected sound, 2-order reflected sound and 3-order reflected sound;

the second improvement is that: setting a low-pass filter on a feedback branch in each comb filter in a spatial sound effect reverberation model to process different sound frequency sections, wherein when the absorption of air on high-frequency components of a voice signal is considered, the reverberation time is related to the sound absorption coefficient of the air and is related to the humidity, the temperature and the frequency of the signal of a room;

whether the indoor temperature is constant or the indoor relative humidity is constant, the higher the signal frequency is, the larger the corresponding air attenuation coefficient is, the specific relation between the decay constant 4n absorbed by the air in the room and the relative humidity, the temperature and the frequency is obtained, the cut-off frequencies of the low-pass filters on the branches of the six comb filters in the space sound effect reverberation model are refined and are respectively set to be 500Hz, 1000Hz, 2000Hz, 4000Hz, 6300Hz and 8000Hz, under the condition that the relative humidity and the temperature of the room are known, the decay constants 4n corresponding to different frequencies of signals in the simulation environment are found based on the air absorption attenuation coefficient, then the reverberation time corresponding to each frequency band is obtained by substitution, the reverberation time of each frequency band is input into the corresponding low-pass filter, then the feedback gain coefficient of each comb filter is obtained according to the formula 1, and the formula 1 is:

wherein R is a delay time, R₆₀The delay time of the cascade all-pass filter is longer than that of the comb filter, namely reverberation time, the delay time of the cascade all-pass filter is 5ms and 1.7ms, and the corresponding attenuation coefficient is set to be 0.7, so that the reverberation simulation of a specific room is realized;

in an artificial three-dimensional reverberation model considering air attenuation, direct sound C is adjusted_DirThe DRR is controlled by the gain factor f, and the DRR reaching a listener is controlled by simulating reverberation in a reconstructed sound field through an artificial three-dimensional reverberation model considering air attenuation on the premise that the distance of a sound source to be restored is known, so that the restoration of the distance of the sound source is realized.

The three-dimensional vivid generating system of the sound source space sound effect of the voice, further, based on the impulse response model of the room of the copy: regarding a room with a sound source and a receiver as a system, wherein an input signal of the system is the sound source, an output signal of the system is a signal received by the receiver, when only one sound source and one receiver are considered in the room, the system is taken as a linear time-invariant system, a system-dependent transfer function is used for expressing the sound propagation process, and a signal received by a listener is equal to the convolution of a corresponding room impulse response and the system input signal;

the room impulse response model needs to construct a simulated sound field, analyze and reconstruct the direction, sound intensity and corresponding time delay of direct sound and all reflected sound reflected by the wall surface in the sound field, the room impulse response is based on geometric acoustics, according to the fact that the signal reaches the receiving end after being reflected by the wall surface, the wall surface can be used as a copy sound source to directly reach the receiving end, the sound source is regarded as a point, by constructing a room impulse response model, simulating reverberation in a room by a computer, reflecting sound waves to be specular reflection, and the energy of the sound wave is reduced after each reflection, the law that the sound energy is continuously reduced is regarded as a series of duplicate sound sources with gradually reduced signal amplitude, and then obtaining the impulse response of the room according to the amplitudes and the spatial positions of all the copy sources, and controlling the signal reflection order, the room size and the microphone direction by the constructed room impulse response.

Further, assuming that in a rectangular room, a process that a sound source reaches a listening position after being reflected by a wall is regarded as a filtering process of the room to a sound source signal, under the condition that background noise is not considered, a signal y (m) received by the listening position is expressed as:

where e (m) is the system unit impulse function, c (m) is the sound source signal, l (m) is the impulse response from the sound source to the listening position, at wm_wAnd has an attenuation coefficient of b_wThen, the sum of the unit impulse responses represents convolution operation, and the reverberation signal y (m) is the convolution of the sound source signal c (m) and the room impulse response l (m).

The transmission equation of a point sound source in a free field environment is expressed as:

q (f, Y') -exp [ jf (T/s-r) ]/4 pi T formula 3

Wherein Q represents sound pressure, f is 2 pi g is angular frequency, g is frequency, r is time, T is | Y-Y ' | represents distance between sound source and listening position, wherein Y is sound source position (x, Y, z), Y ' is listening position (x ', Y ', z '), s is sound propagation speed in air, when the reflecting wall surface is a rigid wall, the sound source reaches the listening position after being reflected by the wall surface, obtaining sound propagation equation of listening position, when the primary reflection model is expanded to the secondary reflection model, when there are duplicate sound sources on six walls in the room, the sound propagation equation at listening position is:

where a denotes the distance of the sound source or its replica from the listening position, h is a/s the corresponding delay time, Q is 8 possibilities of values from the set { (p, i, w): p, i, w ∈ (0, 1) }, for a given reflection order M, 8(2M +1)³Different paths are obtained, according to the reflection coefficients of the six walls, the impulse response of the room is obtained, and on the premise that the acoustic properties of the room are known, the impulse response obtained through computer simulation and the original signal of the sound source are subjected to convolution operation, so that the signal received at the listening position in the room can be analyzed.

A three-dimensional vivid generation system of a voice sound source space sound effect is further provided, and a multi-channel multi-dimensional room reverberation removal method comprises the following steps: the influence of the reflection of sound source signals in a room is removed by adopting a multi-channel multi-dimensional inverse filtering method, so that the signals received by a listener are consistent with the sound source signals at the input end of a loudspeaker, a sound channel system of the room is modeled and is regarded as a plurality of input and output channel systems;

the invention solves the problem of solving an inverse filter when the impulse response is non-minimum phase by adopting multi-channel multi-dimensional inverse filtering, F₁，F₂Respectively representing the impulse responses, L, of two sound sources at listening positions in the room₁，L₂Representing the inverse filter of the system using multi-channel multi-dimensional input-output, and as F₁，F₂When L is inputted₁，L₂The inverse filter of the system is obtained when the relation:

A＝1＝F₁L1+F₂L₂formula 5

A is (H +1) × L dimensional column vector, L is (j +1) × 1 dimensional column vector, F is Toeplitz matrix of room impulse response, and matrix size is (H +1) × (j +1), F₁，F₂，L₁，L₂After z transformation is a polynomial about z.

The three-dimensional lifelike generation system of the sound source space sound effect further needs to satisfy the following two conditions simultaneously when formula 5 has a solution:

condition one, F₁，F₂There is no common pole-zero in the z-plane;

condition two, L₁，L₂Respectively, of order of F₁，F₂The order of (A) is less;

the equation 5 is converted into a matrix formation,

is (j + i +2) × 1 column vector, [ F₁ F₂]Is (H +1) × (j + i +2) matrix, when [ F [)₁ F₂]When the coefficient i is m-1 and j is n-1, [ F ]₁ F₂]In the form of a square matrix, then:

when there are m +1 inputs, m outputs, where F_ji(j 1, 2.. times, m + 1; i 1, 2.. times, m) represents an impulse response between the jth input channel and the ith output channel, L_jiThe FIR filter representing the jth input channel, the inverse filter of the ith output channel is represented in matrix form as:

T_j＝FL_jformula 7

In the formula T_jIs m x 1 dimensional column vector, F ═ F₁ F₂...F_n+1]Is a matrix of n x (n +1) dimensions, L_jProcessing the signal by adopting a multi-channel multi-dimensional room reverberation removing method on the basis of obtaining a copy-based room impulse response model for (n +1) × 1 dimensional column vector to enable the artificial three-dimensional reverberation model considering air attenuationSignals of the type control containing sound source distance information carry correct sound source distance information after the signals reach human ears after being propagated in a reconstruction room.

The three-dimensional lifelike generation system of voice source space sound effect, further, the lifelike generation system of three-dimensional voice space sound effect: the invention improves the direction information of a restored sound source based on a VBAP technology, extracts the space direction information (x, y, z) of a sound source target object in a reconstructed sound field from a 3D video by combining an artificial three-dimensional reverberation model considering air attenuation and a listening position room reverberation removal method, inputs the sound source distance information z to be restored into the restored sound source distance model, calculates and obtains a direct mixing energy ratio required for restoring the distance through a distance perception model based on the direct mixing ratio, then simulates reverberation in a reconstruction environment according to the artificial three-dimensional reverberation model based on the air attenuation inconsistency of frequency spectrum, controls direct sound in the model to enable the output signal to contain the distance information z of the sound source, and aims to ensure that the signal reaching human ears after the signal containing the sound source distance information is transmitted in a reverberation room still contains accurate sound source distance information, the method comprises the steps of performing pre-filtering processing before playback equipment plays, selecting a loudspeaker group according to direction information (x, y) of a sound source, performing room reverberation removal processing on a listening position after the loudspeaker group is determined, enabling signals heard at the listening position to be consistent with input signals of the loudspeaker group, completing restoration of a sound source distance, then performing gain factor distribution on the signals at the output end of a loudspeaker according to the direction information (x, y) of the sound source by adopting a VBAP technology, completing restoration of the sound source direction, receiving the signals carrying the three-dimensional space information (x, y, z) of the sound source at the listening position, and realizing restoration of the three-dimensional space information of the sound source.

The three-dimensional lifelike generation system of pronunciation sound source space audio, further, the reduction sound source distance module: the sound source distance restoring module adopts DRR as a main factor of distance restoration, obtains the DRR corresponding to the restored distance through calculation of a distance perception model on the premise of knowing sound source distance information, and then restores and controls the DRR of the signal through the artificial three-dimensional reverberation model based on the air attenuation inconsistency of the frequency spectrum, so that the signal after reverberation is added contains the sound source distance information, and is realized through the artificial three-dimensional reverberation model considering the air attenuation.

The three-dimensional lifelike generation system of voice sound source space sound effect, further, remove room reverberation module: the method comprises the steps of simulating the propagation process of sound in a room through a room impulse response model based on a copy, constructing the room impulse response model, analyzing the characteristics of listening position signals, and then obtaining an inverse filter of a loudspeaker system through a multi-channel multi-dimensional room reverberation removing method on the basis of obtaining the room impulse response, so that the sound signals received by a listener are consistent with the voice signals originally played by the loudspeaker.

Compared with the prior art, the invention has the following contributions and innovation points:

firstly, most of the current home theater environments have reverberation, and a television screen is usually positioned in a front area of a viewer, so that the invention is mainly developed for reconstructing a sound image at a sound source three-dimensional space position in front of the viewer, analyzing the characteristics of an audio signal at a listening position by combining the environment characteristics of a reconstructed room, a loudspeaker position vector and the listening position vector, and simulating 4 loudspeakers around the television screen to restore the sound source three-dimensional space information in a room with the reverberation in a verification experiment, wherein firstly, the room characteristics in a reconstructed sound field and factors influencing human ear distance perception in the reverberation environment are analyzed, and an artificial reverberation model is constructed to add the reverberation of the reconstructed room to an original voice signal to control the signal reaching the human ear of the listener so that the signal received by the human ear contains sound source distance information; secondly, analyzing the propagation characteristics of the audio signals in a reverberation environment, constructing a room impulse response model to obtain the characteristics of the audio signals at the listening position, and developing a room reverberation removal algorithm to remove the influence of reverberation of the voice signals in the propagation process so that the signals reaching the ears of the listener are consistent with the original signals played by the loudspeaker; thirdly, on the basis of restoring a sound source distance model and removing room reverberation, a VBAP technology is improved, and a 3D voice spatial sound effect generation system is analyzed; fourthly, relevant subjective test experiments are respectively carried out on the room reverberation removal, the sound source restoring distance, the sound source restoring direction and the sound source restoring three-dimensional space information, and results show that the direct mixing ratio of signals at a listener is controlled by adopting an artificial reverberation model on the basis of the room reverberation removal, the sound source distance can be effectively restored, and the sound source restoring method can better restore the sound source three-dimensional space information by combining with the VBAP sound source restoring direction, and has the highest sound image restoring accuracy in the central point direction of an area surrounded by the loudspeaker group;

secondly, the prior art lacks of restoring the sound image distance in a non-free field, and does not provide an effective method for effectively controlling the energy ratio of direct sound and reverberation sound, the invention analyzes and obtains a model for restoring the sound source distance from two aspects, on one hand, factors influencing the perception of the sound source distance of a listener are analyzed, so that in an environment with reverberation, reverberation clues can provide accurate sound source distance positioning for the listener relative to other clues, and the relationship between the sound source distance and DRR is provided on the premise of known room attributes; on the other hand, based on the characteristics of the indoor sound field, the realization mechanism of simulating the room reverberation by the artificial reverberation model is provided, the space sound effect reverberation model is improved, the air attenuation inconsistency corresponding to different frequencies of signals is considered, so that the artificial three-dimensional reverberation model considering the air attenuation can more accurately simulate the reverberation of a reconstructed environment, DRRs reaching the human ears can be effectively controlled through the model, the effectiveness of the artificial three-dimensional reverberation model considering air attenuation in simulating room reverberation and the necessity of considering air absorption attenuation are verified through matlab simulation experiments, the artificial three-dimensional reverberation model considering air attenuation is adopted to realize the simulation of the room reverberation of a listener, in the model, direct sound is achieved through a control signal, so that the DRR reaching the human ear is matched with the sound source distance perception model, and the recovery of sound source distance information is completed;

thirdly, the room impulse response model based on the copy is adopted to simulate the room impulse response, so that the accuracy is better, and the impulse response obtained by computer simulation and the original signal of the sound source are subjected to convolution operation on the premise that the acoustic property of the room is known, so that the signal received at the listening position in the room can be obtained through analysis; in order to remove the influence of the signal carrying sound source distance information on the propagation of a reverberant room and control the signal so as to restore the direction of a sound source, the invention adopts a multi-channel multi-dimensional inverse filtering method to remove the influence of the reflection of the sound source signal in the room, so that the signal received by a listener is consistent with the sound source signal at the input end of a loudspeaker, a sound channel system of the room is modeled and is regarded as a system with a plurality of input and output channels, and the method has the advantages that the calculated inverse filter has limited length and is causality;

fourthly, the present invention proposes, for the first time, to restore a sound source distance model in a reverberant environment, and in the model, to combine an artificial three-dimensional reverberation model in which room reverberation is removed and air attenuation is considered, the improved reverberation model based on the air attenuation inconsistency of the frequency spectrum can more accurately simulate the reverberation effect in a specific room by improving the spatial sound effect reverberation model in the artificial three-dimensional reverberation model, thereby realizing more accurate and comprehensive control signal DRR to recover the sound source distance, and further combining VBAP technology, the method and the device can realize the vivid restoration of the three-dimensional space direction of the sound source, and realize the consistency of the subjective auditory perception obtained by the voice sound source of the 3D video target object in the reconstruction of the three-dimensional sound field and the subjective visual perception obtained in the reconstruction of the 3D video scene.

Drawings

FIG. 1 is a graph of relative humidity versus acoustic attenuation at 0 ℃ and normal atmospheric pressure between 1000 and 8000 Hz.

FIG. 2 is a graph of relative humidity 20% versus temperature for acoustic attenuation in the range of 1000 to 8000Hz at normal atmospheric pressure.

Fig. 3 is a block diagram of an artificial three-dimensional reverberation model of the present invention considering air attenuation.

Fig. 4 is a schematic diagram of the room acoustic environment of the present invention.

Fig. 5 is a schematic diagram of an acquisition inverse filter using multi-channel multi-dimensional inverse filtering.

Fig. 6 is a block diagram of a system for multi-channel multi-dimensional room reverberation removal.

FIG. 7 is an overall structure diagram of the three-dimensional realistic sound generation system of the voice sound source space sound effect of the present invention.

Detailed Description

The technical scheme of the system for generating three-dimensional realistic sound effect of a voice sound source according to the present invention will be further described with reference to the accompanying drawings, so that those skilled in the art can better understand the present invention and can implement the same.

The rapid development of the 3D film and television industry and the initiation of the three-dimensional audio standardization process by the motion picture experts group have brought about a great deal of attention in the field of three-dimensional audio technology. The real complete spatial perception of audiovisual events by listeners is severely compromised by the lack of three-dimensional spatial information representation of sound sources in two-dimensional audio systems developed based on traditional stereo or surround sound. Therefore, the three-dimensional audio processing technology based on the same audio-visual perception becomes an important breakthrough direction for realizing the consistent expression and reconstruction of the three-dimensional audio-video information.

The three-dimensional audio is divided into physical sound field-based and perception-based reconstruction technologies, wherein the physical sound field reconstruction technology comprises Ambisonics and wave field synthesis, and the perception-based sound field reconstruction technology comprises an amplitude translation technology and a head-related transfer function reconstruction technology. Both Ambisonics and WFS technologies require a large number of speakers and are severely limited with respect to the arrangement between speakers. Compared with a reconstruction technology based on a physical sound field, the demand of the reconstruction technology based on the perception for the number of the loudspeakers is obviously reduced, wherein the VBAP technology can better recover the sound direction, but a sound image formed by the technology is only positioned on the spherical surface where the loudspeakers are positioned, a listener cannot perceive sound source distance information outside the spherical surface, and the HRTF mainly adopts earphones for playback and is closely related to individual differences of people.

Therefore, when a loudspeaker is used as playback equipment, the method takes the objective that the subjective auditory perception obtained by a voice sound source in a three-dimensional video in a reconstructed three-dimensional sound field is consistent with the subjective visual perception obtained in a reconstructed 3D video scene, constructs an artificial three-dimensional reverberation model considering air attenuation based on the perception factor influencing the distance between human ears under the conditions of known sound source space information, sound source signals and reconstruction environment, corrects and controls the sound source signals, adds the sound source distance information to the sound source signals, obtains inverse filtering for removing room reverberation at a listening position based on the propagation characteristics of the sound signals in the reverberation environment, removes the influence of the signals played by the loudspeaker in the reverberation environment, ensures the consistency of the input signals of the loudspeaker and the signals received by a listener, and completes the distance recovery of the sound source; and finally, distributing gain factors to the loudspeaker group signals to finish the recovery of the sound source direction and further finish the recovery of the three-dimensional space azimuth information of the sound source. The invention carries out relevant subjective test experiments on the room reverberation removal, the sound source restoring distance, the sound source restoring direction and the sound source restoring three-dimensional space information respectively, and the result shows that the direct mixing ratio of the signal of a listener is controlled by adopting an artificial reverberation model on the basis of the room reverberation removal to effectively restore the sound source distance, and the sound source space sound effect three-dimensional realistic generation system can better restore the sound source three-dimensional space information by combining with the VBAP sound source restoring direction, and has the highest sound image restoring accuracy on the central point direction of the area surrounded by the loudspeaker group.

First, consider artificial three-dimensional reverberation model of air attenuation

The listener's perception of sound source spatial information in a three-dimensional sound field mainly comprises a sound source horizontal angle, an elevation angle and a distance, the horizontal angle is mainly positioned according to binaural time difference and binaural intensity difference, and the elevation angle is mainly positioned according to the minimum audible angle of the listener at different heights to obtain the perception of the ears at the elevation angle. The judgment of the human ear on the sound source perception distance is obtained by joint processing of incomplete information provided by one or more distance clues. In a natural listening environment, human ears are mainly positioned according to intensity, reverberation and binaural cues, wherein the intensity cue is a relative cue for judging the distance of the sound source, the reverberation cue is an absolute cue for judging the distance of the sound source, and factors influencing distance perception mainly comprise sound intensity, direct and reverberation sound energy ratio DDR, frequency spectrum and binaural difference.

The artificial reverberation is to artificially process the effect of the original sound signal by a computer and other instruments, so that a listener can perceive that the sound signal has the reverberation effect, and the listener can listen to the sound more truly and naturally.

The spatial sound effect reverberation model mainly comprises two parts, wherein the first part comprises a 19-order FIR filter which simulates the attenuation of direct sound and early reflected sound of a signal, and the second part comprises six comb filters s 1-s 6 connected in parallel and an all-pass filter D1 connected in series, wherein the six comb filters connected in parallel are responsible for reverberation processing on different frequencies of the signal.

Although the sound processed by the spatial sound effect reverberation model sounds more real and natural and can effectively reduce the metal sounds and the clapping sounds, if the reverberation effect in a specific room is to be simulated more accurately and comprehensively, the spatial sound effect reverberation model still has the following defects: firstly, when a spatial sound effect reverberation model simulates reverberation, the delay parameter and the attenuation coefficient of a 19-order FIR filter are still parameters when a simulated Boston symphony hall is adopted, and no relevant parameter is set for a specific occasion; secondly, feedback gain coefficients of the comb filter are calculated according to the indoor average reverberation time, however, as the energy absorption of the air to the sound is influenced by the signal frequency, the energy attenuation degrees of the sounds in different frequency sections in different environments are different, namely, the corresponding reverberation times are different, especially for high-frequency signals, although the model can change the reverberation time of the high-frequency part of the sound by adding a low-pass filter on a feedback branch, the reverberation time is only roughly controlled, and the reverberation time of the signals with different frequencies cannot be accurately reflected.

Based on the defects of the spatial sound effect reverberation model, in order to enable the artificial reverberation model to more accurately and comprehensively simulate the reverberation in a specific environment, the invention further improves the spatial sound effect reverberation model by the following two points:

the second improvement is that: the low pass filter on the feedback branch in each comb filter in the spatial sound effect reverberation model is arranged to process different sound frequency segments, and when the absorption of air to the high frequency components of the sound signal is considered, the reverberation time is related to the sound absorption coefficient of air, and is related to the humidity, the temperature and the frequency of the signal of the room, as shown in fig. 1 and fig. 2.

It can be known from fig. 1 and fig. 2 that no matter when the indoor temperature is constant or the indoor relative humidity is constant, the higher the signal frequency is, the larger the corresponding air attenuation coefficient is, based on the specific relationship between the decay constant 4n absorbed by the air in the room and the relative humidity, the temperature and the frequency obtained by the trial and error, in order to simulate the reverberation of the reconstructed room more vividly, the cut-off frequencies of the low-pass filters on the branches of the six comb filters in the spatial sound effect reverberation model are refined and are respectively set to be 500Hz, 1000Hz, 2000Hz, 4000Hz, 6300Hz and 8000Hz, under the condition of the known room relative humidity and temperature, the decay constant 4n corresponding to different frequencies of the signal under the simulated environment is found based on the air absorption attenuation coefficient, then the reverberation time corresponding to each frequency band is obtained by substitution, the reverberation time of each frequency band is input into the corresponding low-pass filter, then, the feedback gain coefficient of each comb filter is obtained according to formula 1, where formula 1 is:

wherein R is a delay time, R₆₀For reverberation time, the delay time of the cascaded all-pass filter is longer than that of the comb filter, 5ms and 1.7ms are taken, and the corresponding attenuation coefficient is set to be 0.7, so that the simulation of reverberation of a specific room is realized.

In an artificial three-dimensional reverberation model considering air attenuation, direct sound C is adjusted_DirThe DRR can be controlled by the gain factor f, and the DRR reaching a listener is controlled by simulating reverberation in a reconstructed sound field by considering an artificial three-dimensional reverberation model of air attenuation on the premise of knowing the distance of the sound source to be recovered, so that the recovery of the distance of the sound source is realized.

Method for removing room reverberation at listening position

(one) copy-based room impulse response model

Considering a room with a sound source and a receiver as a system, the input signal of the system is the sound source, the output signal of the system is the signal received by the receiver, considering only one sound source and one receiver in the room, the system is taken as a linear time-invariant system, the propagation process of the sound is expressed by a system-dependent transfer function, and the signal received by the listener is equal to the convolution of the impulse response of the corresponding room and the input signal of the system.

The room impulse response model needs to construct a simulated sound field, analyze and reconstruct the direction, sound intensity and corresponding time delay of direct sound and all reflected sound reflected by the wall surface in the sound field. According to geometric acoustics, a signal reaches a receiving end after being reflected by a wall surface and then can be used as a copy sound source of the wall surface to directly reach the receiving end, the sound source is regarded as a point, reverberation in the room is simulated by a computer through constructing a room impulse response model, the reflected sound wave is reflected in a mirror surface mode, the energy of the sound wave after each reflection is reduced, the law that the sound energy is continuously reduced is regarded as a copy sound source with a series of gradually reduced signal amplitudes, then the impulse response of the room is obtained according to the amplitude and the spatial position of all the copy sources, and the constructed room impulse response controls the signal reflection order, the room size and the microphone direction.

Fig. 4 is a schematic diagram of an actual acoustic environment, and assuming that in a rectangular room, a process in which a sound source reaches a listening position after being reflected by a wall is regarded as a filtering process of the room for a sound source signal, a signal y (m) received by the listening position without considering background noise is represented as:

q (f, Y') -exp [ jf (T/s-r) ]/4 pi T formula 3

where a denotes the distance of the sound source or its replica from the listening position, h is a/s the corresponding delay time, Q is 8 possibilities from the set { (P, i, w): P, i, w e (0, 1) }, for a given reflection order M, 8(2M +1)³And obtaining the impulse response of the room according to the reflection coefficients of the six walls by different paths.

According to the simulation result, the room impulse response model based on the copy is adopted to simulate the room impulse response, so that the room impulse response has better accuracy, and the impulse response obtained by computer simulation and the original signal of the sound source are subjected to convolution operation on the premise that the acoustic property of the room is known, so that the signal received at the listening position in the room can be analyzed.

Method for removing room reverberation in multi-channel and multi-dimensional way

Suitable reverberation can make the sound heard by the listener more intelligible and brighter, but if the reverberation is particularly large, it can give the listener a bad feeling.

In order to remove the influence of the signal carrying sound source distance information on the propagation of a reverberant room and control the signal so as to restore the direction of a sound source, the invention adopts a multi-channel multi-dimensional inverse filtering method to remove the influence of the reflection of the sound source signal in the room, so that the signal received by a listener is consistent with the sound source signal at the input end of a loudspeaker, a sound channel system of the room is modeled and is regarded as a system of a plurality of input and output channels, and the method has the advantages that the calculated inverse filter has limited length and is causality.

The invention adopts multi-channel multi-dimensional inverse filtering to solve the problem of solving the inverse filter when the impulse response is non-minimum phase, and the reverberation removal model is shown as figure 5, wherein F is shown in the figure₁，F₂Respectively representing the impulse responses, L, of two sound sources at listening positions in the room₁，L₂Representing the inverse filter of the system using multi-channel multi-dimensional input-output, and as F₁，F₂When L is inputted₁，L₂The inverse filter of the system is obtained when the relation:

A＝1＝F₁L₁+F₂L₂formula 5

A is (H +1) × L dimensional column vector, L is (j +1) × 1 dimensional column vector, F is Toeplitz matrix of room impulse response, and matrix size is (H +1) × (j +1), F₁，F₂，L₁，L₂After z transformation is a polynomial about z, the following two conditions must be satisfied when equation 5 has a solution:

condition one, F₁，F₂There is no common pole-zero in the z-plane;

condition two, L₁，L₂Respectively of order ofRatio F₁，F₂The order of (A) is less;

the equation 5 is converted into a matrix formation,

the block diagram of the system for reverberation removal when there are m +1 inputs, m outputs is shown in fig. 6, where F_ji(j 1, 2.. times, m + 1; i 1, 2.. times, m) represents an impulse response between the jth input channel and the ith output channel, L_jiThe FIR filter representing the jth input channel, the inverse filter of the ith output channel is represented in matrix form as:

T_j＝FL_jformula 7

In the formula T_jIs m x 1 dimensional column vector, F ═ F₁ F₂...F_n+1]Is a matrix of n x (n +1) dimensions, L_jAnd (n +1) x 1 dimensional column vectors are obtained, and on the basis of obtaining a copy-based room impulse response model, a multi-channel multi-dimensional room reverberation removing method is adopted to process signals, so that signals which are controlled by an artificial three-dimensional reverberation model considering air attenuation and contain sound source distance information carry correct sound source distance information after the signals reach human ears after being transmitted in a reconstructed room. Three-dimensional voice space sound effect vivid generation system

Generation of three-dimensional speech space sound effect

Although the prior VBAP technology can reconstruct the direction information of an original sound source, the method lacks the perception of the distance of a sound image, and the perception of the distance of a listener seriously influences the overall perception of the listener watching a 3D video, therefore, the invention improves the direction information of a restored sound source based on the VBAP technology, provides a block diagram of the overall structure of a voice sound source space sound effect three-dimensional realistic generation system by combining an artificial three-dimensional reverberation model considering air attenuation and a listening position room reverberation removal method, as shown in figure 7, extracts the space direction information (x, y, z) of a sound source target object in a reconstructed sound field from a 3D video, inputs the sound source distance information z to be restored into the restored sound source distance model, calculates and obtains a direct mixing energy ratio required for restoring the distance by a distance perception model based on the direct mixing ratio, and simulates reverberation under a reconstruction environment according to the artificial three-dimensional reverberation model based on the inconsistency of air attenuation of frequency spectrums provided by the invention, in the model, through controlling the direct sound, the output signal contains the distance information z of the sound source, in order to ensure that the signal containing the sound source distance information still contains accurate sound source distance information after the signal propagates in a reverberant room and reaches the ear of a person, the pre-filtering processing is carried out before the playback equipment plays, therefore, firstly, a loudspeaker group is selected according to the direction information (x, y) of the sound source, the room reverberation processing is removed after the loudspeaker group is determined, the signal heard at the listening position is consistent with the input signal of the loudspeaker group, the recovery of the sound source distance is completed, then, the gain factor is distributed to the signal according to the direction information (x, y) of the sound source at the output end of the loudspeaker by adopting a VBAP technology, the recovery of the sound source direction is completed, and thus, the three-dimensional space information (x, y) carrying the sound source is received at the listening position, z) to realize the recovery of the three-dimensional space information of the sound source, and the voice sound source space sound effect three-dimensional vivid generation system also comprises a sound source distance restoring module, a room reverberation removing module and a sound source direction restoring module.

(II) reducing sound source distance module

In an environment with reverberation, as the distance between a sound source and a listener increases, the intensity of the sound reaching the listener and DRR decreases, but the intensity cues of the sound can only provide information about the relative distance of the sound source, in the presence of reverberation, the DRR including intensity information can provide more accurate sound source distance information to the listener than other distance perception cues, the restoring sound source distance module adopts the DRR as a main factor for distance recovery, on the premise of knowing the distance information of the sound source, the DRR corresponding to the recovered distance is obtained by calculation of a distance perception model, and then, the DRR of the signal is repaired and controlled through the artificial three-dimensional reverberation model based on the air attenuation inconsistency of the frequency spectrum, so that the signal added with reverberation contains sound source distance information, and the artificial three-dimensional reverberation model is realized by considering the air attenuation.

(III) remove the room reverberation module

In order to enable a signal carrying sound source distance information to still carry accurate distance information when reaching a listener after being transmitted in a reverberant environment, a room reverberation removal module simulates the transmission process of sound in a room through a room impulse response model based on a copy, a room impulse response model is built, the characteristics of listening position signals are analyzed, then an inverse filter of a loudspeaker system is obtained by adopting a multi-channel multi-dimensional room reverberation removal method on the basis of obtaining the room impulse response, and the sound signal received by the listener is enabled to be consistent with a voice signal originally played by a loudspeaker.

(IV) module for restoring sound source direction

The VBAP technology can only reconstruct the direction information of the original sound source more accurately and lacks the perception of the sound image distance, so the invention adopts the VBAP technology to realize the recovery of the speaker configuration and the sound source direction on the basis of the proposed distance recovery model, the VBAP technology distributes sound signals to a plurality of speakers through the adjustment of gain factors, so that the virtual sound image felt by a user is not limited to the positions of the speakers but can move on a plane formed by the speakers, two free dimensions of the horizontal angle and the height angle of the sound image are expressed, and the virtual sound image is on an arc line connected with the speakers through the adjustment of the gain factors of the speakers, and the specific location of the virtual sound image is related to the location vector of the loudspeakers, the location vector of the listener and the amplitudes of the loudspeakers.

And the sound source direction restoring module calculates gain factors of all the loudspeakers based on a VBAP technology according to the direction (x, y) to be restored and the position vectors of the loudspeakers, then distributes the gain factors to a filter subjected to reverberation elimination, and finally carries out playback through the loudspeakers, so that the simultaneous restoration of the sound source direction and the distance is realized.

Fourthly, summary of the invention

3D video has brought very good visual enjoyment to people, and it follows that viewers have continuously improved requirements for audio-visual experience, so 3D multimedia technology, especially three-dimensional audio technology that cannot get consistent experience with 3D video visual perception, is a direction in which breakthrough is urgently needed, three-dimensional audio enables a reconstructed audio image to have information with three degrees of freedom including horizontal angle, elevation angle and distance, while an audio image reconstructed by a current stereo system or surround sound system only has freedom in horizontal direction, even though newly developed VBAP technology can restore horizontal information and elevation information of an audio image, three-dimensional spatial information expression of a sound object is still lacked, spatial information of a sound object cannot be completely expressed in a 3D audio-visual system, complete and realistic spatial perception of an audio-visual event by listeners is seriously impaired, and a three-dimensional audio processing technology based on the same audio-visual perception becomes a key technology for realizing consistent expression and reconstruction of 3D audio-visual information The method aims at the condition that the subjective auditory perception obtained by a voice sound source of a 3D video target object in a reconstructed three-dimensional sound field is consistent with the subjective visual perception obtained in a reconstructed 3D video scene, restores the distance of the sound source in the environment with reverberation on the premise of improving the VBAP technology restoring direction under the conditions of known sound source space information, sound source signals and reconstruction environment, provides a model for reconstructing the distance of the sound source in the reverberation environment, and further provides a sound effect three-dimensional vivid generation system under the reverberation condition.

Aiming at the problem that the current VBAP technology can restore the sound source direction but cannot restore the sound source distance, the invention provides a sound source sound image distance restoration model, the sound source sound image distance restoration model is combined with the sound source direction of the VBAP technology to restore the three-dimensional space information of the sound source, based on factors influencing the perception of the sound source distance by human ears, strength clues can only provide the relative distance of the sound source for listeners, and DRR containing signal strength information can provide the absolute distance of the sound source. In order to remove the influence of the signal carrying the sound source distance information in the reverberation room, a room impulse response model is simulated by analyzing the characteristics of an indoor sound field and the characteristics of the audio signal in the indoor sound field, room reverberation is removed by adopting a multi-channel multi-dimensional room reverberation removal method on the basis of the room impulse response model construction, so that the signal DRR reaching the ears of a listener is consistent with the DRR controlled by an artificial reverberation model, finally, the recovery of the sound source direction is realized by a VBAP technology, the signal carrying the sound source space information is played by a loudspeaker to reach the listener, and the recovery of the three-dimensional space information of the sound source is realized. The voice sound source space sound effect three-dimensional realistic generation system provided by the invention can realize the restoration of the sound source three-dimensional space direction and has the best restoration effect on the sound source three-dimensional space information in the central point direction of the area surrounded by the loudspeaker group.

Claims

1. Three-dimensional lifelike generation system of pronunciation sound source space sound effect, its characterized in that includes: the method comprises the following steps of considering an artificial three-dimensional reverberation model of air attenuation, a listening position room reverberation removal method and a three-dimensional voice space sound effect lifelike generation system, wherein the listening position room reverberation removal method specifically comprises a room impulse response model based on a copy and a multi-channel multi-dimensional room reverberation removal method;

2. The system for generating three-dimensional realistic sound effect of spatial sound source of voice according to claim 1, characterized in that the artificial three-dimensional reverberation model considering the air attenuation is: the spatial sound effect reverberation model mainly comprises two parts, wherein the first part comprises a 19-order FIR filter which simulates the attenuation of direct sound and early reflected sound of a signal, and the second part comprises six comb filters s 1-s 6 connected in parallel and an all-pass filter D1 connected in series, wherein the six comb filters connected in parallel are responsible for reverberation processing on different frequencies of the signal.

3. The system for generating three-dimensional realistic sound effect of the spatial sound source according to the claim 2 is characterized in that the invention further improves the spatial sound effect reverberation model by the following two points:

4. The system for generating three-dimensional realistic sound source spatial sound effect according to claim 1, wherein the system is based on a replica room impulse response model: regarding a room with a sound source and a receiver as a system, wherein an input signal of the system is the sound source, an output signal of the system is a signal received by the receiver, when only one sound source and one receiver are considered in the room, the system is taken as a linear time-invariant system, a system-dependent transfer function is used for expressing the sound propagation process, and a signal received by a listener is equal to the convolution of a corresponding room impulse response and the system input signal;

5. The system for generating three-dimensional realistic sound effect of spatial sound source of voice sound source according to claim 4, wherein assuming that the process of sound source reflected by the wall and reaching the listening position in the rectangular room is regarded as the filtering process of the room to the sound source signal, the signal y (m) received by the listening position without considering the background noise is expressed as:

where e (m) is the system unit impulse function, c (m) is the sound source signal, l (m) is the impulse response from the sound source to the listening position, at wm_wAnd has an attenuation coefficient of b_wThen, the sum of the unit impulse responses represents convolution operation, and the reverberation signal y (m) is the convolution of the sound source signal c (m) and the room impulse response l (m);

q (f, Y') -exp [ jf (T/s-r) ]/4 pi T formula 3

where a denotes the distance of the sound source or its counterpart from the listening position, h { (p, i, w) }: p, i, w ∈ (0, 1) } with 8(2M +1) for a given reflection order M³Different paths are obtained, according to the reflection coefficients of the six walls, the impulse response of the room is obtained, and on the premise that the acoustic properties of the room are known, the impulse response obtained through computer simulation and the original signal of the sound source are subjected to convolution operation, so that the signal received at the listening position in the room can be analyzed.

6. The system for generating sound source space audio effect three-dimensional realistic sound according to claim 1, characterized in that the multi-channel multi-dimensional room reverberation removing method comprises: the influence of the reflection of sound source signals in a room is removed by adopting a multi-channel multi-dimensional inverse filtering method, so that the signals received by a listener are consistent with the sound source signals at the input end of a loudspeaker, a sound channel system of the room is modeled and is regarded as a plurality of input and output channel systems;

A＝1＝F₁L₁+F₂L₂formula 5

7. The system for generating three-dimensional realistic sound source spatial sound effect according to claim 6, wherein the following two conditions are satisfied when the solution of equation 5 is available:

condition one, F₁，F₂There is no common pole-zero in the z-plane;

the equation 5 is converted into a matrix formation,

is (j + i +2) × 1 column vector, [ F₁F₂]Is (H +1) × (j + i +2) matrix, when [ F [)₁ F₂]When the coefficient i is m-1 and j is n-1, [ F ]₁ F₂]In the form of a square matrix, then:

T_j＝FL_jformula 7

In the formula T_jIs m x 1 dimensional column vector, F ═ F₁ F₂...F_n+1]Is a matrix of n x (n +1) dimensions, L_jAnd (n +1) x 1 dimensional column vectors are obtained, and on the basis of obtaining a copy-based room impulse response model, a multi-channel multi-dimensional room reverberation removing method is adopted to process signals, so that signals which are controlled by an artificial three-dimensional reverberation model considering air attenuation and contain sound source distance information carry correct sound source distance information after the signals reach human ears after being transmitted in a reconstructed room.

8. The system for generating sound source space sound effect three-dimensional reality according to claim 1, wherein the system for generating sound source space sound effect three-dimensional reality comprises: the invention improves the direction information of a restored sound source based on a VBAP technology, extracts the space direction information (x, y, z) of a sound source target object in a reconstructed sound field from a 3D video by combining an artificial three-dimensional reverberation model considering air attenuation and a listening position room reverberation removal method, inputs the sound source distance information z to be restored into the restored sound source distance model, calculates and obtains a direct mixing energy ratio required for restoring the distance through a distance perception model based on the direct mixing ratio, then simulates reverberation in a reconstruction environment according to the artificial three-dimensional reverberation model based on the air attenuation inconsistency of frequency spectrum, controls direct sound in the model to enable the output signal to contain the distance information z of the sound source, and aims to ensure that the signal reaching human ears after the signal containing the sound source distance information is transmitted in a reverberation room still contains accurate sound source distance information, the method comprises the steps of performing pre-filtering processing before playback equipment plays, selecting a loudspeaker group according to direction information (x, y) of a sound source, performing room reverberation removal processing on a listening position after the loudspeaker group is determined, enabling signals heard at the listening position to be consistent with input signals of the loudspeaker group, completing restoration of a sound source distance, then performing gain factor distribution on the signals at the output end of a loudspeaker according to the direction information (x, y) of the sound source by adopting a VBAP technology, completing restoration of the sound source direction, receiving the signals carrying the three-dimensional space information (x, y, z) of the sound source at the listening position, and realizing restoration of the three-dimensional space information of the sound source.

9. The system for generating three-dimensional realistic sound effect of the spatial sound source of the voice sound source according to claim 8, wherein the sound source distance restoring module: the sound source distance restoring module adopts DRR as a main factor of distance restoration, obtains the DRR corresponding to the restored distance through calculation of a distance perception model on the premise of knowing sound source distance information, and then restores and controls the DRR of the signal through the artificial three-dimensional reverberation model based on the air attenuation inconsistency of the frequency spectrum, so that the signal after reverberation is added contains the sound source distance information, and is realized through the artificial three-dimensional reverberation model considering the air attenuation.

10. The system for generating three-dimensional realistic sound effect of voice sound source according to claim 8, wherein the room reverberation module is removed: the method comprises the steps of simulating the propagation process of sound in a room through a room impulse response model based on a copy, constructing the room impulse response model, analyzing the characteristics of listening position signals, and then obtaining an inverse filter of a loudspeaker system through a multi-channel multi-dimensional room reverberation removing method on the basis of obtaining the room impulse response, so that the sound signals received by a listener are consistent with the voice signals originally played by the loudspeaker.