WO2022228220A1 - Method and device for processing chorus audio, and storage medium - Google Patents

Method and device for processing chorus audio, and storage medium Download PDF

Info

Publication number
WO2022228220A1
WO2022228220A1 PCT/CN2022/087784 CN2022087784W WO2022228220A1 WO 2022228220 A1 WO2022228220 A1 WO 2022228220A1 CN 2022087784 W CN2022087784 W CN 2022087784W WO 2022228220 A1 WO2022228220 A1 WO 2022228220A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
dry
chorus
processing
virtual sound
Prior art date
Application number
PCT/CN2022/087784
Other languages
French (fr)
Chinese (zh)
Inventor
张超鹏
陈灏
武文昊
罗辉
李革委
姜涛
胡鹏
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Publication of WO2022228220A1 publication Critical patent/WO2022228220A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/13Aspects of volume control, not necessarily automatic, in stereophonic sound systems

Definitions

  • the present application relates to the technical field of computer applications, and in particular, to a method, device and storage medium for processing chorus audio.
  • the purpose of the present application is to provide a chorus audio processing method, device and storage medium, so as to avoid the head effect caused by the sound field gathering in the center of the human head, so that the sound field is wider and the listening experience is improved.
  • a method for processing chorus audio comprising:
  • the virtual audio-visual coordinate system is centered on the human head, with the midpoint of the straight line where the left and right ears are located as the coordinate origin, the positive direction of the first coordinate axis represents the front of the human head, and the positive direction of the second coordinate axis Represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is within the set distance range.
  • the pitch angle of the sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range;
  • performing time alignment processing on the plurality of obtained dry audio frequencies includes:
  • time alignment processing is performed on the current dry audio audio.
  • Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data
  • the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
  • a chorus audio is generated based on the plurality of the dry audio frequencies and the plurality of the bass data subjected to virtual sound image localization.
  • the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
  • a chorus audio is generated based on the plurality of the dry audio frequencies subjected to virtual sound image localization and the plurality of the dry audio frequencies subjected to reverberation simulation processing.
  • performing reverberation simulation processing on a plurality of the obtained dry audio frequencies respectively includes:
  • Reverberation simulation processing is respectively performed on the obtained plurality of the dry audio frequencies by using the cascade of comb filters and all-pass filters.
  • the method further includes:
  • the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
  • a chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
  • the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
  • a chorus audio is generated based on the plurality of the dry audio audio after virtual sound image localization and the plurality of the dry audio audio after binaural simulation processing.
  • the method further includes:
  • the generation of chorus audio based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing includes:
  • the chorus audio is generated based on the plurality of the dry audio audios after virtual sound image localization, the plurality of the dry audio audios after the binaural simulation processing and the reverberation simulation processing.
  • performing virtual sound image localization on a plurality of the dry audio audio after time alignment processing includes:
  • the obtained dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
  • Each group of dry audio audio is located on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
  • the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis. the elevation angle of the plane formed by the coordinate axis and the second coordinate axis;
  • Each of the virtual sound images is evenly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
  • the synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment includes:
  • the lead vocal audio, the chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized.
  • a processing device for chorus audio comprising:
  • the dry sound audio obtaining module is used to obtain the dry sound audio of the same target song performed by multiple singers respectively;
  • an alignment processing module configured to perform time alignment processing on a plurality of the obtained dry audio frequencies
  • a virtual sound image localization module configured to perform virtual sound image localization on a plurality of the dry sound audio after time alignment processing, so as to locate a plurality of the dry sound audio on a plurality of virtual sound images;
  • the virtual sound image is located in a pre-established virtual sound image coordinate system, the virtual sound image coordinate system is centered on the human head, the midpoint of the straight line where the left and right ears are located is the coordinate origin, and the positive direction of the first coordinate axis represents the front of the human head.
  • the positive direction of the second coordinate axis represents the side of the human head from the left ear to the right ear
  • the positive direction of the third coordinate axis represents directly above the human head
  • the distance between each virtual sound image and the coordinate origin is within the set distance Within the range
  • the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range
  • a chorus audio generation module for generating chorus audio based on a plurality of the dry audio frequencies after performing virtual sound image localization
  • the chorus effect audio output module is configured to output the chorus effect audio after synthesizing the lead singer audio, the chorus audio and the corresponding accompaniment under the condition of obtaining the lead vocal audio sung based on the target song.
  • a chorus audio processing device comprising:
  • the processor is configured to implement the steps of the chorus audio processing method described in any one of the above when executing the computer program.
  • a computer-readable storage medium storing a computer program on the computer-readable storage medium, when the computer program is executed by a processor, implements the steps of any of the above-mentioned chorus audio processing methods.
  • time alignment processing is performed on the obtained plurality of dry voice audio, and the aligned plurality of dry voice audio is processed.
  • the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is set. Within the distance, surround the human ear, generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment Perform a chorus, get and output a large chorus effect audio.
  • Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
  • FIG. 1 is an implementation flowchart of a method for processing chorus audio in an embodiment of the application
  • Fig. 2 is the schematic diagram that the virtual sound image localization coordinate system shows the sound image orientation in the embodiment of the application;
  • FIG. 3 is a schematic diagram of a virtual sound image localization in an embodiment of the application.
  • Fig. 4 is the schematic diagram of the virtual sound image after positioning in the embodiment of the application.
  • FIG. 5 is a schematic diagram of the composition of a spatial sound field process in an embodiment of the application.
  • FIG. 6 is a schematic diagram of a cascaded form of a comb filter and an all-pass filter in an embodiment of the application;
  • FIG. 7 is a schematic diagram of a reverberation impulse response in an embodiment of the application.
  • FIG. 8 is a schematic diagram of a two-channel simulation process in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a framework of a chorus audio processing system in an embodiment of the application.
  • FIG. 10 is a schematic diagram of a specific structure of a chorus audio processing system in an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of an apparatus for processing chorus audio in an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a chorus audio processing device according to an embodiment of the present application.
  • the core of the present application is to provide a method for processing chorus audio. After obtaining the dry voice audio of the same target song performed by multiple singers, time alignment is performed on the obtained dry voice audio, and virtual sound image localization is performed on the aligned Each dry sound audio is located on multiple virtual sound images, and the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance range, surrounding the human ear, based on the virtual sound Like the positioned multiple dry audios, chorus audio is generated, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, the chorus audio and the corresponding accompaniment are chorused to obtain and output the large chorus effect audio.
  • Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
  • the methods provided in the embodiments of the present application can be applied in various scenarios where a large chorus sound effect is to be obtained, and specific solutions can be implemented through the interaction between the server and the client.
  • the server may obtain in advance the dry voice audios of multiple singers, such as singers 1, 2, 3, 4... for the same target song, and perform time alignment on the obtained dry voice audios Process, and perform virtual sound image localization on multiple dry sound audios after alignment, and locate multiple dry sound audios on multiple virtual sound images, and multiple virtual sound images can surround the human ear.
  • Multiple dry audio to generate chorus audio When user X wants to make the song he sings achieve a chorus sound effect, he can sing the target song through the client.
  • the chorus effect audio can be obtained, and the chorus effect audio can be output through the client, so that the user X can feel the chorus sound effect.
  • the server can obtain the dry audios of the target songs performed by users 2, 3, 4, and 5 respectively, perform time alignment processing on the obtained dry audios, and locate the plurality of dry audios after alignment.
  • the multiple virtual sound images surround the human ear, and the chorus audio is generated based on the multiple dry sound audio frequencies after the virtual sound image localization.
  • the server When the server obtains the lead vocal audio sung by user 1 through the client based on the target song, the server synthesizes the lead vocal audio, the chorus audio and the corresponding accompaniment to obtain the chorus effect audio, which is output to the user 1 through the client, so that the user 1 can Feel the big chorus sound.
  • the method may include the following steps:
  • multiple dry audio frequencies may be obtained according to actual needs.
  • the multiple dry audio audios may be audio data obtained by different singers singing the same target song, and different singers may be in the same or different environments.
  • the dry audios of the same target song sung by multiple singers are obtained separately, because the multiple dry audios may be sung by different singers at different times, and there may be a phenomenon of misalignment such as delay.
  • an alignment tool can be used to align the time of the most identical starting positions of the obtained multiple dry audio audios.
  • the obtained multiple dry audio audios may also be preliminarily screened, for example, by using tools such as sound quality detection, and the audio itself is eliminated. Audio with poor sound quality, such as noise, accompaniment backstepping, audio length is too short, audio energy is too small, popping sound, etc. Then, time alignment processing and subsequent steps are performed on the dry audio audio retained after screening.
  • S130 Perform virtual sound image localization on the multiple dry audio audios subjected to the time alignment processing, so as to locate the multiple dry audio audios on the multiple virtual audio images.
  • a plurality of virtual sound images are located in a pre-established virtual sound image coordinate system
  • the virtual sound image coordinate system is centered on the human head
  • the center point of the straight line where the left and right ears are located is the coordinate origin
  • the positive direction of the first coordinate axis indicates the front of the human head.
  • the positive direction of the second coordinate axis represents the side of the human head from the left ear to the right ear
  • the positive direction of the third coordinate axis represents the top of the human head
  • the distance between each virtual sound image and the coordinate origin is within the set distance range.
  • the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range.
  • a virtual audio-visual coordinate system may be established in advance to display the audio-visual orientation.
  • the virtual audiovisual coordinate system may specifically be a Cartesian coordinate system.
  • the virtual audio-visual coordinate system can be centered on the human head, and the midpoint of the straight line where the left and right ears are located as the coordinate origin.
  • the positive direction represents the side of the human head from the left ear to the right ear.
  • the positive direction of the third coordinate axis that is, the z-axis, represents the top of the human head, that is, the direction of the top of the head.
  • the sound image has a certain azimuth and elevation (azimuth) in space ( elevation), you can use to indicate, rad indicates the distance between the current sound image and the coordinate origin.
  • the sound signal is a single-channel signal, which can be regarded as the sound image in the In terms of position, in order to obtain a certain virtual sound image, HRTF (Head Related Transfer Function, head related transformation function) can be used to perform data convolution to realize the localization operation.
  • HRTF Head Related Transfer Function, head related transformation function
  • the schematic diagram of virtual sound image localization is shown in Figure 3, where X represents a real sound source (single-channel signal), Y L and Y R represent the sound signals heard by the left ear and right ear respectively, and HRTF represents the position of the sound signal from the sound source. Transfer function of the transmission path to both ears.
  • the real audio source can be passed through a certain The HRTF filtering of the left and right ears on the position to obtain a two-way acoustic signal.
  • the acoustic signal heard by the human ear is the result of the HRTF filtering of the sound source X. Therefore, when performing virtual sound image localization, the sound signal can be filtered through the HRTF of the corresponding position.
  • multiple virtual sound images can be set, and the distance between each virtual sound image and the coordinate origin can be within the set distance range, such as 1 meter range, and each virtual sound image is relative to the virtual sound image coordinates.
  • the pitch angle of the plane formed by the first coordinate axis and the second coordinate axis of the system may be within a set angle range, such as a range of 10°, so that multiple virtual sound images surround the human ear.
  • each virtual sound image of the plurality of virtual sound images may be uniformly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis. That is, it surrounds the horizontal plane of the human ear at the same interval angle.
  • the interval angle can be set according to the actual situation or analysis of historical data, for example, it is set to 30°. If the interval angle is set to 30°, 12 virtual sound images can be located around the horizontal plane of the human ear at 30° intervals. The elevation angle of these 12 virtual sound images is 0°, and the azimuth angles are: 0°, 30°, 60°, ..., 330°. Of course, the interval angle can also be set to other values, such as 15°, 60°, and so on.
  • the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis may be greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis and the second coordinate axis.
  • the positions of the plurality of virtual sound images in the virtual sound image coordinate system are not limited to the ones mentioned above, and can also be specifically set according to actual needs, and only need to satisfy each virtual sound image.
  • the distance from the coordinate origin is within the set distance range, and the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis may be within the set angle range.
  • some of the virtual sound images of the multiple virtual sound images surround the plane of the human ear at intervals of 30°, and the elevation angle is 0°.
  • the distance between some virtual sound images and the coordinate origin can be the same or different, but they are all within the set distance range, which will enhance the surround effect of the subsequently generated chorus audio.
  • the virtual sound image localization is performed on the multiple dry sound audio frequencies after the time alignment processing, and after the multiple dry sound audio frequencies are located on the multiple virtual sound images, the operations of the subsequent steps can be continued.
  • S140 Generate chorus audio based on the plurality of dry audio audios after virtual sound image localization.
  • each dry audio audio in the multiple dry audio audio can be subjected to HRTF filtering processing at the corresponding virtual audio image position, and corresponding audio data can be obtained at each virtual audio image.
  • Chorus audio may be generated based on the plurality of dry audio audio after virtual panning. Specifically, the corresponding audio data obtained after HRTF filtering processing of multiple virtual sound image positions may be superimposed, or weighted and superimposed to obtain chorus audio. The sound effect of the obtained chorus audio has a three-dimensional sound field sense of hearing.
  • the chorus audio can be stored in a database and used when needed. For example, if a user wants to sing a song with a chorus effect, in this case, the chorus audio can be used to achieve the corresponding effect.
  • the synthesis of the main vocal audio, chorus audio and corresponding accompaniment it can be realized in various ways, such as synthesizing the main vocal audio and the corresponding accompaniment first, and then synthesizing it with the chorus audio, or, first synthesizing the chorus audio and the corresponding accompaniment Synthesize, and then synthesize with the lead vocal audio, or, first synthesize the lead vocal audio and chorus audio, and then synthesize with the corresponding accompaniment. Layer the corresponding accompaniment.
  • the chorus sound effect obtained by different implementation methods will be different, and the specific implementation method can be selected according to the actual situation.
  • Applying the method provided by the embodiment of the present application after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, perform time alignment processing on the obtained plurality of dry voice audio, and align the plurality of dry voice audio after the alignment.
  • the audio performs virtual sound image localization to locate multiple dry audio audios on multiple virtual sound images.
  • the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance.
  • surround the human ear generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment are performed. Chorus, get and output the large chorus effect audio.
  • Positioning multiple dry audio audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround effect. In terms of hearing, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
  • step S120 performs time alignment processing on the obtained plurality of dry audio frequencies, which may include the following steps:
  • the first step determine the reference audio corresponding to the target song
  • the second step for each obtained dry audio frequency, extract the audio features of the current dry audio audio and the reference audio respectively, and the audio features are fingerprint features or fundamental frequency features;
  • the third step determining the time corresponding to the maximum value of the audio feature similarity between the current dry audio and the reference audio as the audio alignment time;
  • the fourth step performing time alignment processing on the current dry audio audio based on the audio alignment time.
  • the reference corresponding to the target song may be determined first.
  • audio Specifically, a dry audio audio with better sound quality can be selected from the plurality of obtained dry audio audios as a reference audio.
  • the original dry vocal audio of the target song may also be determined as the reference audio.
  • audio features of the current dry audio audio and the reference audio can be extracted respectively, and the audio features are fingerprint features or fundamental frequency features.
  • the audio features are fingerprint features or fundamental frequency features.
  • Mel frequency band information, Bark frequency band information, Erb frequency band power, etc. can be extracted through multi-band filtering, and then fingerprint features can be obtained through half-wave rectification, binary judgment, etc.
  • fundamental frequency features can be extracted by fundamental frequency extraction tools such as pyin, crepe, and harvest.
  • the audio features of the reference audio can be saved after being extracted once, and can be called directly when necessary.
  • the audio features of the current dry audio and the reference audio are compared, which can be characterized by a similarity curve, etc., and the time corresponding to the maximum similarity value can be determined as the audio alignment time. Then, based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
  • the corresponding audio alignment time is obtained by comparing with the audio features of the reference audio, and after time alignment processing is performed, multiple dry audio audios after time alignment processing can be obtained.
  • the method may further include the following steps:
  • Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data
  • chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
  • the chorus audio is generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
  • bandpass filtering may be performed on the obtained dry audios, for example, performing bandpass filtering on the plurality of dry audios.
  • Chorus audio may be generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
  • chorus audio may be generated by superimposing or weighted superimposing a plurality of obtained bass data and a plurality of dry audio audios based on virtual sound image localization. After superimposing the bass signal, the heaviness of the sound signal can be enhanced.
  • the method may further include the following steps:
  • chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
  • a chorus audio is generated based on the plurality of dry audio frequencies subjected to virtual sound image localization and the plurality of dry audio frequencies subjected to reverberation simulation processing.
  • Figure 5 shows a schematic diagram of a typical spatial sound field process composition.
  • the acoustic signal with the largest amplitude is the direct sound
  • the following acoustic signal is the reflected acoustic signal obtained by the sound wave reflected on the object closest to the listener, which has obvious directionality
  • a dense acoustic signal It is the reverberation sound signal obtained by the superposition of sound waves after multiple reflections of surrounding objects. It is the superposition of a large number of reflected sounds from different directions without directionality.
  • the reverberation sound is the superposition of multiple reflection sounds, which is characterized by weak energy and no directionality, because it is the superposition of a large number of late reflection sounds from different directions, and has a high echo density, so You can use reverb to create a surround sound with a sense of surround.
  • reverberation simulation processing may be performed on the obtained multiple dry voice audio respectively.
  • a cascade of comb filters and all-pass filters can be used to perform reverberation simulation processing on the obtained multiple dry audio frequencies respectively.
  • Figure 6 shows a cascaded form of comb filters and all-pass filters, in which four comb filters are connected in parallel with two all-pass filters in series.
  • the actual simulated reverberation impulse response is shown in Figure 7.
  • Figure 6 is only a specific form. In practical applications, there can be other forms. The number of comb filters and all-pass filters and the cascading method can be adjusted according to actual needs. .
  • the method may further include the following steps:
  • Reverberation simulation processing is performed on the multiple dry audio frequencies after virtual sound image localization
  • chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
  • the chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
  • a chorus audio can be generated based on the plurality of dry audio audios subjected to virtual sound image localization and subjected to reverberation simulation processing.
  • the chorus audio may be generated by superimposing or weighted superimposing a plurality of dry audio audios after performing virtual sound image localization and performing reverberation simulation processing.
  • Performing reverberation simulation processing on multiple dry audio frequencies after virtual sound image processing can enhance the spatial sound effects of the sound signal, further suppress the head-in-head effect, and expand the sound field.
  • the method may further include the following steps:
  • chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
  • a chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing.
  • the dual Channel analog processing after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, and performing time alignment processing on the obtained plurality of dry voice audio, the dual Channel analog processing.
  • the correlation between the two-channel signals is reduced by delay, and the sound field is expanded as much as possible to obtain two-channel output.
  • a plurality of dry audio audios can be simulated by 8 groups of different delay weights on the left and right, where d represents the delay and g represents the weight.
  • the delay parameter can choose 16 parameters ranging from 21ms to 79ms.
  • the amplitude attenuation is used to represent the energy loss of the sound wave due to reflection, thereby reducing the correlation between the two environmental information. That is, the dry audio can be copied separately to obtain two signals with the same information. The two signals are completely correlated, and then attenuated by different delays and amplitudes to reduce the correlation of the two signals to obtain a pseudo stereo signal.
  • a chorus audio may be generated based on the plurality of dry audio frequencies after virtual panning and the plurality of dry audio frequencies after binaural simulation processing. Specifically, multiple dry audio audios after virtual sound image localization and multiple dry audio audios after binaural simulation processing may be superimposed or weighted superimposed to generate chorus audio.
  • the method may further include the following steps:
  • chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing, including:
  • the chorus audio is generated based on the plurality of dry sound audio frequencies after virtual sound image localization, the plurality of dry sound audio frequencies after the binaural simulation processing and the reverberation simulation processing.
  • time alignment processing is performed on the plurality of dry voice audio frequencies
  • binaural simulation processing is performed on the plurality of dry voice audio frequencies respectively.
  • reverberation simulation processing may be further performed on the plurality of dry audio frequencies after the binaural simulation processing, so as to enhance the spatial effect of the sound signal, suppress the head effect, and expand the sound field.
  • the chorus audio is generated by ringing multiple dry audio frequencies after analog processing. Specifically, multiple dry audio audios after virtual sound image localization, and multiple dry audio audios after binaural simulation processing and reverberation simulation processing may be superimposed or weighted superimposed to generate chorus audio.
  • time alignment processing can be performed on the obtained dry audio, and then based on the time alignment processing of the plurality of dry audios
  • the sound audio is processed by virtual sound image localization, bass enhancement, reverberation simulation, two-channel simulation, etc.
  • the specific processing can be carried out in combination with the above embodiments. Sound simulation and two-channel simulation, so that the final generated chorus audio has a surround sound effect, which can be highly robust to a wide range of sound misalignment. If you want to superimpose the chorus audio and the lead vocal The delay drop is large, which can also ensure that the user has a harmonious listening experience.
  • FIG. 9 is a schematic diagram of a system framework for processing multiple dry audio frequencies after time alignment processing, including a bass enhancement unit, a virtual sound image localization unit, a two-channel simulation unit and a reverberation simulation unit.
  • the bass enhancement unit is used to perform bandpass filtering on multiple dry audios to obtain bass data;
  • the virtual sound image localization unit is used to perform virtual sound image localization on multiple dry audios, so as to locate the multiple dry audios to multiple
  • the two-channel simulation unit is used for performing dual-channel simulation processing on a plurality of dry sound audios;
  • the reverberation simulation unit is used for performing reverberation simulation processing on a plurality of dry sound audio frequencies.
  • Both the virtual sound image localization unit and the two-channel simulation unit can be connected to the reverberation simulation unit.
  • the reverberation simulation unit can be further used for reverberation simulation.
  • the binaural simulation processing is performed on the plurality of dry audio frequencies by the binaural simulation unit
  • the reverberation simulation processing may be further performed by the reverberation simulation unit.
  • weighted superposition can be performed on the audio data processed by these units to obtain chorus audio.
  • Figure 10 shows a specific example of processing multiple dry audio frequencies
  • H represents the transfer function of HRTF filtering, through the processing of this transfer function, virtual sound image localization can be performed on
  • the sound audio is positioned on 12 virtual sound images around the human ear level
  • REV represents the reverberation analog unit
  • BASS represents the bass enhancement unit
  • REF represents the two-channel analog unit.
  • the reverberation simulation unit here can use the same parameters, and you can also configure different parameters for different reverberation simulation units according to actual needs, so as to obtain flexible reverberation modulation.
  • the grand chorus effect of the chorus audio finally generated in the embodiment of the present application is closer to the hearing sense of a real concert chorus.
  • adding accompaniment on the basis of the lead vocal audio and mixing the chorus audio at the same time allows users to have an immersive concert experience and a more shocking immersive sound field surround experience.
  • performing virtual sound image localization on a plurality of dry audio frequencies after time alignment processing may include the following steps:
  • Step 1 according to the number of virtual audio images, group the obtained multiple dry audio audios after time alignment processing, and the number of groups is the same as the number of virtual audio images;
  • Step 2 Position each group of dry audio audio on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
  • the obtained dry voice audio after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, and performing time alignment processing on the obtained plurality of dry voice audio frequency, the obtained dry voice audio can be obtained according to the number of virtual
  • the plurality of dry audio audios after time alignment processing are grouped, and the number of divided groups is the same as the number of virtual audio images, and the same group includes several dry audio audios. If the number of obtained dry audios is large, the same dry audio can be in only one group; if the number of obtained dry audios is small, the same dry audio can be in multiple groups to better Achieve large chorus sound.
  • each group of dry sound tones can be positioned on corresponding virtual sound images respectively, and different groups of dry sound tones correspond to different virtual sound images. Realize the localization processing of the virtual sound images of multiple dry audios, and enhance the sound effect of the chorus.
  • synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment may include the following steps:
  • the volume of the lead vocal audio and the chorus audio can be adjusted respectively, so that the volume of the lead vocal audio and the chorus audio are equal, or the volume of the lead vocal audio is greater than that of the chorus audio. volume.
  • reverberation simulation processing can also be performed on the lead vocal audio and the chorus audio to obtain a surround sound with a sense of surround.
  • the lead vocal audio, chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized, so that the final output chorus effect audio brings a better listening experience to the user.
  • embodiments of the present application further provide a chorus audio processing apparatus, and the chorus audio processing apparatus described below and the chorus audio processing method described above may refer to each other correspondingly.
  • the device may include the following modules:
  • a dry-sound audio obtaining module 1110 is used to obtain the dry-sound audio of the same target song performed by a plurality of singers respectively;
  • a time alignment processing module 1120 configured to perform time alignment processing on the obtained multiple dry audio frequencies
  • the virtual sound image localization module 1130 is configured to perform virtual sound image localization on the plurality of dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on the plurality of virtual sound images; It is located in the pre-established virtual audio-visual coordinate system.
  • the virtual audio-visual coordinate system is centered on the human head and the coordinate origin is the midpoint of the straight line where the left and right ears are located.
  • the positive direction of the first coordinate axis indicates the front of the head, and the positive direction of the second coordinate axis.
  • the direction represents the side of the human head from the left ear to the right ear
  • the positive direction of the third coordinate axis represents the top of the human head
  • the distance between each virtual sound image and the coordinate origin is within the set distance range
  • each virtual sound image is relative to the first.
  • the pitch angle of the plane formed by one coordinate axis and the second coordinate axis is within the set angle range;
  • the chorus audio generation module 1140 is configured to generate chorus audio based on the plurality of dry audio frequencies after virtual sound image localization;
  • the chorus effect audio obtaining module 1150 is configured to output the chorus effect audio after synthesizing the lead singer audio, the chorus audio and the corresponding accompaniment in the case of obtaining the lead singer audio based on the target song singing.
  • time alignment processing is performed on the obtained plurality of dry voice audio, and the aligned plurality of dry voice audio is processed.
  • the audio performs virtual sound image localization to locate multiple dry audio audios on multiple virtual sound images.
  • the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance.
  • surround the human ear generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment are performed. Chorus, get and output the large chorus effect audio.
  • Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
  • the time alignment processing module 1120 is used for:
  • the audio features of the current dry sound audio and the reference audio are extracted respectively, and the audio features are fingerprint features or fundamental frequency features;
  • time alignment processing is performed on the current dry audio audio.
  • a bass data acquisition module for:
  • Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data
  • the chorus audio generation module 1140 is used for:
  • the chorus audio is generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
  • a reverberation simulation processing module is also included for:
  • the chorus audio generation module 1140 is used for:
  • a chorus audio is generated based on the plurality of dry audio frequencies subjected to virtual sound image localization and the plurality of dry audio frequencies subjected to reverberation simulation processing.
  • the reverberation simulation processing module is used for:
  • Reverberation simulation processing is performed on the obtained multiple dry audio frequencies by using the cascade of comb filters and all-pass filters.
  • the reverberation simulation processing module is also used for:
  • reverberation simulation processing is performed on the plurality of dry sound audio frequencies after the virtual sound image localization;
  • the chorus audio generation module 1140 is used for:
  • the chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
  • a two-channel analog processing module for:
  • the chorus audio generation module 1140 is used for:
  • a chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing.
  • the reverberation simulation processing module is also used for:
  • the chorus audio generation module 1140 is used for:
  • the chorus audio is generated based on the plurality of dry sound audio frequencies after virtual sound image localization, the plurality of dry sound audio frequencies after the binaural simulation processing and the reverberation simulation processing.
  • the virtual sound image localization module 1130 is used for:
  • the obtained multiple dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
  • Each group of dry audio audio is positioned on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
  • the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head. is based on the elevation angle of the plane formed by the first coordinate axis and the second coordinate axis; or, each virtual sound image is uniformly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
  • the chorus effect audio obtaining module 1150 is used for:
  • the embodiments of the present application also provide a chorus audio processing device, including:
  • the processor is configured to implement the steps of the above-mentioned chorus audio processing method when the computer program is executed.
  • the chorus audio processing device may include: a processor 10 , a memory 11 , a communication interface 12 and a communication bus 13 .
  • the processor 10 , the memory 11 , and the communication interface 12 all communicate with each other through the communication bus 13 .
  • the processor 10 may be a central processing unit (Central Processing Unit, CPU), an application-specific integrated circuit, a digital signal processor, a field programmable gate array, or other programmable logic devices, and the like.
  • CPU Central Processing Unit
  • the processor 10 may call the program stored in the memory 11, and specifically, the processor 10 may execute the operations in the embodiments of the method for processing chorus audio.
  • the memory 11 is used to store one or more programs, and the programs may include program codes, and the program codes include computer operation instructions.
  • the memory 11 at least stores a program for realizing the following functions:
  • the plurality of virtual sound images are located in a pre-established virtual sound image coordinate system , the virtual audio-visual coordinate system is centered on the human head, and the coordinate origin is the midpoint of the straight line where the left and right ears are located.
  • the side of the third coordinate axis is directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is formed relative to the first coordinate axis and the second coordinate axis.
  • the pitch angle of the plane is within the set angle range;
  • the large chorus effect audio is output.
  • the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function (such as an audio playback function and an audio synthesis function). etc.; the storage data area can store data created during use, such as sound image positioning data, audio synthesis data, etc.
  • the program storage area may store an operating system and an application program required for at least one function (such as an audio playback function and an audio synthesis function).
  • the storage data area can store data created during use, such as sound image positioning data, audio synthesis data, etc.
  • the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
  • the communication interface 12 may be an interface of a communication module for connecting with other devices or systems.
  • the structure shown in FIG. 12 does not constitute a limitation on the chorus audio processing device in the embodiment of the present application.
  • the chorus audio processing device may include more or more chorus audio processing devices than those shown in FIG. 12 . Fewer parts, or a combination of certain parts.
  • the embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned chorus audio processing method are implemented. .
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

A method and device for processing a chorus audio, and a storage medium. The method comprises the following steps: obtaining acapella audios of a plurality of singers singing the same target song (S110); performing time alignment on the plurality of obtained acapella audios (S120), and performing virtual sound image positioning, so as to position the plurality of acapella audios onto the plurality of virtual sound images (S130); generating a chorus audio on the basis of the plurality of acapella audios after having undergone the virtual sound image positioning (S140); and when a lead singer audio based on the singing of the target song is obtained, synthesizing the lead singer audio, the chorus audio, and a corresponding accompaniment, and then outputting a chorus effect audio (S150). The plurality of virtual sound images surround human ears and the plurality of acapella audios are positioned onto the plurality of virtual sound images, so that the outputted chorus effect audio can have a sound field surround sound effect, effectively preventing an in-head effect caused by sound field gathering in the center of the head, and enabling the sound field to be wider.

Description

一种合唱音频的处理方法、设备及存储介质Method, device and storage medium for processing chorus audio
本申请要求于2021年04月27日提交中国专利局、申请号为202110460280.4、发明名称为“一种合唱音频的处理方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110460280.4 and the invention titled "A chorus audio processing method, device and storage medium", which was filed with the China Patent Office on April 27, 2021, the entire contents of which are by reference Incorporated in this application.
技术领域technical field
本申请涉及计算机应用技术领域,特别是涉及一种合唱音频的处理方法、设备及存储介质。The present application relates to the technical field of computer applications, and in particular, to a method, device and storage medium for processing chorus audio.
背景技术Background technique
随着计算机技术的快速发展,音频类、视频类、办公类等各类软件逐渐增多,给人们的生活带来了很多便利。使用音频类软件,用户可以进行听歌、唱歌等体验。With the rapid development of computer technology, various types of software such as audio, video, and office have gradually increased, bringing a lot of convenience to people's lives. Using audio software, users can listen to songs, sing and other experiences.
目前,为了给用户提供演唱会大合唱的听觉体验,多是将多人歌唱数据进行直接叠加处理。但是,这种经过简单的叠加处理得到的音频,在听感上,声场聚集在人头中心,具有头中效应,声场不够宽阔,听觉体验较差。At present, in order to provide users with an auditory experience of a concert chorus, most of the multi-person singing data is directly superimposed. However, the audio obtained through simple superposition processing has a sound field concentrated in the center of the human head, with a head effect, the sound field is not wide enough, and the listening experience is poor.
发明内容SUMMARY OF THE INVENTION
本申请的目的是提供一种合唱音频的处理方法、设备及存储介质,以避免声场聚集在人头中心而产生头中效应,使得声场更加宽阔,提升听觉体验。The purpose of the present application is to provide a chorus audio processing method, device and storage medium, so as to avoid the head effect caused by the sound field gathering in the center of the human head, so that the sound field is wider and the listening experience is improved.
为解决上述技术问题,本申请提供如下技术方案:In order to solve the above-mentioned technical problems, the application provides the following technical solutions:
一种合唱音频的处理方法,包括:A method for processing chorus audio, comprising:
分别获得多个演唱者对同一目标歌曲进行演唱的干声音频;Obtain the dry audio of the same target song performed by multiple singers respectively;
对获得的多个所述干声音频进行时间对齐处理;performing time alignment processing on the obtained plurality of the dry audio frequencies;
对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,以将多个所述干声音频定位到多个虚拟声像上;其中,多个所述虚拟声像位于预先建立的虚拟声像坐标系中,所述虚拟声像坐标系以人头为中心,以左 右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个所述虚拟声像与所述坐标原点的距离在设定距离范围内,每个所述虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的俯仰角在设定角度范围内;Perform virtual sound image localization on a plurality of the dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on a plurality of virtual sound images; In the virtual audio-visual coordinate system, the virtual audio-visual coordinate system is centered on the human head, with the midpoint of the straight line where the left and right ears are located as the coordinate origin, the positive direction of the first coordinate axis represents the front of the human head, and the positive direction of the second coordinate axis Represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is within the set distance range. The pitch angle of the sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range;
基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频;generating chorus audio based on a plurality of the dry audio audio after virtual sound image localization;
在获取到基于所述目标歌曲演唱的主唱音频的情况下,将所述主唱音频、所述合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。In the case of acquiring the lead vocal audio sung based on the target song, after synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment, a large chorus effect audio is output.
在本申请的一种具体实施方式中,所述对获得的多个所述干声音频进行时间对齐处理,包括:In a specific implementation manner of the present application, performing time alignment processing on the plurality of obtained dry audio frequencies includes:
确定所述目标歌曲对应的参考音频;Determine the reference audio corresponding to the target song;
针对获得的每个所述干声音频,分别提取当前干声音频和所述参考音频的音频特征,所述音频特征为指纹特征或基频特征;For each of the obtained dry audio frequencies, extract the audio features of the current dry audio audio and the reference audio respectively, and the audio features are fingerprint features or fundamental frequency features;
将所述当前干声音频与所述参考音频的音频特征相似度最大值对应的时间确定为音频对齐时间;Determine the time corresponding to the audio feature similarity maximum value of the current dry sound audio and the reference audio as the audio alignment time;
基于所述音频对齐时间,对所述当前干声音频进行时间对齐处理。Based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
在本申请的一种具体实施方式中,还包括:In a specific embodiment of the present application, it also includes:
分别对获得的多个所述干声音频进行带通滤波处理,得到多个低音数据;Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data;
相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
基于进行虚拟声像定位后的多个所述干声音频和多个所述低音数据,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio frequencies and the plurality of the bass data subjected to virtual sound image localization.
在本申请的一种具体实施方式中,还包括:In a specific embodiment of the present application, it also includes:
分别对获得的多个所述干声音频进行混响模拟处理;respectively performing reverberation simulation processing on the obtained plurality of the dry audio frequencies;
相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
基于进行虚拟声像定位后的多个所述干声音频和混响模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio frequencies subjected to virtual sound image localization and the plurality of the dry audio frequencies subjected to reverberation simulation processing.
在本申请的一种具体实施方式中,所述分别对获得的多个所述干声音频进行混响模拟处理,包括:In a specific implementation manner of the present application, performing reverberation simulation processing on a plurality of the obtained dry audio frequencies respectively includes:
利用梳状滤波器和全通滤波器的级联分别对获得的多个所述干声音频进行混响模拟处理。Reverberation simulation processing is respectively performed on the obtained plurality of the dry audio frequencies by using the cascade of comb filters and all-pass filters.
在本申请的一种具体实施方式中,在所述对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位之后,还包括:In a specific implementation manner of the present application, after the virtual sound image localization is performed on the plurality of dry audio audios subjected to the time alignment process, the method further includes:
分别对进行虚拟声像定位后的多个所述干声音频进行混响模拟处理;respectively performing reverberation simulation processing on a plurality of the dry audio frequencies after virtual sound image localization;
相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
基于进行虚拟声像定位,且进行混响模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
在本申请的一种具体实施方式中,还包括:In a specific embodiment of the present application, it also includes:
分别对获得的多个所述干声音频进行双声道模拟处理;respectively perform binaural simulation processing on the obtained plurality of the dry audio frequencies;
相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
基于进行虚拟声像定位后的多个所述干声音频和双声道模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio audio after virtual sound image localization and the plurality of the dry audio audio after binaural simulation processing.
在本申请的一种具体实施方式中,在所述分别对获得的多个所述干声音频进行双声道模拟处理之后,还包括:In a specific implementation manner of the present application, after the two-channel simulation processing is performed on the obtained plurality of dry audio frequencies, the method further includes:
对进行双声道模拟处理后的多个所述干声音频进行混响模拟处理;performing reverberation simulation processing on a plurality of the dry audio frequencies after the binaural simulation processing;
相应的,所述基于进行虚拟声像定位后的多个所述干声音频和双声道模拟处理后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the generation of chorus audio based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing includes:
基于进行虚拟声像定位后的多个所述干声音频、双声道模拟处理及混响模拟处理后的多个所述干声音频,生成合唱音频。The chorus audio is generated based on the plurality of the dry audio audios after virtual sound image localization, the plurality of the dry audio audios after the binaural simulation processing and the reverberation simulation processing.
在本申请的一种具体实施方式中,所述对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,包括:In a specific embodiment of the present application, performing virtual sound image localization on a plurality of the dry audio audio after time alignment processing includes:
按照虚拟声像的个数,将获得的进行时间对齐处理后的多个所述干声音频进行分组,组数与虚拟声像的个数相同;According to the number of virtual sound images, the obtained dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
将各组干声音频分别定位到对应的虚拟声像上,不同组干声音频对应 不同虚拟声像。Each group of dry audio audio is located on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
在本申请的一种具体实施方式中,In a specific embodiment of the present application,
多个所述虚拟声像中,位于人头后方的虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的仰角大于位于人头前方的虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的仰角;Among the plurality of virtual sound images, the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis. the elevation angle of the plane formed by the coordinate axis and the second coordinate axis;
或者,or,
每个所述虚拟声像均匀分布在所述第一坐标轴和所述第二坐标轴构成的平面的一周。Each of the virtual sound images is evenly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
在本申请的一种具体实施方式中,所述将所述主唱音频、所述合唱音频和相应的伴奏进行合成,包括:In a specific implementation manner of the present application, the synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment includes:
对所述主唱音频和所述合唱音频分别进行音量调整,和/或,对所述主唱音频和所述合唱音频进行混响模拟处理;respectively performing volume adjustment on the lead vocal audio and the chorus audio, and/or performing reverberation simulation processing on the lead vocal audio and the chorus audio;
将进行音量调整和/或进行混响模拟处理后的所述主唱音频、所述合唱音频和相应的伴奏进行合成。The lead vocal audio, the chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized.
一种合唱音频的处理装置,其特征在于,包括:A processing device for chorus audio, comprising:
干声音频获得模块,用于分别获得多个演唱者对同一目标歌曲进行演唱的干声音频;The dry sound audio obtaining module is used to obtain the dry sound audio of the same target song performed by multiple singers respectively;
对齐处理模块,用于对获得的多个所述干声音频进行时间对齐处理;an alignment processing module, configured to perform time alignment processing on a plurality of the obtained dry audio frequencies;
虚拟声像定位模块,用于对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,以将多个所述干声音频定位到多个虚拟声像上;其中,多个所述虚拟声像位于预先建立的虚拟声像坐标系中,所述虚拟声像坐标系以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个所述虚拟声像与所述坐标原点的距离在设定距离范围内,每个所述虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的俯仰角在设定角度范围内;A virtual sound image localization module, configured to perform virtual sound image localization on a plurality of the dry sound audio after time alignment processing, so as to locate a plurality of the dry sound audio on a plurality of virtual sound images; The virtual sound image is located in a pre-established virtual sound image coordinate system, the virtual sound image coordinate system is centered on the human head, the midpoint of the straight line where the left and right ears are located is the coordinate origin, and the positive direction of the first coordinate axis represents the front of the human head. , the positive direction of the second coordinate axis represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents directly above the human head, and the distance between each virtual sound image and the coordinate origin is within the set distance Within the range, the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range;
合唱音频生成模块,用于基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频;a chorus audio generation module for generating chorus audio based on a plurality of the dry audio frequencies after performing virtual sound image localization;
大合唱效果音频输出模块,用于在获取到基于所述目标歌曲演唱的主 唱音频的情况下,将所述主唱音频、所述合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。The chorus effect audio output module is configured to output the chorus effect audio after synthesizing the lead singer audio, the chorus audio and the corresponding accompaniment under the condition of obtaining the lead vocal audio sung based on the target song.
一种合唱音频的处理设备,包括:A chorus audio processing device, comprising:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序时实现上述任一项所述的合唱音频的处理方法的步骤。The processor is configured to implement the steps of the chorus audio processing method described in any one of the above when executing the computer program.
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的合唱音频的处理方法的步骤。A computer-readable storage medium storing a computer program on the computer-readable storage medium, when the computer program is executed by a processor, implements the steps of any of the above-mentioned chorus audio processing methods.
应用本申请实施例所提供的技术方案,分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,对获得的多个干声音频进行时间对齐处理,并对对齐后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上,多个虚拟声像位于以人头为中心的虚拟声像坐标系中,与坐标原点的距离在设定距离范围内,环绕人耳,基于虚拟声像定位后的多个干声音频,生成合唱音频,并在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合唱,得到并输出大合唱效果音频。将多个干声音频定位到环绕人耳的多个虚拟声像上,可以使得生成的合唱音频具有声场环绕音效,在听感上,可以有效避免最终输出的大合唱效果音频的声场聚集在人头中心而产生的头中效应,使得声场更加宽阔。By applying the technical solutions provided by the embodiments of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, time alignment processing is performed on the obtained plurality of dry voice audio, and the aligned plurality of dry voice audio is processed. Perform virtual sound image localization of sound and audio to locate multiple dry sound audio on multiple virtual sound images. The multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is set. Within the distance, surround the human ear, generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment Perform a chorus, get and output a large chorus effect audio. Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1为本申请实施例中一种合唱音频的处理方法的实施流程图;FIG. 1 is an implementation flowchart of a method for processing chorus audio in an embodiment of the application;
图2为本申请实施例中虚拟声像定位坐标系展示声像方位的示意图;Fig. 2 is the schematic diagram that the virtual sound image localization coordinate system shows the sound image orientation in the embodiment of the application;
图3为本申请实施例中一种虚拟声像定位示意图;3 is a schematic diagram of a virtual sound image localization in an embodiment of the application;
图4为本申请实施例中定位后的虚拟声像的示意图;Fig. 4 is the schematic diagram of the virtual sound image after positioning in the embodiment of the application;
图5为本申请实施例中一种空间声场过程组成示意图;5 is a schematic diagram of the composition of a spatial sound field process in an embodiment of the application;
图6为本申请实施例中一种梳状滤波器和全通滤波器的级联形式示意图;6 is a schematic diagram of a cascaded form of a comb filter and an all-pass filter in an embodiment of the application;
图7为本申请实施例中一种混响脉冲响应示意图;7 is a schematic diagram of a reverberation impulse response in an embodiment of the application;
图8为本申请实施例中一种双声道模拟过程示意图;8 is a schematic diagram of a two-channel simulation process in an embodiment of the present application;
图9为本申请实施例中一种合唱音频的处理系统的框架示意图;9 is a schematic diagram of a framework of a chorus audio processing system in an embodiment of the application;
图10为本申请实施例中一种合唱音频的处理系统的具体结构示意图;10 is a schematic diagram of a specific structure of a chorus audio processing system in an embodiment of the application;
图11为本申请实施例中一种合唱音频的处理装置的结构示意图;11 is a schematic structural diagram of an apparatus for processing chorus audio in an embodiment of the application;
图12为本申请实施例中一种合唱音频的处理设备的结构示意图。FIG. 12 is a schematic structural diagram of a chorus audio processing device according to an embodiment of the present application.
具体实施方式Detailed ways
本申请的核心是提供一种合唱音频的处理方法。分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,对获得的多个干声音频进行时间对齐处理,并对对齐后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上,多个虚拟声像位于以人头为中心的虚拟声像坐标系中,与坐标原点的距离在设定距离范围内,环绕人耳,基于虚拟声像定位后的多个干声音频,生成合唱音频,并在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合唱,得到并输出大合唱效果音频。将多个干声音频定位到环绕人耳的多个虚拟声像上,可以使得生成的合唱音频具有声场环绕音效,在听感上,可以有效避免最终输出的大合唱效果音频的声场聚集在人头中心而产生的头中效应,使得声场更加宽阔。The core of the present application is to provide a method for processing chorus audio. After obtaining the dry voice audio of the same target song performed by multiple singers, time alignment is performed on the obtained dry voice audio, and virtual sound image localization is performed on the aligned Each dry sound audio is located on multiple virtual sound images, and the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance range, surrounding the human ear, based on the virtual sound Like the positioned multiple dry audios, chorus audio is generated, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, the chorus audio and the corresponding accompaniment are chorused to obtain and output the large chorus effect audio. Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
在实际应用中,本申请实施例所提供的方法可以应用在想要获得大合唱音效的各种场景中,可以通过服务器与客户端的交互进行具体方案的实施。In practical applications, the methods provided in the embodiments of the present application can be applied in various scenarios where a large chorus sound effect is to be obtained, and specific solutions can be implemented through the interaction between the server and the client.
举例而言,在场景1中,服务器可以预先获得多个演唱者,如演唱者1、2、3、4……对同一目标歌曲进行演唱的干声音频,对获得的干声音频 进行时间对齐处理,并在对齐后对多个干声音频进行虚拟声像定位,将多个干声音频定位到多个虚拟声像上,多个虚拟声像可以环绕人耳,基于虚拟声像定位后的多个干声音频,生成合唱音频。当用户X想要使得自己演唱的歌曲实现大合唱音效时,可以通过客户端对目标歌曲进行演唱,服务器通过客户端得到用户X演唱的主唱音频,将主唱音频、合唱音频和相应的伴奏进行合成,可以得到大合唱效果音频,将大合唱效果音频通过客户端输出出来,可以使得用户X感受到大合唱音效。For example, in Scenario 1, the server may obtain in advance the dry voice audios of multiple singers, such as singers 1, 2, 3, 4... for the same target song, and perform time alignment on the obtained dry voice audios Process, and perform virtual sound image localization on multiple dry sound audios after alignment, and locate multiple dry sound audios on multiple virtual sound images, and multiple virtual sound images can surround the human ear. Multiple dry audio to generate chorus audio. When user X wants to make the song he sings achieve a chorus sound effect, he can sing the target song through the client. The chorus effect audio can be obtained, and the chorus effect audio can be output through the client, so that the user X can feel the chorus sound effect.
在场景2中,几个好朋友(用户1、2、3、4、5)在同一时间段但不同空间对目标歌曲进行演唱,想要达到大合唱音效。在任意一个用户的角度来看,可以将当前用户作为主唱。如在用户1的角度,服务器可以分别获得用户2、3、4、5对目标歌曲进行演唱的干声音频,对获得的干声音频进行时间对齐处理,对齐后将多个干声音频定位到多个虚拟声像上,多个虚拟声像环绕人耳,基于虚拟声像定位后的多个干声音频,生成合唱音频。服务器在获取到用户1通过客户端基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合成,得到大合唱效果音频,通过客户端输出给用户1,这样用户1可以感受到大合唱音效。In Scenario 2, several good friends ( Users 1, 2, 3, 4, and 5) sing the target song at the same time period but in different spaces, and want to achieve a large chorus sound effect. From the perspective of any user, the current user can be used as the lead singer. For example, from the perspective of user 1, the server can obtain the dry audios of the target songs performed by users 2, 3, 4, and 5 respectively, perform time alignment processing on the obtained dry audios, and locate the plurality of dry audios after alignment. On the multiple virtual sound images, the multiple virtual sound images surround the human ear, and the chorus audio is generated based on the multiple dry sound audio frequencies after the virtual sound image localization. When the server obtains the lead vocal audio sung by user 1 through the client based on the target song, the server synthesizes the lead vocal audio, the chorus audio and the corresponding accompaniment to obtain the chorus effect audio, which is output to the user 1 through the client, so that the user 1 can Feel the big chorus sound.
上述仅为示例性地描述了应用场景,在实际应用中,本申请技术方案还可以应用于更多的场景,如多人合唱、多人小乐队等的音效处理场景。The above only describes the application scenarios exemplarily. In practical applications, the technical solutions of the present application can also be applied to more scenarios, such as sound effect processing scenarios of multi-person chorus and multi-person small bands.
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make those skilled in the art better understand the solution of the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
参见图1所示,为本申请实施例所提供的一种合唱音频的处理方法的实施流程图,该方法可以包括以下步骤:Referring to FIG. 1, which is an implementation flowchart of a method for processing chorus audio provided by an embodiment of the present application, the method may include the following steps:
S110:分别获得多个演唱者对同一目标歌曲进行演唱的干声音频。S110: Obtain dry audio audio of the same target song performed by multiple singers respectively.
在本申请实施例中,可以根据实际需要获得多个干声音频。多个干声音频可以是不同演唱者对于同一目标歌曲演唱得到的音频数据,不同演唱者可以处于相同或不同环境中。In this embodiment of the present application, multiple dry audio frequencies may be obtained according to actual needs. The multiple dry audio audios may be audio data obtained by different singers singing the same target song, and different singers may be in the same or different environments.
S120:对获得的多个干声音频进行时间对齐处理。S120: Perform time alignment processing on the obtained multiple dry audio frequencies.
分别获得多个演唱者对同一目标歌曲进行演唱的干声音频,因为多个干声音频可能为不同演唱者在不同时间演唱的,可能存在延迟等不对齐现象。为了后续达到较好的大合唱音效,可以先对获得的多个干声音频进行时间对齐处理,使得进行时间对齐处理后的干声音频不存在严重抢拍或慢拍,如提前或滞后1秒以上的音频。具体的,可以利用对齐工具对获得的多个干声音频最相同起始位置的时间对齐。The dry audios of the same target song sung by multiple singers are obtained separately, because the multiple dry audios may be sung by different singers at different times, and there may be a phenomenon of misalignment such as delay. In order to achieve a better chorus sound effect in the future, you can first perform time alignment processing on the obtained dry sound audio, so that the dry sound audio after time alignment processing does not have serious rush or slow beat, such as advance or lag more than 1 second 's audio. Specifically, an alignment tool can be used to align the time of the most identical starting positions of the obtained multiple dry audio audios.
在本申请的具体实施方式中,在对获得的多个干声音频进行时间对齐处理之前,还可以对获得的多个干声音频进行初步筛选,如通过音质检测等工具进行筛选,剔除音频本身带有杂音、有伴奏回踩、音频长度太短、音频能量太小、爆音等音质差的音频。然后再对筛选后保留的干声音频进行时间对齐处理及后续步骤的操作。In the specific embodiment of the present application, before the time alignment processing is performed on the obtained multiple dry audio audios, the obtained multiple dry audio audios may also be preliminarily screened, for example, by using tools such as sound quality detection, and the audio itself is eliminated. Audio with poor sound quality, such as noise, accompaniment backstepping, audio length is too short, audio energy is too small, popping sound, etc. Then, time alignment processing and subsequent steps are performed on the dry audio audio retained after screening.
S130:对进行时间对齐处理后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上。S130: Perform virtual sound image localization on the multiple dry audio audios subjected to the time alignment processing, so as to locate the multiple dry audio audios on the multiple virtual audio images.
其中,多个虚拟声像位于预先建立的虚拟声像坐标系中,虚拟声像坐标系以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个虚拟声像与坐标原点的距离在设定距离范围内,每个虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的俯仰角在设定角度范围内。Among them, a plurality of virtual sound images are located in a pre-established virtual sound image coordinate system, the virtual sound image coordinate system is centered on the human head, the center point of the straight line where the left and right ears are located is the coordinate origin, and the positive direction of the first coordinate axis indicates the front of the human head. , the positive direction of the second coordinate axis represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents the top of the human head, and the distance between each virtual sound image and the coordinate origin is within the set distance range. The pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range.
在本申请实施例中,可以预先建立一个虚拟声像坐标系,用于展示声像方位。虚拟声像坐标系具体可以是笛卡尔坐标系。如图2所示,虚拟声像坐标系可以以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴即x轴的正方向表示人头正前方,第二坐标轴即y轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴即z轴的正方向表示人头正上方,即头顶方向,声像在空间中具有一定的方位角(azimuth)和仰角(elevation),可以用
Figure PCTCN2022087784-appb-000001
来表示,rad表示当前声像与坐标原点的距离。
In this embodiment of the present application, a virtual audio-visual coordinate system may be established in advance to display the audio-visual orientation. The virtual audiovisual coordinate system may specifically be a Cartesian coordinate system. As shown in Figure 2, the virtual audio-visual coordinate system can be centered on the human head, and the midpoint of the straight line where the left and right ears are located as the coordinate origin. The positive direction represents the side of the human head from the left ear to the right ear. The positive direction of the third coordinate axis, that is, the z-axis, represents the top of the human head, that is, the direction of the top of the head. The sound image has a certain azimuth and elevation (azimuth) in space ( elevation), you can use
Figure PCTCN2022087784-appb-000001
to indicate, rad indicates the distance between the current sound image and the coordinate origin.
一般声信号为单路信号,可视为声像在
Figure PCTCN2022087784-appb-000002
位置上,为了得到某一虚拟声像,可以利用HRTF(Head Related Transfer Function,头相关变 换函数)进行数据卷积实现定位操作。虚拟声像定位示意图如图3所示,其中,X表示某一真实音源(单路信号),Y L、Y R分别表示左耳和右耳收听到的声信号,HRTF表示声信号从音源位置到双耳的传输路径的传递函数。基于HRTF技术可以将真实音源(单路信号)通过某一
Figure PCTCN2022087784-appb-000003
位置上左右耳的HRTF滤波,获得双路声信号。
Generally, the sound signal is a single-channel signal, which can be regarded as the sound image in the
Figure PCTCN2022087784-appb-000002
In terms of position, in order to obtain a certain virtual sound image, HRTF (Head Related Transfer Function, head related transformation function) can be used to perform data convolution to realize the localization operation. The schematic diagram of virtual sound image localization is shown in Figure 3, where X represents a real sound source (single-channel signal), Y L and Y R represent the sound signals heard by the left ear and right ear respectively, and HRTF represents the position of the sound signal from the sound source. Transfer function of the transmission path to both ears. Based on HRTF technology, the real audio source (single-channel signal) can be passed through a certain
Figure PCTCN2022087784-appb-000003
The HRTF filtering of the left and right ears on the position to obtain a two-way acoustic signal.
左、右耳接收声信号的频域特性可表示为:The frequency domain characteristics of the acoustic signals received by the left and right ears can be expressed as:
Figure PCTCN2022087784-appb-000004
Figure PCTCN2022087784-appb-000004
可以简单的认为人耳听到的声信号是声源X经HRTF滤波的结果。因此,在进行虚拟声像定位时,可以将声信号经过对应位置的HRTF进行滤波。在虚拟声像坐标系中可以设定多个虚拟声像,每个虚拟声像与坐标原点的距离可以在设定距离范围内,如1米范围,每个虚拟声像相对于虚拟声像坐标系的第一坐标轴和第二坐标轴构成的平面的俯仰角可以在设定角度范围内,如10°范围,使得多个虚拟声像环绕人耳。It can be simply considered that the acoustic signal heard by the human ear is the result of the HRTF filtering of the sound source X. Therefore, when performing virtual sound image localization, the sound signal can be filtered through the HRTF of the corresponding position. In the virtual sound image coordinate system, multiple virtual sound images can be set, and the distance between each virtual sound image and the coordinate origin can be within the set distance range, such as 1 meter range, and each virtual sound image is relative to the virtual sound image coordinates. The pitch angle of the plane formed by the first coordinate axis and the second coordinate axis of the system may be within a set angle range, such as a range of 10°, so that multiple virtual sound images surround the human ear.
具体的,多个虚拟声像的每个虚拟声像可以均匀分布在第一坐标轴和第二坐标轴构成的平面的一周。即以相同的间隔角度环绕人耳水平面一周。该间隔角度可以根据实际情况或者对历史数据的分析进行设定,如设定为30°。如果设定间隔角度为30°,则以30°为间隔环绕人耳水平面一周,可以定位出12个虚拟声像,这12个虚拟声像的仰角为0°,方位角分别为:0°、30°、60°、…、330°。当然,间隔角度还可以设定为其他值,如15°、60°等。Specifically, each virtual sound image of the plurality of virtual sound images may be uniformly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis. That is, it surrounds the horizontal plane of the human ear at the same interval angle. The interval angle can be set according to the actual situation or analysis of historical data, for example, it is set to 30°. If the interval angle is set to 30°, 12 virtual sound images can be located around the horizontal plane of the human ear at 30° intervals. The elevation angle of these 12 virtual sound images is 0°, and the azimuth angles are: 0°, 30°, 60°, …, 330°. Of course, the interval angle can also be set to other values, such as 15°, 60°, and so on.
在另一种实施方式中,多个虚拟声像中,位于人头后方的虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的仰角可以大于位于人头前方的虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的仰角。即多个虚拟声像中,位于人头后方的虚拟声像的仰角可以大于位于人头前方的虚拟声像的仰角。这样可以增强定位效果,减少虚拟声像的前后镜像问题。如可以将位于人头后方的虚拟声像的仰角上调10°,即位于人头前方的虚拟声像的仰角θ=0°,位于人头后方的虚拟声像的仰角θ=10°。In another embodiment, among the plurality of virtual sound images, the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis may be greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis and the second coordinate axis. The elevation angle of the plane formed by the first coordinate axis and the second coordinate axis. That is, among the plurality of virtual audio images, the elevation angle of the virtual audio image located behind the human head may be greater than the elevation angle of the virtual audio image located in front of the human head. This enhances the localization effect and reduces the front and rear mirroring problems of the virtual sound image. For example, the elevation angle of the virtual audio image located behind the head can be increased by 10°, that is, the elevation angle of the virtual audio image located in front of the human head is θ=0°, and the elevation angle of the virtual audio image located behind the human head is θ=10°.
如图4所示,多个虚拟声像以30°为间隔环绕人耳水平面一周,位于 人头前方的虚拟声像的仰角θ=0°,位于人头后方的虚拟声像的仰角θ=10°。As shown in Fig. 4, a plurality of virtual sound images surround the horizontal plane of the human ear at intervals of 30°, the elevation angle of the virtual sound image in front of the human head is 0=0°, and the elevation angle of the virtual sound image behind the head is θ=10°.
需要说明的是,设定的多个虚拟声像在虚拟声像坐标系中的位置不限于上面所提到的几种,还可以根据实际需要进行具体设定,只需要满足每个虚拟声像与坐标原点的距离在设定距离范围内,每个虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的俯仰角在设定角度范围内即可。如多个虚拟声像中一部分虚拟声像以30°为间隔环绕人耳平面一周,仰角为0°,另一部分虚拟声像以60°为间隔环绕人耳平面一周,仰角为10°,这两部分虚拟声像与坐标原点的距离可以相同或不同,但都在设定距离范围内,这样将会增强后续生成的合唱音频的环绕效果。It should be noted that the positions of the plurality of virtual sound images in the virtual sound image coordinate system are not limited to the ones mentioned above, and can also be specifically set according to actual needs, and only need to satisfy each virtual sound image. The distance from the coordinate origin is within the set distance range, and the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis may be within the set angle range. For example, some of the virtual sound images of the multiple virtual sound images surround the plane of the human ear at intervals of 30°, and the elevation angle is 0°. The distance between some virtual sound images and the coordinate origin can be the same or different, but they are all within the set distance range, which will enhance the surround effect of the subsequently generated chorus audio.
对进行时间对齐处理后的多个干声音频进行虚拟声像定位,将多个干声音频定位到多个虚拟声像上后,可以继续执行后续步骤的操作。The virtual sound image localization is performed on the multiple dry sound audio frequencies after the time alignment processing, and after the multiple dry sound audio frequencies are located on the multiple virtual sound images, the operations of the subsequent steps can be continued.
S140:基于虚拟声像定位后的多个干声音频,生成合唱音频。S140: Generate chorus audio based on the plurality of dry audio audios after virtual sound image localization.
在分别获得多个演唱者对同一目标歌曲进行演唱的干声音频,对多个干声音频进行时间对齐处理,并对对齐后的干声音频进行虚拟声像定位,将多个干声音频定位到多个虚拟声像上之后,可以使得多个干声音频中每个干声音频分别经过对应的虚拟声像位置的HRTF滤波处理,在每个虚拟声像处可得到对应的音频数据。基于虚拟声像定位后的多个干声音频,可以生成合唱音频。具体的,可以将经过多个虚拟声像位置的HRTF滤波处理后得到的对应的音频数据进行叠加,或者加权叠加,得到合唱音频。得到的合唱音频的音效具有三维声场听感。After obtaining the dry audio of the same target song performed by multiple singers respectively, perform time alignment processing on the plurality of dry audios, and perform virtual sound image positioning on the aligned dry audio, and locate the plurality of dry audios. After being placed on the multiple virtual audio images, each dry audio audio in the multiple dry audio audio can be subjected to HRTF filtering processing at the corresponding virtual audio image position, and corresponding audio data can be obtained at each virtual audio image. Chorus audio may be generated based on the plurality of dry audio audio after virtual panning. Specifically, the corresponding audio data obtained after HRTF filtering processing of multiple virtual sound image positions may be superimposed, or weighted and superimposed to obtain chorus audio. The sound effect of the obtained chorus audio has a three-dimensional sound field sense of hearing.
S150:在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。S150: In the case of acquiring the lead vocal audio sung based on the target song, after synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment, output the chorus effect audio.
在本申请实施例的一个应用场景中,生成合唱音频后,可以将合唱音频存储于数据库中,待需要时使用。比如,某个用户想要自己演唱的歌曲具有合唱效果,这种情况下就可以利用合唱音频达到相应效果。In an application scenario of the embodiment of the present application, after the chorus audio is generated, the chorus audio can be stored in a database and used when needed. For example, if a user wants to sing a song with a chorus effect, in this case, the chorus audio can be used to achieve the corresponding effect.
可以获取当前用户基于目标歌曲演唱的音频,将该音频作为主唱音频,然后将主唱音频、合唱音频和相应的伴奏进行合成,得到大合唱效果音频,输出大合唱效果音频,当前用户即可享受到大合唱音效。You can obtain the audio sung by the current user based on the target song, use the audio as the lead vocal audio, and then synthesize the lead vocal audio, chorus audio and the corresponding accompaniment to obtain the chorus effect audio, output the chorus effect audio, and the current user can enjoy the chorus sound effect. .
对于主唱音频、合唱音频和相应的伴奏的合成,可以通过多种方式实 现,如先将主唱音频和相应的伴奏进行合成,再与合唱音频进行合成,或者,先将合唱音频和相应的伴奏进行合成,再与主唱音频进行合成,再或者,先将主唱音频和合唱音频进行合成,再与相应的伴奏进行合成,如可以对主唱音频和合唱音频进行均衡调整后,按照设定的声伴比叠加相应的伴奏。不同实现方式得到的大合唱音效会有所差别,可以根据实际情况选择具体实现方式。For the synthesis of the main vocal audio, chorus audio and corresponding accompaniment, it can be realized in various ways, such as synthesizing the main vocal audio and the corresponding accompaniment first, and then synthesizing it with the chorus audio, or, first synthesizing the chorus audio and the corresponding accompaniment Synthesize, and then synthesize with the lead vocal audio, or, first synthesize the lead vocal audio and chorus audio, and then synthesize with the corresponding accompaniment. Layer the corresponding accompaniment. The chorus sound effect obtained by different implementation methods will be different, and the specific implementation method can be selected according to the actual situation.
应用本申请实施例所提供的方法,分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,对获得的多个干声音频进行时间对齐处理,并对对齐后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上,多个虚拟声像位于以人头为中心的虚拟声像坐标系中,与坐标原点的距离在设定距离范围内,环绕人耳,基于虚拟声像定位后的多个干声音频,生成合唱音频,并在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合唱,得到并输出大合唱效果音频。将多个干声音频定位到环绕人耳的多个虚拟声像上,可以使得生成的合唱音频具有声场环绕音效,在听感上,可以有效避免最终输出的大合唱效果音频的声场聚集在人头中心而产生的头中效应,使得声场更加宽阔。Applying the method provided by the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, perform time alignment processing on the obtained plurality of dry voice audio, and align the plurality of dry voice audio after the alignment. The audio performs virtual sound image localization to locate multiple dry audio audios on multiple virtual sound images. The multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance. Within the range, surround the human ear, generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment are performed. Chorus, get and output the large chorus effect audio. Positioning multiple dry audio audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround effect. In terms of hearing, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
在本申请的一个实施例中,步骤S120对获得的多个干声音频进行时间对齐处理,可以包括以下步骤:In an embodiment of the present application, step S120 performs time alignment processing on the obtained plurality of dry audio frequencies, which may include the following steps:
第一个步骤:确定目标歌曲对应的参考音频;The first step: determine the reference audio corresponding to the target song;
第二个步骤:针对获得的每个干声音频,分别提取当前干声音频和参考音频的音频特征,音频特征为指纹特征或基频特征;The second step: for each obtained dry audio frequency, extract the audio features of the current dry audio audio and the reference audio respectively, and the audio features are fingerprint features or fundamental frequency features;
第三个步骤:将当前干声音频与参考音频的音频特征相似度最大值对应的时间确定为音频对齐时间;The third step: determining the time corresponding to the maximum value of the audio feature similarity between the current dry audio and the reference audio as the audio alignment time;
第四个步骤:基于音频对齐时间,对当前干声音频进行时间对齐处理。The fourth step: performing time alignment processing on the current dry audio audio based on the audio alignment time.
为便于描述,将上述几个步骤结合起来进行说明。For the convenience of description, the above steps are combined for description.
在本申请实施例中,分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,在对获得的多个干声音频进行时间对齐处理的过程中,可以先确定目标歌曲对应的参考音频。具体的,可以从获得的多个干声音频中 选择出音质较好的一个干声音频,作为参考音频。还可以将目标歌曲的原唱干声音频确定为参考音频。In the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, in the process of performing time alignment processing on the obtained plurality of dry voice audio, the reference corresponding to the target song may be determined first. audio. Specifically, a dry audio audio with better sound quality can be selected from the plurality of obtained dry audio audios as a reference audio. The original dry vocal audio of the target song may also be determined as the reference audio.
针对获得的每个干声音频,可以分别提取当前干声音频和参考音频的音频特征,音频特征为指纹特征或基频特征。如可以通过多频带滤波提取Mel频段信息、Bark频段信息、Erb频段功率等,然后通过半波整流、二值判断等得到指纹特征。又如可以通过pyin、crepe、harvest等基频提取工具提取基频特征。参考音频的音频特征可以在提取一次后保存,在有需要时直接调用即可。For each obtained dry audio audio, audio features of the current dry audio audio and the reference audio can be extracted respectively, and the audio features are fingerprint features or fundamental frequency features. For example, Mel frequency band information, Bark frequency band information, Erb frequency band power, etc. can be extracted through multi-band filtering, and then fingerprint features can be obtained through half-wave rectification, binary judgment, etc. For another example, fundamental frequency features can be extracted by fundamental frequency extraction tools such as pyin, crepe, and harvest. The audio features of the reference audio can be saved after being extracted once, and can be called directly when necessary.
将当前干声音频与参考音频的音频特征进行比较,可以通过相似度曲线等表征,可以将相似度最大值对应的时间确定为音频对齐时间。然后基于音频对齐时间,对当前干声音频进行时间对齐处理。The audio features of the current dry audio and the reference audio are compared, which can be characterized by a similarity curve, etc., and the time corresponding to the maximum similarity value can be determined as the audio alignment time. Then, based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
对于获得的每个干声音频均通过与参考音频的音频特征的比较,得到相应的音频对齐时间,并进行时间对齐处理后,即可得到时间对齐处理后的多个干声音频。For each obtained dry audio frequency, the corresponding audio alignment time is obtained by comparing with the audio features of the reference audio, and after time alignment processing is performed, multiple dry audio audios after time alignment processing can be obtained.
在本申请的一个实施例中,该方法还可以包括以下步骤:In an embodiment of the present application, the method may further include the following steps:
分别对获得的多个干声音频进行带通滤波处理,得到多个低音数据;Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data;
相应的,基于进行虚拟声像定位后的多个干声音频,生成合唱音频,包括:Correspondingly, chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
基于进行虚拟声像定位后的多个干声音频和多个低音数据,生成合唱音频。The chorus audio is generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
在本申请实施例中,在分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,可以分别对获得的多个干声音频进行带通滤波处理,如对多个干声音频进行截止频率为[33,523]Hz的带通滤波处理,得到多个低音数据。In this embodiment of the present application, after obtaining the dry audios of the same target song performed by multiple singers, bandpass filtering may be performed on the obtained dry audios, for example, performing bandpass filtering on the plurality of dry audios. Bandpass filtering with a cutoff frequency of [33,523] Hz to obtain multiple bass data.
基于进行虚拟声像定位后的多个干声音频和多个低音数据,可以生成合唱音频。具体的,可以将得到的多个低音数据与基于进行虚拟声像定位后的多个干声音频进行叠加或者加权叠加等处理,生成合唱音频。叠加低音信号后,可以增强声信号的厚重感。Chorus audio may be generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning. Specifically, chorus audio may be generated by superimposing or weighted superimposing a plurality of obtained bass data and a plurality of dry audio audios based on virtual sound image localization. After superimposing the bass signal, the heaviness of the sound signal can be enhanced.
在本申请的一个实施例中,该方法还可以包括以下步骤:In an embodiment of the present application, the method may further include the following steps:
分别对获得的多个干声音频进行混响模拟处理;Reverberation simulation processing is performed on the obtained multiple dry audio frequencies respectively;
相应的,基于进行虚拟声像定位后的多个干声音频,生成合唱音频,包括:Correspondingly, chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
基于进行虚拟声像定位后的多个干声音频和混响模拟处理后的多个干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio frequencies subjected to virtual sound image localization and the plurality of dry audio frequencies subjected to reverberation simulation processing.
通常,声场中声源发出的声信号会经过直达声、反射、混响等过程。图5所示为一种典型的空间声场过程组成示意图。在该图中,幅度最大的声信号为直达声,紧接着的声信号是声波在距离听者最近的物体上反射得到的反射声信号,其具有明显的方向性,之后的一段密集的声信号是由声波经过周围物体的多次反射后叠加得到的混响声信号,是大量不同方位反射声的叠加,没有方向性。Usually, the sound signal emitted by the sound source in the sound field will go through processes such as direct sound, reflection, and reverberation. Figure 5 shows a schematic diagram of a typical spatial sound field process composition. In this figure, the acoustic signal with the largest amplitude is the direct sound, the following acoustic signal is the reflected acoustic signal obtained by the sound wave reflected on the object closest to the listener, which has obvious directionality, and then a dense acoustic signal It is the reverberation sound signal obtained by the superposition of sound waves after multiple reflections of surrounding objects. It is the superposition of a large number of reflected sounds from different directions without directionality.
根据已知的房间脉冲响应特性,混响声为多路反射声的叠加,特点是能量弱、没有方向性,因为其是大量来自不同方位的后期反射声的叠加,具有较高的回声密度,所以可以利用混响产生具有包围感的环绕音效。According to the known room impulse response characteristics, the reverberation sound is the superposition of multiple reflection sounds, which is characterized by weak energy and no directionality, because it is the superposition of a large number of late reflection sounds from different directions, and has a high echo density, so You can use reverb to create a surround sound with a sense of surround.
在本申请实施例中,在分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,可以分别对获得的多个干声音频进行混响模拟处理。具体的,可以利用梳状滤波器和全通滤波器的级联分别对获得的多个干声音频进行混响模拟处理。In this embodiment of the present application, after obtaining the dry voice audio of the same target song performed by multiple singers respectively, reverberation simulation processing may be performed on the obtained multiple dry voice audio respectively. Specifically, a cascade of comb filters and all-pass filters can be used to perform reverberation simulation processing on the obtained multiple dry audio frequencies respectively.
图6所示为梳状滤波器和全通滤波器的一种级联形式,其中,四个梳状滤波器并联后与两个全通滤波器串联。实际模拟得到的混响脉冲响应如图7所示。Figure 6 shows a cascaded form of comb filters and all-pass filters, in which four comb filters are connected in parallel with two all-pass filters in series. The actual simulated reverberation impulse response is shown in Figure 7.
需要说明的是,图6所示仅为一种具体形式,在实际应用中,可以有其他更多形式,梳状滤波器、全通滤波器的个数和级联方式都可以根据实际需要调整。It should be noted that Figure 6 is only a specific form. In practical applications, there can be other forms. The number of comb filters and all-pass filters and the cascading method can be adjusted according to actual needs. .
分别对获得的多个干声音频进行混响模拟处理,及进行虚拟声像定位,将多个干声音频定位到多个虚拟声像上之后,可以基于进行虚拟声像定位后的多个干声音频和混响模拟处理后的多个干声音频,生成合唱音频。具体的,可以将进行虚拟声像定位后的多个干声音频和混响模拟处理后的多个干声音频进行叠加或者加权叠加处理,生成合唱音频。这样可以增强声 信号的空间音效,进一步抑制头中效应,扩展声场。Perform reverberation simulation processing and virtual sound image localization on the obtained multiple dry sound audios respectively. After positioning the multiple dry sound audios on multiple virtual sound images, you can Acoustic tones and reverbs simulate multiple dry-voiced tones after processing to generate chorus audio. Specifically, multiple dry audio audios after virtual sound image localization and multiple dry audio audios after reverberation simulation processing may be superimposed or weighted superimposed to generate chorus audio. This enhances the spatial effect of the sound signal, further suppresses the head-in-head effect, and expands the sound field.
在本申请的一个实施例中,在对进行时间对齐处理后的多个干声音频进行虚拟声像定位之后,该方法还可以包括以下步骤:In an embodiment of the present application, after performing virtual sound image localization on the plurality of dry audio audios subjected to time alignment processing, the method may further include the following steps:
分别对进行虚拟声像定位后的多个干声音频进行混响模拟处理;Reverberation simulation processing is performed on the multiple dry audio frequencies after virtual sound image localization;
相应的,基于进行虚拟声像定位后的多个干声音频,生成合唱音频,包括:Correspondingly, chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
基于进行虚拟声像定位,且进行混响模拟处理后的多个干声音频,生成合唱音频。The chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
在本申请实施例中,在分别获得多个演唱者对同一目标歌曲进行演唱的干声音频,对获得的多个干声音频进行时间的对齐处理,并进行虚拟声像定位之后,可以进一步分别对进行虚拟声像定位后的多个干声音频进行混响模拟处理,混响模拟处理过程可以参考上一实施例的混响模拟处理过程,这里不再赘述。In the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by multiple singers respectively, performing time alignment processing on the obtained multiple dry voice audio frequency, and performing virtual sound image localization, it is possible to further separate the Reverberation simulation processing is performed on the plurality of dry audio frequencies after virtual sound image localization. For the reverberation simulation processing process, reference may be made to the reverberation simulation processing process in the previous embodiment, which will not be repeated here.
基于进行虚拟声像定位,且进行混响模拟处理后的多个干声音频,可以生成合唱音频。具体的,可以将进行虚拟声像定位,且进行混响模拟处理后的多个干声音频进行叠加或者加权叠加等处理,生成合唱音频。A chorus audio can be generated based on the plurality of dry audio audios subjected to virtual sound image localization and subjected to reverberation simulation processing. Specifically, the chorus audio may be generated by superimposing or weighted superimposing a plurality of dry audio audios after performing virtual sound image localization and performing reverberation simulation processing.
对进行虚拟声像处理后的多个干声音频进行混响模拟处理,可以增强声信号的空间音效,进一步抑制头中效应,扩展声场。Performing reverberation simulation processing on multiple dry audio frequencies after virtual sound image processing can enhance the spatial sound effects of the sound signal, further suppress the head-in-head effect, and expand the sound field.
在本申请的一个实施例中,该方法还可以包括以下步骤:In an embodiment of the present application, the method may further include the following steps:
分别对获得的多个干声音频进行双声道模拟处理;Perform binaural analog processing on the obtained multiple dry audio frequencies respectively;
相应的,基于进行虚拟声像定位后的多个干声音频,生成合唱音频,包括:Correspondingly, chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
基于进行虚拟声像定位后的多个干声音频和双声道模拟处理后的多个干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing.
在本申请实施例中,在分别获得多个演唱者对同一目标歌曲进行演唱的干声音频,且对获得的多个干声音频进行时间对齐处理后,可以分别对多个干声音频进行双声道模拟处理。通过延迟来降低两声道信号相关性,尽量扩展声场得到双路输出。In the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, and performing time alignment processing on the obtained plurality of dry voice audio, the dual Channel analog processing. The correlation between the two-channel signals is reduced by delay, and the sound field is expanded as much as possible to obtain two-channel output.
如图8所示,多个干声音频可以通过左右各8组不同的延迟加权实现 双声道模拟,其中,d表示延迟,g表示权重。由于一般房间脉冲响应取80ms作为混响时间,所以延迟参数可以选择21ms~79ms中不等的16个参数。利用幅度衰减来表示声波因反射作用而造成的能量损失,由此可以降低两路环境信息的相关性。即可以将干声音频分别做拷贝得到信息相同的两路信号,两路信号完全相关,再利用不同的延迟与幅度进行衰减,降低两路信号的相关性,以得到伪立体声信号。As shown in Figure 8, a plurality of dry audio audios can be simulated by 8 groups of different delay weights on the left and right, where d represents the delay and g represents the weight. Since the general room impulse response takes 80ms as the reverberation time, the delay parameter can choose 16 parameters ranging from 21ms to 79ms. The amplitude attenuation is used to represent the energy loss of the sound wave due to reflection, thereby reducing the correlation between the two environmental information. That is, the dry audio can be copied separately to obtain two signals with the same information. The two signals are completely correlated, and then attenuated by different delays and amplitudes to reduce the correlation of the two signals to obtain a pseudo stereo signal.
需要说明的是,图8所示仅为一种具体示例,可以根据实际需要,设置较少组或更多组不同的延迟实现双声道模拟。It should be noted that what is shown in FIG. 8 is only a specific example, and it is possible to set fewer groups or more groups of different delays to implement two-channel simulation according to actual needs.
基于进行虚拟声像定位后的多个干声音频和双声道模拟处理后的多个干声音频,可以生成合唱音频。具体的,可以将进行虚拟声像定位后的多个干声音频和双声道模拟处理后的多个干声音频进行叠加或者加权叠加等处理,生成合唱音频。A chorus audio may be generated based on the plurality of dry audio frequencies after virtual panning and the plurality of dry audio frequencies after binaural simulation processing. Specifically, multiple dry audio audios after virtual sound image localization and multiple dry audio audios after binaural simulation processing may be superimposed or weighted superimposed to generate chorus audio.
在本申请的一个实施例中,在分别对获得的多个干声音频进行双声道模拟处理之后,该方法还可以包括以下步骤:In an embodiment of the present application, after performing binaural simulation processing on the obtained plurality of dry audio frequencies, the method may further include the following steps:
对进行双声道模拟处理后的多个干声音频进行混响模拟处理;Perform reverberation simulation processing on a plurality of dry audio frequencies after two-channel simulation processing;
相应的,基于进行虚拟声像定位后的多个干声音频和双声道模拟处理后的多个干声音频,生成合唱音频,包括:Correspondingly, chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing, including:
基于进行虚拟声像定位后的多个干声音频、双声道模拟处理及混响模拟处理后的多个干声音频,生成合唱音频。The chorus audio is generated based on the plurality of dry sound audio frequencies after virtual sound image localization, the plurality of dry sound audio frequencies after the binaural simulation processing and the reverberation simulation processing.
在本申请实施例中,在分别获得多个演唱者对同一目标歌曲进行演唱的干声音频,对多个干声音频进行时间对齐处理,且分别对多个干声音频进行双声道模拟处理之后,可以进一步对进行双声道模拟处理后的多个干声音频进行混响模拟处理,以增强声信号的空间效应,抑制头中效应,扩展声场。In the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, time alignment processing is performed on the plurality of dry voice audio frequencies, and binaural simulation processing is performed on the plurality of dry voice audio frequencies respectively. After that, reverberation simulation processing may be further performed on the plurality of dry audio frequencies after the binaural simulation processing, so as to enhance the spatial effect of the sound signal, suppress the head effect, and expand the sound field.
对多个干声音频进行虚拟声像定位,将多个干声音频定位到多个虚拟声像上之后,可以基于进行虚拟声像定位后的多个干声音频、双声道模拟处理及混响模拟处理后的多个干声音频,生成合唱音频。具体的,可以对虚拟声像定位后的多个干声音频、双声道模拟处理及混响模拟处理后的多个干声音频进行叠加或者加权叠加等处理,生成合唱音频。Perform virtual sound image localization on multiple dry sound audios. After positioning multiple dry sound audios on multiple virtual sound images, you can perform dual-channel analog processing and mixing based on the multiple dry sound audios after virtual sound image localization. The chorus audio is generated by ringing multiple dry audio frequencies after analog processing. Specifically, multiple dry audio audios after virtual sound image localization, and multiple dry audio audios after binaural simulation processing and reverberation simulation processing may be superimposed or weighted superimposed to generate chorus audio.
在实际应用中,在获得多个演唱者对同一目标歌曲进行演唱的干声音频后,可以先对获得的多个干声音频进行时间对齐处理,然后再基于进行时间对齐处理后的多个干声音频进行虚拟声像定位、低音增强、混响模拟、双声道模拟等处理,具体的处理可以综合上面各实施例进行,通过对多个干声音频进行虚拟声像定位、低音增强、混响模拟、双声道模拟,使得最后生成的合唱音频具有声场环绕音效,可以对较大范围的声音不对齐具有较高的鲁棒性,如果要将合唱音频与主唱音频进行叠加,即使主唱音频延迟落差较大,也可以保证用户具有和谐的听觉体验。In practical applications, after obtaining the dry audio of the same target song performed by multiple singers, time alignment processing can be performed on the obtained dry audio, and then based on the time alignment processing of the plurality of dry audios The sound audio is processed by virtual sound image localization, bass enhancement, reverberation simulation, two-channel simulation, etc. The specific processing can be carried out in combination with the above embodiments. Sound simulation and two-channel simulation, so that the final generated chorus audio has a surround sound effect, which can be highly robust to a wide range of sound misalignment. If you want to superimpose the chorus audio and the lead vocal The delay drop is large, which can also ensure that the user has a harmonious listening experience.
图9所示为对进行时间对齐处理后的多个干声音频进行处理的系统框架示意图,其中包括低音增强单元、虚拟声像定位单元、双声道模拟单元和混响模拟单元。低音增强单元用于对多个干声音频进行带通滤波处理,得到低音数据;虚拟声像定位单元用于对多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上;双声道模拟单元用于对多个干声音频进行双声道模拟处理;混响模拟单元用于对多个干声音频进行混响模拟处理。虚拟声像定位单元和双声道模拟单元均可与混响模拟单元连接,通过虚拟声像定位单元对多个干声音频进行虚拟声像定位后,可以进一步通过混响模拟单元进行混响模拟处理,同样,通过双声道模拟单元对多个干声音频进行双声道模拟处理之后,可以进一步通过混响模拟单元进行混响模拟处理。最终可以对经过这些单元处理后的音频数据进行加权叠加,得到合唱音频。FIG. 9 is a schematic diagram of a system framework for processing multiple dry audio frequencies after time alignment processing, including a bass enhancement unit, a virtual sound image localization unit, a two-channel simulation unit and a reverberation simulation unit. The bass enhancement unit is used to perform bandpass filtering on multiple dry audios to obtain bass data; the virtual sound image localization unit is used to perform virtual sound image localization on multiple dry audios, so as to locate the multiple dry audios to multiple The two-channel simulation unit is used for performing dual-channel simulation processing on a plurality of dry sound audios; the reverberation simulation unit is used for performing reverberation simulation processing on a plurality of dry sound audio frequencies. Both the virtual sound image localization unit and the two-channel simulation unit can be connected to the reverberation simulation unit. After the virtual sound image localization of multiple dry audio frequencies is performed by the virtual sound image localization unit, the reverberation simulation unit can be further used for reverberation simulation. Similarly, after the binaural simulation processing is performed on the plurality of dry audio frequencies by the binaural simulation unit, the reverberation simulation processing may be further performed by the reverberation simulation unit. Finally, weighted superposition can be performed on the audio data processed by these units to obtain chorus audio.
图10所示为对多个干声音频进行处理的一种具体示例,H表示HRTF滤波的传递函数,通过该传递函数的处理可以对多个干声音频进行虚拟声像定位,将多个干声音频定位到环绕人耳水平面的12个虚拟声像上,REV表示混响模拟单元,BASS表示低音增强单元,REF表示双声道模拟单元。这里的混响模拟单元可以使用同一参数,还可以根据实际需求,对不同的混响模拟单元配置不同参数,得到灵活的混响调制。Figure 10 shows a specific example of processing multiple dry audio frequencies, H represents the transfer function of HRTF filtering, through the processing of this transfer function, virtual sound image localization can be performed on The sound audio is positioned on 12 virtual sound images around the human ear level, REV represents the reverberation analog unit, BASS represents the bass enhancement unit, and REF represents the two-channel analog unit. The reverberation simulation unit here can use the same parameters, and you can also configure different parameters for different reverberation simulation units according to actual needs, so as to obtain flexible reverberation modulation.
本申请实施例最后生成的合唱音频的大合唱效果与真实演唱会合唱的听感更加逼近。在实际应用中,在主唱音频基础上添加伴奏,同时混入合唱音频,可以让用户在听感上具有身临其境的演唱会体验,获得更加震撼 的沉浸式声场包围体验。The grand chorus effect of the chorus audio finally generated in the embodiment of the present application is closer to the hearing sense of a real concert chorus. In practical applications, adding accompaniment on the basis of the lead vocal audio and mixing the chorus audio at the same time allows users to have an immersive concert experience and a more shocking immersive sound field surround experience.
在本申请的一个实施例中,对进行时间对齐处理后的多个干声音频进行虚拟声像定位,可以包括以下步骤:In an embodiment of the present application, performing virtual sound image localization on a plurality of dry audio frequencies after time alignment processing may include the following steps:
步骤一:按照虚拟声像的个数,将获得的进行时间对齐处理后的多个干声音频进行分组,组数与虚拟声像的个数相同;Step 1: according to the number of virtual audio images, group the obtained multiple dry audio audios after time alignment processing, and the number of groups is the same as the number of virtual audio images;
步骤二:将各组干声音频分别定位到对应的虚拟声像上,不同组干声音频对应不同虚拟声像。Step 2: Position each group of dry audio audio on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
为便于描述,将上述两个步骤结合起来进行说明。For the convenience of description, the above two steps are combined for description.
在本申请实施例中,分别获得多个演唱者对同一目标歌曲进行演唱的干声音频,并对获得的多个干声音频进行时间对齐处理之后,可以按照虚拟声像的个数,将获得的进行时间对齐处理后的多个干声音频进行分组,分得的组数与虚拟声像的个数相同,同一个组内包括若干个干声音频。如果获得的干声音频数量较多,则可以使得同一干声音频仅在一个组中,如果获得的干声音频数量较少,则可以使得同一干声音频在多个组中,以更好地实现大合唱音效。In the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, and performing time alignment processing on the obtained plurality of dry voice audio frequency, the obtained dry voice audio can be obtained according to the number of virtual The plurality of dry audio audios after time alignment processing are grouped, and the number of divided groups is the same as the number of virtual audio images, and the same group includes several dry audio audios. If the number of obtained dry audios is large, the same dry audio can be in only one group; if the number of obtained dry audios is small, the same dry audio can be in multiple groups to better Achieve large chorus sound.
将多个干声音频进行分组后,可以将各组干声音频分别定位到对应的虚拟声像上,不同组干声音频对应不同虚拟声像。实现对多个干声音频的虚拟声像的定位处理,增强大合唱音效。After a plurality of dry sound tones are grouped, each group of dry sound tones can be positioned on corresponding virtual sound images respectively, and different groups of dry sound tones correspond to different virtual sound images. Realize the localization processing of the virtual sound images of multiple dry audios, and enhance the sound effect of the chorus.
在本申请的一个实施例中,将主唱音频、合唱音频和相应的伴奏进行合成,可以包括以下步骤:In an embodiment of the present application, synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment may include the following steps:
对主唱音频和合唱音频分别进行音量调整,和/或,对主唱音频和合唱音频进行混响模拟处理;Perform volume adjustment on lead vocal audio and chorus audio respectively, and/or perform reverberation simulation processing on lead vocal audio and chorus audio;
将进行音量调整和/或进行混响模拟处理后的主唱音频、合唱音频和相应的伴奏进行合成。Synthesize the lead vocal audio, chorus audio and corresponding accompaniment after volume adjustment and/or reverb simulation processing.
在本申请实施例中,在获取到基于目标歌曲演唱的主唱音频后,可以对主唱音频和合唱音频分别进行音量调整,使得主唱音频和合唱音频的音量相当,或者主唱音频的音量大于合唱音频的音量。同时,还可以对主唱音频和合唱音频进行混响模拟处理,以得到具有包围感的环绕音效。In this embodiment of the present application, after obtaining the lead vocal audio sung based on the target song, the volume of the lead vocal audio and the chorus audio can be adjusted respectively, so that the volume of the lead vocal audio and the chorus audio are equal, or the volume of the lead vocal audio is greater than that of the chorus audio. volume. At the same time, reverberation simulation processing can also be performed on the lead vocal audio and the chorus audio to obtain a surround sound with a sense of surround.
再将进行音量调整和/或进行混响模拟处理后的主唱音频、合唱音频和 相应的伴奏进行合成,使得最后输出的大合唱效果音频为用户带来更好的听感体验。Then, the lead vocal audio, chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized, so that the final output chorus effect audio brings a better listening experience to the user.
相应于上面的方法实施例,本申请实施例还提供了一种合唱音频的处理装置,下文描述的合唱音频的处理装置与上文描述的合唱音频的处理方法可相互对应参照。Corresponding to the above method embodiments, embodiments of the present application further provide a chorus audio processing apparatus, and the chorus audio processing apparatus described below and the chorus audio processing method described above may refer to each other correspondingly.
参见图11所示,该装置可以包括以下模块:Referring to Figure 11, the device may include the following modules:
干声音频获得模块1110,用于分别获得多个演唱者对同一目标歌曲进行演唱的干声音频;A dry-sound audio obtaining module 1110 is used to obtain the dry-sound audio of the same target song performed by a plurality of singers respectively;
时间对齐处理模块1120,用于对获得的多个干声音频进行时间对齐处理;a time alignment processing module 1120, configured to perform time alignment processing on the obtained multiple dry audio frequencies;
虚拟声像定位模块1130,用于对进行时间对齐处理后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上;其中,多个虚拟声像位于预先建立的虚拟声像坐标系中,虚拟声像坐标系以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个虚拟声像与坐标原点的距离在设定距离范围内,每个虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的俯仰角在设定角度范围内;The virtual sound image localization module 1130 is configured to perform virtual sound image localization on the plurality of dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on the plurality of virtual sound images; It is located in the pre-established virtual audio-visual coordinate system. The virtual audio-visual coordinate system is centered on the human head and the coordinate origin is the midpoint of the straight line where the left and right ears are located. The positive direction of the first coordinate axis indicates the front of the head, and the positive direction of the second coordinate axis. The direction represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents the top of the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is relative to the first. The pitch angle of the plane formed by one coordinate axis and the second coordinate axis is within the set angle range;
合唱音频生成模块1140,用于基于进行虚拟声像定位后的多个干声音频,生成合唱音频;The chorus audio generation module 1140 is configured to generate chorus audio based on the plurality of dry audio frequencies after virtual sound image localization;
大合唱效果音频获得模块1150,用于在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。The chorus effect audio obtaining module 1150 is configured to output the chorus effect audio after synthesizing the lead singer audio, the chorus audio and the corresponding accompaniment in the case of obtaining the lead singer audio based on the target song singing.
应用本申请实施例所提供的装置,分别获得多个演唱者对同一目标歌曲进行演唱的干声音频后,对获得的多个干声音频进行时间对齐处理,并对对齐后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上,多个虚拟声像位于以人头为中心的虚拟声像坐标系中,与坐标原点的距离在设定距离范围内,环绕人耳,基于虚拟声像定位后的多 个干声音频,生成合唱音频,并在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合唱,得到并输出大合唱效果音频。将多个干声音频定位到环绕人耳的多个虚拟声像上,可以使得生成的合唱音频具有声场环绕音效,在听感上,可以有效避免最终输出的大合唱效果音频的声场聚集在人头中心而产生的头中效应,使得声场更加宽阔。By applying the device provided by the embodiment of the present application, after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, time alignment processing is performed on the obtained plurality of dry voice audio, and the aligned plurality of dry voice audio is processed. The audio performs virtual sound image localization to locate multiple dry audio audios on multiple virtual sound images. The multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance. Within the range, surround the human ear, generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment are performed. Chorus, get and output the large chorus effect audio. Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
在本申请的一种具体实施方式中,时间对齐处理模块1120,用于:In a specific implementation manner of the present application, the time alignment processing module 1120 is used for:
确定目标歌曲对应的参考音频;Determine the reference audio corresponding to the target song;
针对获得的每个干声音频,分别提取当前干声音频和参考音频的音频特征,音频特征为指纹特征或基频特征;For each obtained dry sound audio, the audio features of the current dry sound audio and the reference audio are extracted respectively, and the audio features are fingerprint features or fundamental frequency features;
将当前干声音频与参考音频的音频特征相似度最大值对应的时间确定为音频对齐时间;Determining the time corresponding to the maximum value of the audio feature similarity between the current dry audio and the reference audio as the audio alignment time;
基于音频对齐时间,对当前干声音频进行时间对齐处理。Based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
在本申请的一种具体实施方式中,还包括低音数据获得模块,用于:In a specific embodiment of the present application, it also includes a bass data acquisition module for:
分别对获得的多个干声音频进行带通滤波处理,得到多个低音数据;Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data;
相应的,合唱音频生成模块1140,用于:Correspondingly, the chorus audio generation module 1140 is used for:
基于进行虚拟声像定位后的多个干声音频和多个低音数据,生成合唱音频。The chorus audio is generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
在本申请的一种具体实施方式中,还包括混响模拟处理模块,用于:In a specific implementation manner of the present application, a reverberation simulation processing module is also included for:
分别对获得的多个干声音频进行混响模拟处理;Reverberation simulation processing is performed on the obtained multiple dry audio frequencies respectively;
相应的,合唱音频生成模块1140,用于:Correspondingly, the chorus audio generation module 1140 is used for:
基于进行虚拟声像定位后的多个干声音频和混响模拟处理后的多个干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio frequencies subjected to virtual sound image localization and the plurality of dry audio frequencies subjected to reverberation simulation processing.
在本申请的一种具体实施方式中,混响模拟处理模块,用于:In a specific embodiment of the present application, the reverberation simulation processing module is used for:
利用梳状滤波器和全通滤波器的级联分别对获得的多个干声音频进行混响模拟处理。Reverberation simulation processing is performed on the obtained multiple dry audio frequencies by using the cascade of comb filters and all-pass filters.
在本申请的一种具体实施方式中,混响模拟处理模块,还用于:In a specific embodiment of the present application, the reverberation simulation processing module is also used for:
在对进行时间对齐处理后的多个干声音频进行虚拟声像定位之后,分别对进行虚拟声像定位后的多个干声音频进行混响模拟处理;After performing virtual sound image localization on the plurality of dry sound audios subjected to the time alignment processing, reverberation simulation processing is performed on the plurality of dry sound audio frequencies after the virtual sound image localization;
相应的,合唱音频生成模块1140,用于:Correspondingly, the chorus audio generation module 1140 is used for:
基于进行虚拟声像定位,且进行混响模拟处理后的多个干声音频,生成合唱音频。The chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
在本申请的一种具体实施方式中,还包括双声道模拟处理模块,用于:In a specific embodiment of the present application, it also includes a two-channel analog processing module for:
分别对获得的多个干声音频进行双声道模拟处理;Perform binaural analog processing on the obtained multiple dry audio frequencies respectively;
相应的,合唱音频生成模块1140,用于:Correspondingly, the chorus audio generation module 1140 is used for:
基于进行虚拟声像定位后的多个干声音频和双声道模拟处理后的多个干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing.
在本申请的一种具体实施方式中,混响模拟处理模块,还用于:In a specific embodiment of the present application, the reverberation simulation processing module is also used for:
在分别对获得的多个干声音频进行双声道模拟处理之后,对进行双声道模拟处理后的多个干声音频进行混响模拟处理;After performing binaural simulation processing on the obtained plurality of dry sound audios respectively, performing reverberation simulation processing on the plurality of dry sound audios after the binaural simulation processing;
相应的,合唱音频生成模块1140,用于:Correspondingly, the chorus audio generation module 1140 is used for:
基于进行虚拟声像定位后的多个干声音频、双声道模拟处理及混响模拟处理后的多个干声音频,生成合唱音频。The chorus audio is generated based on the plurality of dry sound audio frequencies after virtual sound image localization, the plurality of dry sound audio frequencies after the binaural simulation processing and the reverberation simulation processing.
在本申请的一种具体实施方式中,虚拟声像定位模块1130,用于:In a specific embodiment of the present application, the virtual sound image localization module 1130 is used for:
按照虚拟声像的个数,将获得的进行时间对齐处理后的多个干声音频进行分组,组数与虚拟声像的个数相同;According to the number of virtual sound images, the obtained multiple dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
将各组干声音频分别定位到对应的虚拟声像上,不同组干声音频对应不同虚拟声像。Each group of dry audio audio is positioned on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
在本申请的一种具体实施方式中,多个虚拟声像中,位于人头后方的虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的仰角大于位于人头前方的虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的仰角;或者,每个虚拟声像均匀分布在第一坐标轴和第二坐标轴构成的平面的一周。In a specific embodiment of the present application, among the plurality of virtual sound images, the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head. is based on the elevation angle of the plane formed by the first coordinate axis and the second coordinate axis; or, each virtual sound image is uniformly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
在本申请的一种具体实施方式中,大合唱效果音频获得模块1150,用于:In a specific embodiment of the present application, the chorus effect audio obtaining module 1150 is used for:
对主唱音频和合唱音频分别进行音量调整,和/或,对主唱音频和合唱音频进行混响模拟处理;Perform volume adjustment on lead vocal audio and chorus audio respectively, and/or perform reverberation simulation processing on lead vocal audio and chorus audio;
将进行音量调整和/或进行混响模拟处理后的主唱音频、合唱音频和相应的伴奏进行合成。Synthesize the lead vocal audio, chorus audio and corresponding accompaniment after volume adjustment and/or reverb simulation processing.
相应于上面的方法实施例,本申请实施例还提供了一种合唱音频的处理设备,包括:Corresponding to the above method embodiments, the embodiments of the present application also provide a chorus audio processing device, including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行计算机程序时实现上述合唱音频的处理方法的步骤。The processor is configured to implement the steps of the above-mentioned chorus audio processing method when the computer program is executed.
如图12所示,为合唱音频的处理设备的组成结构示意图,合唱音频的处理设备可以包括:处理器10、存储器11、通信接口12和通信总线13。处理器10、存储器11、通信接口12均通过通信总线13完成相互间的通信。As shown in FIG. 12 , which is a schematic diagram of the composition and structure of a chorus audio processing device, the chorus audio processing device may include: a processor 10 , a memory 11 , a communication interface 12 and a communication bus 13 . The processor 10 , the memory 11 , and the communication interface 12 all communicate with each other through the communication bus 13 .
在本申请实施例中,处理器10可以为中央处理器(Central Processing Unit,CPU)、特定应用集成电路、数字信号处理器、现场可编程门阵列或者其他可编程逻辑器件等。In this embodiment of the present application, the processor 10 may be a central processing unit (Central Processing Unit, CPU), an application-specific integrated circuit, a digital signal processor, a field programmable gate array, or other programmable logic devices, and the like.
处理器10可以调用存储器11中存储的程序,具体的,处理器10可以执行合唱音频的处理方法的实施例中的操作。The processor 10 may call the program stored in the memory 11, and specifically, the processor 10 may execute the operations in the embodiments of the method for processing chorus audio.
存储器11中用于存放一个或者一个以上程序,程序可以包括程序代码,程序代码包括计算机操作指令,在本申请实施例中,存储器11中至少存储有用于实现以下功能的程序:The memory 11 is used to store one or more programs, and the programs may include program codes, and the program codes include computer operation instructions. In the embodiment of the present application, the memory 11 at least stores a program for realizing the following functions:
分别获得多个演唱者对同一目标歌曲进行演唱的干声音频;Obtain the dry audio of the same target song performed by multiple singers respectively;
对获得的多个干声音频进行时间对齐处理;Perform time alignment processing on the obtained multiple dry audio frequencies;
对进行时间对齐处理后的多个干声音频进行虚拟声像定位,以将多个干声音频定位到多个虚拟声像上;其中,多个虚拟声像位于预先建立的虚拟声像坐标系中,虚拟声像坐标系以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个虚拟声像与坐标原点的距离在设定距离范围内,每个虚拟声像相对于第一坐标轴和第二坐标轴构成的平面的俯仰角在设定角度范围内;Perform virtual sound image localization on the plurality of dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on the plurality of virtual sound images; wherein, the plurality of virtual sound images are located in a pre-established virtual sound image coordinate system , the virtual audio-visual coordinate system is centered on the human head, and the coordinate origin is the midpoint of the straight line where the left and right ears are located. The side of the third coordinate axis is directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is formed relative to the first coordinate axis and the second coordinate axis. The pitch angle of the plane is within the set angle range;
基于进行虚拟声像定位后的多个干声音频,生成合唱音频;Generate chorus audio based on the plurality of dry audio audio after virtual sound image localization;
在获取到基于目标歌曲演唱的主唱音频的情况下,将主唱音频、合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。In the case of acquiring the lead vocal audio sung based on the target song, after synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment, the large chorus effect audio is output.
在一种可能的实现方式中,存储器11可包括存储程序区和存储数据 区,其中,存储程序区可存储操作系统,以及至少一个功能(比如音频播放功能、音频合成功能)所需的应用程序等;存储数据区可存储使用过程中所创建的数据,如声像定位数据、音频合成数据等。In a possible implementation, the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function (such as an audio playback function and an audio synthesis function). etc.; the storage data area can store data created during use, such as sound image positioning data, audio synthesis data, etc.
此外,存储器11可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件或其他易失性固态存储器件。Additionally, the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
通信接口12可以为通信模块的接口,用于与其他设备或者系统连接。The communication interface 12 may be an interface of a communication module for connecting with other devices or systems.
当然,需要说明的是,图12所示的结构并不构成对本申请实施例中合唱音频的处理设备的限定,在实际应用中合唱音频的处理设备可以包括比图12所示的更多或更少的部件,或者组合某些部件。Of course, it should be noted that the structure shown in FIG. 12 does not constitute a limitation on the chorus audio processing device in the embodiment of the present application. In practical applications, the chorus audio processing device may include more or more chorus audio processing devices than those shown in FIG. 12 . Fewer parts, or a combination of certain parts.
相应于上面的方法实施例,本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述合唱音频的处理方法的步骤。Corresponding to the above method embodiments, the embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned chorus audio processing method are implemented. .
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments may be referred to each other.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Professionals may further realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the possibilities of hardware and software. Interchangeability, the above description has generally described the components and steps of each example in terms of functionality. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two. The software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的技术方案及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。The principles and implementations of the present application are described herein by using specific examples, and the descriptions of the above embodiments are only used to help understand the technical solutions and core ideas of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

Claims (13)

  1. 一种合唱音频的处理方法,其特征在于,包括:A method for processing chorus audio, comprising:
    分别获得多个演唱者对同一目标歌曲进行演唱的干声音频;Obtain the dry audio of the same target song performed by multiple singers respectively;
    对获得的多个所述干声音频进行时间对齐处理;performing time alignment processing on the obtained plurality of the dry audio frequencies;
    对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,以将多个所述干声音频定位到多个虚拟声像上;其中,多个所述虚拟声像位于预先建立的虚拟声像坐标系中,所述虚拟声像坐标系以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个所述虚拟声像与所述坐标原点的距离在设定距离范围内,每个所述虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的俯仰角在设定角度范围内;Perform virtual sound image localization on a plurality of the dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on a plurality of virtual sound images; In the virtual audio-visual coordinate system, the virtual audio-visual coordinate system is centered on the human head, with the midpoint of the straight line where the left and right ears are located as the coordinate origin, the positive direction of the first coordinate axis represents the front of the human head, and the positive direction of the second coordinate axis Represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is within the set distance range. The pitch angle of the sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range;
    基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频;generating chorus audio based on a plurality of the dry audio audio after virtual sound image localization;
    在获取到基于所述目标歌曲演唱的主唱音频的情况下,将所述主唱音频、所述合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。In the case of acquiring the lead vocal audio sung based on the target song, after synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment, a large chorus effect audio is output.
  2. 根据权利要求1所述的合唱音频的处理方法,其特征在于,所述对获得的多个所述干声音频进行时间对齐处理,包括:The method for processing chorus audio according to claim 1, wherein the performing time alignment processing on the plurality of obtained dry audio audios comprises:
    确定所述目标歌曲对应的参考音频;Determine the reference audio corresponding to the target song;
    针对获得的每个所述干声音频,分别提取当前干声音频和所述参考音频的音频特征,所述音频特征为指纹特征或基频特征;For each of the obtained dry audio frequencies, extract the audio features of the current dry audio audio and the reference audio respectively, and the audio features are fingerprint features or fundamental frequency features;
    将所述当前干声音频与所述参考音频的音频特征相似度最大值对应的时间确定为音频对齐时间;Determine the time corresponding to the audio feature similarity maximum value of the current dry sound audio and the reference audio as the audio alignment time;
    基于所述音频对齐时间,对所述当前干声音频进行时间对齐处理。Based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
  3. 根据权利要求1所述的合唱音频的处理方法,其特征在于,还包括:The method for processing chorus audio according to claim 1, further comprising:
    分别对获得的多个所述干声音频进行带通滤波处理,得到多个低音数据;Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data;
    相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
    基于进行虚拟声像定位后的多个所述干声音频和多个所述低音数据, 生成合唱音频。A chorus audio is generated based on the plurality of the dry audio frequencies and the plurality of the bass data subjected to virtual sound image localization.
  4. 根据权利要求1所述的合唱音频的处理方法,其特征在于,还包括:The method for processing chorus audio according to claim 1, further comprising:
    分别对获得的多个所述干声音频进行混响模拟处理;respectively performing reverberation simulation processing on the obtained plurality of the dry audio frequencies;
    相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
    基于进行虚拟声像定位后的多个所述干声音频和混响模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio frequencies subjected to virtual sound image localization and the plurality of the dry audio frequencies subjected to reverberation simulation processing.
  5. 根据权利要求4所述的合唱音频的处理方法,其特征在于,所述分别对获得的多个所述干声音频进行混响模拟处理,包括:The method for processing chorus audio according to claim 4, wherein the performing reverberation simulation processing on the obtained plurality of the dry audio frequencies respectively comprises:
    利用梳状滤波器和全通滤波器的级联分别对获得的多个所述干声音频进行混响模拟处理。Reverberation simulation processing is respectively performed on the obtained plurality of the dry audio frequencies by using the cascade of comb filters and all-pass filters.
  6. 根据权利要求1所述的合唱音频的处理方法,其特征在于,在所述对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位之后,还包括:The method for processing chorus audio according to claim 1, wherein after performing virtual sound image localization on the plurality of dry audio audios subjected to time alignment processing, the method further comprises:
    分别对进行虚拟声像定位后的多个所述干声音频进行混响模拟处理;respectively performing reverberation simulation processing on a plurality of the dry audio frequencies after virtual sound image localization;
    相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
    基于进行虚拟声像定位,且进行混响模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
  7. 根据权利要求1所述的合唱音频的处理方法,其特征在于,还包括:The method for processing chorus audio according to claim 1, further comprising:
    分别对获得的多个所述干声音频进行双声道模拟处理;respectively perform binaural simulation processing on the obtained plurality of the dry audio frequencies;
    相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
    基于进行虚拟声像定位后的多个所述干声音频和双声道模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio audio after virtual sound image localization and the plurality of the dry audio audio after binaural simulation processing.
  8. 根据权利要求7所述的合唱音频的处理方法,其特征在于,在所述分别对获得的多个所述干声音频进行双声道模拟处理之后,还包括:The method for processing chorus audio according to claim 7, characterized in that, after performing binaural simulation processing on the obtained plurality of dry audio audios, the method further comprises:
    对进行双声道模拟处理后的多个所述干声音频进行混响模拟处理;performing reverberation simulation processing on a plurality of the dry audio frequencies after the binaural simulation processing;
    相应的,所述基于进行虚拟声像定位后的多个所述干声音频和双声道 模拟处理后的多个所述干声音频,生成合唱音频,包括:Correspondingly, described based on a plurality of described dry sound audio frequency after virtual sound image localization and a plurality of described dry sound audio frequency after binaural simulation processing, generate chorus audio, including:
    基于进行虚拟声像定位后的多个所述干声音频、双声道模拟处理及混响模拟处理后的多个所述干声音频,生成合唱音频。The chorus audio is generated based on the plurality of the dry audio audios after virtual sound image localization, the plurality of the dry audio audios after the binaural simulation processing and the reverberation simulation processing.
  9. 根据权利要求1所述的合唱音频的处理方法,其特征在于,所述对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,包括:The method for processing chorus audio according to claim 1, wherein the performing virtual sound image localization on a plurality of the dry audio audio after time alignment processing comprises:
    按照虚拟声像的个数,将获得的进行时间对齐处理后的多个所述干声音频进行分组,组数与虚拟声像的个数相同;According to the number of virtual sound images, the obtained dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
    将各组干声音频分别定位到对应的虚拟声像上,不同组干声音频对应不同虚拟声像。Each group of dry audio audio is positioned on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
  10. 根据权利要求1所述的合唱音频的处理方法,其特征在于,The processing method of chorus audio according to claim 1, wherein,
    多个所述虚拟声像中,位于人头后方的虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的仰角大于位于人头前方的虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的仰角;Among the plurality of virtual sound images, the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis. the elevation angle of the plane formed by the coordinate axis and the second coordinate axis;
    或者,or,
    每个所述虚拟声像均匀分布在所述第一坐标轴和所述第二坐标轴构成的平面的一周。Each of the virtual sound images is evenly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
  11. 根据权利要求1至10之中任一项所述的合唱音频的处理方法,其特征在于,所述将所述主唱音频、所述合唱音频和相应的伴奏进行合成,包括:The method for processing chorus audio according to any one of claims 1 to 10, wherein the synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment comprises:
    对所述主唱音频和所述合唱音频分别进行音量调整,和/或,对所述主唱音频和所述合唱音频进行混响模拟处理;respectively performing volume adjustment on the lead vocal audio and the chorus audio, and/or performing reverberation simulation processing on the lead vocal audio and the chorus audio;
    将进行音量调整和/或进行混响模拟处理后的所述主唱音频、所述合唱音频和相应的伴奏进行合成。The lead vocal audio, the chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized.
  12. 一种合唱音频的处理设备,其特征在于,包括:A processing device for chorus audio, comprising:
    存储器,用于存储计算机程序;memory for storing computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至11任一项所述的合唱音频的处理方法的步骤。The processor is configured to implement the steps of the chorus audio processing method according to any one of claims 1 to 11 when executing the computer program.
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求 1至11任一项所述的合唱音频的处理方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the chorus audio according to any one of claims 1 to 11 is implemented. The steps of the processing method.
PCT/CN2022/087784 2021-04-27 2022-04-20 Method and device for processing chorus audio, and storage medium WO2022228220A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110460280.4 2021-04-27
CN202110460280.4A CN113192486B (en) 2021-04-27 2021-04-27 Chorus audio processing method, chorus audio processing equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022228220A1 true WO2022228220A1 (en) 2022-11-03

Family

ID=76979435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087784 WO2022228220A1 (en) 2021-04-27 2022-04-20 Method and device for processing chorus audio, and storage medium

Country Status (2)

Country Link
CN (1) CN113192486B (en)
WO (1) WO2022228220A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192486B (en) * 2021-04-27 2024-01-09 腾讯音乐娱乐科技(深圳)有限公司 Chorus audio processing method, chorus audio processing equipment and storage medium
CN114242025A (en) * 2021-12-14 2022-03-25 腾讯音乐娱乐科技(深圳)有限公司 Method and device for generating accompaniment and storage medium
CN114363793B (en) * 2022-01-12 2024-06-11 厦门市思芯微科技有限公司 System and method for converting double-channel audio into virtual surrounding 5.1-channel audio
CN114630145A (en) * 2022-03-17 2022-06-14 腾讯音乐娱乐科技(深圳)有限公司 Multimedia data synthesis method, equipment and storage medium
CN116170613B (en) * 2022-09-08 2024-09-24 腾讯音乐娱乐科技(深圳)有限公司 Audio stream processing method, computer device and computer program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000333297A (en) * 1999-05-14 2000-11-30 Sound Vision:Kk Stereophonic sound generator, method for generating stereophonic sound, and medium storing stereophonic sound
CN105208039A (en) * 2015-10-10 2015-12-30 广州华多网络科技有限公司 Chorusing method and system for online vocal concert
CN106331977A (en) * 2016-08-22 2017-01-11 北京时代拓灵科技有限公司 Virtual reality panoramic sound processing method for network karaoke
CN107422862A (en) * 2017-08-03 2017-12-01 嗨皮乐镜(北京)科技有限公司 A kind of method that virtual image interacts in virtual reality scenario
CN109785820A (en) * 2019-03-01 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of processing method, device and equipment
CN110379401A (en) * 2019-08-12 2019-10-25 黑盒子科技(北京)有限公司 A kind of music is virtually chorused system and method
CN110992970A (en) * 2019-12-13 2020-04-10 腾讯音乐娱乐科技(深圳)有限公司 Audio synthesis method and related device
CN113192486A (en) * 2021-04-27 2021-07-30 腾讯音乐娱乐科技(深圳)有限公司 Method, equipment and storage medium for processing chorus audio

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4780057B2 (en) * 2007-08-06 2011-09-28 ヤマハ株式会社 Sound field generator
CN108269560A (en) * 2017-01-04 2018-07-10 北京酷我科技有限公司 A kind of speech synthesizing method and system
CN111028818B (en) * 2019-11-14 2022-11-22 北京达佳互联信息技术有限公司 Chorus method, apparatus, electronic device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000333297A (en) * 1999-05-14 2000-11-30 Sound Vision:Kk Stereophonic sound generator, method for generating stereophonic sound, and medium storing stereophonic sound
CN105208039A (en) * 2015-10-10 2015-12-30 广州华多网络科技有限公司 Chorusing method and system for online vocal concert
CN106331977A (en) * 2016-08-22 2017-01-11 北京时代拓灵科技有限公司 Virtual reality panoramic sound processing method for network karaoke
CN107422862A (en) * 2017-08-03 2017-12-01 嗨皮乐镜(北京)科技有限公司 A kind of method that virtual image interacts in virtual reality scenario
CN109785820A (en) * 2019-03-01 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of processing method, device and equipment
CN110379401A (en) * 2019-08-12 2019-10-25 黑盒子科技(北京)有限公司 A kind of music is virtually chorused system and method
CN110992970A (en) * 2019-12-13 2020-04-10 腾讯音乐娱乐科技(深圳)有限公司 Audio synthesis method and related device
CN113192486A (en) * 2021-04-27 2021-07-30 腾讯音乐娱乐科技(深圳)有限公司 Method, equipment and storage medium for processing chorus audio

Also Published As

Publication number Publication date
CN113192486A (en) 2021-07-30
CN113192486B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
WO2022228220A1 (en) Method and device for processing chorus audio, and storage medium
Hacihabiboglu et al. Perceptual spatial audio recording, simulation, and rendering: An overview of spatial-audio techniques based on psychoacoustics
JP5533248B2 (en) Audio signal processing apparatus and audio signal processing method
US5371799A (en) Stereo headphone sound source localization system
Brown et al. A structural model for binaural sound synthesis
CN105264915B (en) Mixing console, audio signal generator, the method for providing audio signal
CN105874820B (en) Binaural audio is produced by using at least one feedback delay network in response to multi-channel audio
US9769589B2 (en) Method of improving externalization of virtual surround sound
US9215544B2 (en) Optimization of binaural sound spatialization based on multichannel encoding
CA2744429C (en) Converter and method for converting an audio signal
CN113170271B (en) Method and apparatus for processing stereo signals
JP2009508442A (en) System and method for audio processing
WO2019229199A1 (en) Adaptive remixing of audio content
JP2023517720A (en) Reverb rendering
JP2014090470A (en) Apparatus and method for stereophonizing mono signal
WO2023109278A1 (en) Accompaniment generation method, device, and storage medium
Lee et al. A real-time audio system for adjusting the sweet spot to the listener's position
US10440495B2 (en) Virtual localization of sound
WO2022196073A1 (en) Information processing system, information processing method, and program
Yuan et al. Externalization improvement in a real-time binaural sound image rendering system
De Sena Analysis, design and implementation of multichannel audio systems
JP2004509544A (en) Audio signal processing method for speaker placed close to ear
GB2369976A (en) A method of synthesising an averaged diffuse-field head-related transfer function
US20240292171A1 (en) Systems and methods for efficient and accurate virtual accoustic rendering
Glasgal Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794691

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2301007017

Country of ref document: TH

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 11202308147P

Country of ref document: SG

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.03.2024)

122 Ep: pct application non-entry in european phase

Ref document number: 22794691

Country of ref document: EP

Kind code of ref document: A1