WO2022228220A1 - Method and device for processing chorus audio, and storage medium - Google Patents
Method and device for processing chorus audio, and storage medium Download PDFInfo
- Publication number
- WO2022228220A1 WO2022228220A1 PCT/CN2022/087784 CN2022087784W WO2022228220A1 WO 2022228220 A1 WO2022228220 A1 WO 2022228220A1 CN 2022087784 W CN2022087784 W CN 2022087784W WO 2022228220 A1 WO2022228220 A1 WO 2022228220A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- dry
- chorus
- processing
- virtual sound
- Prior art date
Links
- 241001342895 Chorus Species 0.000 title claims abstract description 204
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 title claims abstract description 204
- 238000012545 processing Methods 0.000 title claims abstract description 191
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000000694 effects Effects 0.000 claims abstract description 58
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 13
- 210000005069 ears Anatomy 0.000 claims abstract description 11
- 238000004088 simulation Methods 0.000 claims description 103
- 230000004807 localization Effects 0.000 claims description 92
- 230000001755 vocal effect Effects 0.000 claims description 49
- 238000001914 filtration Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 238000003672 processing method Methods 0.000 claims description 10
- 230000000875 corresponding effect Effects 0.000 description 43
- 238000010586 diagram Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 10
- 230000005236 sound signal Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- QVZZPLDJERFENQ-NKTUOASPSA-N bassianolide Chemical compound CC(C)C[C@@H]1N(C)C(=O)[C@@H](C(C)C)OC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](C(C)C)OC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](C(C)C)OC(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](C(C)C)OC1=O QVZZPLDJERFENQ-NKTUOASPSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G10L13/0335—Pitch control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G10L13/047—Architecture of speech synthesisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/13—Aspects of volume control, not necessarily automatic, in stereophonic sound systems
Definitions
- the present application relates to the technical field of computer applications, and in particular, to a method, device and storage medium for processing chorus audio.
- the purpose of the present application is to provide a chorus audio processing method, device and storage medium, so as to avoid the head effect caused by the sound field gathering in the center of the human head, so that the sound field is wider and the listening experience is improved.
- a method for processing chorus audio comprising:
- the virtual audio-visual coordinate system is centered on the human head, with the midpoint of the straight line where the left and right ears are located as the coordinate origin, the positive direction of the first coordinate axis represents the front of the human head, and the positive direction of the second coordinate axis Represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is within the set distance range.
- the pitch angle of the sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range;
- performing time alignment processing on the plurality of obtained dry audio frequencies includes:
- time alignment processing is performed on the current dry audio audio.
- Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data
- the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
- a chorus audio is generated based on the plurality of the dry audio frequencies and the plurality of the bass data subjected to virtual sound image localization.
- the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
- a chorus audio is generated based on the plurality of the dry audio frequencies subjected to virtual sound image localization and the plurality of the dry audio frequencies subjected to reverberation simulation processing.
- performing reverberation simulation processing on a plurality of the obtained dry audio frequencies respectively includes:
- Reverberation simulation processing is respectively performed on the obtained plurality of the dry audio frequencies by using the cascade of comb filters and all-pass filters.
- the method further includes:
- the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
- a chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
- the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:
- a chorus audio is generated based on the plurality of the dry audio audio after virtual sound image localization and the plurality of the dry audio audio after binaural simulation processing.
- the method further includes:
- the generation of chorus audio based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing includes:
- the chorus audio is generated based on the plurality of the dry audio audios after virtual sound image localization, the plurality of the dry audio audios after the binaural simulation processing and the reverberation simulation processing.
- performing virtual sound image localization on a plurality of the dry audio audio after time alignment processing includes:
- the obtained dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
- Each group of dry audio audio is located on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
- the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis. the elevation angle of the plane formed by the coordinate axis and the second coordinate axis;
- Each of the virtual sound images is evenly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
- the synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment includes:
- the lead vocal audio, the chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized.
- a processing device for chorus audio comprising:
- the dry sound audio obtaining module is used to obtain the dry sound audio of the same target song performed by multiple singers respectively;
- an alignment processing module configured to perform time alignment processing on a plurality of the obtained dry audio frequencies
- a virtual sound image localization module configured to perform virtual sound image localization on a plurality of the dry sound audio after time alignment processing, so as to locate a plurality of the dry sound audio on a plurality of virtual sound images;
- the virtual sound image is located in a pre-established virtual sound image coordinate system, the virtual sound image coordinate system is centered on the human head, the midpoint of the straight line where the left and right ears are located is the coordinate origin, and the positive direction of the first coordinate axis represents the front of the human head.
- the positive direction of the second coordinate axis represents the side of the human head from the left ear to the right ear
- the positive direction of the third coordinate axis represents directly above the human head
- the distance between each virtual sound image and the coordinate origin is within the set distance Within the range
- the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range
- a chorus audio generation module for generating chorus audio based on a plurality of the dry audio frequencies after performing virtual sound image localization
- the chorus effect audio output module is configured to output the chorus effect audio after synthesizing the lead singer audio, the chorus audio and the corresponding accompaniment under the condition of obtaining the lead vocal audio sung based on the target song.
- a chorus audio processing device comprising:
- the processor is configured to implement the steps of the chorus audio processing method described in any one of the above when executing the computer program.
- a computer-readable storage medium storing a computer program on the computer-readable storage medium, when the computer program is executed by a processor, implements the steps of any of the above-mentioned chorus audio processing methods.
- time alignment processing is performed on the obtained plurality of dry voice audio, and the aligned plurality of dry voice audio is processed.
- the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is set. Within the distance, surround the human ear, generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment Perform a chorus, get and output a large chorus effect audio.
- Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
- FIG. 1 is an implementation flowchart of a method for processing chorus audio in an embodiment of the application
- Fig. 2 is the schematic diagram that the virtual sound image localization coordinate system shows the sound image orientation in the embodiment of the application;
- FIG. 3 is a schematic diagram of a virtual sound image localization in an embodiment of the application.
- Fig. 4 is the schematic diagram of the virtual sound image after positioning in the embodiment of the application.
- FIG. 5 is a schematic diagram of the composition of a spatial sound field process in an embodiment of the application.
- FIG. 6 is a schematic diagram of a cascaded form of a comb filter and an all-pass filter in an embodiment of the application;
- FIG. 7 is a schematic diagram of a reverberation impulse response in an embodiment of the application.
- FIG. 8 is a schematic diagram of a two-channel simulation process in an embodiment of the present application.
- FIG. 9 is a schematic diagram of a framework of a chorus audio processing system in an embodiment of the application.
- FIG. 10 is a schematic diagram of a specific structure of a chorus audio processing system in an embodiment of the application.
- FIG. 11 is a schematic structural diagram of an apparatus for processing chorus audio in an embodiment of the application.
- FIG. 12 is a schematic structural diagram of a chorus audio processing device according to an embodiment of the present application.
- the core of the present application is to provide a method for processing chorus audio. After obtaining the dry voice audio of the same target song performed by multiple singers, time alignment is performed on the obtained dry voice audio, and virtual sound image localization is performed on the aligned Each dry sound audio is located on multiple virtual sound images, and the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance range, surrounding the human ear, based on the virtual sound Like the positioned multiple dry audios, chorus audio is generated, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, the chorus audio and the corresponding accompaniment are chorused to obtain and output the large chorus effect audio.
- Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
- the methods provided in the embodiments of the present application can be applied in various scenarios where a large chorus sound effect is to be obtained, and specific solutions can be implemented through the interaction between the server and the client.
- the server may obtain in advance the dry voice audios of multiple singers, such as singers 1, 2, 3, 4... for the same target song, and perform time alignment on the obtained dry voice audios Process, and perform virtual sound image localization on multiple dry sound audios after alignment, and locate multiple dry sound audios on multiple virtual sound images, and multiple virtual sound images can surround the human ear.
- Multiple dry audio to generate chorus audio When user X wants to make the song he sings achieve a chorus sound effect, he can sing the target song through the client.
- the chorus effect audio can be obtained, and the chorus effect audio can be output through the client, so that the user X can feel the chorus sound effect.
- the server can obtain the dry audios of the target songs performed by users 2, 3, 4, and 5 respectively, perform time alignment processing on the obtained dry audios, and locate the plurality of dry audios after alignment.
- the multiple virtual sound images surround the human ear, and the chorus audio is generated based on the multiple dry sound audio frequencies after the virtual sound image localization.
- the server When the server obtains the lead vocal audio sung by user 1 through the client based on the target song, the server synthesizes the lead vocal audio, the chorus audio and the corresponding accompaniment to obtain the chorus effect audio, which is output to the user 1 through the client, so that the user 1 can Feel the big chorus sound.
- the method may include the following steps:
- multiple dry audio frequencies may be obtained according to actual needs.
- the multiple dry audio audios may be audio data obtained by different singers singing the same target song, and different singers may be in the same or different environments.
- the dry audios of the same target song sung by multiple singers are obtained separately, because the multiple dry audios may be sung by different singers at different times, and there may be a phenomenon of misalignment such as delay.
- an alignment tool can be used to align the time of the most identical starting positions of the obtained multiple dry audio audios.
- the obtained multiple dry audio audios may also be preliminarily screened, for example, by using tools such as sound quality detection, and the audio itself is eliminated. Audio with poor sound quality, such as noise, accompaniment backstepping, audio length is too short, audio energy is too small, popping sound, etc. Then, time alignment processing and subsequent steps are performed on the dry audio audio retained after screening.
- S130 Perform virtual sound image localization on the multiple dry audio audios subjected to the time alignment processing, so as to locate the multiple dry audio audios on the multiple virtual audio images.
- a plurality of virtual sound images are located in a pre-established virtual sound image coordinate system
- the virtual sound image coordinate system is centered on the human head
- the center point of the straight line where the left and right ears are located is the coordinate origin
- the positive direction of the first coordinate axis indicates the front of the human head.
- the positive direction of the second coordinate axis represents the side of the human head from the left ear to the right ear
- the positive direction of the third coordinate axis represents the top of the human head
- the distance between each virtual sound image and the coordinate origin is within the set distance range.
- the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range.
- a virtual audio-visual coordinate system may be established in advance to display the audio-visual orientation.
- the virtual audiovisual coordinate system may specifically be a Cartesian coordinate system.
- the virtual audio-visual coordinate system can be centered on the human head, and the midpoint of the straight line where the left and right ears are located as the coordinate origin.
- the positive direction represents the side of the human head from the left ear to the right ear.
- the positive direction of the third coordinate axis that is, the z-axis, represents the top of the human head, that is, the direction of the top of the head.
- the sound image has a certain azimuth and elevation (azimuth) in space ( elevation), you can use to indicate, rad indicates the distance between the current sound image and the coordinate origin.
- the sound signal is a single-channel signal, which can be regarded as the sound image in the In terms of position, in order to obtain a certain virtual sound image, HRTF (Head Related Transfer Function, head related transformation function) can be used to perform data convolution to realize the localization operation.
- HRTF Head Related Transfer Function, head related transformation function
- the schematic diagram of virtual sound image localization is shown in Figure 3, where X represents a real sound source (single-channel signal), Y L and Y R represent the sound signals heard by the left ear and right ear respectively, and HRTF represents the position of the sound signal from the sound source. Transfer function of the transmission path to both ears.
- the real audio source can be passed through a certain The HRTF filtering of the left and right ears on the position to obtain a two-way acoustic signal.
- the acoustic signal heard by the human ear is the result of the HRTF filtering of the sound source X. Therefore, when performing virtual sound image localization, the sound signal can be filtered through the HRTF of the corresponding position.
- multiple virtual sound images can be set, and the distance between each virtual sound image and the coordinate origin can be within the set distance range, such as 1 meter range, and each virtual sound image is relative to the virtual sound image coordinates.
- the pitch angle of the plane formed by the first coordinate axis and the second coordinate axis of the system may be within a set angle range, such as a range of 10°, so that multiple virtual sound images surround the human ear.
- each virtual sound image of the plurality of virtual sound images may be uniformly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis. That is, it surrounds the horizontal plane of the human ear at the same interval angle.
- the interval angle can be set according to the actual situation or analysis of historical data, for example, it is set to 30°. If the interval angle is set to 30°, 12 virtual sound images can be located around the horizontal plane of the human ear at 30° intervals. The elevation angle of these 12 virtual sound images is 0°, and the azimuth angles are: 0°, 30°, 60°, ..., 330°. Of course, the interval angle can also be set to other values, such as 15°, 60°, and so on.
- the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis may be greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis and the second coordinate axis.
- the positions of the plurality of virtual sound images in the virtual sound image coordinate system are not limited to the ones mentioned above, and can also be specifically set according to actual needs, and only need to satisfy each virtual sound image.
- the distance from the coordinate origin is within the set distance range, and the pitch angle of each virtual sound image relative to the plane formed by the first coordinate axis and the second coordinate axis may be within the set angle range.
- some of the virtual sound images of the multiple virtual sound images surround the plane of the human ear at intervals of 30°, and the elevation angle is 0°.
- the distance between some virtual sound images and the coordinate origin can be the same or different, but they are all within the set distance range, which will enhance the surround effect of the subsequently generated chorus audio.
- the virtual sound image localization is performed on the multiple dry sound audio frequencies after the time alignment processing, and after the multiple dry sound audio frequencies are located on the multiple virtual sound images, the operations of the subsequent steps can be continued.
- S140 Generate chorus audio based on the plurality of dry audio audios after virtual sound image localization.
- each dry audio audio in the multiple dry audio audio can be subjected to HRTF filtering processing at the corresponding virtual audio image position, and corresponding audio data can be obtained at each virtual audio image.
- Chorus audio may be generated based on the plurality of dry audio audio after virtual panning. Specifically, the corresponding audio data obtained after HRTF filtering processing of multiple virtual sound image positions may be superimposed, or weighted and superimposed to obtain chorus audio. The sound effect of the obtained chorus audio has a three-dimensional sound field sense of hearing.
- the chorus audio can be stored in a database and used when needed. For example, if a user wants to sing a song with a chorus effect, in this case, the chorus audio can be used to achieve the corresponding effect.
- the synthesis of the main vocal audio, chorus audio and corresponding accompaniment it can be realized in various ways, such as synthesizing the main vocal audio and the corresponding accompaniment first, and then synthesizing it with the chorus audio, or, first synthesizing the chorus audio and the corresponding accompaniment Synthesize, and then synthesize with the lead vocal audio, or, first synthesize the lead vocal audio and chorus audio, and then synthesize with the corresponding accompaniment. Layer the corresponding accompaniment.
- the chorus sound effect obtained by different implementation methods will be different, and the specific implementation method can be selected according to the actual situation.
- Applying the method provided by the embodiment of the present application after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, perform time alignment processing on the obtained plurality of dry voice audio, and align the plurality of dry voice audio after the alignment.
- the audio performs virtual sound image localization to locate multiple dry audio audios on multiple virtual sound images.
- the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance.
- surround the human ear generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment are performed. Chorus, get and output the large chorus effect audio.
- Positioning multiple dry audio audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround effect. In terms of hearing, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
- step S120 performs time alignment processing on the obtained plurality of dry audio frequencies, which may include the following steps:
- the first step determine the reference audio corresponding to the target song
- the second step for each obtained dry audio frequency, extract the audio features of the current dry audio audio and the reference audio respectively, and the audio features are fingerprint features or fundamental frequency features;
- the third step determining the time corresponding to the maximum value of the audio feature similarity between the current dry audio and the reference audio as the audio alignment time;
- the fourth step performing time alignment processing on the current dry audio audio based on the audio alignment time.
- the reference corresponding to the target song may be determined first.
- audio Specifically, a dry audio audio with better sound quality can be selected from the plurality of obtained dry audio audios as a reference audio.
- the original dry vocal audio of the target song may also be determined as the reference audio.
- audio features of the current dry audio audio and the reference audio can be extracted respectively, and the audio features are fingerprint features or fundamental frequency features.
- the audio features are fingerprint features or fundamental frequency features.
- Mel frequency band information, Bark frequency band information, Erb frequency band power, etc. can be extracted through multi-band filtering, and then fingerprint features can be obtained through half-wave rectification, binary judgment, etc.
- fundamental frequency features can be extracted by fundamental frequency extraction tools such as pyin, crepe, and harvest.
- the audio features of the reference audio can be saved after being extracted once, and can be called directly when necessary.
- the audio features of the current dry audio and the reference audio are compared, which can be characterized by a similarity curve, etc., and the time corresponding to the maximum similarity value can be determined as the audio alignment time. Then, based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
- the corresponding audio alignment time is obtained by comparing with the audio features of the reference audio, and after time alignment processing is performed, multiple dry audio audios after time alignment processing can be obtained.
- the method may further include the following steps:
- Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data
- chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
- the chorus audio is generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
- bandpass filtering may be performed on the obtained dry audios, for example, performing bandpass filtering on the plurality of dry audios.
- Chorus audio may be generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
- chorus audio may be generated by superimposing or weighted superimposing a plurality of obtained bass data and a plurality of dry audio audios based on virtual sound image localization. After superimposing the bass signal, the heaviness of the sound signal can be enhanced.
- the method may further include the following steps:
- chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
- a chorus audio is generated based on the plurality of dry audio frequencies subjected to virtual sound image localization and the plurality of dry audio frequencies subjected to reverberation simulation processing.
- Figure 5 shows a schematic diagram of a typical spatial sound field process composition.
- the acoustic signal with the largest amplitude is the direct sound
- the following acoustic signal is the reflected acoustic signal obtained by the sound wave reflected on the object closest to the listener, which has obvious directionality
- a dense acoustic signal It is the reverberation sound signal obtained by the superposition of sound waves after multiple reflections of surrounding objects. It is the superposition of a large number of reflected sounds from different directions without directionality.
- the reverberation sound is the superposition of multiple reflection sounds, which is characterized by weak energy and no directionality, because it is the superposition of a large number of late reflection sounds from different directions, and has a high echo density, so You can use reverb to create a surround sound with a sense of surround.
- reverberation simulation processing may be performed on the obtained multiple dry voice audio respectively.
- a cascade of comb filters and all-pass filters can be used to perform reverberation simulation processing on the obtained multiple dry audio frequencies respectively.
- Figure 6 shows a cascaded form of comb filters and all-pass filters, in which four comb filters are connected in parallel with two all-pass filters in series.
- the actual simulated reverberation impulse response is shown in Figure 7.
- Figure 6 is only a specific form. In practical applications, there can be other forms. The number of comb filters and all-pass filters and the cascading method can be adjusted according to actual needs. .
- the method may further include the following steps:
- Reverberation simulation processing is performed on the multiple dry audio frequencies after virtual sound image localization
- chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
- the chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
- a chorus audio can be generated based on the plurality of dry audio audios subjected to virtual sound image localization and subjected to reverberation simulation processing.
- the chorus audio may be generated by superimposing or weighted superimposing a plurality of dry audio audios after performing virtual sound image localization and performing reverberation simulation processing.
- Performing reverberation simulation processing on multiple dry audio frequencies after virtual sound image processing can enhance the spatial sound effects of the sound signal, further suppress the head-in-head effect, and expand the sound field.
- the method may further include the following steps:
- chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization, including:
- a chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing.
- the dual Channel analog processing after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, and performing time alignment processing on the obtained plurality of dry voice audio, the dual Channel analog processing.
- the correlation between the two-channel signals is reduced by delay, and the sound field is expanded as much as possible to obtain two-channel output.
- a plurality of dry audio audios can be simulated by 8 groups of different delay weights on the left and right, where d represents the delay and g represents the weight.
- the delay parameter can choose 16 parameters ranging from 21ms to 79ms.
- the amplitude attenuation is used to represent the energy loss of the sound wave due to reflection, thereby reducing the correlation between the two environmental information. That is, the dry audio can be copied separately to obtain two signals with the same information. The two signals are completely correlated, and then attenuated by different delays and amplitudes to reduce the correlation of the two signals to obtain a pseudo stereo signal.
- a chorus audio may be generated based on the plurality of dry audio frequencies after virtual panning and the plurality of dry audio frequencies after binaural simulation processing. Specifically, multiple dry audio audios after virtual sound image localization and multiple dry audio audios after binaural simulation processing may be superimposed or weighted superimposed to generate chorus audio.
- the method may further include the following steps:
- chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing, including:
- the chorus audio is generated based on the plurality of dry sound audio frequencies after virtual sound image localization, the plurality of dry sound audio frequencies after the binaural simulation processing and the reverberation simulation processing.
- time alignment processing is performed on the plurality of dry voice audio frequencies
- binaural simulation processing is performed on the plurality of dry voice audio frequencies respectively.
- reverberation simulation processing may be further performed on the plurality of dry audio frequencies after the binaural simulation processing, so as to enhance the spatial effect of the sound signal, suppress the head effect, and expand the sound field.
- the chorus audio is generated by ringing multiple dry audio frequencies after analog processing. Specifically, multiple dry audio audios after virtual sound image localization, and multiple dry audio audios after binaural simulation processing and reverberation simulation processing may be superimposed or weighted superimposed to generate chorus audio.
- time alignment processing can be performed on the obtained dry audio, and then based on the time alignment processing of the plurality of dry audios
- the sound audio is processed by virtual sound image localization, bass enhancement, reverberation simulation, two-channel simulation, etc.
- the specific processing can be carried out in combination with the above embodiments. Sound simulation and two-channel simulation, so that the final generated chorus audio has a surround sound effect, which can be highly robust to a wide range of sound misalignment. If you want to superimpose the chorus audio and the lead vocal The delay drop is large, which can also ensure that the user has a harmonious listening experience.
- FIG. 9 is a schematic diagram of a system framework for processing multiple dry audio frequencies after time alignment processing, including a bass enhancement unit, a virtual sound image localization unit, a two-channel simulation unit and a reverberation simulation unit.
- the bass enhancement unit is used to perform bandpass filtering on multiple dry audios to obtain bass data;
- the virtual sound image localization unit is used to perform virtual sound image localization on multiple dry audios, so as to locate the multiple dry audios to multiple
- the two-channel simulation unit is used for performing dual-channel simulation processing on a plurality of dry sound audios;
- the reverberation simulation unit is used for performing reverberation simulation processing on a plurality of dry sound audio frequencies.
- Both the virtual sound image localization unit and the two-channel simulation unit can be connected to the reverberation simulation unit.
- the reverberation simulation unit can be further used for reverberation simulation.
- the binaural simulation processing is performed on the plurality of dry audio frequencies by the binaural simulation unit
- the reverberation simulation processing may be further performed by the reverberation simulation unit.
- weighted superposition can be performed on the audio data processed by these units to obtain chorus audio.
- Figure 10 shows a specific example of processing multiple dry audio frequencies
- H represents the transfer function of HRTF filtering, through the processing of this transfer function, virtual sound image localization can be performed on
- the sound audio is positioned on 12 virtual sound images around the human ear level
- REV represents the reverberation analog unit
- BASS represents the bass enhancement unit
- REF represents the two-channel analog unit.
- the reverberation simulation unit here can use the same parameters, and you can also configure different parameters for different reverberation simulation units according to actual needs, so as to obtain flexible reverberation modulation.
- the grand chorus effect of the chorus audio finally generated in the embodiment of the present application is closer to the hearing sense of a real concert chorus.
- adding accompaniment on the basis of the lead vocal audio and mixing the chorus audio at the same time allows users to have an immersive concert experience and a more shocking immersive sound field surround experience.
- performing virtual sound image localization on a plurality of dry audio frequencies after time alignment processing may include the following steps:
- Step 1 according to the number of virtual audio images, group the obtained multiple dry audio audios after time alignment processing, and the number of groups is the same as the number of virtual audio images;
- Step 2 Position each group of dry audio audio on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
- the obtained dry voice audio after obtaining the dry voice audio of the same target song performed by a plurality of singers respectively, and performing time alignment processing on the obtained plurality of dry voice audio frequency, the obtained dry voice audio can be obtained according to the number of virtual
- the plurality of dry audio audios after time alignment processing are grouped, and the number of divided groups is the same as the number of virtual audio images, and the same group includes several dry audio audios. If the number of obtained dry audios is large, the same dry audio can be in only one group; if the number of obtained dry audios is small, the same dry audio can be in multiple groups to better Achieve large chorus sound.
- each group of dry sound tones can be positioned on corresponding virtual sound images respectively, and different groups of dry sound tones correspond to different virtual sound images. Realize the localization processing of the virtual sound images of multiple dry audios, and enhance the sound effect of the chorus.
- synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment may include the following steps:
- the volume of the lead vocal audio and the chorus audio can be adjusted respectively, so that the volume of the lead vocal audio and the chorus audio are equal, or the volume of the lead vocal audio is greater than that of the chorus audio. volume.
- reverberation simulation processing can also be performed on the lead vocal audio and the chorus audio to obtain a surround sound with a sense of surround.
- the lead vocal audio, chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized, so that the final output chorus effect audio brings a better listening experience to the user.
- embodiments of the present application further provide a chorus audio processing apparatus, and the chorus audio processing apparatus described below and the chorus audio processing method described above may refer to each other correspondingly.
- the device may include the following modules:
- a dry-sound audio obtaining module 1110 is used to obtain the dry-sound audio of the same target song performed by a plurality of singers respectively;
- a time alignment processing module 1120 configured to perform time alignment processing on the obtained multiple dry audio frequencies
- the virtual sound image localization module 1130 is configured to perform virtual sound image localization on the plurality of dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on the plurality of virtual sound images; It is located in the pre-established virtual audio-visual coordinate system.
- the virtual audio-visual coordinate system is centered on the human head and the coordinate origin is the midpoint of the straight line where the left and right ears are located.
- the positive direction of the first coordinate axis indicates the front of the head, and the positive direction of the second coordinate axis.
- the direction represents the side of the human head from the left ear to the right ear
- the positive direction of the third coordinate axis represents the top of the human head
- the distance between each virtual sound image and the coordinate origin is within the set distance range
- each virtual sound image is relative to the first.
- the pitch angle of the plane formed by one coordinate axis and the second coordinate axis is within the set angle range;
- the chorus audio generation module 1140 is configured to generate chorus audio based on the plurality of dry audio frequencies after virtual sound image localization;
- the chorus effect audio obtaining module 1150 is configured to output the chorus effect audio after synthesizing the lead singer audio, the chorus audio and the corresponding accompaniment in the case of obtaining the lead singer audio based on the target song singing.
- time alignment processing is performed on the obtained plurality of dry voice audio, and the aligned plurality of dry voice audio is processed.
- the audio performs virtual sound image localization to locate multiple dry audio audios on multiple virtual sound images.
- the multiple virtual sound images are located in the virtual sound image coordinate system centered on the human head, and the distance from the coordinate origin is within the set distance.
- surround the human ear generate chorus audio based on multiple dry audios after virtual sound image positioning, and when the lead vocal audio based on the target song is obtained, the lead vocal audio, chorus audio and corresponding accompaniment are performed. Chorus, get and output the large chorus effect audio.
- Positioning multiple dry sound audios on multiple virtual sound images surrounding the human ear can make the generated chorus audio have a sound field surround sound effect. In terms of listening sense, it can effectively prevent the sound field of the final output large chorus effect audio from gathering in the center of the human head. The resulting head effect makes the sound field wider.
- the time alignment processing module 1120 is used for:
- the audio features of the current dry sound audio and the reference audio are extracted respectively, and the audio features are fingerprint features or fundamental frequency features;
- time alignment processing is performed on the current dry audio audio.
- a bass data acquisition module for:
- Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data
- the chorus audio generation module 1140 is used for:
- the chorus audio is generated based on the plurality of dry audio frequencies and the plurality of bass data after virtual panning.
- a reverberation simulation processing module is also included for:
- the chorus audio generation module 1140 is used for:
- a chorus audio is generated based on the plurality of dry audio frequencies subjected to virtual sound image localization and the plurality of dry audio frequencies subjected to reverberation simulation processing.
- the reverberation simulation processing module is used for:
- Reverberation simulation processing is performed on the obtained multiple dry audio frequencies by using the cascade of comb filters and all-pass filters.
- the reverberation simulation processing module is also used for:
- reverberation simulation processing is performed on the plurality of dry sound audio frequencies after the virtual sound image localization;
- the chorus audio generation module 1140 is used for:
- the chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
- a two-channel analog processing module for:
- the chorus audio generation module 1140 is used for:
- a chorus audio is generated based on the plurality of dry audio audios after virtual sound image localization and the plurality of dry audio audios after binaural simulation processing.
- the reverberation simulation processing module is also used for:
- the chorus audio generation module 1140 is used for:
- the chorus audio is generated based on the plurality of dry sound audio frequencies after virtual sound image localization, the plurality of dry sound audio frequencies after the binaural simulation processing and the reverberation simulation processing.
- the virtual sound image localization module 1130 is used for:
- the obtained multiple dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;
- Each group of dry audio audio is positioned on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
- the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head. is based on the elevation angle of the plane formed by the first coordinate axis and the second coordinate axis; or, each virtual sound image is uniformly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
- the chorus effect audio obtaining module 1150 is used for:
- the embodiments of the present application also provide a chorus audio processing device, including:
- the processor is configured to implement the steps of the above-mentioned chorus audio processing method when the computer program is executed.
- the chorus audio processing device may include: a processor 10 , a memory 11 , a communication interface 12 and a communication bus 13 .
- the processor 10 , the memory 11 , and the communication interface 12 all communicate with each other through the communication bus 13 .
- the processor 10 may be a central processing unit (Central Processing Unit, CPU), an application-specific integrated circuit, a digital signal processor, a field programmable gate array, or other programmable logic devices, and the like.
- CPU Central Processing Unit
- the processor 10 may call the program stored in the memory 11, and specifically, the processor 10 may execute the operations in the embodiments of the method for processing chorus audio.
- the memory 11 is used to store one or more programs, and the programs may include program codes, and the program codes include computer operation instructions.
- the memory 11 at least stores a program for realizing the following functions:
- the plurality of virtual sound images are located in a pre-established virtual sound image coordinate system , the virtual audio-visual coordinate system is centered on the human head, and the coordinate origin is the midpoint of the straight line where the left and right ears are located.
- the side of the third coordinate axis is directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is formed relative to the first coordinate axis and the second coordinate axis.
- the pitch angle of the plane is within the set angle range;
- the large chorus effect audio is output.
- the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function (such as an audio playback function and an audio synthesis function). etc.; the storage data area can store data created during use, such as sound image positioning data, audio synthesis data, etc.
- the program storage area may store an operating system and an application program required for at least one function (such as an audio playback function and an audio synthesis function).
- the storage data area can store data created during use, such as sound image positioning data, audio synthesis data, etc.
- the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
- the communication interface 12 may be an interface of a communication module for connecting with other devices or systems.
- the structure shown in FIG. 12 does not constitute a limitation on the chorus audio processing device in the embodiment of the present application.
- the chorus audio processing device may include more or more chorus audio processing devices than those shown in FIG. 12 . Fewer parts, or a combination of certain parts.
- the embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned chorus audio processing method are implemented. .
- the steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two.
- the software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
Claims (13)
- 一种合唱音频的处理方法,其特征在于,包括:A method for processing chorus audio, comprising:分别获得多个演唱者对同一目标歌曲进行演唱的干声音频;Obtain the dry audio of the same target song performed by multiple singers respectively;对获得的多个所述干声音频进行时间对齐处理;performing time alignment processing on the obtained plurality of the dry audio frequencies;对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,以将多个所述干声音频定位到多个虚拟声像上;其中,多个所述虚拟声像位于预先建立的虚拟声像坐标系中,所述虚拟声像坐标系以人头为中心,以左右耳所在直线中点为坐标原点,第一坐标轴的正方向表示人头正前方,第二坐标轴的正方向表示人头从左耳到右耳的侧方,第三坐标轴的正方向表示人头正上方,每个所述虚拟声像与所述坐标原点的距离在设定距离范围内,每个所述虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的俯仰角在设定角度范围内;Perform virtual sound image localization on a plurality of the dry sound audios after time alignment processing, so as to locate the plurality of dry sound audios on a plurality of virtual sound images; In the virtual audio-visual coordinate system, the virtual audio-visual coordinate system is centered on the human head, with the midpoint of the straight line where the left and right ears are located as the coordinate origin, the positive direction of the first coordinate axis represents the front of the human head, and the positive direction of the second coordinate axis Represents the side of the human head from the left ear to the right ear, the positive direction of the third coordinate axis represents directly above the human head, the distance between each virtual sound image and the coordinate origin is within the set distance range, and each virtual sound image is within the set distance range. The pitch angle of the sound image relative to the plane formed by the first coordinate axis and the second coordinate axis is within a set angle range;基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频;generating chorus audio based on a plurality of the dry audio audio after virtual sound image localization;在获取到基于所述目标歌曲演唱的主唱音频的情况下,将所述主唱音频、所述合唱音频和相应的伴奏进行合成后,输出大合唱效果音频。In the case of acquiring the lead vocal audio sung based on the target song, after synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment, a large chorus effect audio is output.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,所述对获得的多个所述干声音频进行时间对齐处理,包括:The method for processing chorus audio according to claim 1, wherein the performing time alignment processing on the plurality of obtained dry audio audios comprises:确定所述目标歌曲对应的参考音频;Determine the reference audio corresponding to the target song;针对获得的每个所述干声音频,分别提取当前干声音频和所述参考音频的音频特征,所述音频特征为指纹特征或基频特征;For each of the obtained dry audio frequencies, extract the audio features of the current dry audio audio and the reference audio respectively, and the audio features are fingerprint features or fundamental frequency features;将所述当前干声音频与所述参考音频的音频特征相似度最大值对应的时间确定为音频对齐时间;Determine the time corresponding to the audio feature similarity maximum value of the current dry sound audio and the reference audio as the audio alignment time;基于所述音频对齐时间,对所述当前干声音频进行时间对齐处理。Based on the audio alignment time, time alignment processing is performed on the current dry audio audio.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,还包括:The method for processing chorus audio according to claim 1, further comprising:分别对获得的多个所述干声音频进行带通滤波处理,得到多个低音数据;Band-pass filtering is performed on the obtained multiple dry audio frequencies respectively to obtain multiple bass data;相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:基于进行虚拟声像定位后的多个所述干声音频和多个所述低音数据, 生成合唱音频。A chorus audio is generated based on the plurality of the dry audio frequencies and the plurality of the bass data subjected to virtual sound image localization.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,还包括:The method for processing chorus audio according to claim 1, further comprising:分别对获得的多个所述干声音频进行混响模拟处理;respectively performing reverberation simulation processing on the obtained plurality of the dry audio frequencies;相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:基于进行虚拟声像定位后的多个所述干声音频和混响模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio frequencies subjected to virtual sound image localization and the plurality of the dry audio frequencies subjected to reverberation simulation processing.
- 根据权利要求4所述的合唱音频的处理方法,其特征在于,所述分别对获得的多个所述干声音频进行混响模拟处理,包括:The method for processing chorus audio according to claim 4, wherein the performing reverberation simulation processing on the obtained plurality of the dry audio frequencies respectively comprises:利用梳状滤波器和全通滤波器的级联分别对获得的多个所述干声音频进行混响模拟处理。Reverberation simulation processing is respectively performed on the obtained plurality of the dry audio frequencies by using the cascade of comb filters and all-pass filters.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,在所述对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位之后,还包括:The method for processing chorus audio according to claim 1, wherein after performing virtual sound image localization on the plurality of dry audio audios subjected to time alignment processing, the method further comprises:分别对进行虚拟声像定位后的多个所述干声音频进行混响模拟处理;respectively performing reverberation simulation processing on a plurality of the dry audio frequencies after virtual sound image localization;相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:基于进行虚拟声像定位,且进行混响模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of dry audio audios subjected to virtual sound image localization and reverberation simulation processing.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,还包括:The method for processing chorus audio according to claim 1, further comprising:分别对获得的多个所述干声音频进行双声道模拟处理;respectively perform binaural simulation processing on the obtained plurality of the dry audio frequencies;相应的,所述基于进行虚拟声像定位后的多个所述干声音频,生成合唱音频,包括:Correspondingly, the chorus audio is generated based on the plurality of dry sound audios after virtual sound image localization, including:基于进行虚拟声像定位后的多个所述干声音频和双声道模拟处理后的多个所述干声音频,生成合唱音频。A chorus audio is generated based on the plurality of the dry audio audio after virtual sound image localization and the plurality of the dry audio audio after binaural simulation processing.
- 根据权利要求7所述的合唱音频的处理方法,其特征在于,在所述分别对获得的多个所述干声音频进行双声道模拟处理之后,还包括:The method for processing chorus audio according to claim 7, characterized in that, after performing binaural simulation processing on the obtained plurality of dry audio audios, the method further comprises:对进行双声道模拟处理后的多个所述干声音频进行混响模拟处理;performing reverberation simulation processing on a plurality of the dry audio frequencies after the binaural simulation processing;相应的,所述基于进行虚拟声像定位后的多个所述干声音频和双声道 模拟处理后的多个所述干声音频,生成合唱音频,包括:Correspondingly, described based on a plurality of described dry sound audio frequency after virtual sound image localization and a plurality of described dry sound audio frequency after binaural simulation processing, generate chorus audio, including:基于进行虚拟声像定位后的多个所述干声音频、双声道模拟处理及混响模拟处理后的多个所述干声音频,生成合唱音频。The chorus audio is generated based on the plurality of the dry audio audios after virtual sound image localization, the plurality of the dry audio audios after the binaural simulation processing and the reverberation simulation processing.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,所述对进行时间对齐处理后的多个所述干声音频进行虚拟声像定位,包括:The method for processing chorus audio according to claim 1, wherein the performing virtual sound image localization on a plurality of the dry audio audio after time alignment processing comprises:按照虚拟声像的个数,将获得的进行时间对齐处理后的多个所述干声音频进行分组,组数与虚拟声像的个数相同;According to the number of virtual sound images, the obtained dry sound audio frequency after time alignment processing is grouped, and the number of groups is the same as the number of virtual sound images;将各组干声音频分别定位到对应的虚拟声像上,不同组干声音频对应不同虚拟声像。Each group of dry audio audio is positioned on the corresponding virtual audio image, and different groups of dry audio audio correspond to different virtual audio images.
- 根据权利要求1所述的合唱音频的处理方法,其特征在于,The processing method of chorus audio according to claim 1, wherein,多个所述虚拟声像中,位于人头后方的虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的仰角大于位于人头前方的虚拟声像相对于所述第一坐标轴和所述第二坐标轴构成的平面的仰角;Among the plurality of virtual sound images, the elevation angle of the virtual sound image located behind the human head relative to the plane formed by the first coordinate axis and the second coordinate axis is greater than that of the virtual sound image located in front of the human head relative to the first coordinate axis. the elevation angle of the plane formed by the coordinate axis and the second coordinate axis;或者,or,每个所述虚拟声像均匀分布在所述第一坐标轴和所述第二坐标轴构成的平面的一周。Each of the virtual sound images is evenly distributed on a circumference of the plane formed by the first coordinate axis and the second coordinate axis.
- 根据权利要求1至10之中任一项所述的合唱音频的处理方法,其特征在于,所述将所述主唱音频、所述合唱音频和相应的伴奏进行合成,包括:The method for processing chorus audio according to any one of claims 1 to 10, wherein the synthesizing the lead vocal audio, the chorus audio and the corresponding accompaniment comprises:对所述主唱音频和所述合唱音频分别进行音量调整,和/或,对所述主唱音频和所述合唱音频进行混响模拟处理;respectively performing volume adjustment on the lead vocal audio and the chorus audio, and/or performing reverberation simulation processing on the lead vocal audio and the chorus audio;将进行音量调整和/或进行混响模拟处理后的所述主唱音频、所述合唱音频和相应的伴奏进行合成。The lead vocal audio, the chorus audio and the corresponding accompaniment after volume adjustment and/or reverberation simulation processing are synthesized.
- 一种合唱音频的处理设备,其特征在于,包括:A processing device for chorus audio, comprising:存储器,用于存储计算机程序;memory for storing computer programs;处理器,用于执行所述计算机程序时实现如权利要求1至11任一项所述的合唱音频的处理方法的步骤。The processor is configured to implement the steps of the chorus audio processing method according to any one of claims 1 to 11 when executing the computer program.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求 1至11任一项所述的合唱音频的处理方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the chorus audio according to any one of claims 1 to 11 is implemented. The steps of the processing method.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110460280.4 | 2021-04-27 | ||
CN202110460280.4A CN113192486B (en) | 2021-04-27 | 2021-04-27 | Chorus audio processing method, chorus audio processing equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022228220A1 true WO2022228220A1 (en) | 2022-11-03 |
Family
ID=76979435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/087784 WO2022228220A1 (en) | 2021-04-27 | 2022-04-20 | Method and device for processing chorus audio, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113192486B (en) |
WO (1) | WO2022228220A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113192486B (en) * | 2021-04-27 | 2024-01-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Chorus audio processing method, chorus audio processing equipment and storage medium |
CN114242025A (en) * | 2021-12-14 | 2022-03-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for generating accompaniment and storage medium |
CN114363793B (en) * | 2022-01-12 | 2024-06-11 | 厦门市思芯微科技有限公司 | System and method for converting double-channel audio into virtual surrounding 5.1-channel audio |
CN114630145A (en) * | 2022-03-17 | 2022-06-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Multimedia data synthesis method, equipment and storage medium |
CN116170613B (en) * | 2022-09-08 | 2024-09-24 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio stream processing method, computer device and computer program product |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000333297A (en) * | 1999-05-14 | 2000-11-30 | Sound Vision:Kk | Stereophonic sound generator, method for generating stereophonic sound, and medium storing stereophonic sound |
CN105208039A (en) * | 2015-10-10 | 2015-12-30 | 广州华多网络科技有限公司 | Chorusing method and system for online vocal concert |
CN106331977A (en) * | 2016-08-22 | 2017-01-11 | 北京时代拓灵科技有限公司 | Virtual reality panoramic sound processing method for network karaoke |
CN107422862A (en) * | 2017-08-03 | 2017-12-01 | 嗨皮乐镜(北京)科技有限公司 | A kind of method that virtual image interacts in virtual reality scenario |
CN109785820A (en) * | 2019-03-01 | 2019-05-21 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of processing method, device and equipment |
CN110379401A (en) * | 2019-08-12 | 2019-10-25 | 黑盒子科技(北京)有限公司 | A kind of music is virtually chorused system and method |
CN110992970A (en) * | 2019-12-13 | 2020-04-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio synthesis method and related device |
CN113192486A (en) * | 2021-04-27 | 2021-07-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, equipment and storage medium for processing chorus audio |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4780057B2 (en) * | 2007-08-06 | 2011-09-28 | ヤマハ株式会社 | Sound field generator |
CN108269560A (en) * | 2017-01-04 | 2018-07-10 | 北京酷我科技有限公司 | A kind of speech synthesizing method and system |
CN111028818B (en) * | 2019-11-14 | 2022-11-22 | 北京达佳互联信息技术有限公司 | Chorus method, apparatus, electronic device and storage medium |
-
2021
- 2021-04-27 CN CN202110460280.4A patent/CN113192486B/en active Active
-
2022
- 2022-04-20 WO PCT/CN2022/087784 patent/WO2022228220A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000333297A (en) * | 1999-05-14 | 2000-11-30 | Sound Vision:Kk | Stereophonic sound generator, method for generating stereophonic sound, and medium storing stereophonic sound |
CN105208039A (en) * | 2015-10-10 | 2015-12-30 | 广州华多网络科技有限公司 | Chorusing method and system for online vocal concert |
CN106331977A (en) * | 2016-08-22 | 2017-01-11 | 北京时代拓灵科技有限公司 | Virtual reality panoramic sound processing method for network karaoke |
CN107422862A (en) * | 2017-08-03 | 2017-12-01 | 嗨皮乐镜(北京)科技有限公司 | A kind of method that virtual image interacts in virtual reality scenario |
CN109785820A (en) * | 2019-03-01 | 2019-05-21 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of processing method, device and equipment |
CN110379401A (en) * | 2019-08-12 | 2019-10-25 | 黑盒子科技(北京)有限公司 | A kind of music is virtually chorused system and method |
CN110992970A (en) * | 2019-12-13 | 2020-04-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio synthesis method and related device |
CN113192486A (en) * | 2021-04-27 | 2021-07-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, equipment and storage medium for processing chorus audio |
Also Published As
Publication number | Publication date |
---|---|
CN113192486A (en) | 2021-07-30 |
CN113192486B (en) | 2024-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022228220A1 (en) | Method and device for processing chorus audio, and storage medium | |
Hacihabiboglu et al. | Perceptual spatial audio recording, simulation, and rendering: An overview of spatial-audio techniques based on psychoacoustics | |
JP5533248B2 (en) | Audio signal processing apparatus and audio signal processing method | |
US5371799A (en) | Stereo headphone sound source localization system | |
Brown et al. | A structural model for binaural sound synthesis | |
CN105264915B (en) | Mixing console, audio signal generator, the method for providing audio signal | |
CN105874820B (en) | Binaural audio is produced by using at least one feedback delay network in response to multi-channel audio | |
US9769589B2 (en) | Method of improving externalization of virtual surround sound | |
US9215544B2 (en) | Optimization of binaural sound spatialization based on multichannel encoding | |
CA2744429C (en) | Converter and method for converting an audio signal | |
CN113170271B (en) | Method and apparatus for processing stereo signals | |
JP2009508442A (en) | System and method for audio processing | |
WO2019229199A1 (en) | Adaptive remixing of audio content | |
JP2023517720A (en) | Reverb rendering | |
JP2014090470A (en) | Apparatus and method for stereophonizing mono signal | |
WO2023109278A1 (en) | Accompaniment generation method, device, and storage medium | |
Lee et al. | A real-time audio system for adjusting the sweet spot to the listener's position | |
US10440495B2 (en) | Virtual localization of sound | |
WO2022196073A1 (en) | Information processing system, information processing method, and program | |
Yuan et al. | Externalization improvement in a real-time binaural sound image rendering system | |
De Sena | Analysis, design and implementation of multichannel audio systems | |
JP2004509544A (en) | Audio signal processing method for speaker placed close to ear | |
GB2369976A (en) | A method of synthesising an averaged diffuse-field head-related transfer function | |
US20240292171A1 (en) | Systems and methods for efficient and accurate virtual accoustic rendering | |
Glasgal | Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22794691 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2301007017 Country of ref document: TH |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11202308147P Country of ref document: SG |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.03.2024) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22794691 Country of ref document: EP Kind code of ref document: A1 |