WO2022179110A1 - 一种混音歌曲生成方法、装置、设备及存储介质 - Google Patents
一种混音歌曲生成方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2022179110A1 WO2022179110A1 PCT/CN2021/122573 CN2021122573W WO2022179110A1 WO 2022179110 A1 WO2022179110 A1 WO 2022179110A1 CN 2021122573 W CN2021122573 W CN 2021122573W WO 2022179110 A1 WO2022179110 A1 WO 2022179110A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vocal
- song
- audio
- signal
- accompaniment
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000003860 storage Methods 0.000 title claims abstract description 22
- 230000033764 rhythmic process Effects 0.000 claims abstract description 90
- 230000001755 vocal effect Effects 0.000 claims description 296
- 238000013507 mapping Methods 0.000 claims description 23
- 238000012217 deletion Methods 0.000 claims description 17
- 230000037430 deletion Effects 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 15
- 239000000203 mixture Substances 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 239000000284 extract Substances 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 10
- 230000003595 spectral effect Effects 0.000 claims description 6
- 230000000717 retained effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 31
- 230000008569 process Effects 0.000 description 14
- 241001658044 Beata Species 0.000 description 12
- 238000010586 diagram Methods 0.000 description 12
- 238000004519 manufacturing process Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- DIRFUJHNVNOBMY-UHFFFAOYSA-N fenobucarb Chemical compound CCC(C)C1=CC=CC=C1OC(=O)NC DIRFUJHNVNOBMY-UHFFFAOYSA-N 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 241001342895 Chorus Species 0.000 description 3
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 description 3
- 238000013138 pruning Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/005—Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/125—Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
Definitions
- the present application relates to the technical field of computer signal processing, and in particular, to a method, apparatus, device and storage medium for generating a mixed song.
- the current way to make a remix song is to mix the left channel audio of one song with the right channel audio of another song, creating a wonderful stereo effect.
- these two songs are two different sung versions of the same song.
- the purpose of the present application is to provide a method, apparatus, device and storage medium for generating a mixed song, so as to cover more songs to generate a mixed song with a good mixing effect. Its specific plan is as follows:
- the application provides a method for generating a remixed song, comprising:
- the accompaniment signal aligned with the audio track of the vocal audio is determined as the accompaniment audio to be mixed
- the vocal audio and the accompaniment audio are mixed to obtain a mixed song.
- the present application also provides an apparatus for generating a mixed song, comprising:
- an acquisition module for acquiring at least two song audios; the at least two song audios are different singing versions of the same song;
- the extraction module is used to extract the vocal signal and the accompaniment signal in the audio frequency of each song, and obtains a vocal set comprising at least two vocal signals and an accompaniment set comprising at least two accompaniment signals;
- the alignment module is used to select reference rhythm information from the rhythm information corresponding to the audio of each song, align all the vocal signals in the vocal set based on the reference rhythm information, and align all the audio tracks after the alignment.
- the vocal signal is used as the vocal audio to be mixed;
- a selection module for determining the accompaniment signal aligned with the track of the vocal audio in the accompaniment set as the accompaniment audio to be mixed
- a mixing module configured to mix the vocal audio and the accompaniment audio to obtain a mixed song.
- the present application also provides an electronic device, the electronic device includes a processor and a memory; wherein the memory is used to store a computer program, and the computer program is loaded and executed by the processor to realize the foregoing Remix song generation method.
- the present application also provides a storage medium, where computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, the foregoing method for generating a mixed song is implemented.
- the present application extracts the vocal signal and the accompaniment signal in each song audio, and then selects the reference rhythm information in the rhythm information corresponding to each song audio, All vocal signals are track-aligned based on the reference rhythm information, and all of the track-aligned vocal signals are used as the vocal audio to be mixed, and the accompaniment signal aligned with the vocal audio track is selected as the to-be-mixed accompaniment signal accompaniment audio, and finally mix the vocal audio and accompaniment audio to get a remixed song.
- the application can mix at least two singing versions of the same song, and can cover more songs for mixing, and during the mixing process, all vocal signals in each singing version are track-aligned, and The accompaniment signal that is aligned with the vocal signal track is selected, so when mixing vocals and accompaniment, elements such as lyrics and beats can be kept in harmony and synchronization, and a remixed song with good mixing effect can be obtained. Effect.
- the device, device and storage medium for generating a mixed song provided by the present application also have the above technical effects.
- FIG. 1 is a schematic diagram of a physical architecture applicable to the present application provided by the present application.
- FIG. 2 is a flow chart of a method for generating a remixed song provided by the application
- Fig. 4 is a kind of Beat point schematic diagram that this application provides
- FIG. 5 is a schematic diagram of a data segment corresponding to an adjacent beat group provided by the present application.
- FIG. 6 is a schematic diagram of a data segment corresponding to another adjacent beat group provided by the present application.
- FIG. 8 is a flowchart of a method for producing a remixed song provided by the application.
- FIG. 9 is a schematic diagram of a mixing song generation device provided by the application.
- Fig. 10 is a kind of server structure diagram provided by this application.
- FIG. 11 is a structural diagram of a terminal provided by this application.
- the present application proposes a mixed song generation scheme, which can cover more songs for mixing, and during the mixing process, all vocal signals in each singing version are tracked Align and select the accompaniment signal that is aligned with the vocal signal track, so when mixing vocals and accompaniment, elements such as lyrics, beats and other elements can be kept in harmony and synchronization, and a remixed song with good mixing effect can be obtained. mix effect.
- the method for generating a mixed song provided by the present application can be applied to a system or program with a sound mixing function, such as a music game.
- a system or program with a sound mixing function may run in a server, a personal computer, or other devices.
- FIG. 1 is a schematic diagram of a physical architecture to which this application applies.
- a system or program with a sound mixing function can run on a server, and the server obtains the song audio of at least two singing versions of the same song from other terminal devices through the network; extracts the human voice in the audio of each song signal and accompaniment signal, obtain a vocal set including at least two vocal signals and an accompaniment set including at least two accompaniment signals; select reference rhythm information in the rhythm information corresponding to the audio of each song, All the vocal signals of the accompaniment set are track-aligned, and all the track-aligned vocal signals are used as the vocal audio to be mixed; the accompaniment signals in the accompaniment set that are aligned with the vocal audio tracks are determined as the to-be-mixed accompaniment signals accompaniment audio for vocals; mix vocal audio and accompaniment audio to end up with a remixed song.
- the server can establish communication connections with multiple devices, and the server obtains song audio for mixing from these devices.
- song audio for mixing can also be stored in a database.
- the server can obtain the corresponding mixed songs by collecting the audio of songs uploaded by these devices and mixing them.
- Figure 1 shows a variety of terminal devices. In an actual scenario, more or less types of terminal devices may participate in the mixing process. The specific number and type depend on the actual scenario, which is not limited here. , Figure 1 shows one server, but in an actual scenario, multiple servers can also participate, and the specific number of servers depends on the actual scenario.
- the method for generating a mixed song provided in this embodiment can be performed offline, that is, the server locally stores song audio for mixing, which can directly use the solution provided in this application to mix to obtain a desired mixed song.
- FIG. 2 is a flowchart of a first method for generating a mixed song provided by an embodiment of the present application.
- the method for generating a mixed song may include the following steps:
- the different singing versions of the same song are: the original singing version, the cover version, the adapted version, etc. of the song.
- Song audio is a song in formats such as MP3.
- Method 2 Extract the left-channel vocals and right-channel vocals in the audio of each song, and determine the amplitude average or spectral feature average value of the left-channel vocals and the right-channel vocals as each song.
- the average value of the amplitude corresponds to the range of the time domain
- the average value of the spectral features corresponds to the range of the frequency domain, that is, the left channel vocal and the right channel vocal can be processed based on the two dimensions of the time domain and the frequency domain.
- extracting the accompaniment signal in the audio of each song includes: extracting the accompaniment of the left channel or the accompaniment of the right channel in the audio of each song, and determining the accompaniment of the left channel or the accompaniment of the right channel as the accompaniment in the audio of each song Signal.
- the left and right channel audio of a certain song audio is dataLeft and dataRight respectively
- the left channel accompaniment can be extracted from dataLeft as the accompaniment signal of the song audio
- the right channel accompaniment can also be extracted from dataRight as the accompaniment signal of the song audio.
- Extracting the vocal signal and accompaniment signal in the audio of each song can also be achieved using a vocal accompaniment separation tool (such as spleeter, etc.). Assuming that two different versions of the same song are song1 and song2, respectively, after the vocal accompaniment is separated, two vocal signals can be obtained: vocal1 and vocal2, and two accompaniment signals: surround1 and surround2.
- a vocal accompaniment separation tool such as spleeter, etc.
- determining the accompaniment signal aligned with the vocal track in the accompaniment set as the accompaniment audio to be mixed includes: in the accompaniment set, selecting the accompaniment signal aligned with the reference rhythm information as the accompaniment signal to be mixed Accompaniment audio to be mixed; or after track alignment of any accompaniment signal in the accompaniment set with the reference rhythm information, it is used as the accompaniment audio to be mixed.
- mixing the vocal audio and the accompaniment audio to obtain a mixed song includes: calculating the gain value of the left channel and the gain value of the right channel; Stereo signal of a vocal signal; mix individual stereo signals and accompaniment audio to get a mixed song.
- the signal on the right channel which is the stereo signal of the vocal signal.
- vocalALeft vocalA ⁇ gainLeft
- vocalARight vocalalA ⁇ gainRight
- the alpha When the alpha is adjusted in the direction of less than 0.5, it means that the final mixing effect is to enhance the background (that is, the accompaniment) sound, thereby increasing the surround and immersion of the music; when the alpha is adjusted in the direction of greater than 0.5, it means that the final sound
- the mixing effect is to elevate the clarity of the vocals, thus creating the effect of clear vocals.
- each stereo signal before mixing each stereo signal and accompaniment audio, software such as an equalizer can also be used to enhance the low-frequency components of the surround to enhance the rhythm of the entire music. Or, before mixing each stereo signal and the accompaniment audio, each stereo signal is processed without changing the pitch, so as to obtain more singing styles.
- Method 1 Calculate the gain value of the left channel and the gain value of the right channel according to the preset sound image angle and the preset position of the vocal signal in the preset sound image angle. Set the pan angle to thetaBase, set the position of the vocal signal in the preset pan angle to theta, then the gain value is:
- Method 2 Calculate the gain value of the left channel and the gain value of the right channel by assigning a linear gain. Assuming that the human voice is positioned to the left of the center, then
- Mode 1 uses the method of setting the modulation angle for audio-image modulation
- mode 2 uses the method of distributing linear gain for audio-image modulation, both of which can place the human voice at any position between the left and right 90 degrees, thereby forming a sound image.
- the simultaneous chorus effect can create a more three-dimensional voice image, and the chorus effect can be controlled, so that the user can easily and conveniently adjust the position of the voice image without changing the spectral components of the voice signal. It really combines two voices that are not in the same time and space into the same song.
- each vocal signal in the vocal audio can also be determined over time. For example: in a certain period of time, only one or a few vocal signals appear to achieve duet effect.
- this embodiment can mix at least two singing versions of the same song, and can cover more songs for mixing, and during the mixing process, the reference rhythm information is selected from the rhythm information corresponding to the audio of each song, Based on the reference rhythm information, all vocal signals in each singing version are track-aligned, and the accompaniment signal that is aligned with the vocal signal is selected. Therefore, when mixing vocals and accompaniment, elements such as lyrics and beats can be used. Keeping coordination and synchronization, you get a well-remixed song that improves your mix.
- the alignment method includes:
- the beat information in the audio of each song can be extracted using beattracker or drum beat extraction algorithms.
- the beat information in the beat set and the vocal signal in the vocal set have a one-to-one mapping relationship.
- 3 vocal signals ie vocal set
- vocalA, vocalB and vocalC 3 vocal signals
- accompaniment signals ie accompaniment set
- surroundA, surroundB and surroundC 3 beat information (ie, beat collection): BeatA, BeatB, BeatC.
- the elements in the above three sets have a one-to-one mapping relationship, namely: vocalA-surroundA-BeatA, vocalB-surroundB-BeatB, and vocalC-surroundC-BeatC.
- S302 Determine whether the number of elements included in each beat information in the beat set is the same; if so, execute S303; if not, execute S308.
- each beat information in the beat set includes multiple elements (that is, the beat, that is, the beat point). If the number of elements included in different beat information is the same, it indicates that the rhythm of the corresponding song audio is similar and belongs to the same Arrangement, beat points are not much different, so the steps of S303-S307 can be used for rough alignment. On the contrary, if the number of elements included in different beat information is different, it indicates that the rhythm of the corresponding song audio is quite different and does not belong to the same arrangement. The beat point may be quite different and needs to be adjusted frame by frame, so it is necessary to use S309-S313. The steps are segmented for more detailed alignment.
- FIG. 4 For the Beat points included in the beat information, reference may be made to FIG. 4 . “1, 2, 3...n, n+1...” in FIG. 4 represents each data frame in the audio of the song.
- the arrows indicate the time stamp positions corresponding to the beat points, and the positions corresponding to these beat points are also applicable to human voice signals.
- the second beat information is other beat information other than the first beat information in the beat set. For example, suppose that BeatA is selected from the above beat set as the first beat information, then BeatB and BeatC are the second beat information.
- the second vocal signal is another vocal signal in the vocal set except the first vocal signal
- the first vocal signal is a vocal signal in the vocal set having a mapping relationship with the first beat information.
- S306. Determine a corresponding difference value required for adjusting each second human voice signal according to the first corresponding relationship, and determine the redundant end and the to-be-filled end of each second human voice signal based on the corresponding difference value.
- Steps S303-S307 align the vocal signal by panning the vocal signal as a whole, which follows the Euclidean distance minimization principle.
- M is a positive number, it means that the singer of the song audio A starts singing later than the singer of the song audio B, then using vocalA as the comparison benchmark, move vocalB backward (right) M data points, and the head and tail of vocalA are used as reference points to determine the redundant end and the to-be-complemented end of vocalB.
- the redundant end the part of the translated vocalB that exceeds the vocalA is cut off; for the end to be complemented, the part that is lacking in the vocalB compared with the vocalA is filled with zeros, so that the vocalB and the vocalA can be aligned.
- S308 determine whether the number of currently acquired song audios is only two; if so, execute S309; if not, exit the process.
- the reference tempo information is the third tempo information
- the third tempo information is tempo information with the smallest number of elements in the tempo set.
- the fourth beat information is other beat information except the third beat information in the beat set. Assuming that the beat set includes: BeatA and BeatB, and BeatA includes 3 elements: aA, bA, cA, and BeatB includes 4 elements: aB, bB, cB, dB, then BeatA is the third beat information, and BeatB is the fourth beat information .
- reducing the number of elements in the fourth beat information to be the same as the number of elements in the third beat information includes: arranging each element in the third beat information into a target sequence according to the time stamp size; determining the current iteration number of times, determine the element in the target sequence at the arrangement position equal to the current iteration number as the target element; calculate the time stamp distance between the target element and each comparison element respectively; the comparison element is the fourth beat information that does not match any element in the target sequence.
- the current number of iterations is incremented by one, and the current number of iterations is determined, and the element in the target sequence at the arrangement position equal to the current number of iterations is determined as the target element; the target element and each comparison are calculated separately.
- the timestamp distance of the element the step of determining the contrast element corresponding to the minimum timestamp distance as the element matching the target element, until the current number of iterations is not less than the maximum number of iterations.
- the maximum number of iterations is the number of elements in the third beat information.
- the specific process is as follows: Assuming that the elements in BeatA have been arranged in ascending order of timestamps, the maximum number of iterations is 3. In the first iteration, the current iteration number is 1, then the target element is aA. At this time, the timestamp distances of aA and aB, aA and bB, aA and cB, aA and dB are calculated respectively, and 4 distances can be obtained: 0.1 , 0.2, 0.3, 0.4; then the minimum timestamp distance is 0.1, and its corresponding contrast element is aB, so it is determined that aA matches aB.
- the number of iterations is less than the maximum number of iterations of 3, then the number of iterations changes from 1 to 2, then the target element of the second round of iteration is bA; since aA matches aB, then aB is no longer a comparison element, so calculate bA and bB,
- the timestamp distance between bA and cB, bA and dB three distances can be obtained: 0.5, 0.6, and 0.7; then the minimum timestamp distance is 0.5, and the corresponding comparison element is bB, so it is determined that bA and bB match.
- the number of iterations is less than the maximum number of iterations of 3, and the number of iterations changes from 2 to 3, then the target element of the third round of iteration is cA; since aA matches aB, bA matches bB, then aB and bB are no longer comparison elements. , so by calculating the timestamp distance between cA and cB, cA and dB, two distances can be obtained: 0.7 and 0.8, then the minimum timestamp distance is 0.7, and the corresponding contrast element is cB, so it is determined that cA and cB match.
- BeatA includes 3 elements: aA, bA, cA
- BeatB includes 3 elements: aB, bB, cB.
- S311 Determine a plurality of adjacent beat groups based on the third beat information or the fourth beat information.
- BeatA includes 3 elements: aA, bA, cA
- BeatB includes 3 elements: aB, bB, cB.
- two adjacent beat groups can be determined, a and b, b and c.
- the first data segment corresponding to a and b is the segment in vocalA corresponding to aA-bA
- the second data segment is the segment in vocalB corresponding to aB-bB
- the first data segment corresponding to b and c is the segment in vocalA corresponding to bA ⁇ cA
- the second data segment is the segment in vocalB corresponding to bB ⁇ cB.
- FIG. 5 illustrates an adjacent beat group a and b.
- the first data segment (segment in vocalA) corresponding to the adjacent beat group includes 4 data frames (data frames 2, 3, 4, 5), the second data segment (segment in vocalB) includes 3 data frames (data frames 2, 3, and 4).
- the third human voice signal is a human voice signal in the human voice set that has a mapping relationship with the third beat information
- the fourth human voice signal is other human voice signals in the human voice set except the third human voice signal. If BeatA is the third beat information and BeatB is the fourth beat information, then the third vocal signal is vocalA, and the fourth vocal signal is vocalB.
- the first data segment is a segment in the third vocal signal, and the second data segment is a segment in the fourth vocal signal.
- the number of first data frames in the first data segment is equal to the number of second data frames in the second data segment The number of data frames.
- the number of first data frames in the first data segment is not equal to the number of second data frames in the second data segment, then the maximum number of the first data frames and the number of second data frames
- the data segment corresponding to the value is determined as the segment to be deleted; the number of deletions of each data frame in the segment to be deleted is calculated, and each data frame in the segment to be deleted is deleted according to the number of deletions.
- FIG. 6 illustrates an adjacent beat group b and c.
- the first data segment (segment in vocalA) corresponding to the adjacent beat group includes 3 data frames (data frames 2, 3, and 4).
- the second data segment (segment in vocalB) includes 4 data frames (data frames 2, 3, 4, and 5). It can be seen that, when performing data deletion for each adjacent beat group in this embodiment, sometimes vocalA needs to be deleted, and sometimes vocalB needs to be deleted. Therefore, steps S309 to S313 only mix audios of two songs. Align each data segment in vocalA and vocalB according to steps S309 to S313, so that the alignment of vocalA and vocalB can be realized.
- vocal1 and vocal2 can be aligned according to S309 ⁇ S313 to obtain mutually aligned vocal1' and vocal2'.
- the data in vocal1' and vocal2' The number of frames is equal, so vocal1' and vocal2' can be considered to be the same (meaning the same number of data frames).
- align vocal1' and vocal3, vocal2' and vocal3 respectively to complete the alignment of the three vocal signals.
- vocal1' and vocal2' can be considered to be the same, the data deleted when they align vocal3 are exactly the same.
- the deleted data is also the same. Therefore, by aligning vocal1' and vocal3, vocal2' and vocal3, the same vocal3' can be obtained. Finally, vocal1", vocal2" and vocal3' can be obtained which are aligned with each other.
- the corresponding accompaniment signal also needs to be aligned in the same alignment as the vocal signal, and finally output the accompaniment signal aligned with all the aligned vocal signals.
- tracks of different versions of human voices are aligned according to the beat information of the song audio.
- at least two singing versions of the same song can be mixed, and more songs can be covered for mixing, and during the mixing process, the reference rhythm information is selected from the rhythm information corresponding to the audio of each song, based on the reference rhythm information.
- the rhythm information tracks all vocal signals in each singing version, and selects the accompaniment signal that is aligned with the vocal signal's track, so when mixing vocals and accompaniment, elements such as lyrics, beats, etc. can be kept in harmony Synchronization and synchronization, get a remixed song with good mixing effect, and improve the mixing effect.
- the alignment method provided by this embodiment includes:
- the BPM value corresponding to the audio of each song can be counted by using the BPM detection algorithm.
- BPM is the abbreviation of Beat Per Minute, also known as the number of beats, which means the number of beats per minute.
- the BPM values in the BPM value set and the vocal signals in the vocal set have a one-to-one mapping relationship.
- 3 vocal signals ie, vocal set
- 3 BPM values ie, BPM value set
- BPMA BPMA
- BPMB BPMB
- BPMC BPMC
- the reference BPM value is the reference tempo information. At this time, one BPM value may be randomly selected from the BPM value set as the reference BPM value.
- the target BPM value is other BPM values other than the reference BPM value in the BPM value set. Assuming that BPMA is selected from the BPM value set as the reference BPM value, then BPMB and BPMC are the target BPM values. From this, the ratios can be obtained: BPMA/BPMB, BPMA/BPMC.
- the target human voice signal is other human voice signals in the human voice set except the reference human voice signal, and the reference human voice signal is a human voice signal in the human voice set that has a mapping relationship with the reference BPM value. If BPMA is selected as the reference BPM value, then the reference vocal signal is vocalA, and the target vocal signals are vocalB and vocalC.
- S705 Determine a corresponding ratio required to adjust each target human voice signal according to the second corresponding relationship, and perform variable speed and invariant pitch processing on each target human voice signal based on the corresponding ratio.
- BPMA/BPMB corresponds to vocalB
- BPMA/BPMC corresponds to vocalC
- This embodiment can be implemented by using a variable-speed and constant-modulation processor.
- tracks of different versions of human voices are aligned according to the beat information of the song audio.
- at least two singing versions of the same song can be mixed, and more songs can be covered for mixing, and during the mixing process, the reference rhythm information is selected from the rhythm information corresponding to the audio of each song, based on the reference rhythm information.
- the rhythm information tracks all vocal signals in each singing version, and selects the accompaniment signal that is aligned with the vocal signal's track, so when mixing vocals and accompaniment, elements such as lyrics, beats, etc. can be kept in harmony Synchronization and synchronization, get a remixed song with good mixing effect, and improve the mixing effect.
- vocalA is randomly selected as the standard vocal signal
- adjusted vocalB vocalB ⁇ (RMSA/RMSB)
- adjusted vocalC vocalC ⁇ (RMSA/RMSC)
- This embodiment utilizes the principle of energy difference between the left and right ears, which can reduce the difference in loudness of different human voice signals, and obtain a human voice chorus effect with a stereo image.
- the remix song generation scheme can create remix songs based on existing songs.
- a corresponding mixed song production tool can be designed, and the production of the mixed song can be completed by using the tool.
- Remix Song Maker can be installed on any computer device.
- the mixed song production tool executes the mixed song generation method provided by this application.
- the process of making a remixed song may include the following steps:
- the client uploads the song audio of at least two singing versions of the same song to the server;
- the server inputs the audio of each song into the mixing song production tool in itself, and outputs the mixed song by the mixing song production tool;
- the mixed song creation tool can cover all the songs in the music library. Users can upload any songs they want to remix for remixing. If there is only one singing version of a song in the music library, you can sing it along with the separated accompaniment, so as to create a mixing effect in which you and the professional singer appear in the same song. Moreover, the different singing versions used for mixing only need to have the same score, even if they are performed in different languages.
- the human voice is aligned based on the Beat point and BPM value of the song.
- the human voice can be clarified or the background can be enhanced to widen the sound field, and the tone of the human voice can be adjusted to adjust the background The ratio of the spectral energy of the sound.
- the user can not only adapt the human voice (produce multi-directional dual-tone effects, or perform pitch-shifting processing on the vocals of the song), but also adapt the background sound (produce clear human voice, sound field, etc.) broadening and rhythm enhancement, etc.).
- This production method greatly expands the range of songs covered by the dual-tone effect, and at the same time, it also makes the production of the mixing effect more adaptable content and methods.
- FIG. 9 is a schematic diagram of an apparatus for generating a mixed song provided by an embodiment of the present application, including:
- Obtaining module 901 is used to obtain at least two song audios; at least two song audios are different singing versions of the same song;
- the extraction module 902 is used to extract the vocal signal and the accompaniment signal in the audio of each song to obtain a vocal set including at least two vocal signals and an accompaniment set including at least two accompaniment signals;
- Alignment module 903 used for selecting reference rhythm information in the rhythm information corresponding to the audio of each song, aligning all the vocal signals in the vocal set based on the benchmark rhythm information, and aligning all the vocal signals after the audio tracks are aligned As the vocal audio to be mixed;
- the selection module 904 is used to determine the accompaniment signal aligned with the track of the vocal audio in the accompaniment set as the accompaniment audio to be mixed;
- the mixing module 905 is used for mixing vocal audio and accompaniment audio to obtain a mixed song.
- the extraction module includes:
- the first extraction unit is used to calculate the corresponding mid-position signal of each song audio, and extract the vocal signal in each song audio from the mid-position signal;
- the second extraction unit is used to extract the left-channel vocals and the right-channel vocals in the audio of each song, and determine the amplitude average value or the spectral characteristic average value of the left-channel vocals and the right-channel vocals to determine for the vocal signal in the audio for each song.
- the extraction module includes:
- the third extraction unit is configured to extract the left channel accompaniment or the right channel accompaniment in the audio of each song, and determine the left channel accompaniment or the right channel accompaniment as the accompaniment signal in the audio of each song.
- the alignment module includes:
- the beat extraction unit is used to extract the beat information in the audio of each song, and obtain a beat set including at least two beat information; the beat information in the beat set and the vocal signal in the vocal set have a one-to-one mapping relationship;
- a first selection unit configured to determine that the reference rhythm information is the first rhythm information if the number of elements included in each rhythm information in the rhythm set is the same, and the first rhythm information is any rhythm information in the rhythm set;
- the first calculation unit is used to calculate the difference value between the first beat information and each second beat information respectively; the second beat information is other beat information except the first beat information in the beat set;
- the first determination unit is used to determine the first correspondence between each difference value and each second vocal signal according to a one-to-one mapping relationship;
- the second vocal signal is other people in the vocal set except the first vocal signal an acoustic signal, the first vocal signal is a vocal signal that has a mapping relationship with the first beat information in the vocal set;
- the second determination unit is configured to determine the corresponding difference value required to adjust each second vocal signal according to the first correspondence, and determine the redundant end and the to-be-filled end of each second vocal signal based on the corresponding difference value ;
- the first alignment unit is used to delete redundant data equal to the difference value from the redundant end of each second vocal signal, and add an amount equal to the difference value at the to-be-filled end of each second vocal signal all zero data.
- the first computing unit is specifically used for:
- M is the difference between Beat0 and BeatX; Beat0 is the vector representation of the first beat information; BeatX is the vector representation of any second beat information; sum(Beat0–BeatX) is the bitwise subtraction of each element in Beat0 and BeatX , the accumulated sum of all the differences obtained; numBeats is the number of elements included in each beat information; L is the length of the unit data frame.
- the alignment module further includes:
- the second selection unit is configured to determine that the reference rhythm information is the third rhythm information if two song audios are acquired and the number of elements included in each rhythm information in the rhythm set is different; the third rhythm information is the number of elements in the rhythm set. The least number of beat information;
- the deletion unit is used to delete the number of elements in the fourth beat information to be the same as the number of elements in the third beat information; the fourth beat information is other beat information except the third beat information in the beat set;
- a third determining unit configured to determine a plurality of adjacent beat groups based on the third beat information or the fourth beat information
- the dividing unit is used to divide the third vocal signal and the fourth vocal signal according to each adjacent beat group, and obtain the first data segment and the second data segment corresponding to each adjacent beat group;
- the third vocal signal is a human
- the vocal signal in the vocal set has a mapping relationship with the third beat information, and the fourth vocal signal is other vocal signals in the vocal set except the third vocal signal;
- the second alignment unit is configured to make the data length of the first data segment equal to the data length of the second data segment for each adjacent beat group.
- the second alignment unit includes:
- the first determination subunit is used for determining the number of first data frames and the number of second data frames if the number of first data frames in the first data segment is not equal to the number of second data frames in the second data segment.
- the data segment corresponding to the largest value in the number is determined as the segment to be deleted;
- the first calculation subunit is used to calculate the number of deletions of each data frame in the segment to be deleted, and delete each data frame in the segment to be deleted according to the number of deletions.
- calculation subunit is specifically used for:
- P is the number of deletions in each data frame
- m is the maximum value
- n is the minimum value between the number of the first data frame and the number of the second data frame
- L is the length of the unit data frame.
- the pruning unit includes:
- Arranging subunits for arranging each element in the third beat information into a target sequence according to the size of the timestamp
- the second determination subunit is used to determine the current iteration number, and determines the element on the arrangement position equal to the current iteration number in the target sequence as the target element;
- the second calculation subunit is used to calculate the time stamp distance between the target element and each contrast element respectively;
- the contrast element is an element that does not match any element in the target sequence in the fourth beat information;
- the third determination subunit is used to determine the contrast element corresponding to the minimum time stamp distance as the element matching the target element
- the deletion subunit is used to delete the contrast element in the current fourth beat information if the current iteration number is not less than the maximum iteration number, and retain the elements matching each target element in the fourth beat information.
- the pruning unit also includes:
- the iteration subunit is used to increment the current iteration number by one if the current iteration number is less than the maximum iteration number, and execute the steps in the second determination subunit, the second calculation subunit, and the third determination subunit until the current iteration number is no longer than less than the maximum number of iterations.
- the alignment module includes:
- the statistical unit is used to count the BPM value corresponding to each song audio, and obtains a BPM value set including at least two BPM values; the BPM value in the BPM value set and the vocal signal in the vocal set have a one-to-one mapping relationship;
- the third selection unit is used to select a BPM value from the BPM value set as the reference BPM value; the reference BPM value is the reference rhythm information;
- the second calculation unit is used to calculate the ratio of the reference BPM value to each target BPM value; the target BPM value is other BPM values except the reference BPM value in the BPM value set;
- the fourth determination unit is used to determine the second correspondence between each ratio and each target vocal signal according to a one-to-one mapping relationship;
- the human voice signal is a human voice signal that has a mapping relationship with the reference BPM value in the human voice set;
- the third aligning unit is configured to determine the corresponding ratio required to adjust each target human voice signal according to the second corresponding relationship, and perform variable speed and invariant tone processing on each target human voice signal based on the corresponding ratio.
- it also includes:
- the standard vocal selection module is used to randomly select a vocal signal from all the aligned vocal signals as the standard vocal signal;
- the adjustment module is used to adjust the loudness of each vocal signal to be tuned according to the third formula; the vocal signal to be tuned is the vocal signal other than the standard vocal signal among all the vocal signals after track alignment;
- B is the vocal signal to be tuned after adjusting the loudness
- vocalX is the vocal signal to be tuned before adjusting the loudness
- RMS0 is the root mean square of the standard vocal signal
- RMSX is the root mean square of vocalX.
- the mixing module includes:
- the third calculation unit is used to calculate the left channel gain value and the right channel gain value
- the 5th determining unit is used to determine the stereo signal of each vocal signal in the vocal audio based on the left channel gain value and the right channel gain value;
- the mixing unit is used to mix the individual stereo signals and the accompaniment audio to obtain a mixed song.
- the mixing unit is specifically used for:
- SongComb is the mixed song
- vocal1 is each stereo signal
- alpha is the preset adjustment factor
- surround is the accompaniment audio.
- the third computing unit is specifically used for:
- the selection module includes:
- a fourth selection unit configured to select, in the accompaniment set, an accompaniment signal aligned with the reference rhythm information as the accompaniment audio to be mixed;
- the fourth aligning unit is used for aligning any accompaniment signal in the accompaniment set with the reference rhythm information as the accompaniment audio to be mixed.
- this embodiment provides an apparatus for generating a mixed song, which aligns the tracks of different versions of human voices according to the beat information of the audio of the song.
- at least two singing versions of the same song can be mixed, and more songs can be covered for mixing, and during the mixing process, all vocal signals in each singing version are track-aligned,
- the accompaniment signal aligned with the vocal signal track is selected, so when mixing vocals and accompaniment, elements such as lyrics, beats and other elements can be kept in harmony and synchronization, and a remixed song with good mixing effect can be obtained, which improves the mixing effect. sound effect.
- FIG. 10 and FIG. 11 are both structural diagrams of an electronic device according to an exemplary embodiment, and the contents in the figures should not be considered as any limitation on the scope of use of the present application.
- FIG. 10 is a schematic structural diagram of a server according to an embodiment of the present application.
- the server 50 may specifically include: at least one processor 51 , at least one memory 52 , a power supply 53 , a communication interface 54 , an input and output interface 55 and a communication bus 56 .
- the memory 52 is used to store a computer program, and the computer program is loaded and executed by the processor 51 to implement the relevant steps in the generation of the mixed song disclosed in any of the foregoing embodiments.
- the power supply 53 is used to provide working voltage for each hardware device on the server 50;
- the communication interface 54 can create a data transmission channel between the server 50 and external devices, and the communication protocol it follows is applicable to this Any communication protocol applying for the technical solution is not specifically limited here;
- the input and output interface 55 is used to obtain external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, which is not carried out here. Specific restrictions.
- the memory 52 as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
- the resources stored on the memory 52 include the operating system 521, the computer program 522, and the data 523, etc., and the storage method can be short-term storage or Permanent storage.
- the operating system 521 is used to manage and control each hardware device and computer program 522 on the server 50, so as to realize the operation and processing of the data 523 in the memory 52 by the processor 51, which can be Windows Server, Netware, Unix, Linux, etc. .
- the computer program 522 may further include a computer program that can be used to complete other specific tasks in addition to the computer program that can be used to complete the method for generating a remixed song disclosed in any of the foregoing embodiments.
- the data 523 may include, in addition to data such as song audio for mixing, data such as developer information of the application.
- FIG. 11 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
- the terminal 60 may specifically include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.
- the terminal 60 in this embodiment includes: a processor 61 and a memory 62 .
- the processor 61 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
- the processor 61 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
- the processor 61 may also include a main processor and a coprocessor.
- the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
- the processor 61 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
- the processor 61 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
- AI Artificial Intelligence, artificial intelligence
- Memory 62 may include one or more computer-readable storage media, which may be non-transitory. Memory 62 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash storage devices. In this embodiment, the memory 62 is at least used to store the following computer program 621, wherein, after the computer program is loaded and executed by the processor 61, it can implement the method for generating a mixed song disclosed by any of the foregoing embodiments and executed by the terminal side. related steps. In addition, the resources stored in the memory 62 may also include an operating system 622, data 623, etc., and the storage mode may be short-term storage or permanent storage. The operating system 622 may include Windows, Unix, Linux, and the like. Data 623 may include, but is not limited to, the audio of the song to be mixed.
- the terminal 60 may further include a display screen 63 , an input/output interface 64 , a communication interface 65 , a sensor 66 , a power supply 67 and a communication bus 68 .
- FIG. 11 does not constitute a limitation on the terminal 60, and may include more or less components than those shown in the drawings.
- an embodiment of the present application further discloses a storage medium, where computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, the hybrid system disclosed in any of the foregoing embodiments is implemented.
- Method for generating music For the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
Claims (19)
- 一种混音歌曲生成方法,其特征在于,包括:获取至少两个歌曲音频;所述至少两个歌曲音频为同一首歌曲的不同演唱版本;提取每个歌曲音频中的人声信号和伴奏信号,得到包括至少两个人声信号的人声集合和包括至少两个伴奏信号的伴奏集合;在各个歌曲音频对应的节奏信息中选择基准节奏信息,基于所述基准节奏信息将所述人声集合中的所有人声信号进行音轨对齐,并将音轨对齐后的所有人声信号作为待混音的人声音频;将所述伴奏集合中,与所述人声音频的音轨对齐的伴奏信号确定为待混音的伴奏音频;混合所述人声音频和所述伴奏音频,得到混音歌曲。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,所述提取每个歌曲音频中的人声信号,包括:计算每个歌曲音频对应的中置信号,并从所述中置信号中提取每个歌曲音频中的人声信号;或提取每个歌曲音频中的左声道人声和右声道人声,并将所述左声道人声和所述右声道人声的幅度平均值或频谱特征平均值,确定为每个歌曲音频中的人声信号。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,所述提取每个歌曲音频中的伴奏信号,包括:提取每个歌曲音频中的左声道伴奏或右声道伴奏,并将所述左声道伴奏或所述右声道伴奏确定为每个歌曲音频中的伴奏信号。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,若所述节奏信息为节拍信息,则所述在各个歌曲音频对应的节奏信息中选择基准节奏信息,基于所述基准节奏信息将所述人声集合中的所有人声信号进行音轨对齐,包括:提取每个歌曲音频中的节拍信息,得到包括至少两个节拍信息的节拍集合;所述节拍集合中的节拍信息和所述人声集合中的人声信号具有一一 映射关系;若所述节拍集合中各个节拍信息包括的元素个数相同,则确定所述基准节奏信息为第一节拍信息;所述第一节拍信息为所述节拍集合中的任一个节拍信息;分别计算所述第一节拍信息与每个第二节拍信息的差异值;所述第二节拍信息为所述节拍集合中除所述第一节拍信息以外的其他节拍信息;按照所述一一映射关系确定每个差异值与每个第二人声信号的第一对应关系;所述第二人声信号为所述人声集合中除第一人声信号以外的其他人声信号,所述第一人声信号为所述人声集合中与所述第一节拍信息具有映射关系的人声信号;按照所述第一对应关系确定调整每个第二人声信号所需的相应差异值,并基于相应差异值确定每个第二人声信号的冗余端和待补位端;从每个第二人声信号的冗余端删除与所述差异值等量的冗余数据,并在每个第二人声信号的待补位端添加与所述差异值等量的全零数据。
- 根据权利要求4所述的混音歌曲生成方法,其特征在于,所述分别计算所述第一节拍信息与每个第二节拍信息的差异值,包括:按照第一公式分别计算所述第一节拍信息与每个第二节拍信息的差异值;所述第一公式为:M=[sum(Beat0–BeatX)/numBeats]×L;其中,M为Beat0与BeatX的差异值;Beat0为所述第一节拍信息的向量表示;BeatX为任一个第二节拍信息的向量表示;sum(Beat0–BeatX)为Beat0和BeatX中各个元素对位相减后,得到的所有差值的累加和;numBeats为各个节拍信息包括的元素个数;L为单位数据帧长度。
- 根据权利要求4所述的混音歌曲生成方法,其特征在于,还包括:若获取到两个歌曲音频,且所述节拍集合中各个节拍信息包括的元素个数不同,则确定所述基准节奏信息为第三节拍信息;所述第三节拍信息为所述节拍集合中元素个数最少的节拍信息;将第四节拍信息中的元素个数删减至与所述第三节拍信息中的元素个数相同;所述第四节拍信息为所述节拍集合中除所述第三节拍信息以外的其他节拍信息;基于所述第三节拍信息或所述第四节拍信息确定多个相邻节拍组;按照每个相邻节拍组划分第三人声信号和第四人声信号,得到每个相邻节拍组对应的第一数据片段和第二数据片段;所述第三人声信号为所述人声集合中与所述第三节拍信息具有映射关系的人声信号,所述第四人声信号为所述人声集合中除所述第三人声信号以外的其他人声信号;针对每个相邻节拍组,使所述第一数据片段的数据长度和所述第二数据片段的数据长度相等。
- 根据权利要求6所述的混音歌曲生成方法,其特征在于,所述使所述第一数据片段的数据长度和所述第二数据片段的数据长度相等,包括:若所述第一数据片段中的第一数据帧个数不等于所述第二数据片段中的第二数据帧个数,则将所述第一数据帧个数和所述第二数据帧个数中的最大值对应的数据片段确定为待删减片段;计算所述待删减片段中每个数据帧的删减数,并按照所述删减数删减所述待删减片段中每个数据帧。
- 根据权利要求7所述的混音歌曲生成方法,其特征在于,所述计算所述待删减片段中每个数据帧的删减数,包括:按照第二公式计算所述待删减片段中每个数据帧的删减数;所述第二公式为:P=[(m-n)×L]/m;其中,P为每个数据帧的删减数,m为所述最大值,n为所述第一数据帧个数和所述第二数据帧个数中的最小值,L为单位数据帧长度。
- 根据权利要求6所述的混音歌曲生成方法,其特征在于,所述将第四节拍信息中的元素个数删减至与所述第三节拍信息中的元素个数相同,包括:将所述第三节拍信息中的各个元素按照时间戳大小排列为目标序列;确定当前迭代次数,将所述目标序列中与当前迭代次数相等的排列位置上的元素确定为目标元素;分别计算所述目标元素与各个对比元素的时间戳距离;所述对比元素为所述第四节拍信息中不与所述目标序列中的任一个元素匹配的元素;将最小时间戳距离对应的对比元素确定为与所述目标元素匹配的元素;若当前迭代次数不小于最大迭代次数,则删除当前所述第四节拍信息 中的对比元素,保留所述第四节拍信息中与每个目标元素匹配的元素。
- 根据权利要求9所述的混音歌曲生成方法,其特征在于,若当前迭代次数小于最大迭代次数,则当前迭代次数递增一,并执行确定当前迭代次数,将所述目标序列中与当前迭代次数相等的排列位置上的元素确定为目标元素;分别计算所述目标元素与各个对比元素的时间戳距离;将最小时间戳距离对应的对比元素确定为与所述目标元素匹配的元素的步骤,直至当前迭代次数不小于最大迭代次数。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,若所述节奏信息为BPM值,则所述在各个歌曲音频对应的节奏信息中选择基准节奏信息,基于所述基准节奏信息将所述人声集合中的所有人声信号进行音轨对齐,包括:统计每个歌曲音频对应的BPM值,得到包括至少两个BPM值的BPM值集合;所述BPM值集合中的BPM值和所述人声集合中的人声信号具有一一映射关系;从所述BPM值集合中选择一个BPM值作为基准BPM值;所述基准BPM值为所述基准节奏信息;计算所述基准BPM值与每个目标BPM值的比值;所述目标BPM值为所述BPM值集合中除所述基准BPM值以外的其他BPM值;按照所述一一映射关系确定每个比值与每个目标人声信号的第二对应关系;所述目标人声信号为所述人声集合中除基准人声信号以外的其他人声信号,所述基准人声信号为所述人声集合中与所述基准BPM值具有映射关系的人声信号;按照所述第二对应关系确定调整每个目标人声信号所需的相应比值,并基于相应比值对每个目标人声信号进行变速不变调处理。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,所述将音轨对齐后的所有人声信号作为待混音的人声音频之前,还包括:从音轨对齐后的所有人声信号中随机选择一个人声信号作为标准人声信号;按照第三公式调整每个待调人声信号的响度;所述待调人声信号为音轨对齐后的所有人声信号中除所述标准人声信号以外的其他人声信号;其中,所述第三公式为:B=vocalX×(RMS0/RMSX);其中,B为调整响度之后的待调人声信号,vocalX为调整响度之前的待调人声信号,RMS0为所述标准人声信号的均方根,RMSX为vocalX的均方根。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,所述混合所述人声音频和所述伴奏音频,得到混音歌曲,包括:计算左声道增益值和右声道增益值;基于所述左声道增益值和所述右声道增益值,确定所述人声音频中的每个人声信号的立体声信号;混合各个立体声信号和所述伴奏音频,得到所述混音歌曲。
- 根据权利要求13所述的混音歌曲生成方法,其特征在于,所述混合各个立体声信号和所述伴奏音频,得到混音歌曲,包括:按照第四公式混合各个立体声信号和所述伴奏音频,得到所述混音歌曲;其中,所述第四公式为:SongComb=alpha×(vocal1+…+vocalN)+(1-alpha)×surround;其中,SongComb为所述混音歌曲,vocal1、…、vocalN为各个立体声信号,alpha为预设调整因子,surround为所述伴奏音频。
- 根据权利要求13所述的混音歌曲生成方法,其特征在于,所述计算左声道增益值和右声道增益值,包括:根据预设声像角度和人声信号在所述预设声像角度中的预设位置,计算所述左声道增益值和所述右声道增益值;或通过分配线性增益的方式计算所述左声道增益值和所述右声道增益值。
- 根据权利要求1所述的混音歌曲生成方法,其特征在于,所述将所述伴奏集合中,与所述人声音频的音轨对齐的伴奏信号确定为待混音的伴奏音频,包括:在所述伴奏集合中,选择与所述基准节奏信息对齐的伴奏信号作为待混音的伴奏音频;或将所述伴奏集合中的任一个伴奏信号与所述基准节奏信息进行音轨对齐后,作为待混音的伴奏音频。
- 一种混音歌曲生成装置,其特征在于,包括:获取模块,用于获取至少两个歌曲音频;所述至少两个歌曲音频为同一首歌曲的不同演唱版本;提取模块,用于提取每个歌曲音频中的人声信号和伴奏信号,得到包括至少两个人声信号的人声集合和包括至少两个伴奏信号的伴奏集合;对齐模块,用于在各个歌曲音频对应的节奏信息中选择基准节奏信息,基于所述基准节奏信息将所述人声集合中的所有人声信号进行音轨对齐,并将音轨对齐后的所有人声信号作为待混音的人声音频;选择模块,用于将所述伴奏集合中,与所述人声音频的音轨对齐的伴奏信号确定为待混音的伴奏音频;混合模块,用于混合所述人声音频和所述伴奏音频,得到混音歌曲。
- 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;其中,所述存储器用于存储计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至16任一项所述的混音歌曲生成方法。
- 一种存储介质,其特征在于,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至16任一项所述的混音歌曲生成方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110205483.9A CN112967705B (zh) | 2021-02-24 | 2021-02-24 | 一种混音歌曲生成方法、装置、设备及存储介质 |
CN202110205483.9 | 2021-02-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022179110A1 true WO2022179110A1 (zh) | 2022-09-01 |
Family
ID=76285886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/122573 WO2022179110A1 (zh) | 2021-02-24 | 2021-10-08 | 一种混音歌曲生成方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112967705B (zh) |
WO (1) | WO2022179110A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112967705B (zh) * | 2021-02-24 | 2023-11-28 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种混音歌曲生成方法、装置、设备及存储介质 |
CN114203163A (zh) * | 2022-02-16 | 2022-03-18 | 荣耀终端有限公司 | 音频信号处理方法及装置 |
CN117059055A (zh) * | 2022-05-07 | 2023-11-14 | 北京字跳网络技术有限公司 | 音频处理方法、装置、设备及存储介质 |
CN116524883B (zh) * | 2023-07-03 | 2024-01-05 | 腾讯科技(深圳)有限公司 | 音频合成方法、装置、电子设备和计算机可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070218444A1 (en) * | 2006-03-02 | 2007-09-20 | David Konetski | System and method for presenting karaoke audio features from an optical medium |
CN110534078A (zh) * | 2019-07-30 | 2019-12-03 | 黑盒子科技(北京)有限公司 | 一种基于音频特征的细粒度音乐节奏提取系统及方法 |
CN111326132A (zh) * | 2020-01-22 | 2020-06-23 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、存储介质及电子设备 |
CN111916039A (zh) * | 2019-05-08 | 2020-11-10 | 北京字节跳动网络技术有限公司 | 音乐文件的处理方法、装置、终端及存储介质 |
CN112216294A (zh) * | 2020-08-31 | 2021-01-12 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、电子设备及存储介质 |
CN112967705A (zh) * | 2021-02-24 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种混音歌曲生成方法、装置、设备及存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8005666B2 (en) * | 2006-10-24 | 2011-08-23 | National Institute Of Advanced Industrial Science And Technology | Automatic system for temporal alignment of music audio signal with lyrics |
CN106686431B (zh) * | 2016-12-08 | 2019-12-10 | 杭州网易云音乐科技有限公司 | 一种音频文件的合成方法和设备 |
CN111345010B (zh) * | 2018-08-17 | 2021-12-28 | 华为技术有限公司 | 一种多媒体内容同步方法、电子设备及存储介质 |
CN110992970B (zh) * | 2019-12-13 | 2022-05-31 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频合成方法及相关装置 |
-
2021
- 2021-02-24 CN CN202110205483.9A patent/CN112967705B/zh active Active
- 2021-10-08 WO PCT/CN2021/122573 patent/WO2022179110A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070218444A1 (en) * | 2006-03-02 | 2007-09-20 | David Konetski | System and method for presenting karaoke audio features from an optical medium |
CN111916039A (zh) * | 2019-05-08 | 2020-11-10 | 北京字节跳动网络技术有限公司 | 音乐文件的处理方法、装置、终端及存储介质 |
CN110534078A (zh) * | 2019-07-30 | 2019-12-03 | 黑盒子科技(北京)有限公司 | 一种基于音频特征的细粒度音乐节奏提取系统及方法 |
CN111326132A (zh) * | 2020-01-22 | 2020-06-23 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、存储介质及电子设备 |
CN112216294A (zh) * | 2020-08-31 | 2021-01-12 | 北京达佳互联信息技术有限公司 | 音频处理方法、装置、电子设备及存储介质 |
CN112967705A (zh) * | 2021-02-24 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种混音歌曲生成方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112967705A (zh) | 2021-06-15 |
CN112967705B (zh) | 2023-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022179110A1 (zh) | 一种混音歌曲生成方法、装置、设备及存储介质 | |
WO2021103314A1 (zh) | 一种构造听音场景的方法和相关装置 | |
WO2016188322A1 (zh) | K歌处理方法、装置以及k歌处理系统 | |
US10062367B1 (en) | Vocal effects control system | |
CN111741233B (zh) | 视频配乐方法、装置、存储介质以及电子设备 | |
US10249209B2 (en) | Real-time pitch detection for creating, practicing and sharing of musical harmonies | |
CN103559876A (zh) | 音效处理方法及系统 | |
US20190139437A1 (en) | Teaching vocal harmonies | |
CN110120212B (zh) | 基于用户示范音频风格的钢琴辅助作曲系统及方法 | |
CN108924610A (zh) | 多媒体文件处理方法、装置、介质和计算设备 | |
WO2023109278A1 (zh) | 一种伴奏的生成方法、设备及存储介质 | |
CN107948623A (zh) | 投影仪及其音乐关联信息显示方法 | |
CN105138625A (zh) | 一种协同创作音乐的方法和用于音乐创作的云系统 | |
CN106601268B (zh) | 一种多媒体数据处理方法及装置 | |
CN113077771B (zh) | 异步合唱混音方法及装置、存储介质和电子设备 | |
CN110347864A (zh) | 一种智能调节音频参数的方法及系统 | |
US20160307551A1 (en) | Multifunctional Media Players | |
US20240135905A1 (en) | Audio mixing song generation method and apparatus, device, and storage medium | |
CN106484833A (zh) | 一种音源筛选方法及电子设备 | |
WO2023061330A1 (zh) | 音频合成方法、装置、设备及计算机可读存储介质 | |
US9705953B2 (en) | Local control of digital signal processing | |
CN106448710B (zh) | 一种音乐播放参数的校准方法及音乐播放设备 | |
WO2017000371A1 (zh) | 一种调节蓝牙设备输出的方法、装置、系统及存储介质 | |
CN112037738B (zh) | 一种音乐数据的处理方法、装置及计算机存储介质 | |
Hermes | 8-BIT MUSIC ON TWITCH: How the Chiptune Scene is Overcoming the Pandemic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21927535 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18278602 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21927535 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/12/2023) |