WO2014133331A1

WO2014133331A1 - Apparatus and method for generating karaoke contents

Info

Publication number: WO2014133331A1
Application number: PCT/KR2014/001610
Authority: WO
Inventors: 이인호; 금종룡
Original assignee: 넥스트리밍(주)
Priority date: 2013-02-27
Filing date: 2014-02-27
Publication date: 2014-09-04

Abstract

An apparatus for generating karaoke contents according to the present invention comprises: a vocal processing unit for generating a processed music source by removing a mono component from music contents including a first music source transmitted via a left channel and a second music source transmitted via a right channel; a low sound extraction unit for extracting a low sound component from the music contents; a drum processing unit for detecting a drum component from the music contents, and selecting a drum sound among a plurality of drum sound samples; and a mixing unit for generating karaoke contents by composing the processed music source, the low sound component and the drum sound.

Description

Karaoke content generation device and method

The present invention relates to an apparatus and method for generating karaoke content, and more particularly, to an apparatus and method for generating karaoke content by removing vocal components from music content.

In general, music content includes the singer's vocal component and the accompaniment component of various musical instruments. The music content includes a stereo signal consisting of a left channel and a right channel.

Recently, a technology for generating karaoke content including only accompaniment components by removing vocal components of a singer from music contents has been developed. In the process of removing the singer's vocal component from the music content, the accompaniment component of the musical instrument located in the center of the stereo region is also removed.

In particular, the sounds of kick drums, snare drums, and bass guitars, which form the basis of popular music, are present in the center of the stereo domain, so that the vocal components and the vocal components of the singer are removed. They all disappear together.

Therefore, the generated karaoke content loses the fundamental feeling of the music.

Technical problem to be solved by the present invention is to create a karaoke content to restore the sound of the kick drum, snare drum, bass guitar, etc. disappears in the process of creating karaoke content in the music content to create karaoke content that can maximize the feeling of the original song An apparatus and a method thereof are provided.

An apparatus for generating karaoke content according to an embodiment of the present invention is a vocal processing unit which generates a processing sound source by removing a mono component of music content including a first sound source transmitted through a left channel and a second sound source transmitted through a right channel. A bass extractor extracting a bass component of the music content, a drum processor detecting a drum component from the music content, and selecting a drum sound from a plurality of drum sound samples, and the processing sound source, the bass component, and the drum sound It includes a mixing unit for synthesizing to generate the karaoke content.

The vocal processing unit may obtain a difference signal between the first sound source and the second sound source and remove a mono component of the music content.

The vocal processing unit may generate a first processing sound source from which the mono component is removed by subtracting the second sound source from the first sound source.

The bass extractor may obtain a sum signal of the first sound source and the second sound source, and extract the bass component of the music content by passing the sum signal through a low pass filter.

The low pass filter may be a filter for passing a bass component having a frequency less than or equal to the bandwidth of a human voice.

The drum processor may include a drum detector configured to generate the drum component including the temporal position and volume characteristics of the drum in the music content, and the plurality of drum sound samples, and select a drum sound from the plurality of drum sound samples. It may include a drum sample unit.

The drum detector may extract a fundamental frequency and a signal waveform of the music content for each unit time, and analyze an envelope characteristic of the signal waveform to determine whether the signal waveform is a signal waveform of a drum.

If it is determined that the signal waveform is a signal waveform of a drum, the drum detector may detect a unit time from which the signal waveform is extracted as a drum point.

The drum detector may calculate a volume area of the signal waveform with the zero crossing line, and select a volume coefficient for determining a volume level based on the volume area.

The drum detector may determine whether the signal waveform is a signal waveform of a kick drum or a snare drum by comparing the period of the signal waveform with a threshold value.

The drum sample unit may select any one of a kick drum sound and a snare drum sound from the plurality of drum sound samples, and convert the volume of the selected drum sound according to the volume coefficient.

According to another embodiment of the present invention, a method of generating karaoke content includes generating a processing sound source by removing a mono component of music content including a first sound source transmitted through a left channel and a second sound source transmitted through a right channel. Extracting a bass component of the music content, detecting a drum component in the music content, selecting a drum sound from a plurality of drum sound samples, and synthesizing the processed sound source, the bass component and the drum sound to karaoke Generating content.

The generating of the processed sound source may include generating a first processed sound source from which the mono component is removed by subtracting the second sound source from the first sound source.

The extracting the bass component of the music content may include obtaining a sum signal of the first sound source and the second sound source, and extracting the bass component of the music content by passing the sum signal through a low pass filter. It may include.

The detecting of the drum component in the music content may include extracting a fundamental frequency and a signal waveform of the music content at unit time, and analyzing an envelope characteristic of the signal waveform to determine whether the signal waveform is a signal waveform of a drum. It may include the step of determining.

The detecting of the drum component in the music content may further include detecting, as a drum point, a unit time from which the signal waveform is extracted when it is determined that the signal waveform is a signal waveform of a drum.

The detecting of the drum component in the music content may further include calculating a volume area of the signal waveform with the zero crossing line, and selecting a volume coefficient for determining a volume level based on the volume area. Can be.

The detecting of a drum component in the music content may further include determining whether the signal waveform is a signal waveform of a kick drum or a snare drum by comparing the period of the signal waveform with a threshold value.

The selecting of the drum sound in the plurality of drum sound samples may include selecting a kick drum sound in the plurality of drum sound samples when the signal waveform is a signal waveform of the kick drum, and the volume of the kick drum sound. It may include the step of converting according to the volume coefficient.

Selecting a drum sound from the plurality of drum sound samples, the step of selecting a snare drum sound from the plurality of drum sound samples, when the signal waveform is the frequency of the snare drum, and the volume of the snare drum sound And converting according to the volume coefficient.

The sound of the kick drum, the snare drum, the bass guitar, and the like may be recovered from the karaoke content generated from the music content, and the generated karaoke content may maximize the feeling of the original song.

1 is a block diagram illustrating an apparatus for generating karaoke content according to an embodiment of the present invention.

2 is a flowchart illustrating a method of generating karaoke content according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention.

In addition, in various embodiments, components having the same configuration will be representatively described in the first embodiment using the same reference numerals, and in other embodiments, only the configuration different from the first embodiment will be described. .

In order to clearly describe the present invention, parts irrelevant to the description are omitted, and like reference numerals designate like elements throughout the specification.

Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise.

Referring to FIG. 1, the apparatus for generating karaoke content 100 includes a content input unit 110, a vocal processing unit 120, a bass extraction unit 130, a drum processing unit 140, and a mixing unit 150. .

The content input unit 110 receives music content (Music Contents, MC) from the external device through the left channel and the right channel. The external device means an internet site that provides a sound source, a storage medium in which the sound source is stored, and the like. The left channel is a channel through which the first vocal component and the first accompaniment component are transmitted, and the right channel is a channel through which the second vocal component and the second accompaniment component are transmitted. That is, the music content MC is a stereo sound source in which the singer's vocal components and the accompaniment components of various musical instruments are transmitted to the left and right channels.

Hereinafter, music content including a first vocal component and a first accompaniment component transmitted through a left channel is called a first sound source ML, and includes a second vocal component and a second accompaniment component transmitted through a right channel. The music content is called a second sound source MR.

In general, in a stereo sound source, the first vocal component and the second vocal component are recorded identically, while the first accompaniment component and the second accompaniment component are recorded differently. That is, accompaniment components of different instruments are transmitted through the left channel and the right channel to obtain a stereo sound effect.

The music content MC is a digital signal converted in accordance with a pulse code modulation (PCM) scheme. The PCM method is one of methods for modulating an analog audio signal into a digital signal, and is a modulation method that is practically applied to most digital audio such as a compact disk (CD).

When the music content MC is not an digital signal but an analog audio signal received through a microphone, the content input unit 110 may modulate the analog audio signal into a digital signal according to the PCM method. That is, the content input unit 110 represents a sampling process of measuring the instantaneous voltage of the negative waveform at a very short time, a process of quantizing the voltage value of the measured instantaneous voltage, and a quantized voltage value as a binary number of 1 and 0. An analog audio signal may be modulated into a digital signal by performing an encoding process.

The content input unit 110 transmits the first sound source ML and the second sound source MR to the vocal processing unit 120, the bass extraction unit 130, and the drum processing unit 140. In this case, the content input unit 110 synchronizes the time when the first sound source ML and the second sound source MR are transmitted to the vocal processing unit 120, the bass extraction unit 130, and the drum processing unit 140, and then mixes the mixing unit. Sound synthesis at 150 can be matched in time.

For example, the content input unit 110 divides the time at which the first sound source ML and the second sound source MR are reproduced into a plurality of unit times, and the same in the first sound source ML and the second sound source MR. A portion corresponding to the unit time may be simultaneously transmitted to the vocal processing unit 120, the bass extraction unit 130, and the drum processing unit 140 so that the sound synthesis in the mixing unit 150 matches in time.

The vocal processing unit 120 obtains a difference signal between the first sound source ML and the second sound source MR and removes a mono component located in the center of the stereo region. The vocal processing unit 120 may generate the first processing sound source ML-V from which the mono component is removed by subtracting the second sound source MR from the first sound source ML. The vocal processing unit 120 may generate the second processing sound source MR-V from which the mono component is removed by subtracting the first sound source ML from the second sound source MR. The vocal processing unit 120 transmits the first processing sound source ML-V and the second processing sound source MR-V to the mixing unit 150. Alternatively, the vocal processing unit 120 may generate and transmit only one of the first processing sound source ML-V and the second processing sound source MR-V to the mixing unit 150.

The bass extractor 130 extracts a bass component S-low of the music content MC. The bass extractor 130 obtains a sum signal of the first sound source ML and the second sound source MR, and passes the sum signal through a low pass filter to thereby obtain the music content MC. The bass component (S-low) of can be extracted. The low pass filter is composed of a filter that passes a low tone component having a frequency less than or equal to a bandwidth of a human voice to extract a low tone component (S-low) that does not include a vocal component. The bass extraction unit 130 transmits a bass component (S-low) to the mixing unit 150.

Since kick drums, snare drums, and bass guitars are the basis of music, they are included in the music content MC as mono components. Therefore, in the process of removing the mono component from the vocal processing unit 120, the sound components of the kick drum, the snare drum, and the bass guitar are mostly removed.

By extracting bass components (S-low) below the bandwidth of the human voice from the bass extractor 130, the bass components of the kick drum, snare drum, bass, etc. removed from the vocal processing unit 120 can be recovered. . Most sounds of the bass guitar may be recovered through the bass extractor 130, but sounds of the kick drum and the snare drum may not be sufficiently recovered. In particular, the sound of the snare drum is not only mostly removed from the vocal processing unit 120 but also hardly recovered from the bass extraction unit 130.

The drum processor 140 detects the kick drum component KD and the snare drum component SD in the music content MC, and according to the detected kick drum component KD and the snare drum component SD, the kick drum sound ( KDS) and Snare Drum Sounds (SDS). The kick drum component KD may include the temporal position and volume characteristics of the kick drum in time at which the music content MC is played. The snare drum component SD may include the temporal position and volume characteristics of the snare drum in time at which the music content MC is played.

The drum processor 140 includes a drum detector 141 and a drum sample unit 142.

The drum detector 141 analyzes the music content MC in real time to detect the position and volume characteristics of the drum in the music content MC. In this case, the drum detector 141 may divide the time at which the music content MC is played back into a plurality of unit times, and extract a fundamental frequency and a signal waveform of the music content MC for each unit time. The drum detector 141 analyzes the envelope characteristic of the extracted fundamental frequency and signal waveform to determine whether the signal waveform is a signal waveform of the drum. If it is determined that the signal waveform is the signal waveform of the drum, the drum detector 141 may detect the unit time from which the signal waveform is extracted as a drum point. That is, the temporal position of the drum in the music content MC is detected. The drum detector 141 may calculate a volume area of the signal waveform formed with a zero crossing line, and select a volume coefficient for determining the volume level based on the volume area. That is, the volume characteristic of the drum is detected in the music content MC.

The drum detection unit 141 compares the period of the signal waveform with a threshold to discriminate whether the signal waveform is a signal waveform of a kick drum or a snare drum. If it is determined that the signal waveform is a signal waveform of the kick drum, the drum detector 141 generates a kick drum component KD including the position and volume characteristics of the kick drum in time. If it is determined that the signal waveform is a signal waveform of the snare drum, the drum detector 141 generates a snare drum component SD including the temporal position and volume characteristics of the snare drum. The drum detector 141 transfers the kick drum component KD and the snare drum component SD to the drum sample unit 142.

The drum sample unit 142 stores various drum sound samples of the kick drum and the snare drum. The drum sample unit 142 selects a drum sound from a plurality of drum sound samples. At this time, the user checks the kick drum component KD and the snare drum component SD, and selects the kick drum sound KDS or the snare drum sound SDS from the plurality of drum sound samples through the drum sample unit 142. Can be. That is, the drum sample unit 142 may select a drum sound according to the user's selection. Alternatively, the drum sample unit 142 may select the kick drum sound KDS corresponding to the kick drum component KD in the drum sound sample, and select the snare drum sound SDS corresponding to the snare drum component SD. have.

The drum sample unit 142 converts the volume of the selected kick drum sound KDS according to the volume coefficient included in the kick drum component KD, and includes the volume of the snare drum sound SDS in the snare drum component SD. Can be converted according to the volume coefficient.

The drum sample unit 142 delivers the kick drum sound KDS and the snare drum sound SDS to the mixing unit 150.

The mixing unit 150 may include at least one of the first processing sound source ML-V and the second processing sound source MR-V, a bass component S-low, a kick drum sound KDS, and a snare drum sound ( SDS) is synthesized to generate Karaoke content (KC). The karaoke content KC may include at least one of a first karaoke sound source of the left channel and a second karaoke sound source of the right channel. The first karaoke sound source may be produced by combining the first processing sound source ML-V, the bass component S-low, the kick drum sound KDS, and the snare drum sound SDS. The second karaoke sound source may be generated by combining the second processing sound source ML-R, the bass component S-low, the kick drum sound KDS, and the snare drum sound SDS. The karaoke content KC may be generated as a stereo sound source in which an accompaniment component of the musical instrument except for the singer's vocal component is transmitted through the left channel and the right channel.

The proposed karaoke content generating apparatus 100 performs the operations of the vocal processing unit 120, the bass extraction unit 130, and the drum processing unit 140 in synchronization with the time when the music content is played, and in the mixing unit 150. By synthesizing, the music content MC can be converted into karaoke content KC in real time.

As described above, the proposed karaoke content generating apparatus 100 may restore the sound of the kick drum, the snare drum, and the bass guitar that disappear in the process of removing the singer's vocal component. By restoring the sound of kick drums, snare drums, and bass guitars that are the basis of music, karaoke content (KC) can maximize the feel of the original song.

Hereinafter, the drum processor 140 detects the kick drum component KD and the snare drum component SD in the first sound source ML and the second sound source MR, and detects the kick drum sound KDS and the snare drum sound ( The process of generating SDS) will be described in more detail.

Referring to FIG. 2, the playback time of the music content MC is divided into a plurality of unit times, and the basic frequency of the music content MC is extracted for each unit time (S110). In general, since the drum sound is included in the music content MC as a mono component, a fundamental frequency and a signal waveform thereof may be extracted from one of the first sound source ML and the second sound source MR. In some cases, a fundamental frequency and a signal waveform may be extracted from each of the first sound source ML and the second sound source MR.

The envelope characteristic of the extracted fundamental frequency and signal waveform is analyzed (S120). Envelopes are lines drawn around the waveform by connecting the ends of the signal waveform to each other. Since various instruments have an envelope unique to the instrument, drum sounds may be distinguished through an envelope characteristic analysis. In particular, the drum sound may be distinguished from any one of the first sound source ML and the second sound source MR by analyzing the attack time of the envelope. Attack time is the time from when the sound starts to reaching the maximum volume, which has a different attack time for each type of instrument.

It is determined whether the signal waveform is the signal waveform of the drum through the envelope characteristic analysis (S130). By determining whether the attack time of the envelope coincides with the attack time of the drum sound, it may be determined whether the signal waveform is the signal waveform of the drum.

If it is determined that the signal waveform is not the signal waveform of the drum sound, the process of extracting the fundamental frequency of the portion corresponding to the next unit time (S110) is performed.

If it is determined that the signal waveform is a signal waveform of the drum sound, the unit time from which the signal waveform is extracted is detected as a drum point (S140). The drum point means a time at which a drum sound exists among the time when the music content MC is played.

The drum volume indicated by the signal waveform is detected at the drum point (S150). It is common for the drum volume to be different depending on the nature of the music. Or even within a piece of music, the volume of the drums varies depending on the part played. When synthesizing the drum sound to the detected drum point, if a constant volume drum sound is synthesized, the musical feeling of the original song may be impaired. Therefore, it is preferable to adjust the volume of the drum sound synthesized by detecting the drum volume of the original music. The volume of the drum may be calculated as the volume area of the signal waveform with the zero crossing line. Since the volume area is proportional to the power value per unit time of the signal waveform, the power value per unit time of the signal waveform is obtained from the volume area.

When the volume area is calculated, a volume coefficient for determining the volume of the drum sound may be selected based on the volume area. The volume of the drum sound selected from the plurality of drum sound samples may be converted according to the selected volume coefficient. Accordingly, the synthesized drum sound may have an intensity pattern similar to that of the original song, and the volume level pattern of the original song may be restored as it is.

It is determined whether the period of the signal waveform of the fundamental frequency is larger than the threshold value F (S160). The base frequency of most kick drums has a period of approximately 10 ms, while the base frequency of a snare drum has a period of 5 ms or less. Therefore, when the period of the signal waveform is greater than the threshold value F, the signal waveform is determined to be the signal waveform of the kick drum. When the period of the signal waveform is less than or equal to the threshold value F, the signal waveform is the signal waveform of the snare drum. May be determined to be. The threshold value F may be set to an appropriate value by analyzing the period of the signal waveform of the sound of a number of kick drums and snare drums.

If the period of the signal waveform is greater than the threshold value F, the kick drum sound KDS is generated (S170). At this time, a kick drum component KD is generated that includes the detected drum point and the volume coefficient of the drum. The user can identify the kick drum component KD and select the kick drum sound KDS from a plurality of drum sound samples. The volume of the selected kick drum sound KDS may be converted according to the volume coefficient included in the kick drum component KD.

Alternatively, a kick drum sound (KDS) having an envelope most similar to the envelope characteristic of the signal waveform among a plurality of drum sound samples may be selected, and the volume of the selected kick drum sound (KDS) may be selected from the kick drum component (KD). Can be converted according to the included volume coefficient.

When the period of the fundamental frequency is less than or equal to the threshold value F, the snare drum sound SDS is generated (S180). At this time, the snare drum component SD including the detected drum point and the volume coefficient of the drum is generated. The user can identify the snare drum component SD and select the snare drum sound SDS from the plurality of drum sound samples. The volume of the selected snare drum sound SDS may be converted according to the volume coefficient included in the snare drum component SD.

Alternatively, a snare drum sound SDS having an envelope most similar to the envelope characteristic of a signal waveform among a plurality of drum sound samples may be selected, and the volume of the selected snare drum sound SDS may be selected from the snare drum component SD. Can be converted according to the included volume coefficient.

It is determined whether the input of the music content MC is terminated (S190). When the input of the music content MC is finished, the generation process of the karaoke content KC is finished. When the music content MC is continuously input without ending the input of the music content MC, the process is performed again from step S110 to step S190. That is, the process of generating the kick drum sound KDS and the snare drum sound SDS for each unit time may be repeatedly performed until the time when the reproduction of the music content MC ends.

The above-described karaoke content generating apparatus 100 may be programmed and implemented in various electronic devices such as a computer, a mobile phone, and an MP3 player, and may be produced as an application for a smartphone and stored in an application service server or a storage medium. .

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description of the invention described with reference to the drawings referred to heretofore is merely exemplary of the invention, which has been used only for the purpose of illustrating the invention and is used to limit the scope of the invention as defined in the meaning or claims. It is not. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

Claims

A vocal processing unit generating a processing sound source by removing a mono component of music content including a first sound source transmitted through a left channel and a second sound source transmitted through a right channel;

A bass extraction unit for extracting bass components of the music contents;

A drum processor which detects a drum component in the music content and selects a drum sound from a plurality of drum sound samples; And

And a mixing unit for synthesizing the processing sound source, the bass component, and the drum sound to generate karaoke content.

And the drum processor extracts a fundamental frequency and a signal waveform of the music content for each unit time, and analyzes an envelope characteristic of the signal waveform to determine whether the signal waveform is a signal waveform of a drum.
The method of claim 1,

And the vocal processing unit obtains a difference signal between the first sound source and the second sound source and removes a mono component of the music content.
The method of claim 2,

And the vocal processing unit subtracts the second sound source from the first sound source to generate a first processing sound source from which the mono component is removed.
The method of claim 1,

And the bass extracting unit obtains a sum signal of the first sound source and the second sound source, and extracts a bass component of the music content by passing the sum signal through a low pass filter.
The method of claim 4, wherein

The low pass filter is a karaoke content generating device is a filter for passing a bass component having a frequency less than the bandwidth of the human voice.
The method of claim 1,

And the drum processor detects a unit time from which the signal waveform is extracted as a drum point when it is determined that the signal waveform is a signal waveform of a drum.
The method of claim 1,

And the drum processor calculates a volume area of the signal waveform with the zero crossing line, and selects a volume coefficient for determining a volume level based on the volume area.
The method of claim 7, wherein

And the drum processor is configured to compare the period of the signal waveform with a threshold value to discriminate whether the signal waveform is a signal waveform of a kick drum or a snare drum.
The method of claim 8,

And the drum processor selects one of a kick drum sound and a snare drum sound from the plurality of drum sound samples, and converts a volume of the selected drum sound according to the volume coefficient.
Generating a processing sound source by removing the mono component of the music content including the first sound source transmitted through the left channel and the second sound source transmitted through the right channel;

Extracting bass components of the music content;

Detecting a drum component in the music content;

Selecting a drum sound from the plurality of drum sound samples; And

And synthesizing the processed sound source, the bass component, and the drum sound to generate karaoke content.

Detecting a drum component in the music content,

Extracting a fundamental frequency and a signal waveform of the music content at unit time; And

And analyzing the envelope characteristic of the signal waveform to determine whether the signal waveform is a signal waveform of a drum.
The method of claim 10,

Generating the processing sound source,

And subtracting the second sound source from the first sound source to generate a first processed sound source from which the mono component has been removed.
The method of claim 10,

Extracting the bass component of the music content,

Obtaining a sum signal of the first sound source and the second sound source; And

And passing the sum signal through a low pass filter to extract bass components of the music content.
The method of claim 10,

Detecting a drum component in the music content,

If it is determined that the signal waveform is a signal waveform of a drum, detecting the unit time from which the signal waveform has been extracted as a drum point.
The method of claim 10,

Detecting a drum component in the music content,

Calculating a volume area of the signal waveform with the zero crossing line; And

Selecting a volume coefficient for determining a volume level based on the volume area.
The method of claim 14,

Detecting a drum component in the music content,

And comparing the period of the signal waveform with a threshold to determine whether the signal waveform is a signal waveform of a kick drum or a snare drum.
The method of claim 15,

Selecting a drum sound from the plurality of drum sound samples,

Selecting a kick drum sound from the plurality of drum sound samples when the signal waveform is a frequency of the kick drum; And

And converting a volume of the kick drum sound according to the volume coefficient.
The method of claim 16,

Selecting a drum sound from the plurality of drum sound samples,

Selecting a snare drum sound from the plurality of drum sound samples when the signal waveform is a frequency of the snare drum; And

And converting a volume of the snare drum sound according to the volume coefficient.