KR20100086457A

KR20100086457A - Song searching method and device using voice recognition technology in karaoke environment

Info

Publication number: KR20100086457A
Application number: KR1020100066798A
Authority: KR
Inventors: 조정권
Original assignee: 조정권
Priority date: 2010-07-12
Filing date: 2010-07-12
Publication date: 2010-07-30
Also published as: KR101249549B1

Abstract

PURPOSE: An apparatus and a method for using in searching a song title using a speech recognition technique in a karaoke environment are provided to enable to play the song stored in a machine after eliminating noise from the user voice on a microphone and detecting the keyword to a song title using a speech recognition technique. CONSTITUTION: A method for using in searching a song title using a speech recognition technique in a karaoke environment is as follows. Voice is input with micro(200). The signal input to the microphone is transmitted to a codec and converts an analog signal to a digital signal(220). A noise eliminating algorithm and a speech recognition program are executed in a microprocessor(250) and a main memory(270). A keyword is searched by comparing with the parameter of a stored song title database(280). Using a keyboard or a keypad, additionally necessary information is input(240). Desired information is displayed on a screen and the searching information of a remote control or a list book is transmitted to a machine and a corresponding accompanimnent is played(260).

Description

Song searching method and device using voice recognition technology in karaoke environment {Song searching method and device using voice recognition technology in karaoke environment}

본 발명은 소음이 심한 환경에서 목적 신호만을 추출하는 신호처리 알고리즘을 사용한 뒤, 음성 인식 기술을 이용하여 노래방에서 사용자가 원하는 노래를 음성으로 검색할 수 있게 하여 어두운 조명하에서 노래책을 찾거나, 리모컨 또는 목차본의 키보드를 사용하는 불편함을 없앨 수 있는 음성 인식 기능을 구비한 노래 반주기와 원격 제어 장치에 관한 것이다.The present invention uses a signal processing algorithm that extracts only the target signal in a noisy environment, and then using a speech recognition technology to search for the desired song in the karaoke by voice to find a songbook under dark lighting, or Or it relates to a half-ring cycle and a remote control device having a voice recognition function to eliminate the inconvenience of using the keyboard of the table of contents.

노래방의 제목 안내책에 수록된 노래들은 신곡이 추가됨에 따라 순서대로 게재되지 못하여 찾기가 불편해지고, 환경적으로 대체로 조명이 어둡기 때문에 노래 제목을 검색하는데 많은 시간과 노력이 필요하다.Songs contained in the title book of karaoke are not found in order as new songs are added, making it difficult to find, and environmentally speaking, the lighting is generally dark, so much time and effort is required to search for song titles.

종래의 기술에 의하면, 원격제어용 목차본 또는 리모컨을 사용하여 앞소절에 해당하는 노래를 화면에 순차적으로 표시하여 검색이 용이하도록 하였다. 그러나 이 방법 역시 글자를 일일이 타이핑해야 하는 불편함이 남는다. 원하는 노래 제목을 마이크에 발성함으로써 제목을 실시간으로 찾아주기 위해서는 2~10만여곡의 노래 제목에 대한 음성 인식률이 높아야 하며, 동행한 사람이 노래를 부르는 경우 반주기 소리와 더불어 큰 소음이 발생하는 경우에도 상용화할 수 있는 정도의 인식률이 보장되어야 한다.According to the related art, by using a remote control table of contents or a remote control, the songs corresponding to the preceding measures are sequentially displayed on the screen to facilitate searching. However, this method also has the inconvenience of having to type letters one by one. In order to find the title in real time by vocalizing the title of the desired song in the microphone, the voice recognition rate of about 20,000 songs should be high. The recognition rate that can be commercialized should be guaranteed.

본 발명은 상기와 같은 문제점을 해결하기 위해 고안된 것으로서, 마이크에 입력된 사용자의 음성에서 잡음을 제거하고 음성 인식 기술로 곡명에 대한 키워드를 검출한 뒤, 반주기에 저장된 노래를 실행할 수 있게 하는 신호 처리 알고리즘을 개발하고, 이를 실시간으로 처리할 수 있는 하드웨어 시스템을 구현하는 방법을 제공하는데 그 목적이 있다.The present invention is designed to solve the above problems, the signal processing to remove the noise from the user's voice input to the microphone and to detect the keyword for the song name with a speech recognition technology, and then to execute the song stored in the half cycle The purpose of the present invention is to provide a method for developing an algorithm and implementing a hardware system capable of processing the same in real time.

상기의 목적을 달성하기 위한 본 발명에 따른 디지털 신호 처리 기법을 이용하여 환경 및 반주기, 사람의 노래 소음을 효과적으로 제거하고, 음성 인식률을 향상시킨 뒤 원하는 노래의 키워드를 검색하고, 반주기 또는 목차본에 기저장된 노래를 선택하여 실행한다. 잡음을 제거하기 위해 마이크 입력 신호를 주파수 영역으로 변환한 뒤 신호의 통계적 특성과 파워비를 측정하여 잡음으로 판단되면 이득 조절기가 작동되고 목적 신호 구간에서는 기추정된 잡음의 스펙트럼을 차감하는 방법을 사용한다. 음성 인식률을 향상시키기 위해 입력신호의 주파수축 및 시간축의 변화량을 측정하여 목적 신호의 시작점과 끝점을 정확히 추출하고, 버퍼링된 유효 음성 데이터를 음성 인식 프로그램에 전달한다. 반주기 또는 목차본의 데이터베이스에 저장된 노래들의 기저장된 특징 파라메타와 비교하여 음성 인식된 키워드가 원격 송수신부를 통하여 반주기에 전달되어 반주를 시작하게 된다.By using the digital signal processing technique according to the present invention to achieve the above object, effectively removes the environment, half-cycle, human song noise, improve the speech recognition rate, search for the keyword of the desired song, half-cycle or table of contents Select and play the saved song. To remove the noise, convert the microphone input signal into the frequency domain, measure the statistical characteristics and power ratio of the signal, and determine the noise. Then, the gain adjuster is activated and subtract the spectrum of the estimated noise in the target signal section. do. In order to improve the speech recognition rate, the amount of change in the frequency and time axes of the input signal is measured to accurately extract the start and end points of the target signal, and the buffered effective speech data is transmitted to the speech recognition program. Compared with the pre-stored feature parameters of songs stored in the database of the semi-period or table of contents, the voice-recognized keyword is transmitted to the semi-period through the remote transceiver to start accompaniment.

본 발명에 따르면, 소음이 심한 노래방 환경에서 사용자가 마이크를 이용하여 노래방 반주기 또는 목차본에 저장된 노래들을 음성으로 검색, 실행할 수 있게 되어 어두운 조명하에서 노래책을 찾거나 리모컨의 키보드를 치지 않아도 원하는 곡명을 검색하는 효과를 거둘 수 있다.According to the present invention, in a noisy karaoke environment, the user can use a microphone to search and play songs stored in a karaoke half-cycle or table of contents by voice, so that the user can search for a song book in a dark light or do not have to hit a keyboard of the remote controller. It can have the effect of searching for.

도 1은 단채널 잡음 제거 알고리즘의 블록도를 도시하는 도면이다.
도 2는 잡음 제거후 음성 인식기술을 이용하여 곡목을 검색하는 시스템의 순서도를 도시하는 도면이다.
도 3은 음성 인식을 이용한 반주기와 원격 제어장치 간의 관계를 도시하는 도면이다.
도 4는 도2에 설명된 예시적 프로그램 모듈들을 실시간 처리하기 위한 하드웨어 보드의 블록도를 도시한 도면이다.1 is a block diagram of a short channel noise cancellation algorithm.
2 is a flowchart illustrating a system for searching music pieces using a speech recognition technique after removing noise.
3 is a diagram illustrating a relationship between a half cycle and a remote controller using speech recognition.
4 is a block diagram of a hardware board for real-time processing of the example program modules described in FIG.

이하 본 발명의 이론 및, 구성과 작용을 상세히 설명한다.
Hereinafter, the theory, configuration and operation of the present invention will be described in detail.

음성 인식률을 높이기 위한 신호 처리 방법Signal processing method to increase speech recognition rate

음성 인식 프로그램의 인식률은 입력 신호의 잡음 포함 여부에 크게 좌우되기 때문에 노래방과 같은 소음이 큰 환경에서 음성 인식 기법을 이용하기 위해서는 효과적인 잡음 제거 기술이 필수적이다.Since the recognition rate of the speech recognition program depends on whether the input signal contains noise, an effective noise canceling technique is essential for using the speech recognition technique in a noisy environment such as karaoke.

일반적인 단채널 잡음 제거 시스템은 주파수 도메인에서 이루어지며, 각각의 주파수 성분의 감쇄 혹은 이득 정도를 결정해서 음성의 크기를 추정한다. 이는 음성과 잡음이 섞여 입력되는 신호는 단구간 내에서 잡음이 음성에 비해 상대적으로 변화량이 적은 특성을 이용하여 주변 잡음을 제거하는 방법이다.A typical short channel noise cancellation system is performed in the frequency domain and estimates the loudness of speech by determining the attenuation or gain of each frequency component. This is a method to remove the ambient noise by using the characteristic that the noise is less change than the voice in the input signal mixed with voice and noise.

제안된 1 mic 잡음 제거 시스템의 블록도를 도 1에 나타내었다. 도 1의 1 mic 음질 향상 시스템은 음성에 잡음이 더해진 입력 신호 y(t) 의 주파수 성분 Y(k,l) 의 크기 정보로부터 잡음 D(k,l) 의 파워 스펙트럼을 추정하고, 이를 이용하여 이득 G(k,l) 를 추정한 후, 입력의 크기 신호 스펙트럼에 곱한 후(noise spectral subtraction) 역 FFT (Inverse Fast Fourier Transform)를 이용해 음성을 합성한다. A block diagram of the proposed 1 mic noise cancellation system is shown in FIG. The 1 mic sound quality enhancement system of FIG. 1 estimates the power spectrum of noise D (k, l) from the magnitude information of the frequency component Y (k, l) of the input signal y (t) with noise added to the speech, After estimating the gain G (k, l), we multiply the magnitude signal spectrum of the input (noise spectral subtraction) and synthesize the speech using an Inverse Fast Fourier Transform (FFT).

만일 잡음 구간이라고 추정되면 이득 조절기(Gain controller)에서 입력 신호에 대한 크기를 줄이는 역할을 하게 되며, 이득 조절기를 사용하지 않는 조건이라면 잡음의 주파수 성분을 차감한 후의 잔여 성분을 출력시키게 된다. If the noise section is estimated, the gain controller reduces the magnitude of the input signal. If the gain controller is not used, the residual component after subtracting the frequency component of the noise is output.

마이크 입력신호의 통계적 특성중 주파수축의 변화량과 시간축의 변화량을 계산하여 잡음 구간과 목적 신호 구간의 변화량 추이를 조사하여 잡음 구간과 목적 신호 구간을 구분한다.Among the statistical characteristics of the microphone input signal, the change in the frequency axis and the change in the time axis are calculated, and the noise and target signal sections are distinguished by investigating the change in the noise section and the target signal section.

주파수 영역에서 각 주파수 성분의 파워를

,

의 평균을

, 해당 프레임의 전체 파워를

라 하면 주파수 영역에서의 정규화된 변화량, 즉 주파수 편평도는 수학식 1로 표현된다.
The power of each frequency component in the frequency domain

,

Average of

, The full power of that frame

In this case, the normalized change amount in the frequency domain, that is, the frequency flatness, is expressed by Equation 1.

[수학식 1][Equation 1]

여기서

는 실험적으로 얻어진 임계치이다.
here

Is an experimentally obtained threshold.

시간 영역에서 한 프레임의 파워를

,

의 평균을

, 해당 프레임의 전체 파워를

라 하면 시간 영역에서의 정규화된 변화량은 수학식 2 로 표현된다.
The power of one frame in the time domain

,

Average of

, The full power of that frame

In this case, the normalized change amount in the time domain is represented by Equation 2.

[수학식 2][Equation 2]

여기서

는 실험적으로 얻어진 임계치이다.
here

Is an experimentally obtained threshold.

상기 수학식 1 과 2에서 계산된 변화량이 실험적으로 얻어진 임계치보다 큰 경우에는 목적 신호로 간주할 수 있다.
If the amount of change calculated in Equations 1 and 2 is larger than the experimentally obtained threshold, it can be regarded as the target signal.

목적 신호를 추정할 수 있는 또 다른 파라메타로서 입력 신호의 파워를 오랜 시간동안 IIR(Infinite Impulse Response) 평균을 2번 사용하고 현재 프레임 파워와 비교하는 방법이 있다.Another parameter for estimating the target signal is to use the power of the input signal twice for a long time using the Infinite Impulse Response (IIR) average and compare it with the current frame power.

현재 프레임의 IIR 평균 파워는 수학식 3과 같이 계산된다.
The IIR average power of the current frame is calculated as in Equation 3.

[수학식 3]&Quot; (3) "

여기서

는 0과 1사이의 IIR 스무딩 계수이며,

는 현재 프레임의 IIR 평균 파워,

은 이전 프레임의 파워이다.here

Is the IIR smoothing coefficient between 0 and 1,

Is the IIR average power of the current frame,

Is the power of the previous frame.

가

보다 일정 배수 이하의 프레임에 해당하는 파워에 대해서 IIR(Infinite Impulse Response) 평균을 다시 계산하면 수학식 4와 같다.

end

Re-calculating the average impulse response (IIR) for a power corresponding to a frame less than a certain multiple, as shown in Equation 4.

[수학식 4]&Quot; (4) "

여기서

는 0과 1사이의 IIR 스무딩 계수이며,

는 현재 프레임의 long-term IIR 평균 파워,

은 이전 프레임의 파워이다.here

Is the IIR smoothing coefficient between 0 and 1,

Is the long-term IIR average power of the current frame,

Is the power of the previous frame.

수학식 4에서 계산된

는 변화량이 급격하게 큰, 즉 큰 입력 신호에 대해서는 평균 계산에 참여하지 않게 되어 대체로 잡음 성분들의 프레임 파워를 보여 주게 된다.Calculated in Equation 4

For the large input signal, the amount of change does not participate in the average calculation, which shows the frame power of the noise components.

따라서 마이크 입력 신호의 현재 프레임의 파워가

의 일정 배수보다 큰 경우에는 목적 신호로 간주할 수 있다.
Therefore, the power of the current frame of the microphone input signal

If it is larger than a certain multiple of, it can be regarded as an objective signal.

[수학식 5][Equation 5]

여기서

는 현재 프레임의 파워, c 는 실험적으로 목적 신호로 간주할 수 있는 배수로 임의의 상수이다.
here

Is the power of the current frame, c is an arbitrary constant that can be considered experimentally the target signal.

상기 수학식 1에서 5까지 사용하여 추정된 잡음 구간의 프레임 파워

과 입력 신호의 프레임 파워인

와의 IIR 평균된 신호대잡음비를 수학식 6과 같이 계산한다.
Frame power of the noise section estimated using Equations 1 to 5

And the frame power of the input signal

Calculate the IIR averaged signal-to-noise ratio with

[수학식 6]&Quot; (6) "

여기서

은 이전 프레임의 파워이며

는 0과 1사이의 IIR 스무딩 계수,

는 절대치 연산자이다.here

Is the power of the previous frame

Is the IIR smoothing coefficient between 0 and 1,

Is an absolute operator.

잡음 제거 알고리즘 중에 보편적으로 많이 사용하는 위너(Wiener) 필터를 이용하고 수학식 6을 적용하면 수학식 7과 같이 표현된다.
When using the Wiener filter commonly used among the noise reduction algorithms and applying Equation 6, Equation 7 is expressed.

[수학식 7][Equation 7]

도 1에서 입력 신호의 주파수 성분에 곱해지는 이득은 아래의 수학식 8로 표현된다.
In Figure 1, the gain multiplied by the frequency component of the input signal is represented by Equation 8 below.

[수학식 8][Equation 8]

여기서

은 잡음 성분의 감쇄 계수(Attenuation Level)이며 클수록 잡음 성분을 많이 감쇄시킨다.
here

Is an attenuation level of the noise component, and the larger it reduces the noise component.

입력 신호에 대하여

를 곱하여 주파수 대역에서의 잡음 성분을 줄였으나, 목적 신호내의 잡음 성분을 좀 더 많이 제거하기 위해 수학식 8의

를 곱하여 얻은 목적 신호를 입력 신호

에서 뺀 추정된 잡음 신호

를 수학식 9와 같이 계산한다.
About input signal

Multiply by to reduce the noise component in the frequency band, but in order to remove more noise components in the target signal

Multiply the desired signal by the input signal

Estimated Noise Signal Subtracted from

Is calculated as in Equation (9).

[수학식 9][Equation 9]

정규화된 복소수 LMS(Least Mean Square)알고리즘을 이용한 적응 필터의 계수

를 갱신하는 방법으로 steepest descent 기반의 알고리즘을 적용하며 수학식 10과 같이 표현된다.
Coefficient of Adaptive Filter Using Normalized Complex Least Mean Square (LMS) Algorithm

As a method of updating, a steepest descent based algorithm is applied and is expressed as in Equation 10.

[수학식 10][Equation 10]

여기서

은 n-1 번째 입력신호,

은 n-1 번째까지 추정된 잡음 신호,

은 n-1번째 출력 신호,

는 수렴속도를 결정하는 상수,

은 노름(norm) 연산자이다.here

Is the n-1th input signal,

Is the noise signal estimated up to n-1th,

Is the n-1th output signal,

Is a constant that determines the speed of convergence,

Is the norm operator.

또한 잡음으로 추정되는 구간에서는 이득 조절기가 작동되어 입력 신호의 레벨을 줄인 후 출력시킴으로써 음성 인식기가 작동되지 않도록 억제하여 사용자의 음성이 아닌 소음에 의해 곡목이 검색되는 오류를 방지할 수 있다.
In addition, in the section estimated as noise, the gain adjuster is operated to reduce the level of the input signal and output the noise, thereby suppressing the speech recognizer from operating so as to prevent an error that the song is searched by noise instead of the user's voice.

노래방 반주기와 Karaoke half cycle 목차본Table of Contents , 리모컨 간의 데이터 정보 교환Data information between remote controllers

도 2에 도시된 바와 같이 마이크 입력 신호(200)는 저역 통과 필터를 포함한 프리앰프(210)을 거쳐서 음성 대역의 신호만 통과시킨 뒤, 코덱에 전달되어 아날로그 신호가 디지털 신호로 변환되고(220) 마이크로프로세서(250)와 주메모리(270)에서 잡음 제거 알고리즘 및 음성 인식 프로그램이 실행되어 기저장된 노래 제목 데이터베이스의 파라메타들과 비교하여 키워드를 검색하고(280) 키보드나 키패드를 이용하여 보조적으로 필요한 정보들을 입력하여(240) 원하는 정보가 화면에 표시되고(230) 리모컨이나 목차본의 검색 정보나 키워드가 원격 송수신단을 거쳐(260) 반주기에 전달되어 노래 반주가 실행된다.(290) 잡음 제거 알고리즘을 구현한 소프트웨어와 음성 인식 프로그램이 목차본이나 반주기에 독자적으로 탑재되어 검색된 키워드에 의한 노래를 반주할 수도 있고, 리모컨의 마이크에 입력된 음성 신호를 반주기에 원격 전송하여 반주기에 탑재된 잡음 제거 및 음성 인식 소프트웨어를 구동하여 곡명을 검색한 후 노래를 반주할 수도 있다.As shown in FIG. 2, the microphone input signal 200 passes only a signal of a voice band through a preamplifier 210 including a low pass filter, and is then passed to a codec to convert an analog signal into a digital signal (220). A noise reduction algorithm and a speech recognition program are executed in the microprocessor 250 and the main memory 270 to search for keywords by comparing the parameters of the pre-stored song title database (280), and supplementally necessary information using a keyboard or a keypad. (240) The desired information is displayed on the screen (230), the search information or keyword of the remote control or table of contents is transmitted to the semi-period through the remote transceiver (260) and the song accompaniment is performed (290) Noise reduction algorithm The software and the voice recognition program that implements the song are independently mounted in the table of contents or the half cycle, and accompanies songs by the searched keywords. Also, and there after to the remote transmitting the audio signal input to the microphone on the remote control to drive the half-period the noise reduction and speech recognition software with a half-period search the music name can also be a song accompaniment.

도 3에 도시된 바와 같이 마이크에 입력된 음성이(300) 코덱을 거쳐 디지털 신호로 변환 된 후 마이크로프로세서와 주메모리에서 잡음 존재 여부를 판단하고(310) 잡음이 있으면 잡음 제거 알고리즘에 의하여 잡음이 제거되면서 이득 조절기가 작동되어(320) 목적 신호의 특징 및 파라메타를 추출하고(330), 미리 계산된 노래 제목의 파라메타를 저장한 데이터베이스와(350) 입력된 신호로부터 추출된 파라메타와 비교하여(340) 두 파라메타사이의 거리가 가까운 후보곡들을 화면에 표시하고(360) 후보곡 들 중 사용자가 원하는 곡이 있으면 음성이나 보조 버튼을 이용하여 곡목을 선택하여(370) 노래 반주기가 작동하게 된다.(380)
As shown in FIG. 3, after the voice input to the microphone is converted into a digital signal through a codec (300), the microprocessor and the main memory determine whether there is noise (310). The gain adjuster is removed (320) to extract the features and parameters of the desired signal (330), and to store the parameters of the pre-calculated song title (350) and compare them with the parameters extracted from the input signal (340). The candidates with short distances between the two parameters are displayed on the screen (360). If there is a desired song among the candidates, the song is selected by using a voice or an auxiliary button (370). 380)

실시간 처리를 위한 하드웨어 개발Hardware Development for Real Time Processing

도 4는 도 2에서 도시한 순서도를 구현한 프로그램을 실시간으로 동작시키기 위한 독립적인 하드웨어 시스템에 대한 블록도이다.4 is a block diagram of an independent hardware system for operating a program implementing the flowchart illustrated in FIG. 2 in real time.

Arm 920T 프로세서 또는 범용 프로세서와 Philips CODEC UDA1341TS 또는 AD(아날로그-디지털) 컨버터를 기반으로 하는 실시간 처리 보드를 도 4에서 제시된 블록도를 기반으로 개발하였다. A real-time processing board based on an Arm 920T processor or a general-purpose processor and Philips CODEC UDA1341TS or AD (analog-to-digital) converter was developed based on the block diagram shown in FIG.

2 개의 마이크로부터 들어오는 입력 신호는 컷오프 주파수 17KHz의 1차 저역 통과 회로를 포함한 프리앰프(400)에서 증폭되어 UDA1341TS 스테레오 코덱(410)에서 44.1KHz 의 샘플링 주파수에 의해 디지털 값으로 변환된다. 도 2의 순서도에서 제시한 알고리즘을 C 와 Arm 920T의 어셈블리 언어로 구현한 프로그램이 최적화 된 후 Arm 920T(430)에서 동작함으로써 입력 신호에 포함된 잡음을 실시간으로 제거하고 특징 파라메타를 추출하여 음성 인식 프로그램에 의하여 원하는 곡목을 검색한다. 자주 검색된 곡들은 EEPROM(480)에 자동 저장하여 많이 불리는 노래에 대한 검색 시간을 단축하여 사용자 편의성 위주의 하드웨어를 설계하였다. 사용자의 환경에 따라 기능을 세밀히 변경할 수 있도록 외부에 버튼(420)과 PC 또는 다른 장비에 연결하여 데이터를 주고 받을 수 있는 연결 장치(440)를 제공하였고, 프로그램을 탑재하고 변경하기 쉽게 플래쉬 메모리(450)를 사용하였다. 소프트웨어의 저작권 보호를 위하여 복제 방지용 보안칩(460)을 장착하였으며, 사용자가 장비의 사용 상태 및 제어를 쉽게 하기 위하여 전면 패널부(470)에 프로그램의 동작 상태 및 버튼 사용 여부를 표시하는 LED 를 장착하였다.The input signal from the two microphones is amplified in the preamplifier 400 including the first low pass circuit of cutoff frequency 17KHz and converted into digital values by the sampling frequency of 44.1KHz in the UDA1341TS stereo codec 410. After the program implemented in the assembly language of C and Arm 920T is optimized, the algorithm shown in the flowchart of FIG. 2 is optimized and operated in the Arm 920T 430 to remove noise included in an input signal in real time, extract feature parameters, and perform voice recognition. Search for the desired song by the program. Frequently retrieved songs are automatically stored in the EEPROM (480) to shorten the search time for many called songs, and designed hardware for user convenience. In order to change the function in detail according to the user's environment, a button 420 and a connection device 440 for exchanging data can be exchanged by connecting to a PC or other device, and a flash memory ( 450) was used. Equipped with a copy protection security chip 460 for copyright protection of the software, and the LED on the front panel 470 to indicate the operating status of the program and whether the button is used in order to facilitate the user's use and control of the equipment. It was.

노래방과 같은 소음이 큰 환경에서 마이크 입력 신호에 포함된 잡음을 제거하고 음성 인식 기술을 이용하여 곡명을 검색한 뒤 반주기에서 노래를 반주하는 알고리즘과 실시간 처리를 위한 하드웨어에 대한 상기 기술은 예시 및 설명을 위해 제시되었다. 수학식이나 도면의 정확한 형태로 본 발명을 총망라하거나 한정하려는 것은 아니다. 상기 내용을 응용하여 많은 변경들과 변형들이 가능할 수 있고, 일부 수학식이나 실시 예들을 임의로 조합하여 사용할 수도 있다. 본 발명의 범위는 이러한 상세한 설명이나 도면, 또는 수학식에 의해서가 아니라 여기에 첨부된 청구항들에 의해 한정되어야 한다.The above description of the algorithm for removing the noise contained in the microphone input signal in a noisy environment such as karaoke, searching for the name of a song using voice recognition technology, and then accompaniing the song in a half cycle and hardware for real-time processing Presented for. It is not intended to be exhaustive or to limit the invention to the precise form of equations or drawings. Many modifications and variations are possible in light of the above teaching, and some combinations of the mathematical formulas and embodiments may be used. It is intended that the scope of the invention be limited not by this detailed description or figures, or by equations, but by the claims appended hereto.

200 : 마이크 신호 입력 모듈
210 : 잡음 존재 여부 확인 모듈
220 : 잡음 제거 및 이득 조절기 사용 모듈
230 : 목적 신호 특징 및 파라메타 추출 모듈
240 : 기존 저장된 파라메타와의 비교 모듈
250 : 곡목에 대하여 미리 계산된 파라메타 데이터베이스 저장 모듈
260 : 검색된 후보 곡목들을 화면에 표시하는 모듈
270 : 원하는 곡목을 음성이나 버튼으로 선택하는 모듈
280 : 반주기 자체 또는 원격 제어 장치에서 받은 키워드로 노래를 반주하는 모듈200: microphone signal input module
210: check the presence of noise module
220: noise cancellation and gain regulator using module
230: destination signal feature and parameter extraction module
240: comparison module with the existing stored parameters
250: precomputed parameter database storage module for music pieces
260: Module to display the retrieved candidate songs on the screen
270: module to select the desired song by voice or button
280: A module for accompaniment of songs with keywords received from the semi-period or the remote control device

Claims

As a method to search for the title of a song by using a voice recognition technology, the equipment such as a semi-period, a table of contents, a remote control, etc. in a noisy environment such as karaoke. A data storage unit for storing the data storage unit; A preamplifier section including a microphone and a low pass filter; A flash memory unit including a microprocessor and a main memory unit for operating a program and a firmware program for driving electronic components and equipment when the power is turned on; A display unit for displaying the operating state of the program and equipment; In addition, a method of providing a user with a plurality of candidate songs as a result of speech recognition by operating a speech recognition program after removing noise on hardware including a data remote transmission / reception unit configured by frequency modulation, Bluetooth, and infrared circuits Karaoke speech recognition half cycle and table of contents, remote control device.

As a noise estimation method to remove the half-period accompaniment sound and the singer's voice from the voice recognition microphone input signal to improve the speech recognition rate in a noisy karaoke environment,
Equation 1 after converting a microphone input signal into a frequency domain

The change amount of the frequency component by

And the amount of change in the time domain by

By the IIR average power of the current frame,
Equation 4

By the long term IIR average power of the current frame,

> c

And estimating whether the input signal of the current frame is a noise or a target signal by comparing the power of the current frame with an experimentally obtained threshold and a speech recognition preprocessing method characterized by the above.

Equation 6 to subtract the noise component estimated in claim 2 from the target signal

Calculate the ratio of the input signal and the noise signal by

Equation 8 based on the Wiener filter of

Multiply the gain of the input signal by

After estimating only the noise component at, Equation 10

A noise reduction method for minimizing noise components in a target signal section by passing an input signal through an adaptive filter using a coefficient of and a speech recognition preprocessing method characterized by the same.

According to claim 2, buffering including past values in a method of measuring variation of an input signal according to Equation 1 and 2 for obtaining valid data required for speech recognition by designating a start point and an end point of a target signal mixed with noise. Speech recognition preprocessing method that adds a technique and transmits to the speech recognition engine including valid data that is easy to be lost such as consonant.

The average power of the frequency components and the formant frequency are measured with respect to the effective data obtained by the second, third and fourth terms, and the voices of the males or females are distinguished and compared with the male and female phoneme data on the database. Even if the speaking speed is significantly different, or long song titles (eg, one hundred meters before she meets her) are recognized as valid data by calculating the average impulse response (IIR) for the time and frequency axis changes in the input signal. A method of improving the speech recognition rate and a speech recognition song accompaniment device characterized by the same.

The microphone mounted on the remote controller transmits the user's voice to the semi-period through a remote transmission unit such as FM modulation, Bluetooth, or IR, or by a microphone connected to the semi-period. Displaying about 20 on the screen display, the user selects one of the songs and transmits the keyword of the corresponding song name in a semi-period, the semi-period to play the accompaniment of the song.

The microphone mounted on the table of contents transmits the user's voice to the pre-processing and speech recognition program mounted on the table of contents, and displays 5 to 20 song names similar to the words recognized in the display on the screen display. A method in which a user selects one of the songs and transmits the keyword of the corresponding song name to a half cycle through a remote transmission unit such as frequency modulation, Bluetooth, and infrared method, and the half cycle plays the accompaniment of the song.