KR101301148B1

KR101301148B1 - Song selection method using voice recognition

Info

Publication number: KR101301148B1
Application number: KR1020130025649A
Authority: KR
Inventors: 서창민
Original assignee: 주식회사 금영
Priority date: 2013-03-11
Filing date: 2013-03-11
Publication date: 2013-09-03

Abstract

PURPOSE: A song selection method using a voice recognition is provided to improve a rate of a voice recognition in a karaoke machine by using a structured control command according to a state of the karaoke machine. CONSTITUTION: A song selection method using a voice recognition comprises a song item display step (115), a voice data and control command comparing step (S117,S119), and a song item selecting step (S121). The song item selecting step aims to display multiple song items corresponding to multiple searched songs. The voice data and control command comparing step aims to receive an audio signal to select the song item, and to compare a voice data of the received audio signal with multiple designated control commands. The song item selecting step aims to correspond with the control command and to select the song item. [Reference numerals] (S100) Start; (S101) Receive a voice signal to select a song by voice recognition; (S103) Do the voice data of the voice signal correspond to the designated control command?; (S105) Receive the voice signal for searching for songs; (S107) Search for the songs corresponding to song search DB by using voice data of the voice signal; (S109) Fail to search songs or to recognize the voice data?; (S111) Order the searched songs; (S113) Generate a song item for the searched songs made in order; (S115) Display the generated song item; (S117) Receive the voice signal to control the displayed song item; (S119) Control command for the voice data of the voice signal to select the song item?; (S121) Select the song item corresponding to the determined control command; (S123) Perform other relevant control command; (S200) End

Description

Song selection method using speech recognition {SONG SELECTION METHOD USING VOICE RECOGNITION}

본 발명은 음성 인식을 이용한 노래 선곡 방법에 관한 것으로서, 구체적으로는 마이크를 통해 수신된 음성 신호에 응답하여 곡을 검색할 수 있도록 하고 편리하게 곡을 선택할 수 있도록 하는, 음성 인식을 이용한 노래 선곡 방법에 관한 것이다.The present invention relates to a song selection method using voice recognition, specifically, a song selection method using voice recognition, which enables the user to search for a song and conveniently selects a song in response to a voice signal received through a microphone. It is about.

프로세서의 성능이 향상됨에 따라 아날로그의 음성을 디지털 음성 데이터로 변환하여 이로부터 대응하는 단어나 문장을 인식할 수 있는 음성 인식 기술은 실용적으로 많은 기술 분야에서 이용될 수 있게 되었다. 예를 들어 이러한 음성 인식 기술은 스마트 폰 등에 탑재되어 사용자의 음성으로부터 단어나 문장을 추출할 수 있도록 한다. As the performance of the processor is improved, a speech recognition technology capable of converting analog speech into digital speech data and recognizing corresponding words or sentences therefrom may be practically used in many technical fields. For example, such a voice recognition technology is mounted on a smart phone or the like to extract words or sentences from the user's voice.

그러나 음성 인식 기술은 단어 인식률이 낮거나 정확히 대응하는 단어를 찾는 것에 한계가 존재하여 신뢰성이나 정확성을 요구하는 상용화된 제품에 적용하기는 한계가 있다. However, the speech recognition technology has a limitation in finding words that have low word recognition rate or exact correspondence, and thus, there is a limitation in applying them to commercial products requiring reliability or accuracy.

예를 들어 노래 반주 장치는, 사용자의 음성뿐 아니라 음악 신호도 스피커를 통해 출력되는 환경이다. 이러한 환경에서는 사용자의 음성 신호뿐 아니라 음악 신호도 고려할 필요가 있어서 음성 인식 기술에 의한 인식률이 더 낮아질 수밖에 없다. For example, a song accompaniment device is an environment in which not only a user's voice but also a music signal is output through a speaker. In such an environment, it is necessary to consider not only a user's voice signal but also a music signal, so that the recognition rate by the voice recognition technology is lowered.

더욱이 노래 반주 장치에서 특정 단어를 이용하여 곡을 검색하는 경우에, 해당 단어에 대응하거나 유사한 곡들은 다수 개가 있을 수 있고, 이에 따라 검색된 곡들의 결과에 대해서 네비게이션(navigation)을 할 필요가 있으나 노래방에서의 환경에 따른 인식률 저하로 인해서 리모콘을 이용해서 음성 검색 결과를 다시 네비게이션 하도록 하여 극히 제한된 환경 내에서만 음성 인식 기술이 적용되고 있다. In addition, when a song accompaniment device searches for a song using a specific word, there may be a plurality of songs corresponding to or similar to the word, thus navigating the results of the found songs, but in a karaoke room Due to the deterioration of the recognition rate according to the environment, the voice recognition technology is applied to the voice search results again by using the remote controller.

따라서 노래 반주 장치와 같은 상용화된 제품상에서 신뢰성과 정확성을 제공할 수 있도록 하고 나아가 편리하게 곡의 검색이나 곡의 선택이 가능하도록 하는, 음성 인식을 이용한 노래 선곡 방법이 필요하다. Therefore, there is a need for a song selection method using voice recognition, which can provide reliability and accuracy on a commercially available product such as a song accompaniment device, and further facilitate search and selection of songs.

본 발명은, 상술한 문제점을 해결하기 위해서 안출한 것으로서, 노래 반주 장치의 상태에 따라 구조화된 제어 명령을 이용하여 노래 반주 장치에서 음성 인식률을 높일 수 있도록 하는, 음성 인식을 이용한 노래 선곡 방법을 제공하는 데 그 목적이 있다. The present invention has been made to solve the above-described problems, and provides a song selection method using speech recognition to increase the speech recognition rate in the song accompaniment apparatus using a control command structured according to the state of the song accompaniment apparatus. Its purpose is to.

또한 본 발명은, 노래 반주 장치에서 음성 인식을 통해 검색된 곡들을 음성 인식을 통해 원하는 곡을 바로 선택할 수 있도록 하는, 음성 인식을 이용한 노래 선곡 방법을 제공하는 데 그 목적이 있다. Another object of the present invention is to provide a song selection method using voice recognition, which enables a song accompaniment device to directly select desired songs through voice recognition.

또한 본 발명은, 사용자에게 직관적이고 편리한 사용자 인터페이스를 통해 음성 인식에 따른 곡 검색과 선택할 수 있도록 하는, 음성 인식을 이용한 노래 선곡 방법을 제공하는 데 그 목적이 있다. Another object of the present invention is to provide a song selection method using voice recognition, which enables a user to search and select a song according to voice recognition through an intuitive and convenient user interface.

또한 본 발명은, 제어 명령으로서 음성 인식에 이용될 단어의 형태를 분석하고 이를 적용하여 보다더 높은 음성 인식률을 제공할 수 있도록 하는, 음성 인식을 이용한 노래 선곡 방법을 제공하는 데 그 목적이 있다. Another object of the present invention is to provide a song selection method using speech recognition, which can provide a higher speech recognition rate by analyzing a form of a word to be used for speech recognition as a control command and applying the same.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, unless further departing from the spirit and scope of the invention as defined by the appended claims. It will be possible.

상기와 같은 목적을 달성하기 위한, 음성 인식을 이용한 노래 선곡 방법은, (a) 검색된 복수의 곡에 각각 대응하는 복수의 곡 아이템(item)을 디스플레이하는 단계와 (b) 복수의 곡 아이템 중 하나를 선택하기 위해서 음성 신호를 수신하고, 수신된 음성 신호의 음성 데이터를 지정된 복수의 제어 명령과 비교하는 단계와 (c) 비교의 결과에 따라 결정되는 제어 명령에 대응하고 복수의 곡 아이템 중 하나인 곡 아이템을 선택하는 단계를 포함하고, 복수의 제어 명령 각각은, 복수의 곡 각각에 대응한다. In order to achieve the above object, a song selection method using voice recognition includes: (a) displaying a plurality of song items corresponding to a plurality of retrieved songs, and (b) one of the plurality of song items. Receiving a voice signal to select a, comparing the voice data of the received voice signal with a plurality of specified control commands, and (c) corresponding to the control command determined according to the result of the comparison and being one of the plurality of song items. Selecting a song item, wherein each of the plurality of control commands corresponds to each of the plurality of songs.

또한 상기와 같은 목적을 달성하기 위한, 음성 인식을 이용한 노래 선곡 방법은, 단계 (a)가, 복수의 곡 아이템 중 지정된 곡 아이템을 하이라이트(highlight) 하는 단계를 포함하고, 단계 (c)가, 선택된 곡 아이템을 하이라이트하는 단계 및 지정된 곡 아이템의 하이라이트를 해제하는 단계를 포함하고, 복수의 제어 명령 중 하나 이상의 제어 명령은, 대응하는 곡의 곡 아이템에 포함되어 디스플레이되어, 선택되어 있는 곡 아이템으로부터 복수의 곡 아이템 중 다른 곡 아이템으로 음성 인식을 통해 직접 선택할 수 있도록 한다.In addition, in order to achieve the above object, the song selection method using speech recognition, step (a) comprises the step of highlighting a specified song item of the plurality of song items, step (c), Highlighting the selected song item and unhighlighting the designated song item, wherein one or more of the control commands of the plurality of control commands are included in the song item of the corresponding song and displayed; Another song item among a plurality of song items can be directly selected through voice recognition.

상기와 같은 본 발명에 따른 음성 인식을 이용한 노래 선곡 방법은, 노래 반주 장치의 상태에 따라 구조화된 제어 명령을 이용하여 노래 반주 장치에서 음성 인식률을 높일 수 있도록 하는 효과가 있다. The song selection method using voice recognition according to the present invention as described above has the effect of increasing the speech recognition rate in the song accompaniment apparatus by using a control command structured according to the state of the song accompaniment apparatus.

또한 상기와 같은 본 발명에 따른 음성 인식을 이용한 노래 선곡 방법은, 노래 반주 장치에서 음성 인식을 통해 검색된 곡들을 음성 인식을 통해 원하는 곡을 바로 선택할 수 있도록 하는 효과가 있다. In addition, the song selection method using the voice recognition according to the present invention as described above, the song accompaniment apparatus has the effect that the desired song can be selected directly through the voice recognition through the music recognition.

또한 상기와 같은 본 발명에 따른 음성 인식을 이용한 노래 선곡 방법은, 사용자에게 직관적이고 편리한 사용자 인터페이스를 통해 음성 인식에 따른 곡 검색과 선택할 수 있도록 하는 효과가 있다. In addition, the song selection method using the voice recognition according to the present invention as described above, there is an effect that the user can search and select the song according to the voice recognition through an intuitive and convenient user interface.

또한 상기와 같은 본 발명에 따른 음성 인식을 이용한 노래 선곡 방법은, 제어 명령으로서 음성 인식에 이용될 단어의 형태를 분석하고 이를 적용하여 보다더 높은 음성 인식률을 제공할 수 있도록 하는 효과가 있다. In addition, the song selection method using the voice recognition according to the present invention as described above, there is an effect to provide a higher speech recognition rate by analyzing the form of the words to be used for speech recognition as a control command and apply it.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtained by the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description will be.

도 1은 본 발명에 따른 노래 선곡 방법을 수행하기 위한 시스템 블록도를 도시한 도면이다.
도 2는, 노래 반주 장치의 예시적인 하드웨어 블록도를 도시한 도면이다.
도 3은, 노래 반주 장치에서 수행되는 음성 인식을 이용한 노래 선곡 방법에 대한 상위 레벨의 제어 흐름을 도시한 도면이다.
도 4는, 도 3의 상위 레벨의 제어 흐름을 구체화한 상세한 제어 흐름을 도시한 도면이다.
도 5는, 도 4의 단계 S115 또는 단계 S121에서 표시 장치를 통해 디스플레이된 결과를 도시한 도면이다. 1 is a block diagram illustrating a system for performing a song selection method according to the present invention.
Fig. 2 is a diagram showing an exemplary hardware block diagram of the song accompaniment apparatus.
FIG. 3 is a diagram illustrating a high level control flow of a song selection method using speech recognition performed in a song accompaniment apparatus. Referring to FIG.
FIG. 4 is a diagram showing a detailed control flow incorporating the higher level control flow of FIG. 3.
FIG. 5 is a diagram illustrating a result displayed through the display device in step S115 or step S121 of FIG. 4.

상술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술 되어 있는 상세한 설명을 통하여 더욱 명확해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다.
The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which: It can be easily carried out. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 노래 선곡 방법을 수행하기 위한 시스템 블록도를 도시한 도면이다. 도 1에 따르면 이 시스템은, 노래 반주 장치(100)와 표시 장치(200)와 하나 이상의 마이크(300)를 포함한다. 1 is a block diagram illustrating a system for performing a song selection method according to the present invention. According to FIG. 1, the system includes a song accompaniment device 100, a display device 200, and one or more microphones 300.

본 시스템의 각 장치에 대해서 간단히 살펴보면, 노래 반주 장치(100)는 이 노래 반주 장치(100)를 이용하는 사용자에 의한 곡의 선택에 따라 선곡된 곡의 오디오 데이터를 재생하여 오디오 신호로 출력하고 선곡된 곡에 대응하는 가사나 동영상 등을 표시 장치(200) 등으로 출력한다. Referring briefly to each device of the present system, the song accompaniment apparatus 100 reproduces audio data of a song selected according to a song selection by a user using the song accompaniment apparatus 100, outputs it as an audio signal, and is selected. The lyrics or video corresponding to the song are output to the display device 200 or the like.

또한 이 노래 반주 장치(100)는, 마이크(300)로부터 음성 신호를 수신하여 수신된 음성 신호의 음성 데이터에 따른 요청된 제어 명령에 따라 노래 반주 장치(100)에 소프트웨어 및/또는 하드웨어 형태로 탑재된 음성 검색 엔진을 구동하여 다수의 곡에서 대응하는 곡들을 검색하거나 음성 명령 인식 엔진을 구동하여 노래 반주 장치(100)의 상태나 음성 검색 상에서의 상태에 따라 지정된 제어 명령을 수행하도록 한다. In addition, the song accompaniment apparatus 100 receives a voice signal from the microphone 300 and mounts it in software and / or hardware form on the song accompaniment apparatus 100 according to a requested control command according to voice data of the received voice signal. The voice search engine is driven to search for corresponding songs in the plurality of songs, or the voice command recognition engine is driven to perform a specified control command according to the state of the song accompaniment apparatus 100 or the state on the voice search.

이와 같은 음성 검색 엔진과 음성 명령 인식 엔진은 바람직하게는 노래 반주 장치(100)에 탑재되고 나아가 소프트웨어 형태로 구현될 수 있다. Such a voice search engine and a voice command recognition engine are preferably mounted on the song accompaniment apparatus 100 and further implemented in software form.

노래 반주 장치(100)에 대한 보다더 구체적인 설명은 도 2 내지 도 5를 통해서 살펴보도록 한다. A more detailed description of the song accompaniment apparatus 100 will be described with reference to FIGS. 2 to 5.

표시 장치(200)는, 노래 반주 장치(100)에 예를 들어 유선이나 무선으로 연결되어 노래 반주 장치(100)로부터 출력된 비디오 신호를 수신하여 수신된 비디오 신호를 디스플레이 패널을 통해서 출력할 수 있는 장치이다. The display device 200 may be connected to the song accompaniment apparatus 100 by wire or wirelessly, for example, to receive a video signal output from the song accompaniment apparatus 100 and output the received video signal through a display panel. Device.

이러한 표시 장치(200)는, LCD 패널이나 PDP 패널이나 LED 패널 등을 구비하여 노래 반주 장치(100)를 이용하는 사용자로 하여금 가사를 볼 수 있도록 하거나 사용자에게 음성 인식을 위한 가이드를 제공하거나 재생중인 곡의 배경 영상을 표시하도록 한다. The display device 200 includes an LCD panel, a PDP panel, an LED panel, or the like so that a user who uses the song accompaniment apparatus 100 can view the lyrics, provide the user with a guide for speech recognition, or play a song. To display the background image.

마이크(300)는, 유선 또는 무선을 통해 노래 반주 장치(100)에 연결되어, 사용자로부터 수신된 음성 신호를 노래 반주 장치(100)에게 아날로그 음성 신호로 또는 아날로그 음성 신호가 변환된 디지털 음성 신호로 출력한다. The microphone 300 is connected to the song accompaniment apparatus 100 by wire or wirelessly, and converts the voice signal received from the user into an analog voice signal to the song accompaniment apparatus 100 or a digital voice signal converted from the analog voice signal. Output

이러한 마이크(300)로부터의 음성 신호는, 노래 반주 장치(100)로 하여금 음성 인식을 통한 곡의 선곡이 가능하도록 하고(거나) 사용자의 음성을 스피커(미도시)로 출력할 수 있도록 한다.
The voice signal from the microphone 300 enables the song accompaniment apparatus 100 to select a song through voice recognition and / or output a user's voice to a speaker (not shown).

도 2는, 노래 반주 장치(100)의 예시적인 하드웨어 블록도를 도시한 도면이다. 2 is a diagram illustrating an exemplary hardware block diagram of the song accompaniment apparatus 100.

도 2에 따르면, 노래 반주 장치(100)는, 입력 인터페이스(110)와 출력 인터페이스(120)와 메모리(130)와 하드 디스크(140)와 마이크 수신 단자(150)와 아날로그-디지털 변환기(160)와 프로세서(170)와 시스템 버스/제어 버스(180)를 포함한다. According to FIG. 2, the song accompaniment apparatus 100 includes an input interface 110, an output interface 120, a memory 130, a hard disk 140, a microphone receiving terminal 150, and an analog-to-digital converter 160. And a processor 170 and a system bus / control bus 180.

이러한 예시적인 노래 반주 장치(100)는, 필요에 따라 도시된 일부의 블록을 생략하도록 구성될 수 있고 혹은 도시되지 않은 다른 블록들이 더 포함하도록 구성될 수 있다. This exemplary song accompaniment apparatus 100 may be configured to omit some of the blocks shown as needed, or may be configured to further include other blocks not shown.

도 2의 각 블록들을 간단히 살펴보면, 입력 인터페이스(110)는, 노래 반주 장치(100)를 이용하는 사용자로부터 입력을 수신하기 위한 인터페이스이다. 이러한 입력 인터페이스(110)는, 예를 들어 하나 이상의 리모콘 수신 모듈과 하나 이상의 버튼 등을 포함한다. 이러한 입력 인터페이스(110)는, 예를 들어 사용자로 하여금 곡을 선곡하거나 곡을 재생하거나 곡의 재생을 취소할 수 있도록 한다. 2, the input interface 110 is an interface for receiving an input from a user using the song accompaniment apparatus 100. The input interface 110 includes, for example, one or more remote control receiving modules and one or more buttons. This input interface 110, for example, allows a user to select a song, play a song, or cancel playback of a song.

출력 인터페이스(120)는, 유선 또는 무선의 오디오 출력 단자와 비디오 출력 단자를 포함하여, 노래 반주 장치(100)에서 재생중인 곡의 미디 데이터나 압축된 오디오 데이터로부터 변환된 오디오 신호를 오디오 출력 단자를 통해 스피커로 출력하거나 가사 이미지 및/또는 배경 동영상 등으로부터 변환된 비디오 신호를 비디오 출력 단자를 통해 표시 장치(200)로 출력한다. The output interface 120 includes a wired or wireless audio output terminal and a video output terminal. The output interface 120 converts an audio signal converted from MIDI data or compressed audio data of a song being played by the song accompaniment apparatus 100 into an audio output terminal. The video signal is output to the display device 200 through the video output terminal through the video output terminal.

메모리(130)는, 휘발성 메모리 및/또는 비휘발성 메모리를 포함한다. 이러한 메모리(130)는 프로세서(170)의 제어하에 미디 데이터나 압축된 오디오 데이터나 음성 인식을 위해 이용되는 음성 신호의 음성 데이터를 임시로 저장할 수 있도록 하고 나아가 프로세서(170)의 부팅 시에 필요한 초기 프로그램을 저장할 수 있다. The memory 130 includes a volatile memory and / or a nonvolatile memory. The memory 130 may temporarily store the MIDI data, the compressed audio data, or the voice data of the voice signal used for voice recognition under the control of the processor 170, and furthermore, the initial data necessary for booting the processor 170. You can save the program.

하드 디스크(140)는, 대용량의 저장 공간을 제공하여 각종 데이터와 각종 프로그램을 저장한다. 이러한 데이터와 프로그램은 프로세서(170)에 의해서 이용될 수 있다. The hard disk 140 provides a large storage space to store various data and various programs. Such data and programs may be used by the processor 170.

예를 하드 디스크(140)에는, 복수의 동영상 파일을 포함하고, 복수의 오디오 파일을 저장하고 있다. 이러한 동영상 파일은 곡의 재생시에 이용될 수 있는 배경 영상으로 사용될 수 있고, 그리고 압축된 오디오 파일 및/또는 미디 파일과 같은 오디오 파일은 곡의 재생시에 이용되어 오디오 신호로 출력될 수 있다. For example, the hard disk 140 includes a plurality of video files and stores a plurality of audio files. Such a moving picture file may be used as a background image that may be used at the time of playback of a song, and an audio file such as a compressed audio file and / or a MIDI file may be used at the time of playback of a song and output as an audio signal.

또한 복수의 프로그램을 포함하는 하드 디스크(140)는, 입력 인터페이스(110)를 통해 수신된 입력에 따라 노래 반주 장치(100)를 제어하기 위한 제어 엔진, 제어 엔진의 제어에 따라 또는 독립적으로 구동될 수 있는 음성 엔진, 제어 엔진의 제어하에 오디오 파일이나 동영상 파일을 지정된 포맷에 따라 디코딩하는 재생 엔진, 노래 반주 장치(100)에 포함된 하드웨어 블록을 제어하기 위한 인터페이스 엔진 등이 소프트웨어 프로그램의 형태로 포함될 수 있다. In addition, the hard disk 140 including a plurality of programs may be driven independently or under control of a control engine, a control engine for controlling the song accompaniment apparatus 100 according to an input received through the input interface 110. A voice engine, a playback engine for decoding an audio file or a video file according to a specified format under the control of a control engine, and an interface engine for controlling a hardware block included in the song accompaniment apparatus 100 may be included in the form of a software program. Can be.

그리고 하드 디스크(140)는, 곡을 검색하기 위한 곡 검색 DB를 포함할 수 있다. 이러한 곡 검색 DB는 검색 파라미터(예를 들어, 곡 번호, 곡명, 가수, 작곡가, 가사 등)의 입력(예를 들어 제어 엔진에 의해서 또는 음성 엔진에 의해서)에 따라 그 결과를 출력할 수 있도록 하고, 이러한 검색 결과는 하나 이상의 파라미터를 포함할 수 있다. The hard disk 140 may include a song search DB for searching for a song. This song search DB can output the result according to the input of the search parameters (e.g., song number, song name, singer, composer, lyrics, etc.) by the control engine or by the voice engine. This search result may include one or more parameters.

또한 하드 디스크(140)는, 노래 반주 장치(100)에서 이용되는 곡들에 대한 인기도 리스트를 포함할 수 있다. 이러한 인기도 리스트는 예를 들어 모든 곡들에 대해서 지정된 기준 시점에서의 인기 순위의 리스트나 각 장르(팝, 댄스, 발라드, ㅌ)별로의 인기 순위의 리스트 등을 포함할 수 있고 이러한 인기도 리스트는 예를 들어 곡의 제목이나 곡 번호 등을 포함한다. 그리고 이 인기도 리스트는 검색 결과를 순서화하기 위해서 이용될 수 있다. In addition, the hard disk 140 may include a popularity list for songs used in the song accompaniment apparatus 100. Such a popularity list may include, for example, a list of popularity rankings at a given reference point for all songs, or a list of popularity rankings by genre (pop, dance, ballad, pop), and the like. For example, the title of the song or the song number. This popularity list can then be used to order the search results.

여기서 검색 결과에 포함되는 하나 이상의 파라미터는 예를 들어 곡 번호를 포함하고 나아가 검색 파라미터와 이 검색 파라미터와 연관되거나 사용자에게 곡을 인식하기 위해서 필요한 다른 파라미터를 더 포함할 수 있다. Here, the one or more parameters included in the search result may include, for example, a song number, and further include a search parameter and other parameters associated with the search parameter or required to recognize the song to the user.

여기서 음성 엔진은 또한, 마이크(300)로부터 수신된 음성 신호로부터의 음성 데이터로부터 곡 검색 DB에서 이용될 파라미터의 입력 데이터로 변환하고 변환된 입력 데이터를 이용하여 곡 검색 DB에서 대응하는 하나 이상의 곡을 검색하기 위한 음성 검색 엔진과 수신된 음성 신호로부터 음성 인식에서 제어 명령으로 이용되는 단어를 인식하기 위한 음성 명령 인식 엔진을 포함할 수 있다. Here, the voice engine may also convert voice data from the voice signal received from the microphone 300 into input data of a parameter to be used in the song search DB and use the converted input data to select one or more corresponding songs in the song search DB. And a voice command recognition engine for recognizing a word used as a control command in voice recognition from the received voice signal.

여기서는 각각의 음성 검색 엔진과 음성 명령 인식 엔진이 별도의 분리된 형태로 설명하였으나 이에 국한될 필요는 없고 하나의 엔진으로서 혹은 둘 이상의 분리된 엔진으로 구성될 수 있다.Although each voice search engine and voice command recognition engine have been described in a separate form, the present invention is not limited thereto and may be configured as one engine or two or more separate engines.

그리고 음성 명령 인식 엔진은, 노래 반주 장치(100)의 상태에 따라 또는 제어 엔진의 제어에 따라 또는 음성 인식에서의 단계와 같은 상태에 따라 각 상태나 각 제어에서 인식하여 할 제어 명령들을 제한하도록 구성될 수 있다. 이러한 각 상태나 제어에서의 인식되어야 할 제어 명령들을 제한함으로써 음성 인식률을 높일 수 있도록 한다. And the voice command recognition engine is configured to limit the control commands to be recognized in each state or each control according to the state of the song accompaniment apparatus 100 or according to the control of the control engine or the state such as the step in the voice recognition. Can be. The speech recognition rate can be increased by limiting the control commands to be recognized in each of these states or controls.

이러한 각각의 엔진들은 프로세서(170)에 의해서 메모리(130) 등에 로딩되어 구동될 수 있다. Each of these engines may be loaded and driven by the memory 170 by the processor 170.

노래 반주 장치(100)에 포함되어 음성 엔진(나아가 제어 엔진과 연동하여)에 의해서 수행되는 음성 인식을 이용한 노래 선곡 방법은 도 3 내지 도 5를 보다더 상세히 살펴보도록 한다. A song selection method using voice recognition included in the song accompaniment apparatus 100 and performed by a speech engine (in conjunction with a control engine) will be described in more detail with reference to FIGS. 3 to 5.

마이크 수신 단자(150)는, 유선 또는 무선으로 연결된 마이크(300)로부터 음성 신호를 수신한다. 이러한 마이크 수신 단자(150)는 마이크(300)의 유선 케이블에 연결되기 위한 연결 단자를 포함하거나 무선의 음성 데이터를 수신하기 위한 안테나를 포함한다. The microphone receiving terminal 150 receives a voice signal from the microphone 300 connected by wire or wirelessly. The microphone receiving terminal 150 includes a connection terminal for connecting to a wired cable of the microphone 300 or an antenna for receiving wireless voice data.

아날로그-디지털 변환기(160)(Analog Digital Converter)는, 유선으로 연결된 마이크(300)로부터의 아날로그의 음성 신호를 일련의 연속적인 디지털의 음성 데이터로 변환하여 프로세서(170)로 출력한다. 이러한 아날로그-디지털 변환기(160)는 지정된 샘플링 주기에 따라 지정된 크기(예를 들어 8 비트, 16 비트 등)의 디지털 데이터로 변환하여 약속된 내부 데이터 통신 포맷(예를 들어 I2S나 SPDIF 등)에 따라 이 디지털 데이터를 프로세서(170)로 출력한다. The analog-to-digital converter 160 converts an analog audio signal from the microphone 300 connected by wire into a series of continuous digital audio data and outputs the same to the processor 170. The analog-to-digital converter 160 converts the digital data of a specified size (for example, 8 bits, 16 bits, etc.) according to a designated sampling period and according to a promised internal data communication format (for example, I2S or SPDIF). This digital data is output to the processor 170.

만일 마이크 수신 단자(150)가 무선으로 마이크의 음성 데이터를 수신하는 경우에는 이 아날로그-디지털 변환기(160)는 생략될 수 있고, 이때에는 프로세서(170)가 무선 데이터 패킷을 수신하여 이 무선 데이터 패킷으로부터 디지털의 음성 데이터를 추출할 수 있다. If the microphone receiving terminal 150 wirelessly receives the voice data of the microphone, the analog-to-digital converter 160 may be omitted, in which case the processor 170 receives the wireless data packet to receive the wireless data packet. Digital audio data can be extracted from the data.

프로세서(170)는, 프로그램의 명령어들을 로딩하여 실행할 수 있는 하나 이상의 실행 유닛을 구비하여, 프로그램의 명령어의 실행에 따라 각 하드웨어 블록들을 제어한다. The processor 170 includes one or more execution units that can load and execute instructions of a program to control each hardware block according to execution of the instructions of the program.

이러한 프로세서(170)는, 예를 들어 제어 엔진과 음성 엔진과 같은 프로그램을 시분할 또는 하나 이상의 실행 유닛에 할당하여 제어 엔진과 음성 엔진이 동시에 수행되도록 하거나 프로그래머나 사용자에게 동시에 수행되는 것으로 인지하게 하고 나아가 각 엔진 간에 데이터를 프로그램의 약속된 프로토콜에 따라 송수신할 수 있도록 한다. The processor 170 may, for example, assign a program such as a control engine and a speech engine to time division or one or more execution units so that the control engine and the speech engine may be executed simultaneously or may be recognized by the programmer or the user as being simultaneously executed. Allows data to be sent and received between engines according to the program's promised protocols.

이 프로세서(170)에 로딩되어 수행되는 음성 인식을 이용한 노래 선곡 방법에 대해서는 도 3 내지 도 5를 통해서 상세히 살펴보도록 한다. A song selection method using voice recognition loaded and executed by the processor 170 will be described in detail with reference to FIGS. 3 to 5.

시스템 버스/제어 버스(180)는, 각 블록들에서 생성되는 데이터나 제어 신호를 다른 블록으로 전송할 수 있도록 한다. 이러한 시스템 버스/제어 버스(180)는 병렬 버스이거나 시리얼(serial) 버스일 수 있다.
The system bus / control bus 180 may transmit data or control signals generated in each block to other blocks. This system bus / control bus 180 may be a parallel bus or a serial bus.

도 3은, 노래 반주 장치(100)에서 수행되는 음성 인식을 이용한 노래 선곡 방법에 대한 상위 레벨의 제어 흐름을 도시한 도면이다.FIG. 3 is a diagram illustrating a high level control flow of a song selection method using voice recognition performed by the song accompaniment apparatus 100.

이러한 노래 선곡 방법은 바람직하게는 프로세서(170)에서 구동되는 프로그램에 의해서 수행되고 예를 들어 음성 엔진과 제어 엔진이 연동하여 수행될 수 있다. This song selection method is preferably performed by a program driven in the processor 170, for example, the speech engine and the control engine may be performed in conjunction.

도 3에 따르는 상위 레벨의 제어 흐름을 살펴보면, 먼저 단계 S1에서 음성 엔진, 구체적으로는 음성 명령 인식 엔진,이 곡 검색을 음성으로 수행하기 위한 지정된 제어 명령과 수신된 음성 신호의 변환된 음성 데이터가 대응하는 지를 결정하여, 만일 음성 데이터가 지정된 제어 명령과 일치하거나 유사도에 따라 일정한 유사도 이상으로 대응하는 경우에는 단계 S2로 전이한다. Referring to the high-level control flow according to Fig. 3, first, in step S1, the voice engine, specifically the voice command recognition engine, the designated control command for performing this song search by voice and the converted voice data of the received voice signal are If the voice data matches the specified control command or corresponds to more than a certain similarity according to the similarity, the process shifts to step S2.

여기서의 지정된 제어 명령은 음성을 통해 곡 검색을 시작하도록 요청하도록 하는 단어일 수 있고, 예를 들어 "음성검색", "음성인식", "금영노래방" 등과 같은 단어일 수 있다.The designated control command here may be a word for requesting to start searching for a song through voice, for example, a word such as "voice search", "speech recognition", "gold karaoke", and the like.

이와 같이 진입 단계(단계 S1)에서는 단지 지정된 제어 명령, 즉 지정된 단어,를 인식하면 되고, 이에 따라 음성 인식에 따른 인식률을 지정된 단어와의 비교로 높일 수 있고 나아가 특정 상태에서만 이러한 음성 검색이 가능하도록 하여 불필요한 음성 인식을 생략할 수 있도록 한다. Thus, in the entry step (step S1), only the designated control command, i.e., the designated word, needs to be recognized, so that the recognition rate according to the speech recognition can be increased by comparison with the specified word, and further, such a voice search can be performed only in a specific state. So that unnecessary speech recognition can be omitted.

여기서, 노래 반주 장치(100)를 제어하는 제어 엔진은 음성 검색을 허용할 수 있는 상태를 결정하고 만일 음성 검색이 허용되는 경우에 이 음성 엔진, 구체적으로는 음성 명령 인식 엔진,을 구동하거나 음성 엔진에 음성 인식을 시작하도록 요청할 수 있다. Here, the control engine for controlling the song accompaniment apparatus 100 determines a state capable of allowing a voice search, and if the voice search is allowed, drives the voice engine, specifically the voice command recognition engine, or operates the voice engine. You can ask to start speech recognition.

예를 들어 노래 반주 장치(100)는, 곡 또는 노래를 재생 중인 상태, 곡 또는 노래가 재생 정지되어 사용자 입력을 수신하기를 대기하고 있는 대기 상태, 곡의 재생 중 상태는 아니지만 사용자 입력을 수신하여 특정 기능을 수행하고 있는 상태 등이 있을 수 있다. For example, the song accompaniment apparatus 100 may include a state in which a song or song is being played, a standby state in which the song or song is stopped, and waiting to receive user input. There may be a state performing a specific function.

이러한 상태를 제어 엔진이 결정하여 여러 상태 중 지정된 하나 이상의 상태에서 이러한 음성 검색이 허용되도록 음성 엔진을 제어할 수 있다. Such a state may be determined by the control engine to control the speech engine to allow such a voice search in one or more of the specified states.

단계 S2(입력 단계)에서 음성 엔진이 사용자가 검색하고자 하는 파라미터(예를 들어 곡명)의 음성 신호의 음성 데이터를 수신한다. 이러한 단계 S2는 예를 들어 연속적으로 수신된 음성 신호에서 수신된 음성 신호의 끝(예를 들어 음성 데이터의 일정 기간 동안의 존재 여부를 이용하여)을 찾아 사용자가 검색하고자 하는 단어나 문장의 음성 데이터를 분리하도록 구성된다. In step S2 (input step), the voice engine receives voice data of a voice signal of a parameter (for example, a song name) that the user wants to search. This step S2 is, for example, the voice data of the word or sentence that the user wants to search by searching for the end of the received voice signal (for example, using the presence or absence of the voice data for a certain period) in the continuously received voice signal. It is configured to separate.

이후 단계 S3(인식 단계)에서 분리된 음성 데이터의 분석(음성 엔진, 구체적으로는 음성 검색 엔진에 의해서)을 통해서 대응하는 단어나 문장을 인식하고 인식된 단어나 문장을 곡 검색 DB의 대응하는 파라미터와 비교를 통해서 유사하거나 일치하는 곡 들을 검색한다. Subsequently, in step S3 (recognition step), the corresponding words or sentences are recognized through analysis of the speech data separated by the speech data (by the voice engine, specifically, the voice search engine), and the corresponding parameters of the song search DB are recognized. Search for similar or matching songs by comparing with.

만일 이 단계 S3에서 음성 데이터의 분석을 통해 대응하는 단어나 문장을 인식하지 못하는 경우 및/또는 하나 이상의 대응하는 곡을 검색하지 못한 경우 단계 S2로 전이할 수 있다. If in step S3 the corresponding words or sentences are not recognized through analysis of the voice data and / or one or more corresponding songs are not found, the process may transition to step S2.

이러한 검색을 통해 하나 이상의 곡들이 검색되어 이후 단계 S4(결과 표출 단계)에서 검색된 곡들이 결과로서 생성되어 표시 장치(200)에 표시된다. 이러한 결과의 생성은 음성 엔진에 의해서 이루어지거나 혹은 음성 엔진으로부터 결과를 수신한 제어 엔진에 의해서 생성될 수 있다. One or more songs are searched through this search, and the songs searched in step S4 (result display step) are generated as a result and displayed on the display device 200. The generation of these results may be by a speech engine or by a control engine that receives the results from the speech engine.

여기서 결과 표출 단계(S4)에서는 또 다른 음성 인식에 따라 제어 명령을 수신하고 수신된 제어 명령에 따라 결과의 디스플레이를 변경할 수 있다. 그리고 이 결과 표출 단계(S4)에서 이용되는 제어 명령의 단어들은 진입 단계(S1)에서 이용되는 제어 명령의 단어와는 상이하도록 구성될 수 있다. Here, in the result display step S4, a control command may be received according to another voice recognition, and the display of the result may be changed according to the received control command. The words of the control command used in the result display step S4 may be configured to be different from the words of the control command used in the entry step S1.

이에 따라 음성 인식에 있어서도 각 단계(또는 상태)에 따라 서로 상이한 제어 명령의 단어만을 인식하도록 하고 이에 따라 음성 인식률을 향상시킬 수 있도록 한다. Accordingly, in speech recognition, only words of different control commands are recognized according to each step (or state), thereby improving the speech recognition rate.

이하 도 4에서 도 3의 상위 레벨의 제어 흐름을 보다더 상세히 살펴보도록 한다.
Hereinafter, the high level control flow of FIG. 3 will be described in more detail with reference to FIG. 4.

도 4는, 도 3의 상위 레벨의 제어 흐름을 구체화한 상세한 제어 흐름을 도시한 도면이다. 이러한 제어 흐름은 노래 반주 장치(100)의 프로세서(170)상에서 구동되는 음성 엔진이나 제어 엔진에 의한 하드웨어 블록의 제어를 통해 그리고 프로세서(170)상에서의 프로그램의 구동에 의해 수행될 수 있다. FIG. 4 is a diagram showing a detailed control flow incorporating the higher level control flow of FIG. 3. This control flow may be performed through the control of a hardware block by a voice engine or a control engine driven on the processor 170 of the song accompaniment apparatus 100 and by the driving of a program on the processor 170.

이하에서는 예시적인 음성 검색 엔진과 음성 명령 인식 엔진을 이용하여 설명하나, 다양한 변형 구성 예가 존재한다. Hereinafter, although an exemplary voice search engine and a voice command recognition engine are described, various modification configurations exist.

먼저 단계 S100에서, 노래 반주 장치(100)가 전원 인가에 따라 부팅 후 자동으로, 혹은 제어 엔진에 의한 제어에 따라 본 제어 흐름이 시작한다. 만일 제어 엔진에 의해서 시작하는 경우, 제어 엔진이 인식하는 노래 반주 장치(100)의 상태에 따라 제어 엔진이 음성 엔진을 구동하도록 함으로써 시작하거나 혹은 이미 구동되어 있는 음성 엔진에 시그널을 전달함으로써 시작할 수 있다. First, in step S100, the present control flow starts automatically after booting in response to power-on, or according to control by the control engine. If starting by the control engine, the control engine may start by driving the voice engine according to the state of the song accompaniment apparatus 100 recognized by the control engine or by transmitting a signal to a voice engine that is already driven. .

이후 단계 S101에서, 음성 엔진에 의한 음성 인식을 통해 노래 선곡을 위한 음성 신호를 마이크 수신 단자(150)를 통해 수신한다. 이와 같이 수신된 음성 신호는, 유선을 통해 음성 신호를 수신한 경우, 아날로그-디지털 변환기(160)를 통해 디지털의 음성 데이터로 변환되고 변환된 디지털의 일련의 음성 데이터는 프로세서(170)로, 구체적으로는 음성 명령 인식 엔진으로, 전달된다. Thereafter, in step S101, a voice signal for song selection is received through the microphone receiving terminal 150 through voice recognition by the voice engine. When the received voice signal is received through the wire, the voice signal is converted into digital voice data through the analog-to-digital converter 160, and the converted series of digital voice data is transferred to the processor 170. Is transmitted to the voice command recognition engine.

그리고 일련의 음성 데이터는 제어 명령인 지정된 단어와의 비교를 위해서 음성 데이터의 끝을 식별한다. 예를 들어 음성 엔진의 음성 명령 인식 엔진에 의해서 이러한 끝을 식별할 수 있고, 음성 데이터의 디지털의 크기 값을 이용하여 일정 임계치 이하의 디지털의 크기 값이 일정 기간(예를 들어 1초 등) 이상 지속하는 경우 그 끝을 식별할 수 있다. 이러한 끝의 식별은 주파수 도메인 상에서 이루어지거나 혹은 시간 도메인 상에서 이루어질 수 있다. The series of voice data then identifies the end of the voice data for comparison with a specified word that is a control command. For example, the voice command recognition engine of the speech engine may identify this end, and the digital magnitude value of the voice data may be equal to or less than a certain threshold by using the digital magnitude value of the voice data. If it persists, the end can be identified. This end identification can be made in the frequency domain or in the time domain.

이후 단계 S103에서, 아날로그-디지털 변환기(160)를 통해 수신되어 식별된 음성 데이터는 지정된 제어 명령과 대응하는 지를 비교한다. Thereafter, in step S103, it is compared whether the voice data received and identified through the analog-to-digital converter 160 corresponds to the designated control command.

예를 들어 이러한 지정된 제어 명령은 음성 검색 엔진을 구동하거나 시작하기 위한 지정된 단어일 수 있고, 이러한 단어는 "음성검색", "음성인식", "금영노래방" 등과 같은 단어일 수 있다. For example, such designated control command may be a designated word for starting or starting a voice search engine, and such word may be a word such as "speech search", "speech recognition", "gold karaoke", or the like.

이와 같은 지정된 단어와의 비교는 다른 단어들과의 비교할 필요가 없어 음성 검색의 진입을 위한 음성 인식률을 높일 수 있다. The comparison with the designated word does not need to compare with other words, thereby increasing the speech recognition rate for entering the voice search.

이와 같은 단계 S103은, 알려져 있는 음성 인식 기술을 이용하여 이루어질 수 있고 예를 들어 선형 예측에 의해 추출되는 특징 파라미터나 식별된 음성 데이터를 주파수 도메인으로 변환하여 변환된 주파수 값에서 특징 파라미터(또는 특징 벡터)를 추출함으로써 이루어지도록 구성될 수 있고, 선형 예측 또는 주파수 도메인 상의 특징 파라미터의 비교로 단어를 결정하거나 혹은 식별된 음성 데이터에 따라 선형 예측 또는 주파수 도메인 상에서의 특징 파라미터로부터 단어를 결정하여 이에 따라 결정된 단어와 지정된 제어 명령의 단어와의 비교로 이루어질 수도 있다. Such a step S103 may be performed using a known speech recognition technique, for example, a feature parameter (or a feature vector) at a frequency value converted by converting a feature parameter or identified speech data extracted by linear prediction into a frequency domain. And determine the words by linear prediction or comparison of feature parameters on the frequency domain or by determining words from the linear prediction or feature parameters on the frequency domain according to the identified speech data. It may also consist of a comparison of a word with a word of a specified control command.

단계 S103에서 만일 음성 데이터와 지정된 제어 명령이 동일 또는 유사한 단어(예를 들어 단어 결정을 통해)이거나 일정한 임계 범위 내의 유사도(예를 들어 주파수 비교나 선형 예측 등을 통해)를 가지는 경우에 단계 S105로 전이하고 그렇지 않은 경우에는 단계 S101로 전이한다. In step S103, if the voice data and the designated control command have the same or similar words (for example, through word determination) or have similarities (for example, through frequency comparison or linear prediction, etc.) within a predetermined threshold range, the process goes to step S105. Otherwise, the process proceeds to step S101.

이후 단계 S105에서, 음성 검색 엔진이 구동되어, 곡 검색을 위한 음성 신호를 마이크 수신 단자(150)를 통해 수신하고 수신된 음성 신호를 디지털의 음성 데이터로 변환하여 음성 검색 엔진으로 전달된다. 그리고 음성 검색 엔진은 이와 같이 수신된 일련의 음성 데이터로부터 단어나 문장을 구별하기 위한 끝을 식별한다. Subsequently, in step S105, the voice search engine is driven to receive a voice signal for song search through the microphone receiving terminal 150, convert the received voice signal into digital voice data, and deliver the voice signal to the voice search engine. And the voice search engine identifies the end for distinguishing a word or sentence from the received series of voice data.

이후 단계 S107에서, 음성 검색 엔진은 하드 디스크(140)에 저장되어 있는 곡 검색 DB에서 식별된 음성 데이터를 이용하여 대응하는 곡을 검색한다. Subsequently, in step S107, the voice search engine searches for a corresponding song using the voice data identified in the song search DB stored in the hard disk 140.

이와 같이 검색된 곡은 하나 이상의 곡들을 포함하고 일반적으로는 복수의 곡들을 포함할 수 있다. 이러한 검색에 따라 검색된 곡들 각각에 대한 다수의 파라미터(곡명, 곡번호, 가수, 작곡가, 가사, 발매일 등)를 곡 검색 DB로부터 음성 검색 엔진이 획득할 수 있다. The retrieved song may include one or more songs and generally may include a plurality of songs. According to such a search, the voice search engine may obtain a plurality of parameters (song name, song number, singer, composer, lyrics, release date, etc.) for each of the retrieved songs from the song search DB.

단계 S103에서와 마찬가지로, 단계 S107에서는 식별된 음성 데이터의 특징 파라미터와 곡 검색 DB의 검색을 위한 파라미터의 특징 파라미터(예를 들어 LPCC(Linear Prediction Cepstral Coefficient) 나 MFCC(Mel Frequency Cepstral Coefficient) 등)의 비교로 이루어질 수 있다. As in step S103, in step S107 the feature parameters of the identified voice data and the feature parameters of the parameters for retrieving the song search DB (e.g., Linear Prediction Cepstral Coefficient (LPCC) or Mel Frequency Cepstral Coefficient (MFCC), etc.) It can be made by comparison.

이러한 비교를 통해서 검색된 곡들은 곡 검색을 위한 음성 데이터와 일정한 범위 내의 유사도가 있다. Songs searched through this comparison have similarities within a certain range with voice data for song search.

이후 단계 S109에서, 음성 검색 엔진은 곡 검색이 실패하였는지 혹은 음성 데이터로부터의 단어 인식 등이 실패하였는지를 결정하고 만일 실패한 경우 단계 S105로 전이하고 성공한 경우에 단계 S111로 전이한다. Then, in step S109, the voice search engine determines whether the music search has failed or the word recognition from the voice data, etc. has failed, and if it fails, the process shifts to step S105 and, if successful, to step S111.

이후 단계 S111에서, 음성 검색 엔진은 검색된 곡들을 순서화한다. Then, in step S111, the voice search engine orders the retrieved songs.

예를 들어 음성 검색 엔진은, 검색된 곡들을 저장되어 있는 인기도 리스트와 비교하고 인기도 리스트에서의 순위에 따라 높은 순위의 곡이 검색된 곡들에 존재하는 경우 이 높은 순위의 곡을 검색된 곡들 상에서 높은 순위를 가지도록 순서화할 수 있다. 이러한 인기도 리스트는 전체 곡들에 대한 인기 순위를 나타내고(거나) 장르별 인기 순위를 나타내는 리스트이다. 그리고 장르별 인기 순위보다는 전체 곡에 대한 인기 순위에 더 우선 순위를 두도록 구성할 수도 있다. For example, the voice search engine compares the searched songs with a stored popularity list and if a high ranking song exists in the searched songs according to the ranking in the popularity list, the high ranking song has a higher rank on the searched songs. Can be ordered. This popularity list is a list representing the popularity ranking for all songs and / or the popularity ranking by genre. And it can be configured to give priority to the popularity ranking of the entire song rather than the popularity ranking by genre.

또는 음성 검색 엔진은, 식별된 음성 데이터와의 유사도에 따라 식별된 음성 데이터와 가장 유사한 곡이 첫 번째로 리스팅되도록 표시 장치(200)에 디스플레이하도록 하고 다음으로 유사한 곡은 두 번째로 리스팅되어 디스플레이될 수 있도록 순서화 할 수도 있다.Alternatively, the voice search engine may display on the display device 200 such that the songs most similar to the identified voice data are listed first according to the similarity with the identified voice data, and then the similar songs are listed and displayed second. You can also order them to be.

여기서의 유사도 판단을 위해서, 식별된 음성 데이터의 단어와 검색된 곡들의 지정된 파라미터의 단어 사이에서 음성 데이터의 첫번째 문자에 가장 높은 가중치를 두도록 하고 그 다음 문자에 그 다음의 가중치를 두고 각각의 대응하는 문자 사이의 유사도를 이 가중치에 따라 합산함으로써 유사도를 판단할 수 있다. For the similarity determination here, the highest weight is placed on the first character of the speech data and the next weight is placed on the next character between the word of the identified speech data and the word of the specified parameter of the retrieved songs. Similarity can be determined by summing the similarity between them according to this weight.

또는 음성 검색 엔진은, 식별된 음성 데이터와 인기도 리스트에 의한 인기도 순위에 따른 조합으로 검색된 곡들을 순서화할 수 있다. 예를 들어 인기도 리스트에 존재하는 곡들에 대해서는 인기도 순위에 따르는 순위를 제공하고 인기도 리스트에 존재하지 않는 곡들에 대해서는 식별된 음성 데이터와의 유사도로 인기도 리스트에 존재하는 곡들에 대한 순위보다 낮은 순위에서 순서화할 수 있다. Alternatively, the voice search engine may order the searched songs in a combination according to the identified voice data and the popularity ranking by the popularity list. For example, songs that are in the popularity list are ranked according to popularity rankings, and songs that are not in the popularity list are ordered in a lower rank than songs that are in the popularity list with similarity to the identified voice data. can do.

이후 단계 S113에서, 음성 검색 엔진 또는 음성 검색 엔진으로부터 순서화된 검색된 곡들에 대한 파라미터들을 수신한 제어 엔진은, 순서화된 곡들에 대하여 표시 장치(200)로 디스플레이될 곡 아이템들을 생성한다. Subsequently, in step S113, the control engine that receives the parameters for the ordered searched songs from the voice search engine or the voice search engine generates the song items to be displayed on the display device 200 for the ordered songs.

이러한 곡 아이템들 각각은, 곡 검색 DB에 포함되는 하나 이상의 파라미터 값들을 포함하고, 예를 들어 곡 제목, 곡 번호, 가수 등과 같은 파라미터의 값들을 포함하여 표시 장치(200)에 표시되도록 한다. Each of the song items includes one or more parameter values included in the song search DB, and includes, for example, values of parameters such as a song title, a song number, a singer, and the like to be displayed on the display device 200.

나아가 단계 S113은, 각 곡 아이템들을 선택하기 위한 대응하는 제어 명령이 각 곡 아이템들에 포함되어 이후 표시 장치(200)에 디스플레이 되도록 곡 아이템들을 생성한다. 이러한 제어 명령은 음성 인식을 통해 대응하는 곡 아이템들을 직접 선택할 수 있도록 하는 예를 들어 단어로 구성된다. Further, in step S113, the song items are generated such that a corresponding control command for selecting each song item is included in each song item and then displayed on the display device 200. This control command consists of, for example, words that enable voice recognition to directly select the corresponding song items.

각각의 제어 명령, 예를 들어 각각의 단어,들은 표시 장치(200)에 표시되는 이미지상에 하나의 곡 아이템을 무작위로(randomly) 선택하기 위해서 이용될 수 있고 하나의 이미지로 표시 장치(200)에 표시될 때, 서로 구별될 수 있도록 한다. Each control command, for example each word, may be used to randomly select one song item on an image displayed on the display device 200 and display the image device 200 as one image. When marked with, they can be distinguished from each other.

이에 따라 이러한 제어 명령 각각은 하나의 곡에 나아가 곡 제목이나 곡 번호에 매칭되고 순서화된 곡 아이템들에 대해서 곡 순서에 따라 일대일로 매핑되어 표시될 수 있고, 이에 따라 노래 반주 장치(100)의 사용자는 매칭된 제어 명령을 마이크(300)를 통해 발성함으로써 해당 대응하는 곡 아이템을 무작위로 선택할 수 있도록 한다. Accordingly, each of these control commands may be displayed in one-to-one mapping to the song titles or song numbers that are matched to the song title or song number and ordered according to the song order. Accordingly, the user of the song accompaniment device 100 The utterance of the matched control command through the microphone 300 allows the corresponding song item to be selected at random.

이후 단계 S115에서, 음성 검색 엔진 또는 제어 엔진은 생성된 곡 아이템들을 표시 장치(200)에 디스플레이한다. 이러한 곡 아이템들은 음성 인식을 통해서 검색된 곡들을 사용자가 식별할 수 있도록 하고 나아가 검색된 곡에 대응하는 제어 명령을 직관적으로 알 수 있도록 한다. In operation S115, the voice search engine or the control engine displays the generated song items on the display device 200. These song items allow the user to identify the retrieved songs through voice recognition and further intuitively know the control commands corresponding to the retrieved songs.

여기서 디스플레이되는 곡 아이템들은 단계 S113에 의해서 생성된 전체 곡 아이템 중 일부의 곡 아이템일 수 있고, 예를 들어 표시 장치(200)의 한 화면상에 8개의 곡 아이템이 디스플레이될 수 있고 16개의 곡 아이템들이 생성된 경우에, 16개의 곡 아이템 중 가상 유사도가 높은 상위 8개의 곡 아이템들이 디스플레이될 수 있다.
The song items displayed here may be song items of some of the entire song items created by step S113. For example, 8 song items may be displayed on one screen of the display device 200 and 16 song items. In this case, the top eight song items having a high virtual similarity among the sixteen song items may be displayed.

도 5는, 도 4의 단계 S115(또는 단계 S121) 에서 표시 장치(200)를 통해 디스플레이된 결과를 도시한 도면이다. FIG. 5 is a diagram illustrating a result displayed through the display device 200 in step S115 (or step S121) of FIG. 4.

도 5에 알 수 있는 바와 같이, 음성 인식(단계 S107)을 통해 결정된 단어(도 5에서는 "개나리")와 이로부터 유사도를 따라 검색된 곡들에 대한 곡 아이템들이 도시되어 있다. As can be seen in FIG. 5, there are shown song items for words determined through speech recognition (step S107) (" forsythia " in FIG. 5) and songs retrieved according to similarity therefrom.

예시적으로 하나의 화면에 8개의 곡 아이템들이 디스플레이되어 있고, 각각의 곡 아이템들은 대응하는 제어 명령("연두", "빨강", "분홍", "자주", "보라", "파랑", "검정"과 같은 단어)을 포함하고 있어 사용자는 시각적으로 그리고 직관적으로 대응하는 제어 명령을 알 수 있다. For example, eight song items are displayed on one screen, and each song item has a corresponding control command ("green", "red", "pink", "frequent", "purple", "blue", Words such as "black") so that the user can know the corresponding control commands visually and intuitively.

나아가 곡 아이템을 선택하기 위한 제어 명령 외에 다른 기타 제어 명령들이 또한 표시될 수 있다. 이러한 곡 아이템에 대응하는 제어 명령과 기타 제어 명령은 이 디스플레이된 검색 결과 화면상에서 이용될 수 있는 제어 명령이다. Furthermore, in addition to control commands for selecting a song item, other control commands may also be displayed. Control commands and other control commands corresponding to these song items are control commands that can be used on this displayed search result screen.

이러한 곡 아이템에 대응하는 제어 명령은, 예를 들어 순서를 나타내는 "1", "2", "3", "4" 등이나 "하나", "둘" 등과 같은 숫자일 수 있다. 그리고 이러한 제어 명령은 바람직하게는 각각의 곡 아이템을 선택하기 위한 제어 명령이 음성 인식으로 용이하게 구별될 수 있도록 구성될 필요가 있다. The control command corresponding to the song item may be, for example, a number such as "1", "2", "3", "4", or the like, "one", "two", etc., indicating the order. And such a control command preferably needs to be configured so that the control command for selecting each song item can be easily distinguished by speech recognition.

이에 따라 바람직하게는 제어 명령의 단어는 서로 변별력이 큰 음소의 조합으로 되도록 구성하는 것이 바람직하며, 한글에서는 자음보다는 모음의 음의 길이가 길기에 제어 명령의 단어 간에 모음을 상이하도록 구성하는 것이 음성 인식율을 높이고 인식할 필요가 없는 단어에 대한 거절율을 높일 수 있도록 한다. Accordingly, it is preferable to configure the words of the control command to be a combination of phonemes with a large discriminating power. In Korean, the length of the vowel sound is longer than the consonant. Increase recognition rates and increase rejections for words that don't need to be recognized.

예를 들어 도 5에 개시되어 있는 바와 같이, 동일한 크기의 단어의 모음의 조합("ㅕㅜ", "ㅜㅘ", "ㅏㅏ", "ㅜㅗ", "ㅏㅜ", "ㅗㅏ" 등)이 서로 상이하도록 구성하고, 나아가 동일한 모음 조합에 대해서도 첫 문자의 자음이나 첫 문자의 받침 자음을 달리하거나 첫 문자의 자음이나 받침 자음의 조합을 달리함으로써 음성 인식에 유리하도록 구성될 필요가 있다. For example, as shown in FIG. 5, combinations of vowels of the same size ("ㅕ TT", "TT", "ㅏㅏ", "TT", "ㅏ TT", "ㅗㅏ", etc.) are different from each other. In addition, the same vowel combination needs to be configured to be advantageous in speech recognition by different consonants of the first letter or the consonant of the first letter, or different combinations of the consonant or the consonant of the first letter.

이와 같이 두 문자 이상으로, 모음에 차별화를 두도록 하고, 나아가 동일한 모음 조합에 대해서도 상이한 주파수 특성이 있는 자음의 조합으로 제어 명령을 구성하는 이유는, 일반적으로 한글의 음성 인식에서 있어서 모음 부분이 한글 소리의 인식에 있어서 주요 부분이고, 이에 따라 모음 부분을 둘 이상 구성함으로써 모음 부분을 용이하게 식별할 수 있도록 하고 나아가 동일한 모음 조합에 대해서는 상이한 자음의 조합을 제공함으로써 음성 인식률을 높일 수 있도록 한다. In this way, the vowels are differentiated in two or more letters, and the control command is composed of consonants having different frequency characteristics even for the same vowel combination. It is a main part in the recognition of the vowel, and thus, by constituting two or more vowel parts, the vowel part can be easily identified, and further, the same vowel combination can provide a combination of different consonants to increase the speech recognition rate.

이러한 단어 구성은, 서로 변별력이 큰 음소의 조합으로 구성되도록 하고 이에 따라 단어들의 인식률이 높아질 수 있게 한다. 또한 특정한 표시 상태에서 인식할 수 있는 제어 명령(또는 단어)을 한정함으로써 음성 인식률을 높일 수 있다. This word structure allows the phoneme to be composed of a phoneme having a large discrimination ability with each other, thereby increasing the recognition rate of the words. In addition, the speech recognition rate can be increased by limiting control commands (or words) that can be recognized in a specific display state.

그리고 노래 반주 장치(100)에서 구동되는 음성 명령 인식 엔진(또는 음성 검색 엔진)은, 예를 들어 HMM(Hidden Markov Model)을 이용한 음소 조합형 고립어 인식 엔진일 수 있고, 이러한 고립어 인식 엔진은 지정된 대상 단어 외에는 인식을 하지 않도록 구성할 수 있다. The voice command recognition engine (or voice search engine) driven by the song accompaniment apparatus 100 may be, for example, a phoneme combination type isolated word recognition engine using a Hidden Markov Model (HMM), and the isolated word recognition engine may be a designated target word. Other than that, it can be configured not to recognize.

이와 같은 HMM을 이용한 음소 조합형 고립어 인식 엔진은 식별된 음성 데이터의 음소들의 특징 파라미터나 특징 벡터(예를 들어 LPCC, MFCC나 그 외 기타 변형예)를 추출하고 이를 지정된 제어 명령의 단어의 음소들의 특징 파라미터나 특징 벡터를 비교하여 확률적으로 사용자에 의해서 발음된 단어와 현재 상태에서 이용될 수 있는 하나 이상의 제어 명령의 단어 사이의 유사도를 결정할 수 있고 이에 따라 대응하는 제어 명령의 단어를 결정하거나 거절할 수 있다. The phoneme combination isolated word recognition engine using the HMM extracts feature parameters or feature vectors (eg, LPCC, MFCC, or other variations) of the phonemes of the identified speech data, and extracts the feature parameters of the phonemes of the words of the designated control command. By comparing the parameters or feature vectors, one can determine the similarity between the words pronounced by the user and the words of one or more control commands that can be used in the current state, thereby determining or rejecting the words of the corresponding control command. Can be.

또한 단계 S115에서 디스플레이된 검색 결과에서 하나의 지정된 곡 아이템(첫번째 디스플레이에서는 첫번째의 곡 아이템)은 하이라이트(도 5에서는 분홍색)되어 있을 수 있고, 이와 같이 하이라이트된 곡 아이템을 "시작"과 같은 기타 제어 명령을 음성으로 입력함으로써 곡의 재생을 시작할 수 있다. Also, in the search result displayed in step S115, one designated song item (the first song item in the first display) may be highlighted (pink in FIG. 5), and other control such as "start" the highlighted song item in this way. By inputting a command by voice, playback of a song can be started.

이후 단계 S117에서, 마이크 수신 단자(150)를 통해 음성 신호를 수신한다. 그리고 수신된 음성 신호는 이후 아날로그-디지털 변환기(160)를 통해 디지털의 음성 데이터로 변환된다. 이와 같이 수신되는 음성 데이터는 단계 S115에서 표시된 곡 아이템들 중에서 하나를 선택하기 위한 제어 명령일 수 있다. 그리고 이러한 음성 데이터는 단어 인식을 위해서 음성 데이터의 끝을 식별되도록 분리될 수 있다. Thereafter, in step S117, the voice signal is received through the microphone receiving terminal 150. The received voice signal is then converted into digital voice data through the analog-to-digital converter 160. The received voice data may be a control command for selecting one of the song items displayed in step S115. The voice data may be separated to identify the end of the voice data for word recognition.

이후 단계 S119에서, 음성 명령 인식 엔진은 수신된 음성 신호의 음성 데이터를 각각의 곡 아이템에 대응하는 복수의 제어 명령과 기타 제어 명령들과 비교하고 만일 곡 아이템에 대응하는 제어 명령으로 결정된 경우에는 단계 S121로 전이하고, 그렇지 않은 경우에는 단계 S123으로 전이한다. Thereafter, in step S119, the voice command recognition engine compares voice data of the received voice signal with a plurality of control commands and other control commands corresponding to each song item, and if determined as a control command corresponding to the song item, Transition to S121, otherwise, shift to step S123.

단계 S119에서의 음성 인식 방식은, 단계 S103이나 단계 S107에서와 유사하게 이루어질 수 있다. The speech recognition method in step S119 can be made similarly to step S103 or step S107.

이후 단계 S121에서 음성 명령 인식 엔진은 결정된 제어 명령에 대응하는 곡 아이템을 선택하도록 하고, 구체적으로는 앞서 선택된 곡 아이템(도 5의 "연두")의 하이라이트를 해제하고 선택된 곡 아이템을 지정된 색상으로 하이라이트한다. Subsequently, in step S121, the voice command recognition engine selects a song item corresponding to the determined control command, specifically, unhighlights the previously selected song item (“green” in FIG. 5) and highlights the selected song item with a specified color. do.

이와 같이 디스플레이된 곡 아이템에 대응하는 제어 명령이 대응하는 곡 번호나 곡 제목과 같은 라인에 디스플레이되어 직관적으로 특정 곡 아이템을 선택할 수 있도록 하고 음성 인식을 통해 순차적인 선택(상하)이나 리모콘을 통한 선택 없이 향상된 음성 인식으로 다른 곡 아이템을 직접 선택할 수 있도록 한다. The control command corresponding to the displayed song item is displayed on the same line as the corresponding song number or song title, allowing intuitive selection of a specific song item, and sequential selection (up and down) through voice recognition or selection using the remote controller. Improved speech recognition allows for direct selection of other song items.

그리고 단계 S123에서 음성 데이터로부터 인식된 제어 명령이 기타 제어 명령인지를 결정하여 만일 기타 제어 명령인 경우에 해당 기타 제어 명령에 대응하는 기능을 음성 명령 인식 엔진에 의해서 혹은 제어 엔진에 요청하여 수행할 수 있다. 이러한 기타 제어 명령의 수행 후에는 그 제어 명령의 타입에 따라 종료(S200)되거나 다른 단계(예를 들어 S105나 S117 등)로 전이할 수 있다. In step S123, it is determined whether the control command recognized from the voice data is the other control command, and in the case of the other control command, a function corresponding to the other control command can be performed by the voice command recognition engine or by requesting the control engine. have. After the execution of these other control commands, it may be terminated (S200) or transitioned to another step (for example, S105 or S117) depending on the type of the control command.

이와 같은 도 4의 제어 흐름에 의해서, 음성 인식률을 높일 수 있고 임의의 곡을 무작위로 선택할 수 있도록 하여 보다더 기능적으로 향상되고 신뢰할 수 있는 노래 반주 장치(100)를 제공할 수 있다.
By such a control flow of FIG. 4, it is possible to increase the voice recognition rate and to randomly select a random song, thereby providing a more functionally improved and reliable song accompaniment apparatus 100.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것이 아니다. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.

100 : 노래 반주 장치
110 : 입력 인터페이스 120 : 출력 인터페이스
130 : 메모리 140 : 하드 디스크
150 : 마이크 수신 단자 160 : 아날로그-디지털 변환기
170 : 프로세서 180 : 시스템 버스/제어 버스
200 : 표시 장치
300 : 마이크100: song accompaniment device
110: input interface 120: output interface
130: memory 140: hard disk
150: microphone receiving terminal 160: analog-to-digital converter
170: processor 180: system bus / control bus
200: display device
300: microphone

Claims

Song selection method using speech recognition,
(a) displaying a plurality of song items corresponding to each of the found plurality of songs;
(b) receiving a voice signal to select one of the plurality of song items, and comparing voice data of the received voice signal with a plurality of designated control commands; And
(c) selecting a song item corresponding to a control command determined according to a result of the comparison and being one of the plurality of song items;
Each of the plurality of control commands is displayed in step (a) as two or more Hangul words having the same number of characters in the song item of the corresponding song,
The first letter and the second letter of the Hangul word of each of the plurality of control commands are different from the second letter corresponding to the corresponding first letter of the Hangul word of the other control command, and the first letters of the first letter of the Hangul word. The combination of the initial and neutral is configured to be different from the combination of the first and neutral of the second letter,
How to choose a song.

The method of claim 1,
The step (a) includes: highlighting a specified song item among the plurality of song items;
Step (c) may include highlighting a selected song item; And releasing highlights of the designated song item.
One or more control commands of the plurality of control commands may be directly selected through voice recognition from another selected song item to a different song item.
How to choose a song.

The method of claim 2,
Prior to said step (a), receiving a voice signal and comparing the voice data of the received voice signal with a control command different from said plurality of control commands; Receiving a voice signal for song search when the voice data corresponds to the different control command according to the comparison result; And searching for a plurality of songs using the voice data of the voice signal received from the song search DB.
How to choose a song.

The method of claim 3,
After retrieving the plurality of songs, ordering the retrieved plurality of songs according to a stored popularity list; And generating the plurality of song items including a control command corresponding to a song title of each of the ordered plurality of songs.
How to choose a song.

5. The method of claim 4,
In the generating of the plurality of song items, each of the plurality of control commands is matched with a song title according to the order of the plurality of songs,
How to choose a song.

The method of claim 1,
The song selection method is performed in a song accompaniment device,
The Hangul words of each of the plurality of control commands are Hangul words representing different colors.
The background image of the Korean word displayed in the step (a) or the Korean word displayed and included in the song item in the step (a) is represented by a color corresponding to the Korean word.
How to choose a song.