JP2023524374A

JP2023524374A - Method for providing voice recognition music selection service and voice recognition music selection device

Info

Publication number: JP2023524374A
Application number: JP2022558058A
Authority: JP
Inventors: ヤン，テシク
Original assignee: ニナノ
Priority date: 2020-05-18
Filing date: 2021-04-16
Publication date: 2023-06-12
Also published as: WO2021235708A1; KR102362815B1; KR20220020878A; KR20210142446A

Abstract

A method for providing a voice recognition music selection service is disclosed. According to one aspect of the present invention, a step of providing a plurality of reference tokens generated by segmenting a reference utterance containing a song title and a singer name, segmenting on the basis of parts of speech, and stems and endings, a user a step of inputting a search utterance sentence uttered by a to search for a song in the form of text; a step of dividing the search utterance sentence into words, a part of speech, and a step of generating a plurality of search tokens by dividing the word stem and ending into a plurality of search tokens; comparing the search token with a plurality of reference tokens; calculating a similarity score of the search utterance sentence to the reference utterance sentence according to the comparison result; and providing song information to the user based on the similarity score. A method for providing a voice recognition song selection service is provided.
[Selection drawing] Fig. 2

Description

本発明は、音声認識選曲サービスの提供方法及び音声認識選曲装置に関する。 The present invention relates to a voice recognition music selection service providing method and a voice recognition music selection device.

一般的に、ユーザが歌伴奏器で歌おうとする曲を検索するため、検索用リモコンなどを用いて前記リモコンなどに備えられた数字又は文字ダイヤルなどを押して該当曲の曲名、歌手名又は曲番号などを入力する方式が用いられてきた。 In general, in order to search for a song that a user wants to sing on a song accompaniment instrument, a search remote controller is used to press a number or letter dial provided on the remote controller to input the song name, singer name, or song number of the corresponding song. and so on have been used.

しかし、このような方式は、曲を検索するために多くの手間がかかるだけでなく、検索に多くの時間が必要であるという問題点があった。 However, such a method has the problem that it takes a lot of time and effort to search for songs.

これによって、上述のようにダイヤルなどを押して曲名などをいちいち入力しなくても、希望する曲を容易に検索できる方法及び装置に対するニーズが増加している実情である。 Accordingly, there is an increasing need for a method and apparatus for easily searching for a desired song without pressing a dial and inputting song titles one by one.

韓国実用新案登録第２０－０２０２９１６号公報Korean Utility Model Registration No. 20-0202916

本発明は、曲検索をより容易かつ正確に行える音声認識選曲サービスの提供方法及び音声認識選曲装置を提供する。 The present invention provides a voice recognition music selection service providing method and a voice recognition music selection apparatus that enable easier and more accurate song retrieval.

本発明の一側面によると、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークン（ｔｏｋｅｎ）を提供されるステップ、ユーザが曲検索のために発話した検索発話文をテキストの形態で入力されるステップ、検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成するステップ、複数の検索トークンを複数の基準トークンと比較するステップ、比較結果によって基準発話文に対する検索発話文の類似度点数を算出するステップ、及び類似度点数を基準にユーザに曲情報を提供するステップを含む音声認識選曲サービスの提供方法が提供される。 According to one aspect of the present invention, a step of providing a plurality of reference tokens generated by segmenting a reference utterance containing a song title and a singer name, segmenting on the basis of parts of speech, and stems and endings, a user a step of inputting a search utterance sentence uttered by a to search for a song in the form of text; a step of dividing the search utterance sentence into words, a part of speech, and a step of generating a plurality of search tokens by dividing the word stem and ending into a plurality of search tokens; comparing the search token with a plurality of reference tokens; calculating a similarity score of the search utterance sentence to the reference utterance sentence according to the comparison result; and providing song information to the user based on the similarity score. A method for providing a cognitive music selection service is provided.

複数の基準トークンは、基準発話文を分かち書きを基準に一次分割し、一次分割された基準発話文を品詞を基準に二次分割し、二次分割された基準発話文のうちの動詞と形容詞を語幹と語尾を基準に三次分割することによって生成されてもよい。 The plurality of reference tokens are obtained by firstly dividing the reference utterance sentence based on the spaces, secondarily dividing the firstly divided reference utterance sentence based on the part of speech, and dividing the secondarily divided reference utterance sentence into verbs and adjectives. It may be generated by tertiary segmentation based on the stem and the ending.

複数の基準トークンを提供されるステップで、複数の基準トークンのそれぞれの第１字目、及び第１字目に一字ずつ順に追加して構成される拡張トークンをさらに提供されてもよい。 The step of providing a plurality of reference tokens may further provide a first character of each of the plurality of reference tokens and an extension token constructed by sequentially adding one character to the first character.

検索発話文をテキストの形態で入力されるステップは、ユーザが曲検索のために発話した検索発話文を音声で入力されるステップ、及び検索発話文をテキストの形態に変換するステップを含んでもよい。 The step of inputting the search utterance in the form of text may include the step of inputting the search utterance by voice, which is uttered by the user for the song search, and the step of converting the search utterance into the form of text. .

検索発話文をテキストの形態で入力されるステップは、検索発話文をテキストの形態に変換するステップの後に、検索発話文を曲名及び歌手名と比較して検索発話文の誤りを修正するステップをさらに含んでもよい。 The step of inputting the search utterance in the form of text includes, after the step of converting the search utterance into the form of text, comparing the search utterance with the song title and singer name to correct errors in the search utterance. It may contain further.

複数の検索トークンを生成するステップで、検索発話文を分かち書きを基準に一次分割し、一次分割された検索発話文を品詞を基準に二次分割し、二次分割された検索発話文のうちの動詞と形容詞を語幹と語尾を基準に三次分割することによって複数の検索トークンを生成してもよい。 In the step of generating a plurality of search tokens, the primary division of the search utterance sentence is performed on the basis of the spaces, the primary division of the search utterance sentence is secondary on the basis of the part of speech, and the second division of the search utterance sentence is performed. Multiple search tokens may be generated by tertiary segmentation of verbs and adjectives based on stems and endings.

類似度点数を算出するステップで、複数の検索トークンが複数の基準トークンにすべて含まれる場合に類似度点数を算出してもよい。 In the step of calculating the similarity score, the similarity score may be calculated when the plurality of search tokens are all included in the plurality of reference tokens.

類似度点数を算出するステップで、基準トークンにマッチングされる検索トークンのそれぞれに対して単位点数を算出し、単位点数を合算して類似度点数を算出してもよい。 In the step of calculating the similarity score, a unit score may be calculated for each search token that matches the reference token, and the unit scores may be summed to calculate the similarity score.

単位点数は、基準トークンにマッチングされる検索トークンの文字数が多いほど高く算出されてもよい。 A higher unit score may be calculated as the number of characters of the search token that matches the reference token increases.

本発明の一側面によると、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークンを提供される基準トークン提供部、ユーザが曲検索のために発話した検索発話文をテキストの形態で入力される入力部、検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成する検索トークン生成部、複数の検索トークンを複数の基準トークンと比較するトークン比較部、比較結果によって基準発話文に対する検索発話文の類似度点数を算出する点数算出部、及び類似度点数を基準にユーザに曲情報を提供する結果提供部を含む、音声認識選曲装置が提供される。 According to one aspect of the present invention, a reference token providing unit provided with a plurality of reference tokens generated by segmenting a reference utterance sentence including a song title and a singer name on the basis of parts of speech, and stems and endings; An input part for inputting search utterance sentences uttered for song search in the form of text, a search token for generating a plurality of search tokens by dividing the search utterance sentences into parts, parts of speech, and word stems and endings. a generation unit, a token comparison unit that compares a plurality of search tokens with a plurality of reference tokens, a score calculation unit that calculates a similarity score of the search utterance sentence to the reference utterance sentence based on the comparison results, A voice recognition music selector is provided that includes an information providing results provider.

本発明によると、曲検索をより容易かつ正確に行うことが可能である。 According to the present invention, it is possible to perform song searches more easily and accurately.

図１は、本発明の一実施例による音声認識選曲サービス提供システムを示す図である。FIG. 1 is a diagram showing a voice recognition music selection service providing system according to an embodiment of the present invention. 図２は、本発明の一実施例による音声認識選曲サービスの提供方法を示す順序図である。FIG. 2 is a flow chart showing a method of providing a voice recognition song selection service according to an embodiment of the present invention. 図３は、本発明の一実施例による検索発話文をテキストの形態で入力されるステップを示す順序図である。FIG. 3 is a flowchart showing the steps of inputting a search utterance in the form of text according to an embodiment of the present invention. 図４は、本発明の一実施例による類似度点数を算出するステップにおける類似度点数算出有無の判断過程を示す図である。FIG. 4 is a diagram showing a process of determining whether or not to calculate a similarity score in the step of calculating a similarity score according to an embodiment of the present invention. 図５は、本発明の一実施例による類似度点数を算出するステップにおける類似度点数算出過程を示す順序図である。FIG. 5 is a flowchart showing a similarity score calculation process in the step of calculating a similarity score according to an embodiment of the present invention. 図６は、本発明の一実施例による音声認識選曲装置の構成を示す図である。FIG. 6 is a diagram showing the configuration of a voice recognition music selection apparatus according to an embodiment of the present invention.

本発明は多様な変換を加えることができ、種々の実施例を有することができるところ、特定実施例を図面に例示して詳細な説明に詳しく説明する。しかし、これは本発明を特定の実施形態に対して限定しようとするのではなく、本発明の思想及び技術範囲に含まれるすべての変換、均等物ないし代替物を含むものと理解されたい。本発明の説明において関連する公知技術についての具体的な説明が本発明の要旨を不明にする恐れがあると判断される場合はその詳細な説明を省略する。 Since the present invention can be modified in various ways and can have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the invention to any particular embodiment, but should be understood to include all transformations, equivalents or alternatives falling within the spirit and scope of the invention. In the description of the present invention, when it is determined that a detailed description of related known technology may obscure the gist of the present invention, the detailed description will be omitted.

第１、第２などの用語は多様な構成要素を説明するために使用することができるが、上記構成要素は上記用語によって限定されてはならない。上記用語は１つの構成要素を他の構成要素から区別する目的だけで使用される。 Although the terms first, second, etc. may be used to describe various components, the components should not be limited by the terms. The above terms are only used to distinguish one component from another.

本発明で使用した用語は単に特定の実施例を説明するために使用されらものであり、本発明を限定しようとする意図ではない。単数の表現は文脈上明白に異なって意味しない限り、複数の表現を含む。本出願で、「含む」又は「有する」などの用語は、明細書上に記載された特徴、数字、ステップ、動作、構成要素、部品又はこれらを組み合わせたものが存在することを指定しようとするものであり、１つ又はそれ以上の他の特徴や数字、ステップ、動作、構成要素、部品又はこれらを組み合わせたものなどの存在又は付加可能性をあらかじめ排除しないものと理解されたい。 The terminology used in the present invention is only used to describe particular embodiments and is not intended to be limiting of the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as "including" or "having" are intended to specify the presence of the features, numbers, steps, acts, components, parts, or combinations thereof set forth in the specification. and does not preclude the presence or addition of one or more other features, figures, steps, acts, components, parts or combinations thereof.

以下、本発明による音声認識選曲サービスの提供方法及び音声認識選曲装置（１００）の実施例を添付図面を参照して詳しく説明し、添付図面を参照して説明するにあたって、同一又は対応する構成要素は同一の図面番号を付与し、これについての重複する説明は省略する。 Hereinafter, embodiments of the voice recognition music selection service providing method and the voice recognition music selection device (100) according to the present invention will be described in detail with reference to the accompanying drawings. are assigned the same drawing numbers, and redundant description thereof will be omitted.

まず、本発明の一実施例による音声認識選曲サービス提供システムについて説明する。 First, a speech recognition music selection service providing system according to an embodiment of the present invention will be described.

本実施例によると、図１に示すように、ユーザ（１０）が曲検索のために発話した検索発話文を入力される入力装置、検索発話文をテキストの形態に変換する変換サーバ（ｓｅｒｖｅｒ）、テキストの形態に変換された検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成する生成サーバ、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークンを格納し、複数の基準トークンと複数の検索トークンを比較して、比較結果によって基準発話文に対する検索発話文の類似度点数を算出し、類似度点数を基準にユーザ（１０）に提供する曲情報を生成する分析サーバ、ユーザ（１０）に提供された曲情報のうちでユーザ（１０）が選択した曲を予約及び再生の少なくとも一方を含んで行う伴奏装置及び入力装置、生成サーバ、分析サーバ、及び伴奏装置間でデータを送受信する送受信装置を含む、音声認識選曲サービス提供システムが提供される。 According to this embodiment, as shown in FIG. 1, an input device for inputting search utterances uttered by a user (10) for song search, and a conversion server (server) for converting the search utterances into text form. , a generation server that divides the search utterance sentence converted into the form of text into words, the part of speech, and the word stem and ending to generate a plurality of search tokens; , and a plurality of reference tokens generated by dividing based on word stems and endings, comparing the plurality of reference tokens with the plurality of search tokens, and determining the similarity of the search utterance sentence to the reference utterance sentence according to the comparison result an analysis server that calculates a score and generates song information to be provided to the user (10) based on the similarity score; A voice recognition music selection service providing system is provided, including an accompaniment device and an input device that perform at least one of reproduction, a generation server, an analysis server, and a transmission/reception device that transmits and receives data between the accompaniment devices.

ユーザ（１０）が曲検索のために発話した検索発話文である音声データは、入力装置によって送受信装置に送られ、該当音声データは、送受信装置によって変換サーバに送信され、テキストの形態に変換されてテキストデータになる。 Voice data, which is a search utterance sentence uttered by the user (10) for song search, is sent by the input device to the transmission/reception device, and the corresponding voice data is sent by the transmission/reception device to the conversion server and converted into a text form. becomes text data.

また、該当テキストデータは再度送受信装置に受信されて生成サーバに送信され、このとき、分かち書き、品詞、及び語幹と語尾を基準に分割されて複数の検索トークンになり、前記複数の検索トークンは送受信装置によって分析サーバに送られる。 Also, the corresponding text data is received by the transmitting/receiving device again and transmitted to the generation server, and at this time, is divided into a plurality of search tokens based on the punctuation, the part of speech, and the stem and ending of the word, and the plurality of search tokens are transmitted and received. sent by the device to the analysis server.

これによって、前記複数の検索トークンは、分析サーバに格納された、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークンと比較され、分析サーバは、該当比較結果によって類似度点数を算出して、類似度点数を基準にユーザ（１０）に提供する曲情報を生成した後に送受信装置に送信する。 Thereby, the plurality of search tokens are divided into a plurality of reference tokens generated by dividing the reference utterance sentence including the song title and the singer name stored in the analysis server, and dividing the part of speech, and the stem and ending of the word. After the comparison, the analysis server calculates a similarity score according to the corresponding comparison result, generates song information to be provided to the user (10) based on the similarity score, and transmits the same to the transmitting/receiving device.

その後、送受信装置によってユーザ（１０）に提供される前記曲情報のうちでユーザ（１０）が希望する曲を選択すると、該当情報は伴奏装置に送信され、最終的に該当曲の伴奏が始まるようになる。 After that, when the user (10) selects a desired song from the song information provided to the user (10) by the transmitting/receiving device, the corresponding information is transmitted to the accompaniment device, and finally the accompaniment of the corresponding song is started. become.

このような本実施例によると、より容易かつ正確な曲検索が可能である。上述した装置及びサーバなどに関連するさらなる構成又は機能などに対する事項は、後述の音声認識選曲サービスの提供方法及び音声認識選曲装置（１００）の内容による。 According to this embodiment, it is possible to search songs more easily and accurately. Matters regarding further configurations or functions related to the above-described devices and servers depend on the method of providing the voice recognition music selection service and the content of the voice recognition music selection device 100, which will be described later.

次いで、本発明の一実施例による音声認識選曲サービスの提供方法について説明する。 Next, a method for providing a voice recognition music selection service according to an embodiment of the present invention will be described.

本実施例によると、図２に示すように、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークン（ｔｏｋｅｎ）を提供されるステップ、ユーザ（１０）が曲検索のために発話した検索発話文をテキストの形態で入力されるステップ（Ｓ１２０）、検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成するステップ（Ｓ１３０）、複数の検索トークンを複数の基準トークンと比較するステップ（Ｓ１４０）、比較結果によって基準発話文に対する検索発話文の類似度点数を算出するステップ（Ｓ１５０）、及び類似度点数を基準にユーザ（１０）に曲情報を提供するステップ（Ｓ１６０）を含む、音声認識選曲サービスの提供方法が提供される。 According to the present embodiment, as shown in FIG. 2, a plurality of reference tokens are provided by dividing a reference utterance sentence including a song title and a singer name, and dividing it based on parts of speech, stems, and endings. a step of inputting a search utterance sentence uttered by the user (10) for song search in the form of text (S120); dividing the search utterance sentence into parts of speech, and segmenting it based on the part of speech and the stem and ending of a word; a step of generating a plurality of search tokens (S130), a step of comparing the plurality of search tokens with a plurality of reference tokens (S140), and a step of calculating a similarity score of the search utterance sentence to the reference utterance sentence according to the comparison results (S150 ), and providing song information to the user (10) based on the similarity score (S160).

このような本実施例によると、検索しようとする曲に対する内容を発話、すなわち言うことだけでも検索が行われるので、ユーザ（１０）はより容易に曲を検索でき、ユーザ（１０）によって発話された検索発話文を分割することによって生成された複数の検索トークンを、曲名と歌手名を含む基準発話文を分割することによって生成された複数の基準トークンと比較して検索結果を導き出すため、より正確な曲検索が可能となる。 According to this embodiment, the search is performed by uttering the contents of the song to be searched, that is, by just saying the song. Multiple search tokens generated by splitting the search utterances with the same name are compared with multiple reference tokens generated by splitting the reference utterances containing song titles and singer names to derive search results. Accurate song search is possible.

以下、図２ないし図６を参照して本実施例による音声認識選曲サービスの提供方法の各ステップについて説明する。 Hereinafter, each step of the voice recognition music selection service providing method according to the present embodiment will be described with reference to FIGS. 2 to 6. FIG.

複数の基準トークンを提供されるステップ（Ｓ１１０）は、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークンを提供されてもよい。 The step of providing a plurality of reference tokens (S110) includes providing a plurality of reference tokens generated by segmenting a reference utterance including the song title and singer name, and dividing the part of speech, and the stem and ending of the sentence. good too.

これによって、曲名と歌手名などの曲情報に関する内容をすべて有する複数の基準トークンというデータが形成され、該当データが集積されることで、ユーザの検索要請内容との比較のための１つのデータフィールド（ｄａｔａｆｉｅｌｄ）を形成することができる。 As a result, data called a plurality of reference tokens having all contents related to song information such as song titles and singer names are formed, and the corresponding data are accumulated to form one data field for comparison with the user's search request contents. (data field) can be formed.

このような方式は、曲名、歌手名などの曲情報に関する内容それぞれに対する別途の前記データフィールドを設け、該当データフィールドそれぞれで前記検索要請内容との個別的比較を行った後、これを合算して結果を導き出す方式に比べ、前記検索要請内容との比較が１つのデータフィールド内で一度に行われるので、前記個別的比較によって発生し得る誤りを減らすことができ、これによって、該当誤りの発生による誤差の発生を減少させることで、より正確な比較及び結果の導出が可能となる。 In this method, separate data fields are provided for each content related to song information such as song titles, singer names, etc., and each data field is individually compared with the search request content, and then added up. Since the comparison with the content of the search request is performed in one data field at a time compared to the method of deriving the result, it is possible to reduce errors that may occur due to the individual comparison. Reducing the occurrence of errors allows for more accurate comparisons and derivation of results.

ここで「トークン」は、発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することで生成される１つの字又は複数の字の集まりであって、本発明で比較のためのデータフィールドを構成する一単位のデータを意味するものと理解することができる。 Here, a "token" is a single character or a group of a plurality of characters generated by dividing a spoken sentence based on the part of speech, word stem and ending, and is a data field for comparison in the present invention. can be understood to mean a unit of data that constitutes the

また「発話文」は、文言そのまま、「音を出して言った言葉を文字で表記した文章など」を意味するものと理解することができる。 In addition, the ``spoken sentence'' can be understood to mean ``a sentence in which words spoken with sound are written in letters, etc.''.

例えば、ユーザ（１０）が歌手「イウンミ」の「恋人います」という曲を検索しようとするとき、ユーザ（１０）が言う内容を文字で表記した「イウンミの恋人います探してくれ」（以下で例として本発明の各ステップの内容に対して例示する）という文章を発話文と言える。 For example, when the user (10) tries to search for the song "I have a lover" by the singer "Lee Eun Mi", the contents of what the user (10) says are written in characters "Lee Eun Mi has a lover." ) can be said to be an uttered sentence.

複数の基準トークンは、基準発話文を分かち書きを基準に一次分割し、一次分割された基準発話文を品詞を基準に二次分割し、二次分割された基準発話文のうちの動詞と形容詞を語幹と語尾を基準に三次分割することによって生成されてもよい。より具体的に語幹と語尾を基準に三次分割することは文字レベルで行われてもよい。 The plurality of reference tokens are obtained by firstly dividing the reference utterance sentence based on the spaces, secondarily dividing the firstly divided reference utterance sentence based on the part of speech, and dividing the secondarily divided reference utterance sentence into verbs and adjectives. It may be generated by tertiary segmentation based on the stem and the ending. More specifically, tertiary segmentation based on word stems and word endings may be performed at the character level.

これによって、基準発話文を構成する複数の単語を意味などを基準に分割することによって複数の基準トークンが設けられてもよく、このとき単語の意味などを基準とするため、後述の複数の検索トークンと比較して、曲検索時にさらに実効的な結果を導き出すことができる。 As a result, a plurality of reference tokens may be provided by dividing a plurality of words constituting the reference utterance sentence based on the meaning and the like. Compared to tokens, more effective results can be derived when searching for songs.

例えば、「イウンミの恋人います探してくれ」は、分かち書きを基準に「イウンミの」、「恋人」、「います」及び「探してくれ」に一次分割され、これはまた品詞を基準に「イウンミ」（名詞）、「の」（助詞）、「恋人」（名詞）、「います」（動詞）及び「探してくれ」（動詞）に二次分割され、このうちで動詞である「います」と「探してくれ」は、語幹と語尾を基準に「い」（語幹）、「ます」（語尾）及び「探」（語幹）、「して」（語尾）、「くれ」（語幹＋語尾）にそれぞれ三次分割される。これによって「イウンミ」、「の」、「恋人」、「い」、「ます」、「探」、「して」及び「くれ」という基準トークンが生成される。 For example, ``Look for Lee Eun-mi's lover'' is primarily divided into ``Lee Eun-mi's'', ``Lover'', ``I'm here'' and ``Look for me'' based on the phrasing, and this is also divided based on the part of speech `` Iunmi” (noun), “no” (particle), “lover” (noun), “imasu” (verb) and “look for me” (verb). "Masu" and "Search for" are based on the stem and the ending of the word, "i" (the stem), "masu" (the ending) and "search" (the stem), "shite" (the ending), "kure" (the stem) + ending). This generates the reference tokens ``iunmi'', ``no'', ``lover'', ``i'', ``masu'', ``search'', ``shite'' and ``kure''.

複数の基準トークンを提供されるステップ（Ｓ１１０）では、複数の基準トークンのそれぞれの第１字目、及び第１字目に一字ずつ順に追加して構成される拡張トークンをさらに提供されてもよい。 In the step of providing a plurality of reference tokens (S110), the first character of each of the plurality of reference tokens and an extension token configured by sequentially adding one character to the first character may be further provided. good.

これによって、複数の基準トークンに加え、複数の基準トークンのそれぞれの第１字目及び拡張トークンのデータがデータフィールドに加えられるので、より詳細かつ正確な後述する検索トークンとの比較及び曲検索結果の導出が行われることができる。 As a result, in addition to the plurality of reference tokens, the data of the first character and extension token of each of the plurality of reference tokens are added to the data field, so that more detailed and accurate comparison with search tokens and song search results to be described later can be obtained. A derivation of can be performed.

例えば、「イウンミ」という基準トークンに加えて「イ」という第１字目、及び「イウン」、「イウンミ」という拡張トークンをさらに提供されてもよい。この過程で、「イウンミ」という重複トークンは択日的に削除されてもよい。 For example, in addition to the reference token "iunmi", the first letter "i" and the extended tokens "iun" and "iunmi" may be provided. In this process, duplicate tokens 'Lee Eun Mi' may be selectively deleted.

検索発話文をテキストの形態で入力されるステップ（Ｓ１２０）は、図２及び図３に示すように、ユーザ（１０）が曲検索のために発話した検索発話文をテキストの形態で入力されてもよい。 The step of inputting the search utterance in the form of text (S120) is, as shown in FIGS. good too.

これによって、まずユーザ（１０）は言うことだけで曲の検索が可能であるので、より容易に曲を検索可能となる。 As a result, since the user (10) can first search for songs just by speaking, it becomes possible to search for songs more easily.

より具体的に、検索発話文をテキストの形態で入力されるステップ（Ｓ１２０）は、図３に示すように、ユーザ（１０）が曲検索のために発話した検索発話文を音声で入力されるステップ（Ｓ１２２）、及び検索発話文をテキストの形態に変換するステップ（Ｓ１２４）を含んでもよい。 More specifically, in the step of inputting the search utterance in the form of text (S120), as shown in FIG. A step (S122) and a step (S124) of converting the retrieved utterance into text form may also be included.

この場合、ユーザ（１０）は、曲名及び歌手名の少なくともいずれか一方を含む検索発話文を発話してもよく、好ましくは、より正確な曲検索のために曲名及び歌手名を含む検索発話文（例えば、「イウンミの恋人います探してくれ」）を発話して検索要請をしてもよい。 In this case, the user (10) may utter a search utterance containing at least one of the song title and the singer name, preferably the search utterance containing the song title and the singer name for more accurate song search. (For example, "Find Lee Eun-mi's girlfriend.") to make a search request.

また、検索発話文をテキストの形態に変換するステップ（Ｓ１２４）によって、後述する複数の検索トークンを生成するためのソースデータ（ｓｏｕｒｃｅｄａｔａ）が用意されるようになる。 Also, the step (S124) of converting the retrieval utterance into a text form prepares source data for generating a plurality of retrieval tokens, which will be described later.

より具体的に、検索発話文をテキストの形態に変換するステップ（Ｓ１２４）では、検索発話文のテキスト形態への変換正確度を高めるために、曲名及び歌手名を含む曲情報データを活用でき、より具体的に上記の曲情報データを変換の基準値と設定して変換を行うことで、前記変換正確度を向上させることができる。 More specifically, in the step of converting the retrieved utterance into text form (S124), song information data including song titles and singer names can be used to increase accuracy in converting the retrieved utterance into text form. More specifically, the conversion accuracy can be improved by setting the above song information data as a reference value for conversion and performing conversion.

すなわち、一般的に使用される単語を基準に発話文をテキストに変換する場合、意図した単語でない単語に変換される可能性があって変換正確度が低下する恐れがあるが、これとは異なり曲情報データを変換の基準とする場合、曲を探すためのユーザ（１０）の発話意図に対応する基準が設けられるところ、変換正確度がより向上することができる。 In other words, when converting a spoken sentence into text based on commonly used words, it may be converted into words other than the intended words, which may reduce the conversion accuracy. If the song information data is used as a reference for conversion, conversion accuracy can be further improved because a reference corresponding to the user's (10) utterance intention for searching for a song is provided.

またこの場合、人工知能（ＡＩ、ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）技術を活用でき、例えば、ディープラーニング（ｄｅｅｐｌｅａｒｎｉｎｇ）によって頻繁に露出される単語などに対する敏感度を向上させることで、変換正確度をより向上させることができる。 In this case, artificial intelligence (AI) technology can be used, for example, by improving the sensitivity to words that are frequently exposed by deep learning, conversion accuracy can be further improved. can be done.

検索発話文をテキストの形態で入力されるステップ（Ｓ１２０）は、図３に示すように、検索発話文をテキストの形態に変換するステップ（Ｓ１２４）の後、検索発話文を曲名及び歌手名と比較して検索発話文の誤りを修正するステップ（Ｓ１２６）をさらに含んでもよい。 As shown in FIG. 3, the step of inputting the search utterance in the form of text (S120) includes the step of converting the search utterance into text form (S124), followed by the step of converting the search utterance into text form (S124). A step of comparing and correcting errors in the retrieved utterance (S126) may be further included.

すなわち、検索発話文に一部誤りが存在する場合であっても、曲名及び歌手名と比較することで検索発話文の誤りを修正し、検索トークンの生成のためのテキストを用意することで、より正確な曲検索が可能となる。 In other words, even if there are some errors in the retrieval utterance sentence, the errors in the retrieval utterance sentence are corrected by comparing with the song title and singer name, and by preparing the text for generating the retrieval token, A more accurate song search becomes possible.

例えば、「イウンミの恋人です探してくれ」のように、検索発話文に「です」という誤りが存在する場合、「イウンミ」という歌手名及び「恋人います」という曲名と前記検索発話文を比較することにより、「イウンミの恋人います探してくれ」に上記誤りを修正してテキストを用意することができる。 For example, if there is an error "is" in the search utterance sentence, such as "Lee Eun Mi's lover, please look for me", the singer name "Lee Eun Mi" and the song title "I have a lover" are compared with the search utterance sentence. By doing so, it is possible to correct the above error and prepare the text for 'Find Lee Eun Mi's lover'.

より具体的に、検索発話文の誤りを修正するステップ（Ｓ１２６）では、検索発話文を曲名及び歌手名と比較して点数を算出でき、前記比較点数が既設定の基準値を上回る曲情報のうちの最高点の曲情報を基準に検索発話文を修正することができる。 More specifically, in the step of correcting errors in the retrieved utterance (S126), a score can be calculated by comparing the retrieved utterance with the title of the song and the name of the singer. The search utterance sentence can be corrected based on the song information with the highest score among them.

これによって、検索発話文の修正時にユーザ（１０）に意図に合うテキストに修正される可能性がより向上するようになる。 As a result, the possibility of correcting the search utterance sentence to match the intention of the user (10) is further improved.

またこの場合にも、上述したように人工知能技術、例えばディープラーニングを適用でき、これによって、頻繁に検索される単語などに対する学習によって該当単語が含まれた文章に対するより正確な修正が行われることができる。 Also in this case, as described above, artificial intelligence technology, such as deep learning, can be applied, and by learning words that are frequently searched, more accurate corrections can be made to the sentences containing the relevant words. can be done.

検索発話文を音声で入力されるステップ（Ｓ１２２）では、図６に示すように、検索発話文を音声認識可能なレシーバ（ｒｅｃｅｉｖｅｒ）（１２２）によって入力されてもよい。 In the step of inputting the search utterance by voice (S122), as shown in FIG. 6, the search utterance may be input by a receiver capable of speech recognition (122).

より具体的に、レシーバ（１２２）は、騒音状況における音声認識率を増加させるために、騒音フィルタリング及び音声増幅の少なくともいずれか一方が可能であってもよい。 More specifically, the receiver (122) may be capable of noise filtering and/or audio amplification to increase speech recognition rate in noisy situations.

本発明を主に実施できるカラオケなどの場所は、歌の音など多くの周辺騒音が混じる空間であるため、ユーザ（１０）の音声を正確に認識することが重要である。 Accurate recognition of the user's (10) voice is important, as venues such as karaoke, where the present invention may be primarily practiced, are spaces with a lot of ambient noise, such as the sound of singing.

したがって、レシーバ（１２２）が周辺騒音のフィルタリング及びユーザ（１０）音声の増幅の少なくともいずれか一方を行うことによって、ユーザ（１０）の音声認識率を高め、これによって曲検索の正確度をより向上させることができる。 Therefore, the receiver (122) filters ambient noise and/or amplifies the user's (10) voice to improve the voice recognition rate of the user (10), thereby improving the accuracy of song search. can be made

複数の検索トークンを生成するステップ（Ｓ１３０）は、検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成してもよい。これによって、複数の基準トークンと対応する複数の検索トークンを用意することで、その後の両トークンの比較時に実効性を向上させることができる。 The step of generating a plurality of search tokens (S130) may generate a plurality of search tokens by segmenting the search utterance sentence based on parts of speech, and stems and endings. Thus, by preparing multiple reference tokens and corresponding multiple search tokens, it is possible to improve effectiveness when comparing both tokens later.

複数の検索トークンを生成するステップ（Ｓ１３０）では、検索発話文を分かち書きを基準に一次分割し、一次分割された検索発話文を品詞を基準に二次分割し、二次分割された検索発話文のうちの動詞と形容詞を語幹と語尾を基準に三次分割することによって複数の検索トークンを生成してもよい。 In the step of generating a plurality of search tokens (S130), the search utterance sentence is primarily divided on the basis of spaces, the primarily divided search utterance sentence is secondarily divided on the basis of the part of speech, and the second-order divided search utterance sentence is A plurality of search tokens may be generated by tertiary division of verbs and adjectives based on stems and endings.

これによって、検索発話文を構成する複数の単語を意味などを基準に分割することによって複数の検索トークンが用意され、このとき、単語の意味などを基準に上述した基準トークンと比較して曲検索を行うことにおいて、さらに実効的な検索結果を導き出すことができる。 As a result, a plurality of search tokens are prepared by dividing a plurality of words constituting a search utterance sentence based on the meaning and the like. , more effective search results can be derived.

これに対する具体的な例示は、上述した複数の基準トークンの生成に対する例示を参照することができる。 A specific example of this can refer to the example of generating a plurality of reference tokens described above.

複数の基準トークンと比較するステップ（Ｓ１４０）は、複数の検索トークンを複数の基準トークンと比較してもよい。これによって、後述する類似度点数の算出及びそれによる結果の提供のための基準データ値が用意される。 The step of comparing (S140) with a plurality of reference tokens may compare a plurality of search tokens with a plurality of reference tokens. Accordingly, a reference data value is prepared for calculating a similarity score and providing a result thereof, which will be described later.

また、複数の検索トークンを複数の基準トークンと比較するステップ（Ｓ１４０）は、別途設けられた比較サーバを介して行われてもよい。 Also, the step of comparing multiple search tokens with multiple reference tokens (S140) may be performed via a separately provided comparison server.

類似度点数を算出するステップ（Ｓ１５０）は、上述した比較結果によって基準発話文に対する検索発話文の類似度点数を算出してもよい。 The step of calculating the similarity score (S150) may calculate the similarity score of the search utterance sentence with respect to the reference utterance sentence based on the comparison results described above.

このような類似度点数を通じて、後述するように、検索結果のうちで曲情報を選別してユーザ（１０）に提供することができ、これによって、ユーザ（１０）はさらに実効的な検索結果を提供されることができる。 Through this similarity score, it is possible to select song information from search results and provide it to the user (10), as will be described later, so that the user (10) can obtain more effective search results. can be provided.

より具体的に、類似度点数を算出するステップ（Ｓ１５０）では、図４に示すように、複数の検索トークンが複数の基準トークンにすべて含まれる場合に類似度点数を算出してもよい。 More specifically, in the step of calculating the similarity score (S150), as shown in FIG. 4, the similarity score may be calculated when a plurality of search tokens are all included in a plurality of reference tokens.

これによって、複数の検索トークンが複数の基準トークンに一部のみ含まれる場合でも、類似度点数を算出することによって検索結果に不要な曲情報が含まれることを防止することで、より正確な曲検索結果をユーザ（１０）に提供することができる。 As a result, even if a plurality of search tokens are only partially included in a plurality of reference tokens, the calculation of the similarity score prevents unnecessary song information from being included in the search results, thereby providing more accurate song information. Search results can be provided to the user (10).

より具体的に類似度点数を算出するステップ（Ｓ１５０）では、図４に示すように、複数の検索トークンの一部が複数の基準トークンに含まれる場合、ユーザ（１０）に再度検索発話文を発話することを要請してもよい。 More specifically, in the step of calculating the similarity score (S150), as shown in FIG. 4, when some of the plurality of search tokens are included in the plurality of reference tokens, the user (10) is asked to repeat the search utterance sentence. You may ask to speak.

この場合、ユーザ（１０）は、複数の基準トークン内に含まれていないと予想される字又は単語を除いて再度発話してもよく、これによって、複数の検索トークンが複数の基準トークンにすべて含まれるようになると、類似度点数を算出できるようになる。 In this case, the user (10) may respeak except for the letters or words that are not expected to be contained within the multiple reference tokens, so that the multiple search tokens all fit into the multiple reference tokens. Once included, a similarity score can be calculated.

例えば、ユーザ（１０）が「イウンミの恋人です探してくれ」と発話する場合、「です」（語尾）という検索トークンは、「イウンミの恋人います探してくれ」に対する複数の基準トークンに含まれていないため、ユーザ（１０）に検索発話文を再発化するように要請することができる。 For example, when the user (10) utters ``Look for Lee Eun-mi's lover,'' the search token ``desu'' (ending) is included in multiple reference tokens for ``Lee Eun-mi's lover, please look for me.'' Therefore, the user (10) can be requested to reinstate the search utterance.

また類似度点数を算出するステップ（Ｓ１５０）では、図４に示すように、ユーザ（１０）に再度検索発話文を発話することを要請すると共に、複数の基準トークンに含まれていない検索トークンに対する情報を提供してもよい。 Further, in the step of calculating the similarity score (S150), as shown in FIG. 4, the user (10) is requested to utter the search utterance sentence again, and the search tokens not included in the plurality of reference tokens are You may provide information.

これによって、検索発話文再発化要請及びユーザ（１０）の再発化のステップが意味なく繰り返されることを防止でき、曲検索の容易性及び正確性が共に向上することができる。 As a result, it is possible to prevent meaningless repetition of the search utterance sentence re-request and the user's (10) re-reproduction step, thereby improving the ease and accuracy of song retrieval.

例えば、ユーザ（１０）が「イウンミの恋人です探してくれ」と発話する場合、ユーザ（１０）に検索発話文を再発化するように要請すると共に、複数の基準トークンに含まれていない「です」（語尾）という検索トークンの情報を提示することができる。 For example, when the user (10) utters ``Lee Eun-mi's girlfriend, please find me'', the user (10) is requested to reproduce the search utterance sentence, and ``is not included in multiple reference tokens'' ” (suffix) search token information can be presented.

類似度点数を算出するステップ（Ｓ１５０）では、図５に示すように、基準トークンにマッチングされる検索トークンのそれぞれに対して単位点数を算出し、各単位点数を合算して類似度点数を算出してもよい。 In the similarity score calculation step (S150), as shown in FIG. 5, a unit score is calculated for each of the search tokens that match the reference token, and the unit scores are summed to calculate the similarity score. You may

すなわち、基準トークンにマッチングされる検索トークンごとに単位点数が付けられ、最終点数としての類似度点数を、上記のそれぞれの単位点数をすべて合算して算出してもよい。 That is, a unit score may be assigned to each search token that matches the reference token, and the similarity score as the final score may be calculated by adding up all of the above unit scores.

より具体的に、単位点数は、基準トークンにマッチングされる検索トークンの文字数が多いほど高く算出されてもよい。 More specifically, a higher unit score may be calculated as the number of characters of the search token that matches the reference token increases.

すなわち、多くの文字数の検索トークンが基準トークンとマッチングされるということは、ユーザ（１０）が希望する曲情報を構成するキーワード（ｋｅｙｗｏｒｄ）である可能性が高いことを意味するため、該当検索トークンに高い単位点数を付与することで、最終的に算出される類似度点数に不均性を付与することができる。 In other words, if a search token with a large number of characters is matched with a reference token, it means that there is a high possibility that the search token is a key word forming song information desired by the user (10). By assigning a high unit score to a token, it is possible to impart non-uniformity to the finally calculated similarity score.

例えば「い」という検索トークンよりも文字数の多い「恋人」という検索トークンは、曲題目全体でみたとき、曲題目を構成するキーワードと見られるため、「恋人」に、「い」に比べて高い単位点数を付与することで、「恋人」が類似度点数の算出にさらに多く寄与するように設定することができる。 For example, the search token "lover", which has more characters than the search token "i", is seen as a keyword that makes up the song title when viewed as a whole song title. By giving the unit score, it is possible to set the "lover" to contribute more to the calculation of the similarity score.

曲情報を提供するステップ（Ｓ１６０）は、類似度点数を基準にユーザ（１０）に曲情報を提供してもよい。 The step of providing song information (S160) may provide song information to the user (10) based on the similarity score.

これによって、無分別にすべての検索結果をユーザ（１０）に提供することに比べ、曲検索結果提供の正確性及び適正性がより向上することができる。 As a result, the accuracy and appropriateness of providing the song search results can be improved, compared to indiscriminately providing all search results to the user (10).

より具体的に、曲情報を提供するステップ（Ｓ１６０）では、類似度点数が既設定値以上の曲名及び歌手名を含む曲情報をユーザ（１０）に提供してもよい。 More specifically, in the step of providing song information (S160), song information including song titles and singer names whose similarity score is equal to or greater than a preset value may be provided to the user (10).

すなわち、既設定の類似度点数の数値を基準に該当数値以上の点数の曲名及び歌手名を含む曲情報をユーザ（１０）に提供することで、曲検索結果の正確度及び適正度をより高めることができる。 That is, by providing the user (10) with song information including song titles and singer names with scores equal to or higher than the preset similarity scores, the accuracy and appropriateness of song search results are further enhanced. be able to.

また曲情報を提供するステップ（Ｓ１６０）では、類似度点数が高い順に曲情報をユーザ（１０）に提供してもよい。 In the step of providing song information (S160), song information may be provided to the user (10) in descending order of similarity scores.

これによって、ユーザ（１０）は、提供された点数の曲名及び歌手名を含む曲情報のうちで高い類似度点数の曲情報から探すことで、希望する曲をより速かに探すことができる。 Accordingly, the user (10) can quickly find the desired song by searching for song information with a high similarity score among the song information including the song title and singer name with the provided scores.

本発明の一実施例による音声認識選曲装置（１００）について説明する。 A voice recognition music selection device (100) according to an embodiment of the present invention will be described.

本実施例によると、図６に示すように、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークンを提供される基準トークン提供部（１１０）、ユーザ（１０）が曲検索のために発話した検索発話文をテキストの形態で入力される入力部（１２０）、検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成する検索トークン生成部（１３０）、複数の検索トークンを複数の基準トークンと比較するトークン比較部（１４０）、比較結果によって基準発話文に対する検索発話文の類似度点数を算出する点数算出部（１５０）、及び類似度点数を基準にユーザ（１０）に曲情報を提供する結果提供部（１６０）を含む、音声認識選曲装置（１００）が提供される。 According to the present embodiment, as shown in FIG. 6, a criterion provided is a plurality of criterion tokens generated by segmenting a criterion utterance containing a song title and a singer name, and segmenting it based on parts of speech, and stems and endings. A token providing unit (110), an input unit (120) for inputting search utterance sentences uttered by the user (10) for song search in the form of text, dividing the search utterance sentences, parts of speech, and stems and endings as criteria. a search token generation unit (130) that generates a plurality of search tokens by dividing into a plurality of search tokens; a token comparison unit (140) that compares the plurality of search tokens with a plurality of reference tokens; Provided is a voice recognition music selection device (100) including a score calculation unit (150) for calculating a similarity score and a result providing unit (160) for providing song information to a user (10) based on the similarity score. .

このような本実施例によると、ユーザ（１０）は、より容易に曲検索が可能であり、より正確な曲検索結果を得ることができる。 According to this embodiment, the user (10) can more easily search for songs and obtain more accurate song search results.

以下、図２ないし図６を参照して本実施例による音声認識選曲装置（１００）の各構成について説明する。 Hereinafter, each configuration of the voice recognition music selection apparatus (100) according to the present embodiment will be described with reference to FIGS. 2 to 6. FIG.

基準トークン提供部（１１０）は、曲名と歌手名を含む基準発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって生成された複数の基準トークンを提供されてもよい。 The reference token providing unit (110) may be provided with a plurality of reference tokens generated by segmenting a reference utterance containing a song title and a singer name, and segmenting on the basis of parts of speech, and stems and endings.

この場合、複数の基準トークンをサーバなどからダウンロード（ｄｏｗｎｌｏａｄ）などの形態で提供されてもよく、前記複数の基準トークンが前記サーバなどに格納されている状態で、前記サーバに後述する検索トークンを送信して前記複数の基準トークンを用いてもよい。 In this case, a plurality of reference tokens may be provided in the form of download from a server or the like, and in a state in which the plurality of reference tokens are stored in the server or the like, a search token, which will be described later, is sent to the server. The plurality of reference tokens may be transmitted and used.

基準トークン提供部（１１０）は、複数の基準トークンのそれぞれの第１字目、及び第１字目に一字ずつ順に追加して構成される拡張トークンをさらに提供されてもよい。 The reference token providing unit (110) may further be provided with first characters of each of the plurality of reference tokens and extended tokens formed by sequentially adding one character to the first character.

入力部（１２０）は、ユーザ（１０）が曲検索のために発話した検索発話文をテキストの形態で入力されてもよい。 The input unit 120 may receive a search utterance sentence uttered by the user 10 for song search in the form of text.

より具体的に、入力部（１２０）は、ユーザ（１０）が曲検索のために発話した検索発話文を音声で入力される音声入力部（１２０）、及び検索発話文をテキストの形態に変換するテキスト変換部を含んでもよい。 More specifically, the input unit 120 includes a voice input unit 120 for inputting a search utterance sentence uttered by the user 10 for song search, and a search utterance sentence converted into a text form. may include a text conversion unit that

検索発話文のテキスト形態への変換は別途のサーバを介して行われてもよく、このために音声で入力された検索発話文を前記サーバに送信してもよい。 The conversion of the retrieval utterance into text form may be performed via a separate server, and for this purpose, the retrieval utterance, which is input by voice, may be transmitted to the server.

より具体的に、テキスト変換部は、検索発話文のテキスト形態への変換正確度を高めるために、曲名及び歌手名を含む曲情報データが活用されてもよく、より具体的に、上述した曲情報データを変換の基準値と設定して変換を行うことで、前記変換正確度を向上させることができる。 More specifically, the text conversion unit may utilize song information data including song titles and singer names in order to increase accuracy in converting the retrieved utterances into text form. By setting the information data as a reference value for conversion and performing conversion, the conversion accuracy can be improved.

またこの場合、人工知能技術を活用してもよく、例えば、ディープラーニングによって、頻繁に露出される単語などに対する敏感度を向上させることで、変換正確度をより向上させることができる。 In this case, artificial intelligence technology may also be used. For example, deep learning can be used to improve sensitivity to words that are frequently exposed, thereby further improving conversion accuracy.

入力部（１２０）は、検索発話文を曲名及び歌手名と比較して検索発話文の誤りを修正する誤り修正部をさらに含んでもよい。 The input unit (120) may further include an error correction unit that compares the retrieved utterance with the song title and singer name and corrects errors in the retrieved utterance.

より具体的に、誤り修正部は、検索発話文を曲名及び歌手名と比較して点数を算出でき、前記比較点数が既設定の基準値を上回る曲情報のうちの最高点の曲情報を基準に検索発話文を修正してもよい。 More specifically, the error correction unit can compare the retrieved utterance text with the song title and the singer name to calculate the score, and the song information with the highest score among the song information whose comparison score exceeds a preset reference value is used as the reference. You may modify the search utterance sentence to

またこの場合にも、上述したように、人工知能技術、例えばディープラーニングを適用してもよく、これによって、頻繁に検索される単語などに対する学習を通じて該当単語が含まれた文章に対するより正確な修正が行われることができる。 Also in this case, as described above, artificial intelligence technology, such as deep learning, may be applied, so that through learning of frequently searched words, sentences containing the relevant words may be corrected more accurately. can be done.

音声入力部（１２０）は、図６に示すように、検索発話文を音声認識可能なレシーバ（１２２）によって入力されてもよい。 The speech input unit (120), as shown in FIG. 6, may receive a search speech sentence by a receiver (122) capable of speech recognition.

この場合、レシーバ（１２２）は、音声を入力されて認識できるすべての入力装置を含んでもよく、例えば、伴奏器やリモコンに備えられたマイク、本装置に備えられたマイク、別途設けられた専用レシーバ又は本装置と連動したユーザ（１０）の端末に備えられたマイクなどが前記レシーバ（１２２）に含まれてもよい。 In this case, the receiver (122) may include any input device capable of inputting and recognizing voice. Said receiver (122) may include a receiver or a microphone provided at the user's (10) terminal associated with the device.

より具体的に、レシーバ（１２２）は、騒音状況における音声の認識率を増加させるために、騒音のフィルタリング及び音声の増幅の少なくともいずれか一方が可能であってもよい。 More specifically, the receiver (122) may be capable of filtering noise and/or amplifying speech to increase the recognition rate of speech in noisy situations.

検索トークン生成部（１３０）は、検索発話文を分かち書き、品詞、及び語幹と語尾を基準に分割することによって複数の検索トークンを生成してもよい。 The search token generation unit (130) may generate a plurality of search tokens by dividing the search utterance sentence into words, parts of speech, and stems and endings.

また別途設けられたサーバにテキストの形態に変換された検索発話文を送信して該当サーバで前記複数の検索トークンが生成されてもよい。 Alternatively, the search utterance text converted into text may be transmitted to a server provided separately, and the plurality of search tokens may be generated by the corresponding server.

検索トークン生成部（１３０）は、検索発話文を分かち書きを基準に一次分割し、一次分割された検索発話文を品詞を基準に二次分割し、二次分割された検索発話文のうちの動詞と形容詞を語幹と語尾を基準に三次分割することによって複数の検索トークンを生成してもよい。 A search token generation unit (130) primarily divides the search utterance sentence based on the spaces, secondarily divides the primarily divided search utterance sentence based on the part of speech, and verbs of the secondarily divided search utterance sentence. A plurality of search tokens may be generated by tertiary segmentation of adjectives and adjectives based on their stems and endings.

トークン比較部（１４０）は、複数の検索トークンを複数の基準トークンと比較してもよい。 The token comparator (140) may compare multiple search tokens with multiple reference tokens.

この場合、別途設けられたサーバ内で複数の検索トークンと複数の基準トークンとの相互比較が行われてもよく、このために複数の検索トークンが前記サーバに送信されてもよい。 In this case, multiple search tokens may be cross-compared with multiple reference tokens in a separate server, and multiple search tokens may be sent to said server for this purpose.

点数算出部（１５０）は、上述した比較結果によって基準発話文に対する検索発話文の類似度点数を算出してもよい。 The score calculation unit (150) may calculate a similarity score of the search utterance sentence to the reference utterance sentence based on the comparison result described above.

より具体的に点数算出部（１５０）は、図４に示すように、複数の検索トークンが複数の基準トークンにすべて含まれる場合に類似度点数を算出してもよい。 More specifically, the score calculation unit (150) may calculate the similarity score when all of the plurality of search tokens are included in the plurality of reference tokens, as shown in FIG.

点数算出部（１５０）は、図４に示すように、複数の検索トークンの一部が複数の基準トークンに含まれる場合、ユーザ（１０）に再度検索発話文を発話することを要請してもよい。 As shown in FIG. 4, when some of the plurality of search tokens are included in the plurality of reference tokens, the score calculation unit (150) requests the user (10) to utter the search utterance sentence again. good.

また点数算出部（１５０）は、図４に示すように、ユーザ（１０）に再度検索発話文を発話することを要請すると共に、複数の基準トークンに含まれていない検索トークンに対する情報を提供してもよい。 Also, as shown in FIG. 4, the score calculation unit 150 requests the user 10 to utter the search utterance sentence again and provides information on search tokens not included in the plurality of reference tokens. may

点数算出部（１５０）は、図５に示すように、基準トークンにマッチングされる検索トークンのそれぞれに対して単位点数を算出し、各単位点数を合算して類似度点数を算出してもよい。 As shown in FIG. 5, the score calculation unit (150) may calculate a unit score for each search token that matches the reference token, and add up each unit score to calculate a similarity score. .

より具体的に単位点数は、基準トークンにマッチングされる検索トークンの文字数が多いほど高く算出されてもよい。 More specifically, the unit score may be calculated to be higher as the number of characters of the search token that matches the reference token is larger.

結果提供部（１６０）は、類似度点数を基準にユーザ（１０）に曲情報を提供してもよい。 The result providing unit (160) may provide song information to the user (10) based on the similarity score.

より具体的に、結果提供部（１６０）は、類似度点数が既設定値以上の曲名及び歌手名を含む曲情報をユーザ（１０）に提供してもよい。 More specifically, the result providing unit (160) may provide the user (10) with song information including song titles and singer names whose similarity score is greater than or equal to a preset value.

また結果提供部（１６０）は、類似度点数が高い順に曲名及び歌手名を含む曲情報をユーザ（１０）に提供してもよい。 Also, the result providing unit (160) may provide the user (10) with song information including song titles and singer names in descending order of similarity scores.

上述した該当類似度点数の算出は別途のサーバで行われてもよく、また検索発話文の再発化要請、複数の基準トークンに含まれていない検索トークンに対する情報の提供、検索結果としての曲情報の提供のためのディスプレイ（ｄｉｓｐｌａｙ）部が備えられてもよい。 The calculation of the corresponding similarity score described above may be performed by a separate server, requesting re-request of the search utterance sentence, providing information on search tokens not included in a plurality of reference tokens, and song information as search results. A display may be provided for the provision of.

一方、上述した実施例の構成要素はプロセス的な観点で容易に把握することができる。すなわち、それぞれの構成要素はそれぞれのプロセスで把握することができる。また上述した実施例のプロセスは装置の構成要素の観点で容易に把握することができる。 On the other hand, the constituent elements of the above-described embodiment can be easily grasped from a process point of view. That is, each component can be grasped by each process. Also, the processes of the above-described embodiments can be easily understood in terms of the components of the apparatus.

また上述した技術的内容は、多様なコンピュータ手段によって行われ得るプログラム命令の形態で具現され、コンピュータ判読可能媒体に記録されてもよい。前記コンピュータ判読可能媒体は、プログラム命令、データファイル、データ構造などを単独で又は組み合わせて含んでもよい。前記媒体に記録されるプログラム命令は、実施例のために特に設計及び構成されたものであるか、又はコンピュータソフトウェア当業者に公知となって使用可能なものであってもよい。コンピュータ判読可能記録媒体の例としては、ハードディスク、プロッピィーディスク及び磁気テープのような磁気媒体（ｍａｇｎｅｔｉｃｍｅｄｉａ）、ＣＤ－ＲＯＭ、ＤＶＤのような光記録媒体（ｏｐｔｉｃａｌｍｅｄｉａ）、プロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような磁気－光媒体（ｍａｇｎｅｔｏ－ｏｐｔｉｃａｌｍｅｄｉａ）、及びＲＯＭ、ＲＡＭ、フラッシュメモリーなどのようなプログラム命令を格納して行うように特に構成されたハードウェア装置が含まれる。プログラム命令の例としては、コンパイラーによって作われるような機械語コードだけでなく、インタプリターなどを用いてコンピュータによって実行できる高級言語コードを含む。ハードウェア装置は、実施例の動作を行うために１つ以上のソフトウェアモジュールとして作動するように構成されてもよく、その逆も同様である。
以上、本発明の一実施例について説明したが、該当技術分野における通常の知識を有する者であれば特許請求範囲に記載した本発明の思想から逸脱しない範囲内で、構成要素の付加、変更、削除又は追加などによって本発明を多様に修正及び変更させることができ、これも本発明の権利範囲内に含まれると言えるであろう。 In addition, the technical contents described above may be embodied in the form of program instructions that can be executed by various computer means and recorded on a computer-readable medium. The computer-readable media may include program instructions, data files, data structures, etc. singly or in combination. The program instructions recorded on the medium may be those specifically designed and constructed for the embodiment, or they may be of the kind known and available to those of skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, proppy discs and magnetic tapes, optical media such as CD-ROMs and DVDs, and floptical disks. ), and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code, such as produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. A hardware device may be configured to act as one or more software modules to perform the operations of an embodiment, and vice versa.
An embodiment of the present invention has been described above. It can be said that the present invention can be variously modified and changed by deletion or addition, etc., and this is also included in the scope of the present invention.

１０：ユーザ
１００：音声認識選曲装置
１１０：基準トークン提供部
１２０：入力部
１２２：レシーバ
１３０：検索トークン生成部
１４０：トークン比較部
１５０：点数算出部
１６０：結果提供部 10: User 100: Speech recognition music selection device 110: Reference token provider 120: Input unit 122: Receiver 130: Search token generator 140: Token comparator 150: Score calculator 160: Result provider

Claims

providing a plurality of reference tokens generated by segmenting a reference utterance containing a song title and a singer name, and segmenting on the basis of parts of speech, and stems and endings;
Inputting a search utterance sentence uttered by a user for searching for a song in the form of text;
generating a plurality of search tokens by segmenting the search utterance on the basis of phrasing, parts of speech, and stems and endings;
comparing said plurality of search tokens with said plurality of reference tokens;
A method for providing a voice recognition music selection service, comprising: calculating a similarity score of the retrieved utterance sentence to the reference utterance sentence according to a comparison result; and providing song information to the user based on the similarity score.

The plurality of reference tokens are
Primarily dividing the reference utterance sentence based on the spaces,
secondarily dividing the primarily divided reference utterance sentence based on the part of speech;
2. The method of claim 1, wherein the verbs and adjectives of the secondary-divided reference utterance sentence are tertiary-divided based on the stem and ending.

In providing the plurality of reference tokens,
2. The provision of the speech recognition music selection service according to claim 1, further comprising a first character of each of said plurality of reference tokens and an extension token configured by sequentially adding one character to said first character. Method.

The step of inputting the search utterance in the form of text includes:
2. The voice recognition music selection service according to claim 1, comprising the steps of inputting the search utterance sentence uttered by the user for the song search by voice; and converting the search utterance sentence into the text form. How to provide.

The step of inputting the search utterance in the form of text includes:
After the step of converting the retrieved utterance into text form,
5. The method of claim 4, further comprising comparing the retrieval utterance with the song title and singer name to correct errors in the retrieval utterance.

In generating the plurality of search tokens,
Primarily dividing the retrieved utterance sentence based on the spaces between words;
secondarily dividing the primarily divided retrieval utterance sentence based on the part of speech;
2. The method of claim 1, wherein the plurality of search tokens are generated by tertiary division of verbs and adjectives in the secondary division of the search utterance sentence based on the stem and the ending. .

In the step of calculating the similarity score,
2. The method of claim 1, wherein the similarity score is calculated when all of the plurality of search tokens are included in the plurality of reference tokens.

In the step of calculating the similarity score,
2. The method of claim 1, wherein a unit score is calculated for each of the search tokens matched with the reference token, and the similarity score is calculated by summing the unit scores.

9. The method of claim 8, wherein the greater the number of characters of the search token matching the reference token, the higher the unit score calculated.

a reference token provider provided with a plurality of reference tokens generated by segmenting a reference utterance containing a song title and a singer name, and segmenting on the basis of parts of speech, and stems and endings;
an input unit for inputting a search utterance sentence uttered by a user for song search in the form of text;
a search token generation unit that generates a plurality of search tokens by dividing the search utterance sentence based on the phrasing, part of speech, and stem and ending;
a token comparator that compares the plurality of search tokens with the plurality of reference tokens;
A voice recognition music selection device, comprising: a score calculation unit that calculates a similarity score of the retrieved utterance sentence to the reference utterance sentence based on a comparison result; and a result providing unit that provides song information to the user based on the similarity score.