JP2007225916A

JP2007225916A - Authoring apparatus, authoring method and program

Info

Publication number: JP2007225916A
Application number: JP2006047186A
Authority: JP
Inventors: Naohiro Emoto; 直博江本; Juichi Sato; 寿一佐藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-02-23
Filing date: 2006-02-23
Publication date: 2007-09-06

Abstract

<P>PROBLEM TO BE SOLVED: To create a content which is used in a Karaoke system and a musical instrument training system and can indicate singing technique and musical performance technique. <P>SOLUTION: A CPU 11 of an authoring device 1 applies voice analysis processing to voice data, and calculates pitch data, power data, and spectrum data from the voice data. The CPU 11 detects relation between accompaniment data and the voice data, and based on the relation, the pitch data, the power data and the spectrum data, detects a period where the singing technique is used in the voice data, and creates singing technique data in which time information for indicating the detected period is made to correspond to identification information for indicating the singing technique and stores the created singing technique data into a storing section 14. When creating processing of the singing technique is finished, the CPU 11 outputs the created singing technique data, the voice data and the accompaniment data, as the content of a predetermined format. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、カラオケシステムや楽器練習システムで用いられるコンテンツを作成するための技術に関する。 The present invention relates to a technique for creating content used in a karaoke system or a musical instrument practice system.

音楽にあわせて歌詞を表示するカラオケ装置においては、音楽の発音タイミングと歌詞の表示タイミングとを同期させるための方法が種々提案されている。例えば、特許文献１に記載の歌詞表示システムは、歌詞データを複数のブロックに分割し、歌い始めを示すタイミング情報と歌い終わりを示すタイミング情報とをブロック毎に付帯させて、このタイミング情報に応じて歌詞を表示するようになっている。また、特許文献２には、楽曲の音声データから歌音声データを抽出して文字データに変換し、歌音声データと文字データとを照合させて、文字データに表示時間情報を割り付ける方法が提案されている。特許文献１や特許文献２に記載の方法によれば、音楽の発音タイミングと同期して歌詞が表示されるから、利用者は歌唱のタイミングを把握することができる。
特開平１１−３８９８１号公報特開２００１−１７５２６７号公報 In a karaoke apparatus that displays lyrics in accordance with music, various methods for synchronizing the pronunciation timing of music and the display timing of lyrics have been proposed. For example, the lyric display system described in Patent Document 1 divides lyrics data into a plurality of blocks, adds timing information indicating the start of singing and timing information indicating the end of singing for each block, and responds to the timing information. The lyrics are displayed. Patent Document 2 proposes a method of extracting song voice data from voice data of music, converting it into character data, collating song voice data with character data, and assigning display time information to the character data. ing. According to the methods described in Patent Document 1 and Patent Document 2, the lyrics are displayed in synchronization with the sounding timing of music, so that the user can grasp the timing of singing.
JP 11-38981 A JP 2001-175267 A

ところで、歌唱者が楽譜に沿って機械的に歌を歌うことはほとんどなく、歌唱者の多くは、歌い始めや歌い終わりを意図的にずらしたり、ビブラートやこぶし等の歌唱技法を用いたりして歌のなかに感情の盛り上がり等を表現する。カラオケ装置の利用者のなかには、このような意図的なタイミングのずれや歌唱技法を真似て歌いたいという要望をもつものもいる。これは楽器演奏についても同様である。
本発明は上述した背景の下になされたものであり、カラオケシステムや楽器練習システムで用いられるコンテンツであって、歌唱技法や演奏技法を示すことのできるコンテンツを作成する技術を提供することを目的とする。 By the way, singers rarely sing songs mechanically according to the score, and many singers intentionally shift the beginning and end of singing and use singing techniques such as vibrato and fist. Express the excitement of emotion in the song. Some users of karaoke devices have a desire to sing by imitating such intentional timing shifts and singing techniques. The same applies to musical instrument performance.
The present invention has been made under the background described above, and it is an object of the present invention to provide a technique for creating content that can be used in a karaoke system or a musical instrument practice system and that can show a singing technique or a performance technique. And

上記課題を解決するため、本発明は、音声データから音声のピッチを算出するピッチ算出手段と、前記ピッチ算出手段により算出されたピッチの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の技法が用いられている区間として特定する特定手段と、前記特定手段により特定された区間を示す技法データを出力する出力手段とを備えることを特徴とするオーサリング装置を提供する。
また、本発明は、前記音声データから当該音声データのスペクトルを算出するスペクトル算出手段と、前記スペクトル算出手段により算出されたスペクトルの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の技法が用いられている区間として特定する特定手段と、前記特定手段により特定された区間を示す技法データを出力する出力手段とを備えることを特徴とするオーサリング装置を提供する。
また、本発明は、音声データから当該音声データのパワーを算出するパワー算出手段と、前記パワー算出手段により算出されたパワーの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の技法が用いられている区間として特定する特定手段と、前記特定手段により特定された区間を示す技法データを出力する出力手段とを備えることを特徴とするオーサリング装置を提供する。 In order to solve the above-described problems, the present invention analyzes a pitch calculation unit that calculates a pitch of voice from voice data, and a temporal change pattern of the pitch calculated by the pitch calculation unit. It is determined whether or not it corresponds to a predetermined pattern, and if it corresponds, a specifying unit that specifies a section corresponding to the pattern as a section in which a specific technique is used, and specified by the specifying unit There is provided an authoring device comprising output means for outputting technique data indicating a section.
Further, the present invention analyzes a spectrum calculation unit that calculates a spectrum of the voice data from the voice data, and a temporal change pattern of the spectrum calculated by the spectrum calculation unit, and the analysis result is determined in advance. Determining whether or not to correspond to the pattern, and if so, specifying means for specifying the section corresponding to the pattern as a section in which a specific technique is used, and a section specified by the specifying means There is provided an authoring device comprising output means for outputting the indicated technique data.
Further, the present invention analyzes a power calculation means for calculating the power of the voice data from the voice data and a temporal change pattern of the power calculated by the power calculation means, and the analysis result is determined in advance. A corresponding means for identifying a section corresponding to the pattern as a section in which a specific technique is used, and a section specified by the specifying means. There is provided an authoring device comprising output means for outputting technique data.

本発明の好ましい態様においては、前記特定手段は、前記ピッチ算出手段によって算出されたピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を特定することを特徴とする。
本発明の別の好ましい態様においては、前記特定手段は、前記ピッチ算出手段によって算出されたピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的に変化する区間を特定することを特徴とする。
また、本発明の別の好ましい態様においては、前記特定手段は、前記スペクトル算出手段によって算出されたスペクトルの時間的な変化のパターンを解析して、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を特定することを特徴とする。
また、本発明の別の好ましい態様においては、前記特定手段は、前記スペクトル算出手段によって算出されたスペクトルの時間的な変化のパターンを解析して、スペクトル特性が予め決められた変化状態に急激に遷移している区間を特定することを特徴とする。
また、本発明の別の好ましい態様においては、音声データから音声のピッチを算出するピッチ算出手段を設け、前記特定手段は、前記ピッチ算出手段が算出したピッチが所定の領域にあるときに、前記スペクトル算出手段によって算出されたスペクトルの時間的な変化のパターンを解析して、スペクトル特性が予め決められた変化状態に急激に遷移している区間を特定することを特徴とする。
また、本発明の別の好ましい態様においては、伴奏データと前記音声データとを所定のフレーム単位で解析し、両者の時間的な対応関係を検出する対応関係検出手段を備え、前記特定手段は、前記対応関係検出手段によって検出された対応関係と、前記ピッチ算出手段によって算出されたピッチとに基づいて、前記音声データに含まれる音の開始時刻と当該音に対応する前記伴奏データの音の開始時刻とが異なる区間を特定することを特徴とする。
また、本発明の別の好ましい態様においては、伴奏データと前記音声データとを所定のフレーム単位で解析し、両者の時間的な対応関係を検出する対応関係検出手段を備え、前記特定手段は、前記パワー算出手段によって算出されたパワーと、前記対応関係検出手段によって検出された対応関係とに基づいて、伴奏データが有音である区間であって音声データのパワーの値が閾値よりも小さい区間を特定することを特徴とする。 In a preferred aspect of the present invention, the specifying unit analyzes a pattern of temporal change of the pitch calculated by the pitch calculating unit, and the pitch is continuously within a predetermined range above and below the central frequency. It is characterized by specifying the section which is fluctuating.
In another preferred aspect of the present invention, the specifying unit analyzes a pattern of temporal change of the pitch calculated by the pitch calculating unit and specifies a section that continuously changes from a low pitch to a high pitch. It is characterized by doing.
In another preferable aspect of the present invention, the specifying unit analyzes the temporal change pattern of the spectrum calculated by the spectrum calculating unit, and various changes in the spectral characteristics can be made in a short time. A section to be switched is specified.
In another preferable aspect of the present invention, the specifying unit analyzes the temporal change pattern of the spectrum calculated by the spectrum calculating unit, and the spectrum characteristic is rapidly changed to a predetermined change state. It is characterized by specifying the transition section.
In another preferable aspect of the present invention, a pitch calculating unit that calculates a pitch of audio from audio data is provided, and the specifying unit is configured such that when the pitch calculated by the pitch calculating unit is in a predetermined region, By analyzing a temporal change pattern of the spectrum calculated by the spectrum calculating means, a section in which the spectrum characteristic is rapidly changed to a predetermined change state is specified.
Further, in another preferred aspect of the present invention, there is provided a correspondence detecting means for analyzing accompaniment data and the audio data in a predetermined frame unit and detecting a temporal correspondence between the two, and the specifying means includes: Based on the correspondence detected by the correspondence detection means and the pitch calculated by the pitch calculation means, the start time of the sound included in the audio data and the start of the sound of the accompaniment data corresponding to the sound A section having a different time is specified.
Further, in another preferred aspect of the present invention, there is provided a correspondence detecting means for analyzing accompaniment data and the audio data in a predetermined frame unit and detecting a temporal correspondence between the two, and the specifying means includes: Based on the power calculated by the power calculating means and the correspondence detected by the correspondence detecting means, the section where the accompaniment data is voiced and the power value of the audio data is smaller than the threshold It is characterized by specifying.

本発明によれば、カラオケシステムや楽器練習システムで用いられるコンテンツであって、用いられている歌唱技法や演奏技法を示すことのできるコンテンツを作成することができる。 ADVANTAGE OF THE INVENTION According to this invention, it is the content used by a karaoke system or a musical instrument practice system, Comprising: The content which can show the singing technique and performance technique currently used can be created.

＜Ａ：構成＞
図１は、この発明の一実施形態であるオーサリング装置１のハードウェア構成を例示したブロック図である。オーサリング装置１は、例えばパーソナルコンピュータ装置等の装置である。図において、１１はＣＰＵ（Central Processing Unit）である。１２はＲＯＭ（Read Only Memory）である。１３はＲＡＭ（Random Access Memory）である。１４は例えばハードディスクなどの大容量記憶装置で構成された記憶部である。ＣＰＵ１１は、ＲＯＭ１２または記憶部１４に記憶されているコンピュータプログラムを読み出して実行することにより、バス１５を介してオーサリング装置１の各部を制御する。
１６は例えば液晶ディスプレイなどで構成される表示部であり、ＣＰＵ１１の制御の下、文字列や各種メッセージ、オーサリング装置１を操作するためのメニュー画面等を表示する。１７はキーボードやマウス等の入力装置を備える操作部であり、キーの押下やマウスの操作等に応じて操作内容に対応した信号をＣＰＵ１１へ出力する。
１８は、ＣＰＵ１１から供給されるデジタルデータをアナログデータに変換するＤ／Ａ変換部である。Ｄ／Ａ変換部１８によって変換されたデータはスピーカ１９に供給され、スピーカ１９は供給されるデータに応じた音声を放音する。 <A: Configuration>
FIG. 1 is a block diagram illustrating a hardware configuration of an authoring apparatus 1 according to an embodiment of the invention. The authoring device 1 is a device such as a personal computer device. In the figure, 11 is a CPU (Central Processing Unit). Reference numeral 12 denotes a ROM (Read Only Memory). Reference numeral 13 denotes a RAM (Random Access Memory). Reference numeral 14 denotes a storage unit composed of a mass storage device such as a hard disk. The CPU 11 reads out and executes a computer program stored in the ROM 12 or the storage unit 14 to control each unit of the authoring apparatus 1 via the bus 15.
Reference numeral 16 denotes a display unit composed of, for example, a liquid crystal display, and displays a character string, various messages, a menu screen for operating the authoring device 1 and the like under the control of the CPU 11. An operation unit 17 includes an input device such as a keyboard and a mouse, and outputs a signal corresponding to the operation content to the CPU 11 in response to a key press or a mouse operation.
A D / A converter 18 converts digital data supplied from the CPU 11 into analog data. The data converted by the D / A conversion unit 18 is supplied to the speaker 19, and the speaker 19 emits sound corresponding to the supplied data.

オーサリング装置１の記憶部１４は、図１に示すように、伴奏データ記憶領域１４ａと、音声データ記憶領域１４ｂと、歌唱技法データ記憶領域１４ｃと、歌詞データ記憶領域１４ｄとを有している。伴奏データ記憶領域１４ａには、例えばＭＩＤＩ（Musical Instruments Digital Interface）形式の伴奏データであってその楽曲の伴奏を行う各種楽器の音程（ピッチ）を示す情報が楽曲の進行に伴って記された伴奏データが記憶される。次に、音声データ記憶領域１４ｂには、例えばＷＡＶＥ形式やＭＰ３（MPEG Audio Layer-3）形式などの音声データであって伴奏データの表す伴奏に沿って歌唱者が歌った歌を表す音声データが記憶されている。また、歌詞データ記憶領域１４ｄには、音声データと対応する歌詞を示す歌詞データが記憶される。なお、この歌詞データは、作業者がオーサリング装置１の操作部１７を操作して入力するようにすればよい。 As shown in FIG. 1, the storage unit 14 of the authoring apparatus 1 has an accompaniment data storage area 14a, a voice data storage area 14b, a singing technique data storage area 14c, and a lyrics data storage area 14d. In the accompaniment data storage area 14a, for example, accompaniment data in the MIDI (Musical Instruments Digital Interface) format, in which information indicating the pitch (pitch) of various musical instruments that accompany the music is recorded as the music progresses. Data is stored. Next, in the audio data storage area 14b, audio data representing a song sung by the singer along the accompaniment represented by the accompaniment data, such as audio data in the WAVE format or MP3 (MPEG Audio Layer-3) format, for example. It is remembered. The lyrics data storage area 14d stores lyrics data indicating lyrics corresponding to the voice data. The lyrics data may be input by the operator operating the operation unit 17 of the authoring device 1.

記憶部１４の歌唱技法データ記憶領域１４ｃには、音声データ記憶領域１４ｂに記憶された音声データの示す音声において用いられている歌唱技法を示すデータ（以下、「歌唱技法データ」）が記憶される。
図２は、歌唱技法データの内容の一例を示す図である。図示のように、歌唱技法データは、「時刻情報」と「識別情報」との項目が互いに関連付けられている。これらの項目のうち、「時刻情報」の項目には、音声データにおいて歌唱技法が用いられた区間を示す時刻情報が記憶される。なお、この時刻情報が示す区間は、開示時刻情報と終了時刻情報とによって表される時間幅を有した区間であってもよく、またはある１点の時刻を示すものであってもよい。
「識別情報」の項目には、予め設定された複数の歌唱技法を識別する情報が記憶される。この「識別情報」は、例えば「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」、「息継ぎ」などの歌唱技法を識別する情報である。「ビブラート」は、音の高さをほんのわずかに連続的に上下させ、震えるような音色を出す技法を示す。「しゃくり」は、目的の音より低い音から発音し、音程を滑らかに目的の音に近づけていく技法を示す。「こぶし」は、装飾的に加えるうねるような節回しを行う技法を示す。「ファルセット」は、いわゆる「裏声」で歌う技法を示す。「つっこみ」は、歌い出しを本来のタイミングよりも早いタイミングにする技法を示す。「ため」は、歌い出しを本来のタイミングよりも遅いタイミングにする技法を示す。「息継ぎ」は、歌唱者が息継ぎをするタイミングを示すものである。 In the singing technique data storage area 14c of the storage unit 14, data indicating the singing technique used in the voice indicated by the voice data stored in the voice data storage area 14b (hereinafter, “singing technique data”) is stored. .
FIG. 2 is a diagram illustrating an example of the contents of singing technique data. As illustrated, in the singing technique data, items of “time information” and “identification information” are associated with each other. Among these items, the “time information” item stores time information indicating a section in which the singing technique is used in the audio data. The section indicated by the time information may be a section having a time width represented by the disclosure time information and the end time information, or may indicate a certain point of time.
In the item of “identification information”, information for identifying a plurality of preset singing techniques is stored. This “identification information” is information for identifying a singing technique such as “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for”, “breathing”, and the like. “Vibrato” refers to a technique that raises and lowers the pitch of the sound only slightly and produces a trembling tone. “Shikkuri” refers to a technique in which sound is generated from a sound lower than the target sound, and the pitch is smoothly brought close to the target sound. “Fist” refers to a technique for adding a decorative undulation. “Falset” indicates a technique of singing with a so-called “back voice”. “Tsukumi” refers to a technique for making the singing start earlier than the original timing. “For” indicates a technique for making the singing timing later than the original timing. The “breathing” indicates the timing when the singer breathes.

次に、オーサリング装置１のＣＰＵ１１がＲＯＭ１２または記憶部１４に記憶されたコンピュータプログラムを実行することによって実現するコンテンツ生成機能について説明する。
図３は、オーサリング装置１のコンテンツ生成機能に係るソフトウェア構成を示す図である。なお、図において、音声データ分析部１１１、照合検出部１１２および歌唱技法データ作成部１１３は、ＣＰＵ１１が記憶部１４に記憶されたコンピュータプログラムを読み出して実行することによって実現される。なお、図中の矢印は、データの流れを概略的に示したものである。
図において、音声データ分析部１１１は、ＲＯＭ１２または記憶部１４に記憶された音声データを読出して、読み出した音声データを音声分析し、時刻に対応したピッチ、パワー（音量）、スペクトルを音声データから算出し、算出したピッチ、パワー、スペクトルのそれぞれを示すピッチデータ、パワーデータ、スペクトルデータを生成する。
照合検出部１１２は、音声データと記憶部１４の伴奏データ記憶領域１４ａに記憶された伴奏データとを所定のフレーム単位で解析し、両者の時間的な対応関係を検出する処理を行う。
歌唱技法データ作成部１１３は、音声データ分析部１１１によって算出されたピッチ、パワーおよびスペクトルの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の歌唱技法が用いられている区間として特定し、特定した区間を示す歌唱技法データを記憶部１４の歌唱技法データ記憶領域１４ｃに記憶する。 Next, a content generation function realized by the CPU 11 of the authoring apparatus 1 executing a computer program stored in the ROM 12 or the storage unit 14 will be described.
FIG. 3 is a diagram illustrating a software configuration related to the content generation function of the authoring apparatus 1. In the figure, the voice data analysis unit 111, the collation detection unit 112, and the singing technique data creation unit 113 are realized by the CPU 11 reading and executing a computer program stored in the storage unit 14. The arrows in the figure schematically show the flow of data.
In the figure, an audio data analysis unit 111 reads audio data stored in the ROM 12 or the storage unit 14, analyzes the read audio data, and calculates a pitch, power (volume), and spectrum corresponding to the time from the audio data. The pitch data, power data, and spectrum data indicating the calculated pitch, power, and spectrum are generated.
The collation detection unit 112 analyzes the audio data and the accompaniment data stored in the accompaniment data storage area 14a of the storage unit 14 in units of predetermined frames, and performs a process of detecting the temporal correspondence between the two.
The singing technique data creation unit 113 analyzes the pattern of temporal changes in pitch, power, and spectrum calculated by the voice data analysis unit 111, and determines whether or not the analysis result corresponds to a predetermined pattern. If it is determined and corresponds, the section corresponding to the pattern is specified as the section in which the specific singing technique is used, and the singing technique data indicating the specified section is stored in the singing technique data storage area 14c of the storage unit 14. To do.

＜Ｂ：動作＞
次に、この実施形態の動作について以下に説明する。まず、オーサリング装置１のＣＰＵ１１は、音声データを記憶部１４から読み出し、読み出した音声データに対して音声分析処理を行い、時刻に対応したピッチ、パワー（音量）、スペクトルを音声データから算出し、算出したピッチ、パワー、スペクトルのそれぞれを示すピッチデータ、パワーデータ、スペクトルデータを生成する。続けて、ＣＰＵ１１は、音声データと伴奏データとを所定のフレーム単位で解析し、音声データと伴奏データとの時間的な対応関係を検出する。そして、ＣＰＵ１１は、音声データから算出されたピッチ、パワーおよびスペクトルの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の歌唱技法が用いられている区間として特定する。そして、ＣＰＵ１１は、特定した区間の時刻情報とその歌唱技法を示す識別情報とを関連付けて、記憶部１４の歌唱技法データ記憶領域１４ｃに記憶することによって出力する。 <B: Operation>
Next, the operation of this embodiment will be described below. First, the CPU 11 of the authoring apparatus 1 reads voice data from the storage unit 14, performs voice analysis processing on the read voice data, calculates a pitch, power (volume), and spectrum corresponding to time from the voice data, Pitch data, power data, and spectrum data indicating the calculated pitch, power, and spectrum are generated. Subsequently, the CPU 11 analyzes the audio data and the accompaniment data in a predetermined frame unit, and detects a temporal correspondence between the audio data and the accompaniment data. Then, the CPU 11 analyzes a pattern of temporal change in pitch, power, and spectrum calculated from the audio data, determines whether or not the analysis result corresponds to a predetermined pattern, and corresponds to the case. Specifies the section corresponding to the pattern as a section in which a specific singing technique is used. Then, the CPU 11 associates the time information of the specified section with the identification information indicating the singing technique, and outputs it by storing it in the singing technique data storage area 14c of the storage unit 14.

ここで、各歌唱技法が用いられている区間の特定処理について以下に説明する。本実施形態においては、ＣＰＵ１１は、「ビブラート」、「しゃくり」、「こぶし」、「ファルセット」、「つっこみ」、「ため」および「息継ぎ」の各歌唱技法が用いられている区間を特定（検出）する。これらのうち、「ビブラート」および「しゃくり」は音声データから算出されたピッチに基づいて検出する。また、「こぶし」および「ファルセット」は音声データから算出されたスペクトルに基づいて検出する。また、「ため」および「つっこみ」は、音声データから算出されたピッチと伴奏データとに基づいて検出する。また、「息継ぎ」は、音声データから算出されたパワーと伴奏データとに基づいて検出する。 Here, the identification process of the area where each singing technique is used is demonstrated below. In the present embodiment, the CPU 11 specifies (detects) a section in which each singing technique of “vibrato”, “shakuri”, “fist”, “farset”, “tsukkomi”, “for” and “breathing” is used. ) Of these, “vibrato” and “shrimp” are detected based on the pitch calculated from the audio data. “Fist” and “Falset” are detected based on the spectrum calculated from the audio data. Further, “for” and “tsukkomi” are detected based on the pitch calculated from the audio data and the accompaniment data. Further, “breathing” is detected based on the power calculated from the audio data and the accompaniment data.

ＣＰＵ１１は、音声データと伴奏データとの対応関係と、音声データから算出されたピッチとに基づいて、音声データに含まれる音の開始時刻と当該音に対応する伴奏データの音の開始時刻とが異なる区間を特定する。ここで、ＣＰＵ１１は、音声データのピッチの変化タイミングが伴奏データのピッチの変化タイミングよりも早く現れている区間、すなわち音声データに含まれる音の開始時刻が当該音に対応する伴奏データの音の開始時刻よりも早い区間については、この区間を「つっこみ」の歌唱技法が用いられている区間であると特定する。ＣＰＵ１１は、特定した区間の時刻情報と「つっこみ」を示す識別情報とを対応付けて記憶部１４の歌唱技法データ記憶領域１４ｃに記憶する。
逆に、ＣＰＵ１１は、音声データと伴奏データとの対応関係と、音声データから算出されたピッチとに基づいて、音声データのピッチの変化タイミングが伴奏データのピッチの変化タイミングよりも遅れて現れている区間、すなわち音声データに含まれる音の開始時刻が当該音に対応する伴奏データの音の開始時刻よりも遅い区間を検出し、検出した区間を「ため」の歌唱技法が用いられている区間であると特定する。 Based on the correspondence between the audio data and the accompaniment data, and the pitch calculated from the audio data, the CPU 11 determines the sound start time included in the audio data and the sound start time of the accompaniment data corresponding to the sound. Identify different sections. Here, the CPU 11 is a section where the pitch change timing of the audio data appears earlier than the pitch change timing of the accompaniment data, that is, the start time of the sound included in the audio data corresponds to the sound of the accompaniment data corresponding to the sound. For a section earlier than the start time, this section is specified as a section in which the “Tsukumi” singing technique is used. The CPU 11 stores the time information of the identified section and the identification information indicating “tsutsumi” in association in the singing technique data storage area 14 c of the storage unit 14.
On the contrary, the CPU 11 shows that the pitch change timing of the audio data appears later than the pitch change timing of the accompaniment data based on the correspondence between the audio data and the accompaniment data and the pitch calculated from the audio data. That is, a section where the start time of the sound included in the sound data is later than the start time of the sound of the accompaniment data corresponding to the sound, and the section where the singing technique of “for” is used as the detected section To be identified.

また、ＣＰＵ１１は、音声データから算出したピッチの時間的な変化のパターンを解析して、中心となる周波数の上下に所定の範囲内でピッチが連続的に変動している区間を検出し、検出した区間を「ビブラート」の歌唱技法が用いられている区間であると特定する。 Further, the CPU 11 analyzes the pattern of the temporal change of the pitch calculated from the audio data, detects a section where the pitch continuously fluctuates within a predetermined range above and below the center frequency, and detects it. This section is identified as a section in which the “vibrato” singing technique is used.

また、ＣＰＵ１１は、音声データから算出したピッチの時間的な変化のパターンを解析して、低いピッチから高いピッチに連続的にピッチが変化する区間を検出し、検出した区間を「しゃくり」の歌唱技法が用いられている区間であると特定する。なお、この処理は、伴奏データとの対応関係に基づいて行うようにしてもよい。すなわち、ＣＰＵ１１は、音声データと伴奏データとの対応関係に基づいて、音声データのピッチが、低いピッチから連続的に伴奏データのピッチに近づいている区間を検出すればよい。 Further, the CPU 11 analyzes the pattern of the temporal change of the pitch calculated from the audio data, detects a section in which the pitch continuously changes from a low pitch to a high pitch, and sings the detected section as a “shrimp” song. Identifies the interval in which the technique is used. This process may be performed based on the correspondence with the accompaniment data. In other words, the CPU 11 may detect a section in which the pitch of the audio data is continuously approaching the pitch of the accompaniment data from a low pitch based on the correspondence relationship between the audio data and the accompaniment data.

また、ＣＰＵ１１は、音声データと伴奏データとの対応関係と、音声データから算出されたパワーとに基づいて、伴奏データが有音である区間であって音声データのパワー値が所定の閾値よりも小さい区間を検出し、検出した箇所を「息継ぎ」の区間であると特定する。 Further, the CPU 11 is a section where the accompaniment data is sound and the power value of the audio data is higher than a predetermined threshold based on the correspondence between the audio data and the accompaniment data and the power calculated from the audio data. A small section is detected, and the detected part is specified as the "breathing" section.

また、ＣＰＵ１１は、音声データから算出されたスペクトルの時間的な変化パターンを解析して、スペクトル特性がその予め決められた変化状態に急激に遷移している区間を検出し、検出した区間を「ファルセット」の歌唱技法が用いられている区間であると特定する。ここで、予め決められた変化状態とは、スペクトル特性の高調波成分が極端に少なくなる状態である。例えば、図５に示すように、地声の場合は沢山の高調波成分が含まれるが（同図（ａ）参照）、ファルセットになると高調波成分の大きさが極端に小さくなる（同図（ｂ）参照）。なお、この場合、ＣＰＵ１１は、ピッチが大幅に上方に変化したかどうかも参照してもよい。ファルセットは地声と同一のピッチを発生する場合でも用いられることもあるが、一般には地声では発声できない高音を発声するときに使われる技法だからである。したがって、音声データのピッチが所定音高以上の場合に限って「ファルセット」の検出をするように構成してもよい。また、男声と女声とでは一般にファルセットを用いる音高の領域が異なるので、音声データの音域や、音声データから検出されるフォルマントによって性別検出を行い、この結果を踏まえてファルセット検出の音高領域を設定してもよい。
また、ＣＰＵ１１は、スペクトル特性の変化の態様が短時間に多様に切り替わる区間を検出し、検出した部分を「こぶし」の歌唱技法が用いられている部分であると特定する。「こぶし」の場合は、短い区間において声色や発声方法を変えて唸るような味わいを付加する歌唱技法であるため、この技法が用いられている区間においてはスペクトル特性が多様に変化するからである。
以上のようにして、ＣＰＵ１１は、音声データから各歌唱技法が用いられている区間を検出し、検出した区間を示す時刻情報とその歌唱技法を示す識別情報とを対応付けて記憶部１４の歌唱技法データ記憶領域１４ｃに記憶する。この処理によって、生成された歌唱技法データが出力されることになる。 Further, the CPU 11 analyzes the temporal change pattern of the spectrum calculated from the audio data, detects a section where the spectral characteristics are abruptly changed to the predetermined change state, and detects the detected section as “ It is specified that the section uses the “Falset” singing technique. Here, the predetermined change state is a state in which the harmonic component of the spectrum characteristic is extremely reduced. For example, as shown in FIG. 5, in the case of a local voice, many harmonic components are included (refer to FIG. 5A), but when a falset is formed, the magnitude of the harmonic components becomes extremely small (FIG. b)). In this case, the CPU 11 may also refer to whether or not the pitch has changed significantly upward. The falset is sometimes used even when generating the same pitch as the local voice, but is generally a technique used when generating high-pitched sounds that cannot be generated by the local voice. Therefore, “Falset” may be detected only when the pitch of the audio data is equal to or higher than a predetermined pitch. In addition, since the pitch range using the falset is generally different between male voice and female voice, gender detection is performed based on the voice data range and formants detected from the voice data, and based on this result, the pitch range for falset detection is determined. It may be set.
In addition, the CPU 11 detects a section in which the mode of change of the spectrum characteristic is variously switched in a short time, and identifies the detected part as a part where the “fist” singing technique is used. In the case of “fist”, it is a singing technique that adds a taste that changes the voice color and utterance method in a short section, so the spectral characteristics change variously in the section where this technique is used. .
As described above, the CPU 11 detects the section in which each singing technique is used from the voice data, associates the time information indicating the detected section with the identification information indicating the singing technique, and sings in the storage unit 14. The data is stored in the technique data storage area 14c. By this processing, the generated singing technique data is output.

オーサリング装置１のＣＰＵ１１は、歌唱技法データの生成処理を終えると、伴奏データ記憶領域１４ａに記憶された伴奏データ、音声データ記憶領域１４ｂに記憶された音声データ、歌唱技法データ記憶領域１４ｃに記憶された歌唱技法データ、および歌詞データ記憶領域１４ｄに記憶された歌詞データを併せて、図４に示すような所定のフォーマットのコンテンツを生成し、記憶部１４に記憶する。なお、このとき、記憶部１４に記憶するに代えて、通信ネットワーク等を介して他の装置に生成したコンテンツを送信するようにしてもよい。要するに、ＣＰＵ１１が、音声データ、伴奏データおよび歌詞データの少なくともいずれか一つに歌唱技法データを付与してコンテンツとして出力するようにすればよい。 After finishing the singing technique data generation process, the CPU 11 of the authoring apparatus 1 stores the accompaniment data stored in the accompaniment data storage area 14a, the audio data stored in the audio data storage area 14b, and the singing technique data storage area 14c. The singing technique data and the lyrics data stored in the lyrics data storage area 14 d are combined to generate content in a predetermined format as shown in FIG. 4 and store it in the storage unit 14. At this time, instead of storing in the storage unit 14, the generated content may be transmitted to another device via a communication network or the like. In short, the CPU 11 may add singing technique data to at least one of audio data, accompaniment data, and lyrics data and output the content as content.

このように、本実施形態によれば、歌唱者の音声を示す音声データから、その歌唱者が用いた歌唱技法を自動的に抽出することができる。また、抽出した歌唱技法を示す歌唱技法データを含むコンテンツを作成することができる。このコンテンツを利用すれば、例えばカラオケ機器等において、歌詞表示と共に歌唱技法のポイントを表示させることができ、これにより、歌唱者が用いている歌唱技法をカラオケ機器の利用者に報知することができる。また、この歌唱技法データに基づいて、歌唱者の歌唱能力を評価するようにしてもよい。 Thus, according to this embodiment, the singing technique used by the singer can be automatically extracted from the voice data indicating the singer's voice. Moreover, the content containing the song technique data which show the extracted song technique can be created. If this content is used, the point of the singing technique can be displayed together with the lyric display in, for example, karaoke equipment, and thereby the singing technique used by the singer can be notified to the user of the karaoke equipment. . Moreover, you may make it evaluate a singer's singing ability based on this singing technique data.

ここで、本実施形態で生成するコンテンツを用いたカラオケオーサリングツールの具体例について以下に説明する。
図６は、本実施形態で生成するコンテンツを用いたカラオケオーサリングツール（カラオケツール）において表示される画面の一例を示す図である。図６に示す画面においては、図４に具体例として例示したコンテンツが用いられている。カラオケオーサリングツールの制御部は、供給されるコンテンツに基づいて図６に示す画面を表示させる。具体的には、カラオケオーサリングツールの制御部は、コンテンツに含まれる歌詞データに基づいて歌詞Ａ１を表示させ、また、コンテンツに含まれる音声データに基づいて、歌詞Ａ１に対応する音声データのピッチの時間的変化を示すグラフＡ２を表示させる。また、制御部は、コンテンツに含まれる伴奏データに基づいて、当該伴奏データのピッチの時間的変化を示すグラフＡ３を表示させる。また、制御部は、楽曲の進行に対応した時間軸Ａ４を表示するとともに、各歌唱技法毎のトラックＡ５１〜Ａ５５を特定（検出）し、各トラックにおいては、特定された歌唱技法区間に基づいて、そのトラックと対応する歌唱技法が用いられている区間に双方向矢印を表示させる。例えば、「ビブラート」のトラックＡ５４に表示させる双方向矢印Ａ５４１は、歌詞の「ち」と対応する区間に表示され、この区間で「ビブラート」技法が用いられていることを示している。 Here, the specific example of the karaoke authoring tool using the content produced | generated by this embodiment is demonstrated below.
FIG. 6 is a diagram illustrating an example of a screen displayed in a karaoke authoring tool (karaoke tool) using content generated in the present embodiment. In the screen shown in FIG. 6, the content illustrated as a specific example in FIG. 4 is used. The controller of the karaoke authoring tool displays the screen shown in FIG. 6 based on the supplied content. Specifically, the control unit of the karaoke authoring tool displays the lyrics A1 based on the lyrics data included in the content, and the pitch of the audio data corresponding to the lyrics A1 based on the audio data included in the content. A graph A2 showing a temporal change is displayed. Further, the control unit displays a graph A3 indicating a temporal change in the pitch of the accompaniment data based on the accompaniment data included in the content. In addition, the control unit displays a time axis A4 corresponding to the progress of the music and specifies (detects) the tracks A51 to A55 for each singing technique, and in each track, based on the specified singing technique section. The two-way arrow is displayed in the section where the singing technique corresponding to the track is used. For example, the two-way arrow A541 displayed on the track A54 of “Vibrato” is displayed in the section corresponding to the lyrics “Chi”, indicating that the “Vibrato” technique is used in this section.

このように、歌唱者が用いた歌唱技法が画面に表示されるから、システムの利用者つまりカラオケコンテンツ制作者は、その楽曲で用いられている歌唱技法（ビブラート、ため等）を視覚的に把握することができる。このようなコンテンツを利用したカラオケシステムを用いれば、利用者は、お手本となる歌唱者が用いている歌唱技法を視覚的に把握することができるから、利用者は、お手本の歌唱技法を真似て歌いやすくなる。
なお、本実施形態に係るコンテンツを用いるシステムは、上述したカラオケオーサリングツールに限定されるものではなく、カラオケシステム（例えば楽器カラオケシステム）や歌唱練習システム等においても好適に用いることが可能である。 In this way, since the singing technique used by the singer is displayed on the screen, the user of the system, that is, the karaoke content creator, visually grasps the singing technique (vibrato, trial, etc.) used in the song. can do. If a karaoke system using such content is used, the user can visually grasp the singing technique used by the model singer, so the user can imitate the model singing technique. It becomes easy to sing.
The system using content according to the present embodiment is not limited to the karaoke authoring tool described above, and can be suitably used in a karaoke system (for example, a musical instrument karaoke system) or a singing practice system.

＜Ｃ：変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその一例を示す。
（１）上述した実施形態においては、歌唱者の音声を表す音声データから歌唱技法が用いられている区間を検出して歌唱技法データを生成するようにしたが、本発明における音声データは、歌唱者の音声を表す音声データに限定されるものではなく、楽器の演奏を表す音声データにも適用することができる。この場合も、オーサリング装置のＣＰＵが、楽器の演奏データから演奏技法（例えば、ビブラート、息継ぎ、ため等）が用いられている区間を、上述した実施形態と同様の方法で検出して、検出した区間を示す演奏技法データを生成するようにすればよい。すなわち、音声データは、人の歌声を表す音声データであってもよく、楽器の演奏音を表す音声データであってもよい。また、技法データは、歌唱における技法を表す技法データであってもよく、または、楽器の演奏技法を表す技法データであってもよい。 <C: Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below.
(1) In the above-described embodiment, the singing technique data is generated by detecting the section in which the singing technique is used from the voice data representing the voice of the singer. However, the voice data in the present invention is singing. The present invention is not limited to voice data representing a person's voice, and can also be applied to voice data representing the performance of a musical instrument. Also in this case, the CPU of the authoring device detects and detects the section in which the performance technique (for example, vibrato, breath breathing, etc.) is used from the performance data of the musical instrument in the same manner as in the above-described embodiment. What is necessary is just to produce | generate the performance technique data which show an area. That is, the sound data may be sound data representing a human singing voice or sound data representing a performance sound of a musical instrument. The technique data may be technique data representing a technique in singing, or may be technique data representing a musical performance technique.

（２）上述した実施形態においては、音声データと対応する歌詞データを、作業者が操作部を用いて入力するようにしたが、これに代えて、オーサリング装置のＣＰＵが、音声データに対して音声認識処理を施して、音声データから歌詞データを自動的に生成するようにしてもよい。この場合は、ＣＰＵが、伴奏データと、音声データから抽出したピッチデータとを照合させて、どの歌詞がどのタイミングで歌われているかを自動的に割り付けるようにすればよい。 (2) In the above-described embodiment, the lyric data corresponding to the voice data is input by the operator using the operation unit, but instead, the CPU of the authoring apparatus performs the processing on the voice data. Voice recognition processing may be performed to automatically generate lyrics data from the voice data. In this case, the CPU may collate the accompaniment data with the pitch data extracted from the audio data, and automatically assign which lyrics are sung at which timing.

（３）上述した実施形態においては、オーサリング装置１がコンテンツの生成をすべて自動的に行うようにしたが、これに代えて、作業者が、操作部１７を操作してデータの入力や変更を行うようにしてもよい。具体的には、例えば、歌唱技法データ生成処理において、ＣＰＵ１１が、歌唱技法が用いられている区間を検出すると、その区間を示すグラフ等を表示部に表示させるようにしてもよい。この場合は、作業者は、表示部に表示された内容を確認することによって、歌唱技法が用いられている区間を確認できる。このとき、誤検出等を作業者が発見した場合には、作業者が操作部１７を操作してその区間の削除や修正を行うことも可能である。また、自動的に読み取れなかった区間を、作業者が操作部１７を操作して手入力で入力するようにしてもよい。
このようにコンテンツのデータの入力や変更を可能とすることによって、コンテンツのデータの精度を向上させることができる。 (3) In the above-described embodiment, the authoring device 1 automatically performs content generation. Instead, the operator operates the operation unit 17 to input or change data. You may make it perform. Specifically, for example, in the singing technique data generation process, when the CPU 11 detects a section in which the singing technique is used, a graph or the like indicating the section may be displayed on the display unit. In this case, the worker can confirm the section in which the singing technique is used by confirming the content displayed on the display unit. At this time, if the worker finds an erroneous detection or the like, the worker can operate the operation unit 17 to delete or correct the section. In addition, the operator may input the section that cannot be automatically read by manually operating the operation unit 17.
Thus, by enabling the input and change of content data, the accuracy of content data can be improved.

（４）上述した実施形態においては、記憶部１４の音声データ記憶領域１４ｂに記憶される音声データはＷＡＶＥ形式やＭＰ３形式のデータとしたが、データの形式はこれに限定されるものではなく、音声を示すデータであればどのような形式のデータであってもよい。
また、上述した実施形態においては、音声データを記憶部１４に記憶させて、オーサリング装置１のＣＰＵ１１が記憶部１４から音声データを読み出すようにしたが、これに代えて、通信ネットワークを介して音声データを受信するようにしてもよい。 (4) In the above-described embodiment, the audio data stored in the audio data storage area 14b of the storage unit 14 is data in WAVE format or MP3 format, but the data format is not limited to this. Any type of data may be used as long as it indicates data.
In the above-described embodiment, the audio data is stored in the storage unit 14 and the CPU 11 of the authoring apparatus 1 reads out the audio data from the storage unit 14, but instead, the audio data is transmitted via a communication network. Data may be received.

（５）上述した実施形態においては、歌唱技法として「ビブラート」や「ため」等を検出するようにしたが、検出する歌唱技法（または演奏技法）は上述した実施形態において示したものに限定されるものではなく、例えば、スタッカートやクレッシェンド（デクレッシェンド）等であってもよい。具体的には、音声データから検出したパワーがある短い一定期間の間だけ強く現れる区間をスタッカートとして検出するようにしてもよい。また、パワーデータ値が連続的に徐々に大きくなる（小さくなる）区間をクレッシェンド（デクレッシェンド）として検出するようにしてもよい。要するに、音声データから算出されたピッチやスペクトル、パワーの時間的な変化のパターンを解析して、この解析結果が予め定められたパターンに対応するか否かを判定し、対応する場合には当該パターンに対応する区間を特定の技法が用いられている区間として特定すればよい。 (5) In the above-described embodiment, “vibrato” or “for” is detected as a singing technique, but the singing technique (or performance technique) to be detected is limited to that shown in the above-described embodiment. For example, it may be a staccato or a crescendo (decrescendo). Specifically, a section in which the power detected from the audio data appears strongly only for a certain short period may be detected as a staccato. Further, a section in which the power data value gradually increases (decreases) continuously may be detected as crescendo (decrescendo). In short, the pattern of the temporal change in pitch, spectrum, and power calculated from the audio data is analyzed to determine whether or not this analysis result corresponds to a predetermined pattern. What is necessary is just to specify the area corresponding to a pattern as an area where a specific technique is used.

（６）上述した実施形態におけるオーサリング装置１のＣＰＵ１１によって実行されるプログラムは、磁気テープ、磁気ディスク、フロッピー（登録商標）ディスク、光記録媒体、光磁気記録媒体、ＣＤ（Compact Disk）−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、ＲＡＭなどの記録媒体に記憶した状態で提供し得る。また、インターネットのようなネットワーク経由でオーサリング装置１にダウンロードさせることも可能である。 (6) Programs executed by the CPU 11 of the authoring apparatus 1 in the above-described embodiment are a magnetic tape, a magnetic disk, a floppy (registered trademark) disk, an optical recording medium, a magneto-optical recording medium, a CD (Compact Disk) -ROM, It can be provided in a state stored in a recording medium such as a DVD (Digital Versatile Disk) or RAM. It is also possible for the authoring device 1 to download via a network such as the Internet.

オーサリング装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of an authoring apparatus. 歌唱技法データの内容の一例を示す図である。It is a figure which shows an example of the content of singing technique data. オーサリング装置のソフトウェア構造の一例を示すブロック図である。It is a block diagram which shows an example of the software structure of an authoring apparatus. コンテンツデータの内容の一例を示す図である。It is a figure which shows an example of the content of content data. ファルセットの検出処理を説明するための図である。It is a figure for demonstrating the detection process of a false set. カラオケシステムにおいて表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed in a karaoke system.

Explanation of symbols

１…オーサリング装置、１１…ＣＰＵ、１２…ＲＯＭ、１３…ＲＡＭ、１４…記憶部、１５…バス、１６…表示部、１７…操作部、１８…Ｄ／Ａ変換部、１９…スピーカ、１１１…音声データ分析部、１１２…照合検出部、１１３…歌唱技法データ作成部。 DESCRIPTION OF SYMBOLS 1 ... Authoring apparatus, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... Memory | storage part, 15 ... Bus, 16 ... Display part, 17 ... Operation part, 18 ... D / A conversion part, 19 ... Speaker, 111 ... Voice data analysis unit, 112 ... collation detection unit, 113 ... singing technique data creation unit.

Claims

Pitch calculating means for calculating the pitch of the voice from the voice data;
A pattern of temporal change in pitch calculated by the pitch calculating means is analyzed to determine whether or not the analysis result corresponds to a predetermined pattern, and in the case of corresponding, the pattern corresponds to the pattern. A specifying means for specifying a section as a section in which a specific technique is used;
An authoring apparatus comprising: output means for outputting technique data indicating a section specified by the specifying means.

Spectrum calculating means for calculating the spectrum of the voice data from the voice data;
Analyze the temporal change pattern of the spectrum calculated by the spectrum calculating means, determine whether or not the analysis result corresponds to a predetermined pattern, and if it corresponds, correspond to the pattern A specifying means for specifying a section as a section in which a specific technique is used;
An authoring apparatus comprising: output means for outputting technique data indicating a section specified by the specifying means.

Power calculating means for calculating the power of the audio data from the audio data;
Analyzing the temporal change pattern of the power calculated by the power calculating means to determine whether or not the analysis result corresponds to a predetermined pattern. A specifying means for specifying a section as a section in which a specific technique is used;
An authoring apparatus comprising: output means for outputting technique data indicating a section specified by the specifying means.

The specifying unit analyzes a pattern of temporal change of the pitch calculated by the pitch calculating unit, and specifies a section where the pitch continuously fluctuates within a predetermined range above and below the central frequency. The authoring apparatus according to claim 1, wherein:

The said specifying means analyzes the pattern of the time change of the pitch calculated by the said pitch calculation means, and specifies the area which changes continuously from a low pitch to a high pitch. The authoring device described.

The said specifying means analyzes the pattern of the time change of the spectrum calculated by the said spectrum calculation means, and specifies the area where the aspect of the change of a spectrum characteristic changes variously in a short time. The authoring device according to 2.

The specifying means analyzes the temporal change pattern of the spectrum calculated by the spectrum calculating means, and specifies a section in which the spectral characteristics are rapidly transitioning to a predetermined change state. The authoring device according to claim 2.

Pitch calculation means for calculating the pitch of the voice from the voice data is provided, and the specifying means is adapted to provide a temporal analysis of the spectrum calculated by the spectrum calculation means when the pitch calculated by the pitch calculation means is in a predetermined region. The authoring device according to claim 2, wherein the change pattern is analyzed to identify a section in which the spectrum characteristic is rapidly changed to a predetermined change state.

Analyzing the accompaniment data and the audio data in a predetermined frame unit, and comprising correspondence detection means for detecting the temporal correspondence between the two,
The specifying means is based on the correspondence detected by the correspondence detection means and the pitch calculated by the pitch calculation means, and the start time of the sound included in the audio data and the accompaniment corresponding to the sound. The authoring device according to claim 1, wherein a section having a different start time of data sound is specified.

Analyzing the accompaniment data and the audio data in a predetermined frame unit, and comprising correspondence detection means for detecting the temporal correspondence between the two,
The specifying means is a section where the accompaniment data is sound and the power value of the audio data is based on the power calculated by the power calculating means and the correspondence detected by the correspondence detecting means. The authoring device according to claim 3, wherein a section smaller than the threshold is specified.

An authoring method for an authoring device comprising a control means,
The control means calculating a pitch of voice from voice data;
The control means analyzes a pattern of temporal change of the calculated pitch, determines whether or not the analysis result corresponds to a predetermined pattern, and corresponds to the pattern when corresponding. Identifying an interval as an interval in which a particular technique is used;
The authoring method comprising: a step of outputting the technique data indicating the specified section.

An authoring method for an authoring device comprising a control means,
The control means calculating a spectrum of the audio data from the audio data;
The control unit analyzes the calculated temporal change pattern of the spectrum, determines whether or not the analysis result corresponds to a predetermined pattern, and corresponds to the pattern when corresponding. Identifying an interval as an interval in which a particular technique is used;
The authoring method comprising: a step of outputting the technique data indicating the specified section.

An authoring method for an authoring device comprising a control means,
The control means calculating power of the audio data from the audio data;
The control means analyzes a pattern of a temporal change in the calculated power, determines whether or not the analysis result corresponds to a predetermined pattern, and corresponds to the pattern when corresponding. Identifying an interval as an interval in which a particular technique is used;
The authoring method comprising: a step of outputting the technique data indicating the specified section.

On the computer,
A pitch calculation function for calculating the pitch of the voice from the voice data;
A pattern of temporal change in pitch calculated by the pitch calculation function is analyzed, and it is determined whether or not the analysis result corresponds to a predetermined pattern. A specific function that identifies a section as a section where a specific technique is used;
An output function for outputting technique data indicating a section specified by the specifying function.

On the computer,
A spectrum calculation function for calculating the spectrum of the audio data from the audio data;
Analyze the temporal change pattern of the spectrum calculated by the spectrum calculation function, determine whether or not the analysis result corresponds to a predetermined pattern, and if it corresponds, correspond to the pattern A specific function that identifies a section as a section where a specific technique is used;
An output function for outputting technique data indicating a section specified by the specifying function.

On the computer,
A power calculation function for calculating the power of the audio data from the audio data;
Analyzing the temporal change pattern of the power calculated by the power calculation function, it is determined whether or not the analysis result corresponds to a predetermined pattern, and if it corresponds, corresponds to the pattern A specific function that identifies a section as a section where a specific technique is used;
An output function for outputting technique data indicating a section specified by the specifying function.