JP5591428B2

JP5591428B2 - Automatic recording device

Info

Publication number: JP5591428B2
Application number: JP2014519697A
Authority: JP
Inventors: 裕生山下; 知弘岩崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2012-06-04
Filing date: 2012-06-04
Publication date: 2014-09-17
Anticipated expiration: 2032-06-04
Also published as: CN104350545A; WO2013183078A1; JPWO2013183078A1; CN104350545B

Description

この発明は、放送データを音声認識して得られた認識結果から情報を自動的に抽出して記録する自動記録装置に関するものである。 The present invention relates to an automatic recording apparatus that automatically extracts information from a recognition result obtained by voice recognition of broadcast data and records the information.

例えば特許文献１には、放送局より放送される放送データを分析して楽曲等のコンテンツデータと会話とに分類して抽出し、抽出したコンテンツデータを数値化して、その数値化されたコンテンツデータを外部機器に送信して照合し、そのコンテンツデータに対応するアーティスト名等の識別データを受信して、その受信した識別データを抽出したコンテンツデータに対応付けて保存するデータ処理装置が開示されている。 For example, in Patent Document 1, broadcast data broadcast from a broadcasting station is analyzed, extracted by classifying content data such as music and conversation, and the extracted content data is digitized. A data processing apparatus is disclosed for transmitting and collating an ID to an external device, receiving identification data such as an artist name corresponding to the content data, and storing the received identification data in association with the extracted content data Yes.

特開２００８−２７５７３号公報JP 2008-27573 A

しかしながら、例えば特許文献１のような従来のデータ処理装置は、コンテンツデータの識別を行うために、録音したコンテンツデータの特徴量を外部機器へ送信して識別データを受信する必要があり、外部機器との通信が確立しない場合にはデータ処理を行うことができない、という課題があった。また、新曲などの新規のコンテンツに対応するためには外部機器が持つデータベースを更新する必要があり、また、多くのコンテンツを識別可能とするためには、外部機器が持つコンテンツのデータ数を増やさなければいけない、という課題もあった。 However, for example, a conventional data processing apparatus such as Patent Document 1 needs to transmit a feature amount of recorded content data to an external device and receive the identification data in order to identify the content data. There has been a problem that data processing cannot be performed if communication with is not established. In order to support new content such as new songs, it is necessary to update the database held by the external device. To make it possible to identify a large amount of content, the number of content data held by the external device is increased. There was also a problem that it was necessary.

この発明は、上記のような課題を解決するためになされたものであり、放送データから抽出されたコンテンツの情報を外部機器へ送信、受信することなく、そのコンテンツの識別データを取得し、当該識別データをコンテンツと対応付けて自動的に記録することのできる自動記録装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and acquires the identification data of the content without transmitting or receiving the information of the content extracted from the broadcast data to an external device. It is an object of the present invention to provide an automatic recording apparatus capable of automatically recording identification data in association with content.

上記目的を達成するため、この発明の自動記録装置は、放送データから、コンテンツおよび当該コンテンツの識別データを含む音声を検知して取得する音声取得部と、前記コンテンツを紹介する際の文言を記憶する定型文記憶部と、前記音声取得部により取得された音声データを認識するとともに、当該認識結果と前記定型文記憶部に記憶されている文言とに基づいて、前記コンテンツの識別データを抽出して出力する音声認識部と、前記音声認識部から前記コンテンツの識別データを受け取った場合に、前記コンテンツの開始時点および終了時点を検知するよう指示する制御部と、前記制御部からの指示にしたがって、前記音声取得部により取得された音声データから前記コンテンツの開始時点および終了時点を検知するコンテンツ区間検出部と、前記コンテンツ区間検出部により検出された開始時点と終了時点の間のコンテンツ区間におけるコンテンツを記録する映像音声記録部と、少なくとも前記映像音声記録部により記録されたコンテンツと、前記コンテンツの識別データとを記憶する情報記憶部とを備え、前記制御部は、前記コンテンツの識別データを前記映像音声記録部により記録されたコンテンツと対応付けて前記情報記憶部に保存することを特徴とする。 In order to achieve the above object, an automatic recording apparatus according to the present invention stores an audio acquisition unit that detects and acquires audio including broadcast content and content identification data, and language used when introducing the content. And recognizing the voice data acquired by the voice acquisition unit, and extracting the identification data of the content based on the recognition result and the words stored in the fixed sentence storage unit A voice recognition unit that outputs the content, a control unit that instructs to detect a start time and an end time of the content when the identification data of the content is received from the voice recognition unit, and an instruction from the control unit , Content section detection for detecting the start time and end time of the content from the audio data acquired by the audio acquisition unit. A content, a video / audio recording unit that records content in a content interval between a start time point and an end time point detected by the content interval detection unit, content recorded by at least the video / audio recording unit, and identification of the content An information storage unit that stores data, and the control unit stores the identification data of the content in the information storage unit in association with the content recorded by the video / audio recording unit.

この発明の自動記録装置によれば、放送データを音声認識して得られた認識結果から、楽曲などのコンテンツに対応する曲名やアーティスト名等の識別データを抽出することにより、コンテンツの情報を外部機器へ送信、受信することなく、そのコンテンツの識別データを受け取り、当該識別データをコンテンツと対応付けて自動的に記録することができる。 According to the automatic recording device of the present invention, content information is externally extracted by extracting identification data such as a song name and an artist name corresponding to content such as music from a recognition result obtained by voice recognition of broadcast data. The identification data of the content can be received without being transmitted or received to the device, and the identification data can be automatically recorded in association with the content.

実施の形態１による自動記録装置の一例を示すブロック図である。1 is a block diagram illustrating an example of an automatic recording apparatus according to Embodiment 1. FIG. 定型文記憶部に記憶されている曲紹介文言の一例を示す図である。It is a figure which shows an example of the music introduction word memorize | stored in the fixed phrase memory | storage part. 情報記憶部に記憶されている曲名・アーティスト名および楽曲が対応付けられているデータの一例を示す図である。It is a figure which shows an example of the data with which the music name and artist name, and music which are memorize | stored in the information storage part are matched. 実施の形態１における自動記録装置の動作を示すフローチャートである。3 is a flowchart illustrating an operation of the automatic recording apparatus according to the first embodiment. 実施の形態２による自動記録装置の一例を示すブロック図である。FIG. 6 is a block diagram illustrating an example of an automatic recording apparatus according to a second embodiment. 情報記憶部に記憶されている曲名・アーティスト名、楽曲および取得回数が対応付けられている情報の一例を示す図である。It is a figure which shows an example of the information with which the music title and artist name, the music, and acquisition frequency which were memorize | stored in the information storage part were matched. 実施の形態２における自動記録装置の動作を示すフローチャートである。10 is a flowchart showing the operation of the automatic recording apparatus in the second embodiment. 実施の形態３における自動記録装置の動作を示すフローチャートである。10 is a flowchart illustrating an operation of the automatic recording apparatus according to the third embodiment. 実施の形態４による自動記録装置の一例を示すブロック図である。FIG. 10 is a block diagram illustrating an example of an automatic recording apparatus according to a fourth embodiment. 実施の形態４における自動記録装置の動作を示すフローチャートである。10 is a flowchart illustrating an operation of the automatic recording apparatus according to the fourth embodiment. 実施の形態５による自動記録装置の一例を示すブロック図である。FIG. 10 is a block diagram illustrating an example of an automatic recording apparatus according to a fifth embodiment. 実施の形態５における自動記録装置の動作を示すフローチャートである。10 is a flowchart illustrating an operation of the automatic recording apparatus according to the fifth embodiment. 実施の形態６による自動記録装置の一例を示すブロック図である。FIG. 10 is a block diagram illustrating an example of an automatic recording apparatus according to a sixth embodiment. 実施の形態６による自動記録装置の別の一例を示すブロック図である。FIG. 20 is a block diagram showing another example of an automatic recording apparatus according to Embodiment 6. 実施の形態６における自動記録装置の動作を示すフローチャートである。18 is a flowchart showing the operation of the automatic recording apparatus in the sixth embodiment.

以下、この発明の実施の形態について、図面を参照しながら詳細に説明する。
実施の形態１．
図１は、この発明の実施の形態１による自動記録装置の一例を示すブロック図である。この実施の形態では、ラジオ、テレビなどで放送される放送データから、コンテンツおよび当該コンテンツの識別データを音声取得、音声認識して記録する自動記録装置として、音楽コンテンツ（楽曲）とそのコンテンツ（楽曲）の識別データである曲名とアーティスト名を対応付けて保存する場合を例として説明する。なお、以下の実施の形態においても同様とする。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing an example of an automatic recording apparatus according to Embodiment 1 of the present invention. In this embodiment, music content (music) and its content (music) are recorded as an automatic recording device that acquires and recognizes the content and identification data of the content from broadcast data broadcast on a radio, television, or the like. A case where the song name and the artist name, which are identification data), are stored in association with each other will be described as an example. The same applies to the following embodiments.

この自動記録装置は、音声取得部１、音声認識部２、定型文記憶部３、制御部４、情報記憶部５、コンテンツ区間検出部６、映像音声記録部７を備えている。また、この実施の形態１では図示は省略したが、この自動記録装置は、キーやタッチパネル等による入力信号を取得する入力部８や、データを表示または音声により出力する出力部９も備えている（後述する実施の形態４における図９参照）。 The automatic recording apparatus includes a voice acquisition unit 1, a voice recognition unit 2, a fixed sentence storage unit 3, a control unit 4, an information storage unit 5, a content section detection unit 6, and a video / audio recording unit 7. Although not shown in the first embodiment, the automatic recording apparatus also includes an input unit 8 that acquires an input signal from a key, a touch panel, or the like, and an output unit 9 that displays data or outputs it by voice. (See FIG. 9 in Embodiment 4 described later).

そして、この自動記録装置は、ラジオやテレビなどのオーディオ機器から出力される放送データから音声を取得して認識し、当該認識した結果から、放送されている楽曲（コンテンツ）の名称（曲名）やアーティストの名称（アーティスト名）などの識別データを抽出し、楽曲（コンテンツ）に対応付けて曲名やアーティスト名などの識別データを自動的に情報記憶部に記録するものである。 The automatic recording device acquires and recognizes sound from broadcast data output from an audio device such as a radio or a television, and based on the recognized result, the name (song name) of the song (content) being broadcast or Identification data such as an artist name (artist name) is extracted, and identification data such as a song name and artist name is automatically recorded in the information storage unit in association with the music (content).

音声取得部１は、放送データから、コンテンツおよび当該コンテンツの識別データを含む音声を検知して取得する。この際、オーディオ機器から出力される音声をライン入力などで取得する。アナログで取得した場合はＡ／Ｄ変換して、例えばＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）形式などのデジタル形式に変換して取得する。 The audio acquisition unit 1 detects and acquires audio including content and identification data of the content from broadcast data. At this time, the sound output from the audio device is acquired by line input or the like. When it is obtained in analog form, it is A / D converted and converted into a digital format such as a PCM (Pulse Code Modulation) format.

音声認識部２は、認識辞書（図示せず）を有し、音声取得部１により取得された音声データを認識する。具体的には、搭乗者発話等の内容に該当する音声区間を検出し、当該音声区間の音声データの特徴量を抽出し、その特徴量に基づいて認識辞書を用いて認識処理を行い、音声認識結果を文字列で出力する。なお、認識処理としては、例えばＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）法のような一般的な方法を用いて行えばよいため、ここでは説明を省略する。また、音声認識部２は、後述のようにネットワーク上のサーバにあるものとしてもよい。 The voice recognition unit 2 has a recognition dictionary (not shown) and recognizes the voice data acquired by the voice acquisition unit 1. Specifically, a voice section corresponding to the content of the passenger utterance, etc. is detected, a feature amount of the voice data of the voice section is extracted, a recognition process is performed using a recognition dictionary based on the feature amount, and voice The recognition result is output as a character string. Note that the recognition process may be performed using a general method such as an HMM (Hidden Markov Model) method, and thus description thereof is omitted here. The voice recognition unit 2 may be in a server on the network as will be described later.

ここで利用する音声認識は予め認識辞書で登録した認識語彙を認識する構文型の音声認識と、「あ」「い」「う」「え」「お」などの一文字の音節を連続で認識することにより、任意の文字列を認識可能とする大語彙連続音声認識の両方を併用する。なお、認識全てを大語彙連続認識で行い、認識結果を形態素解析する方法をとってもよい。形態素解析については例えばＨＭＭ法のような一般的な方法を用いて行えばよいため、ここでは説明を省略する。 The speech recognition used here is syntactic speech recognition that recognizes recognition vocabulary registered in the recognition dictionary in advance, and continuous recognition of single-character syllables such as “A”, “I”, “U”, “E”, “O”. Thus, both large vocabulary continuous speech recognition enabling recognition of an arbitrary character string are used in combination. A method of performing all recognition by large vocabulary continuous recognition and performing a morphological analysis on the recognition result may be used. The morphological analysis may be performed by using a general method such as the HMM method, and thus description thereof is omitted here.

定型文記憶部３は、楽曲（コンテンツ）を紹介する際の文言として、例えば図２に示すように「次の曲は＜アーティスト名＞の＜曲名＞です」、「お聴きいただいたのは＜アーティスト名＞の＜曲名＞です」のようなディスクジョッキーやプレゼンテーターなどが曲を紹介する時によく使われる文言を記憶している。以下、これを曲紹介文言と呼ぶ。 For example, as shown in FIG. 2, the fixed sentence storage unit 3 introduces a song (content) as follows: “The next song is <song name> of <artist name>”, “ Disc jockeys and presenters such as <Song Name> of Artist Name> remember words often used when introducing songs. Hereinafter, this is referred to as a song introduction wording.

そして前述の音声認識部２は、音声取得部１により取得された音声データを認識するとともに、定型文記憶部３を参照して、すなわち、音声データを認識した認識結果と定型文記憶部３に記憶されている文言とに基づいて、楽曲（コンテンツ）の曲名、アーティスト名など（識別データ）を抽出して出力する。具体的な抽出方法としては、定型文記憶部３に記憶されている曲紹介文言については＜アーティスト名＞と＜曲名＞の部分を大語彙連続認識で認識して抽出し、それ以外の部分を構文型音声認識で認識する。 The voice recognition unit 2 recognizes the voice data acquired by the voice acquisition unit 1 and refers to the fixed sentence storage unit 3, that is, the recognition result obtained by recognizing the voice data and the fixed sentence storage unit 3. Based on the stored wording, the song name, artist name, etc. (identification data) of the song (content) are extracted and output. As a specific extraction method, for the song introduction words stored in the fixed phrase storage unit 3, the <artist name> and <song name> portions are recognized and extracted by large vocabulary continuous recognition, and the other portions are extracted. Recognize by syntactic speech recognition.

制御部４は、音声認識部２により出力された認識結果である曲名、アーティスト名など（識別データ）の文字列を入力とし、その楽曲（コンテンツ）の曲名、アーティスト名など（識別データ）を受け取った場合に、後述するコンテンツ区間検出部６へ動作開始の命令を出力、すなわち、楽曲（コンテンツ）の開始時点および終了時点を検知するよう指示を行う。 The control unit 4 receives the character string of the song name, artist name, etc. (identification data), which is the recognition result output by the voice recognition unit 2, and receives the song name, artist name, etc. (identification data) of the song (content). In this case, an operation start command is output to the content section detection unit 6 described later, that is, an instruction is given to detect the start time and end time of the music (content).

情報記憶部５は、例えば図３に示すように、少なくとも楽曲（コンテンツ）と、その楽曲（コンテンツ）のアーティスト名や曲名（識別データ）とを記憶している。なお、この図３に示すように、アーティスト名、曲名（識別データ）を楽曲（コンテンツ）に対応付けて保存するとともに、その楽曲（コンテンツ）を取得（録音）した取得日時なども関連付けて保存するようにしてもよい。また、図３（ａ）に示すように、曲名ごとのデータとしてもよいし、図３（ｂ）に示すように、アーティストごとにまとめたデータであってもよい。なお、情報記憶部５は、ハードディスクでもよいし、ＳＤカードなどであっても構わない。 As shown in FIG. 3, for example, the information storage unit 5 stores at least a song (content), and an artist name and song name (identification data) of the song (content). As shown in FIG. 3, the artist name and song name (identification data) are stored in association with the song (content), and the acquisition date and time when the song (content) is acquired (recorded) is also stored in association with it. You may do it. Further, as shown in FIG. 3A, the data may be data for each music title, or may be data collected for each artist as shown in FIG. 3B. The information storage unit 5 may be a hard disk or an SD card.

コンテンツ区間検出部６は、制御部４からの指示にしたがって、音声取得部１により取得された音声データから楽曲（コンテンツ）の開始時点および終了時点を検知する。具体的には、音声取得部１から出力されるデジタル音声データを入力とし、入力されたデジタル音声データが持つ周波数の特徴量などを利用して、音声データの中の楽曲（コンテンツ）と会話（コンテンツ以外の部分）の境界区間を検知する。そして、楽曲の開始区間を検知すると、後述する映像音声記録部７に記録開始の命令を送り、楽曲の終了区間を検知すると、映像音声記録部７に記録終了の命令を送る。なお、開始区間や終了区間の検知には時間−周波数解析のような一般的な方法を用いて行えばよいため、ここでは説明を省略する。 The content section detection unit 6 detects the start time and end time of the music (content) from the audio data acquired by the audio acquisition unit 1 in accordance with an instruction from the control unit 4. Specifically, the digital audio data output from the audio acquisition unit 1 is input, and using the feature amount of the frequency of the input digital audio data, the music (content) in the audio data is conversed ( The boundary section of the part other than the content) is detected. When the start section of the music is detected, a recording start command is sent to the video / audio recording unit 7 described later, and when the end section of the music is detected, a recording end command is sent to the video / audio recording unit 7. In addition, since what is necessary is just to perform using the general method like a time-frequency analysis for the detection of a start area and an end area, description is abbreviate | omitted here.

映像音声記録部７は、コンテンツ区間検出部６の命令により、すなわち、コンテンツ区間検出部６により検出された開始時点と終了時点の間のコンテンツ区間における楽曲（コンテンツ）部分のみを記録し、情報記憶部５に保存する。
そして前述の制御部４は、音声認識部２から受け取った曲名とアーティスト名（識別データ）を映像音声記録部７により記録された楽曲（コンテンツ）と対応付けて情報記憶部５に保存する。The video / audio recording unit 7 records only the music (content) part in the content section between the start time point and the end time point detected by the content section detection unit 6, that is, the information section storage. Save in part 5.
The control unit 4 stores the song name and artist name (identification data) received from the voice recognition unit 2 in the information storage unit 5 in association with the song (content) recorded by the video / audio recording unit 7.

次に、図４に示すフローチャートを用いて、実施の形態１の自動記録装置の動作を説明する。
まず、音声取得部１は、オーディオ機器より入力された音声をライン入力で取得する（ステップＳＴ１１）。この時、オーディオ機器から入力された音声がアナログ形式の場合はＡ／Ｄ変換を行い、例えばＰＣＭ形式に変換してデジタルデータとして取得する。
次に、音声認識部２は、音声取得部１により取得された音声データを認識し、認識結果を文字列で出力する。この際、定型文記憶部３と比較した上で、大語彙連続音声認識を行うことにより、曲名およびアーティスト名を抽出する（ステップＳＴ１２）。Next, the operation of the automatic recording apparatus according to the first embodiment will be described with reference to the flowchart shown in FIG.
First, the voice acquisition unit 1 acquires voice input from an audio device by line input (step ST11). At this time, if the audio input from the audio device is in the analog format, A / D conversion is performed, for example, converted into the PCM format and acquired as digital data.
Next, the voice recognition unit 2 recognizes the voice data acquired by the voice acquisition unit 1 and outputs the recognition result as a character string. At this time, the song title and artist name are extracted by performing large vocabulary continuous speech recognition after comparing with the fixed phrase storage unit 3 (step ST12).

制御部４は、音声認識部２から曲名・アーティスト名を受け取ると、コンテンツ区間検出部６を動作させる指示を行う。コンテンツ区間検出部６は、音声取得部１により取得されたオーディオ音声に対し信号処理技術を用いて周波数などの特徴量を抽出し、楽曲部分の開始区間を検知して（ステップＳＴ１３）、映像音声記録部７に記録開始の命令を送る。
そして、映像音声記録部７は、コンテンツ区間検出部６からの命令を受けて、ステップＳＴ１３で検知された楽曲の開始位置から楽曲の記録を開始する（ステップＳＴ１４）。Upon receiving the song title / artist name from the voice recognition unit 2, the control unit 4 instructs the content section detection unit 6 to operate. The content section detection unit 6 extracts a feature amount such as a frequency from the audio sound acquired by the sound acquisition unit 1 using a signal processing technique, detects the start section of the music portion (step ST13), and performs video and audio. An instruction to start recording is sent to the recording unit 7.
Then, the video / audio recording unit 7 receives the instruction from the content section detection unit 6 and starts recording the music from the start position of the music detected in step ST13 (step ST14).

また、コンテンツ区間検出部６は、取得されたオーディオ音声に対し信号処理技術を用いて特徴量を抽出し、楽曲部分の終了区間を検知して（ステップＳＴ１５）、映像音声記録部７に記録終了の命令を送る。
そして、映像音声記録部７は、コンテンツ区間検出部６からの命令を受けて楽曲の記録を停止し（ステップＳＴ１６）、その録音された楽曲を情報記憶部５に保存する（ステップＳＴ１７）。Further, the content section detection unit 6 extracts a feature amount from the acquired audio sound by using a signal processing technique, detects the end section of the music part (step ST15), and ends the recording in the video / audio recording unit 7. Send instructions.
The video / audio recording unit 7 receives the command from the content section detection unit 6 and stops recording the music (step ST16), and stores the recorded music in the information storage unit 5 (step ST17).

最後に、制御部４は、ステップＳＴ１２で抽出されて音声認識部２から取得した曲名・アーティスト名を、ステップＳＴ１７で保存された楽曲と関連付けて、情報記憶部５に保存する（ステップＳＴ１８）。
この結果、例えば図３に示すような関連付けテーブルが保存される。Finally, the control unit 4 stores the song name / artist name extracted in step ST12 and acquired from the voice recognition unit 2 in the information storage unit 5 in association with the song stored in step ST17 (step ST18).
As a result, for example, an association table as shown in FIG. 3 is stored.

このようにして、ラジオやテレビなどの放送データだけに基づいて、大語彙連続音声認識を利用した音声認識を行うことにより、コンテンツの識別データを参照するための外部データベースが不要となり、当該外部データベースの作成、更新の手間も省くことができ、さらに、その外部データベースとの通信も不要となる。
また、識別データと、コンテンツ開始部分が抽出できたことを条件にコンテンツを記録するため、記憶媒体の容量を圧迫せずに曲部分のみを効率的に保存することができる。In this way, by performing speech recognition using large vocabulary continuous speech recognition based only on broadcast data such as radio and television, an external database for referring to content identification data becomes unnecessary, and the external database This eliminates the need to create and update the database, and further eliminates the need for communication with the external database.
Further, since the content is recorded on the condition that the identification data and the content start portion can be extracted, it is possible to efficiently save only the music portion without reducing the capacity of the storage medium.

以上のように、この実施の形態１によれば、放送データを音声認識して得られた認識結果から、楽曲などのコンテンツに対応する曲名やアーティスト名等の識別データを抽出することにより、コンテンツの情報を外部機器へ送信、受信することなく、そのコンテンツの識別データを受け取り、当該識別データをコンテンツと対応付けて自動的に記録することができる。 As described above, according to the first embodiment, by extracting identification data such as a song name and an artist name corresponding to a content such as a song from the recognition result obtained by voice recognition of the broadcast data, the content can be obtained. The identification data of the content can be received without being transmitted or received to an external device, and the identification data can be automatically recorded in association with the content.

実施の形態２．
図５は、この発明の実施の形態２による自動記録装置の一例を示すブロック図である。なお、実施の形態１で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。以下に示す実施の形態２では、実施の形態１と比べると、制御部４が情報記憶部５に保存されている情報を参照することにより、ユーザの嗜好にあったコンテンツのみを記録するものである。Embodiment 2. FIG.
FIG. 5 is a block diagram showing an example of an automatic recording apparatus according to Embodiment 2 of the present invention. In addition, the same code | symbol is attached | subjected to the structure similar to what was demonstrated in Embodiment 1, and the overlapping description is abbreviate | omitted. In the second embodiment shown below, as compared with the first embodiment, the control unit 4 refers to the information stored in the information storage unit 5 to record only the content that meets the user's preference. is there.

情報記憶部５には、例えば図６に示すような形式で、音声認識部２から出力されたアーティスト名、曲名（識別データ）が楽曲（コンテンツ）に対応付けて保存されているだけではなく、それぞれの楽曲（コンテンツ）やそのアーティストの楽曲（コンテンツ）が取得された回数を含むデータが保存されており、この情報記憶部５に記憶されているデータは制御部４より参照可能とする。 In the information storage unit 5, not only is the artist name and song name (identification data) output from the voice recognition unit 2 stored in association with the song (content), for example, in the format shown in FIG. Data including the number of times each piece of music (content) or the artist's music (content) is acquired is stored, and the data stored in the information storage unit 5 can be referred to by the control unit 4.

そして制御部４は、音声認識部２から出力された曲名、アーティスト名など（識別データ）の文字列を入力とし、当該曲名とアーティスト名（識別データ）を情報記憶部５に記録するとともに、情報記憶部５に記憶されている当該データ（取得回数を含む当該コンテンツに関する情報）を参照することにより、そのコンテンツを取得した回数が所定の回数以上である場合にのみ、コンテンツ区間検出部６へ動作開始の命令を出力する。 Then, the control unit 4 receives the character string of the song name, artist name, etc. (identification data) output from the voice recognition unit 2, records the song name and artist name (identification data) in the information storage unit 5, and By referring to the data (information regarding the content including the number of acquisitions) stored in the storage unit 5, the content section detection unit 6 operates only when the number of times the content is acquired is equal to or greater than a predetermined number. Output the start command.

次に、図７に示すフローチャートを用いて実施の形態２における自動記録装置の動作を説明する。
まず、音声取得部１は、オーディオ機器より入力された音声をライン入力で取得する（ステップＳＴ２１）。この時、オーディオ機器から入力された音声がアナログ形式の場合はＡ／Ｄ変換を行い、例えばＰＣＭ形式に変換してデジタルデータとして取得する。
次に、音声認識部２は、音声取得部１により取得された音声データを認識し、認識結果を文字列で出力する。この際、定型文記憶部３と比較した上で、大語彙連続音声認識を行うことにより、曲名およびアーティスト名を抽出する（ステップＳＴ２２）。Next, the operation of the automatic recording apparatus according to the second embodiment will be described using the flowchart shown in FIG.
First, the voice acquisition unit 1 acquires voice input from an audio device by line input (step ST21). At this time, if the audio input from the audio device is in the analog format, A / D conversion is performed, for example, converted into the PCM format and acquired as digital data.
Next, the voice recognition unit 2 recognizes the voice data acquired by the voice acquisition unit 1 and outputs the recognition result as a character string. At this time, the song title and artist name are extracted by performing large vocabulary continuous speech recognition after comparing with the fixed phrase storage unit 3 (step ST22).

制御部４は、音声認識部２から曲名・アーティスト名を取得すると、取得した曲名・アーティスト名について情報記憶部５に記憶されているデータを参照し、当該曲名・アーティスト名のコンテンツを取得した回数が所定の回数以上である場合（ステップＳＴ２３のＹＥＳの場合）には、コンテンツ区間検出部６を動作させ、ステップＳＴ２４〜ＳＴ２９の処理を行う。
なお、ステップＳＴ２４〜ＳＴ２９の処理については、実施の形態１における図４に示したステップＳＴ１３〜ＳＴ１８の処理と同一であるため、説明を省略する。When the control unit 4 acquires the song name / artist name from the voice recognition unit 2, the control unit 4 refers to the data stored in the information storage unit 5 for the acquired song name / artist name, and the number of times the content of the song name / artist name is acquired. Is equal to or greater than the predetermined number of times (in the case of YES in step ST23), the content section detection unit 6 is operated, and the processes in steps ST24 to ST29 are performed.
In addition, about the process of step ST24-ST29, since it is the same as the process of step ST13-ST18 shown in FIG. 4 in Embodiment 1, description is abbreviate | omitted.

一方、ステップＳＴ２３において、ステップＳＴ２２で抽出された曲名・アーティスト名の楽曲の取得回数が所定の回数未満である場合（ステップＳＴ２３のＮＯの場合）には、制御部４は音声認識部２から出力された曲名・アーティスト名を、その取得回数を１回追加して情報記憶部５に保存する（ステップＳＴ３０）。 On the other hand, in step ST23, when the number of acquisitions of the song name / artist name extracted in step ST22 is less than a predetermined number (in the case of NO in step ST23), the control unit 4 outputs from the voice recognition unit 2. The obtained song name / artist name is added to the obtained number of times and stored in the information storage unit 5 (step ST30).

このように、既に所定の回数以上取得している曲名・アーティスト名の楽曲、すなわち、ユーザの嗜好に合ったコンテンツのみを記録することができるので、記憶媒体の容量を圧迫せずに曲部分のみを効率的に記録することができる。 As described above, since only a song title / artist name song that has been acquired more than a predetermined number of times, that is, only a content that meets the user's preference, can be recorded, only the song portion is stored without reducing the capacity of the storage medium. Can be recorded efficiently.

以上のように、この実施の形態２によれば、実施の形態１における効果に加え、ユーザの嗜好に合ったコンテンツのみを記録することができるので、記憶媒体の容量を圧迫せずに曲部分のみを効率的に記録することができる。 As described above, according to the second embodiment, in addition to the effects in the first embodiment, only content that meets the user's preference can be recorded. Only can be recorded efficiently.

実施の形態３．
この発明の実施の形態３による自動記録装置の一例を示すブロック図は、実施の形態２の図５に示したブロック図と同じであるため、図示および説明を省略する。そして、以下に示す実施の形態３では、実施の形態２と比べると、楽曲（コンテンツ）の区間検出開始の命令を行うかどうかを、その楽曲（コンテンツ）がユーザの嗜好に合ったものであるか否かによるのではなく、音声認識の尤度によって決定するものである。
なお、この実施の形態３では、音声認識部２が認識結果を制御部４に出力する際に、その認識結果とともにその認識の尤度も出力する。Embodiment 3 FIG.
The block diagram showing an example of the automatic recording apparatus according to the third embodiment of the present invention is the same as the block diagram shown in FIG. And in Embodiment 3 shown below, compared with Embodiment 2, whether the music (content) suits a user preference whether the instruction | indication of a section detection start of a music (content) is performed. It is not determined by whether or not it is determined by the likelihood of speech recognition.
In the third embodiment, when the speech recognition unit 2 outputs the recognition result to the control unit 4, the recognition likelihood is also output together with the recognition result.

次に、図８に示すフローチャートを用いて実施の形態３における自動記録装置の動作を説明する。
まず、音声取得部１は、オーディオ機器より入力された音声をライン入力で取得する（ステップＳＴ３１）。この時、オーディオ機器から入力された音声がアナログ形式の場合はＡ／Ｄ変換を行い、例えばＰＣＭ形式に変換してデジタルデータとして取得する。
次に、音声認識部２は、音声取得部１により取得された音声データを認識し、認識結果を文字列で出力する。この際、定型文記憶部３と比較した上で、大語彙連続音声認識を行うことにより、曲名およびアーティスト名を抽出する（ステップＳＴ３２）。Next, the operation of the automatic recording apparatus according to the third embodiment will be described using the flowchart shown in FIG.
First, the voice acquisition unit 1 acquires voice input from an audio device by line input (step ST31). At this time, if the audio input from the audio device is in the analog format, A / D conversion is performed, for example, converted into the PCM format and acquired as digital data.
Next, the voice recognition unit 2 recognizes the voice data acquired by the voice acquisition unit 1 and outputs the recognition result as a character string. At this time, the song title and artist name are extracted by performing large vocabulary continuous speech recognition after comparing with the fixed phrase storage unit 3 (step ST32).

また、音声認識部２により認識結果が出力される際に、音声認識部２において認識された音声の確からしさ（もっともらしさ）を示す尤度もともに出力され、制御部４は、その認識の尤度も同時に取得し、当該認識の尤度が所定の値以上である場合（ステップＳＴ３３のＹＥＳの場合）にのみ、コンテンツ区間検出部６を動作させ、ステップＳＴ３４〜ＳＴ３９の処理を行う。
なお、ステップＳＴ３４〜ＳＴ３９の処理については、実施の形態１における図４に示したステップＳＴ１３〜ＳＴ１８の処理と同一であるため、説明を省略する。In addition, when the recognition result is output by the speech recognition unit 2, the likelihood indicating the likelihood (reliability) of the speech recognized by the speech recognition unit 2 is also output, and the control unit 4 is able to recognize the likelihood of the recognition. The content section detection unit 6 is operated only when the likelihood of the recognition is equal to or greater than a predetermined value (YES in step ST33), and the processes in steps ST34 to ST39 are performed.
In addition, about the process of step ST34-ST39, since it is the same as the process of step ST13-ST18 shown in FIG. 4 in Embodiment 1, description is abbreviate | omitted.

一方、ステップＳＴ３３において、音声認識の尤度が所定の値未満である場合（ステップＳＴ３３のＮＯの場合）には、そのまま処理を終了する。 On the other hand, in step ST33, when the likelihood of speech recognition is less than a predetermined value (NO in step ST33), the process is terminated as it is.

ここで、尤度の具体例について説明する。例えば、大語彙連続音声認識において、認識された音声一音ずつの確からしさ（もっともらしさ）は、放送データから聞こえてくる司会者などの滑舌の良さや雑音の少なさによって高くなるものであり、通常は６０〜７０％以上の尤度であれば、その音（文字）が出力されたと判断される。そこで、ステップＳＴ３３における所定の値として、例えば８０％を設定しておくことにより、正しく音声認識された場合にのみステップＳＴ３４以降の処理に進むようにする。 Here, a specific example of likelihood will be described. For example, in large vocabulary continuous speech recognition, the probability (reliability) of each recognized speech is increased by the goodness of the tongue and the noise from the moderators heard from the broadcast data. Usually, if the likelihood is 60 to 70% or more, it is determined that the sound (character) is output. Therefore, for example, 80% is set as the predetermined value in step ST33 so that the process proceeds to step ST34 and subsequent steps only when the voice is correctly recognized.

また、例えば、定型文記憶部３に記憶されている曲紹介文言（図２）と比較した構文型音声認識において、一致する文言が何％あるかによって、認識された音声が曲紹介であるか否かという尤度を算出するようにしてもよい。この場合にも、ステップＳＴ３３における所定の値としては、例えば８０％と設定しておくことにより、曲紹介の構文が正しく音声認識された場合にのみステップＳＴ３４以降の処理に進むようにする。 Also, for example, in syntactic speech recognition compared with the song introduction text (FIG. 2) stored in the fixed phrase storage unit 3, whether the recognized voice is a song introduction depending on what percentage of matching words are present. You may make it calculate the likelihood of being no. Also in this case, the predetermined value in step ST33 is set to 80%, for example, so that the process proceeds to step ST34 and subsequent steps only when the syntax of the song introduction is correctly recognized.

これにより、低い尤度の音声認識結果に基づいて、誤ってコンテンツ区間検出部６が動作することを防ぐことができるとともに、誤った曲名やアーティスト名（識別データ）を関連付けた楽曲（コンテンツ）を保存してしまうことを防ぐことができる。 Accordingly, it is possible to prevent the content section detection unit 6 from operating erroneously based on the low-reliability speech recognition result, and to select a song (content) associated with an incorrect song name or artist name (identification data). It is possible to prevent saving.

以上のように、この実施の形態３によれば、実施の形態１における効果に加え、音声認識の尤度が所定の値以上である場合のみコンテンツの識別データとコンテンツとを記録することができるので、誤った識別データを関連付けたコンテンツを保存してしまい、記憶媒体の容量を圧迫するのを防ぐことができる。 As described above, according to the third embodiment, in addition to the effects in the first embodiment, content identification data and content can be recorded only when the likelihood of speech recognition is a predetermined value or more. Therefore, it is possible to prevent the content associated with the wrong identification data from being saved and press the capacity of the storage medium.

実施の形態４．
図９は、この発明の実施の形態４による自動記録装置の一例を示すブロック図である。なお、実施の形態１〜３で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。この実施の形態４のブロック図には、実施の形態１〜３では図示を省略した、キーやタッチパネル等による入力信号を取得することによりユーザからの操作入力を受け付ける入力部８と、データを表示または音声出力することによりユーザにデータを提示する出力部９も図示されており、以下に示す実施の形態４では、これらの入力部８および出力部９を介して、ユーザが楽曲（コンテンツ）の保存の要否を選択することができるものである。Embodiment 4 FIG.
FIG. 9 is a block diagram showing an example of an automatic recording apparatus according to Embodiment 4 of the present invention. In addition, the same code | symbol is attached | subjected to the structure similar to what was demonstrated in Embodiment 1-3, and the overlapping description is abbreviate | omitted. In the block diagram of the fourth embodiment, an input unit 8 that receives an input signal from a user by acquiring an input signal from a key or a touch panel, which is not shown in the first to third embodiments, and data are displayed. Alternatively, an output unit 9 that presents data to the user by outputting a voice is also illustrated, and in the fourth embodiment described below, the user can use the input unit 8 and the output unit 9 to generate a song (content). The necessity of storage can be selected.

そして制御部４は、音声認識部２から出力された曲名、アーティスト名など（識別データ）の文字列を取得すると、それらの曲名、アーティスト名など（識別データ）を出力部９を介して提示することにより保存の要否をユーザに確認し、入力部８を介してユーザからの入力を受け付けることにより、楽曲（コンテンツ）の保存要否を判断する。具体的には、入力部を介して保存要である旨の入力を受け付けた場合には、楽曲（コンテンツ）の曲名、アーティスト名など（識別データ）を楽曲（コンテンツ）に対応付けて情報記憶部５に保存し、保存否である旨の入力を受け付けた場合には、楽曲（コンテンツ）の曲名、アーティスト名など（識別データ）のみを保存する。 When the control unit 4 acquires the character string of the song name, artist name, etc. (identification data) output from the voice recognition unit 2, the control unit 4 presents the song name, artist name, etc. (identification data) via the output unit 9. Thus, the necessity of storage is confirmed with the user, and the input from the user is accepted via the input unit 8 to determine whether or not the music (content) needs to be stored. Specifically, when an input indicating that storage is necessary is accepted via the input unit, the information storage unit associates the song name, artist name, etc. (identification data) of the song (content) with the song (content). In the case of accepting an input indicating that the data is not saved, only the song name, artist name, etc. (identification data) of the song (content) are saved.

入力部８は、ユーザの意思を入力するものであり、例えばボタンやタッチディスプレイなどでもよいし、マイク等による音声認識を用いた音声入力や、ジェスチャー入力によるものでもよい。また、それらを組み合わせたものであっても構わない。
出力部９は、制御部４により出力された曲名、アーティスト名（識別データ）を、例えば合成音声を利用して出力するものでもよいし、ディスプレイ画面に文字を表示するものでもよい。また、それら両方に出力しても構わない。The input unit 8 is for inputting a user's intention, and may be, for example, a button or a touch display, or may be a voice input using voice recognition using a microphone or the like, or a gesture input. Moreover, you may combine them.
The output unit 9 may output the song title and artist name (identification data) output by the control unit 4 using, for example, synthesized speech, or may display characters on the display screen. Moreover, you may output to both of them.

次に、図１０に示すフローチャートを用いて実施の形態４における自動記録装置の動作を説明する。
ステップＳＴ４１〜ＳＴ４６の処理については、実施の形態１における図４に示したステップＳＴ１１〜ＳＴ１６の処理と同一であるため、説明を省略する。Next, the operation of the automatic recording apparatus according to the fourth embodiment will be described using the flowchart shown in FIG.
The processing in steps ST41 to ST46 is the same as the processing in steps ST11 to ST16 shown in FIG.

そして、ステップＳＴ４６において、映像音声記録部７が、コンテンツ区間検出部６からの命令を受けて楽曲の記録を停止した後、制御部４は、出力部９に対して曲名・アーティスト名を出力するよう指示を行い、ユーザにその楽曲の保存をするかどうか確認を求める（ステップＳＴ４７）。 In step ST46, after the video / audio recording unit 7 receives the instruction from the content section detection unit 6 and stops recording the music, the control unit 4 outputs the music title / artist name to the output unit 9. And asks the user to confirm whether or not to save the song (step ST47).

ユーザが、入力部８を介して曲名・アーティスト名が示された楽曲について保存要の選択をした場合、すなわち、入力部８が楽曲の保存要否について保存要である旨のユーザの入力を受け付けると（ステップＳＴ４８のＹＥＳの場合）、映像音声記録部７に録音された楽曲を情報記憶部５に保存し（ステップＳＴ４９）、曲名・アーティスト名をその楽曲と関連付けて情報記憶部５に保存する（ステップＳＴ５０）。 When the user selects to save the music whose song name / artist name is indicated via the input unit 8, that is, the input unit 8 accepts the user's input to the effect that the music needs to be saved. (In the case of YES in step ST48), the music recorded in the video / audio recording unit 7 is stored in the information storage unit 5 (step ST49), and the music title / artist name is associated with the music and stored in the information storage unit 5. (Step ST50).

一方、ステップＳＴ４８において、ユーザが保存要の選択をしなかった場合、すなわち、入力部８が楽曲の保存要否について保存否である旨のユーザの入力を受け付けた場合（ステップＳＴ４８のＮＯの場合）には、曲名・アーティスト名だけを情報記憶部５に保存して、その曲名・アーティスト名の取得回数など、曲名・アーティスト名情報を更新する（ステップＳＴ５１）。 On the other hand, when the user does not select the storage necessity in step ST48, that is, when the input unit 8 accepts the user's input indicating that the music is not necessary for the preservation (in the case of NO in step ST48). ), Only the song name / artist name is stored in the information storage unit 5, and the song name / artist name information such as the number of times the song name / artist name is acquired is updated (step ST51).

以上のように、この実施の形態４によれば、実施の形態１における効果に加え、コンテンツを記録した後でさらに、ユーザに対して保存要否の確認を行ってから必要な場合にだけ保存するようにしたので、ユーザが所望しないコンテンツの保存を防ぐことができる。 As described above, according to the fourth embodiment, in addition to the effects in the first embodiment, after the content is recorded, the user is further confirmed as to whether or not the storage is necessary, and is stored only when necessary. As a result, it is possible to prevent storage of contents not desired by the user.

実施の形態５．
図１１は、この発明の実施の形態５による自動記録装置の一例を示すブロック図である。なお、実施の形態１〜４で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。以下に示す実施の形態５では、実施の形態４と比べると、制御部４がコンテンツ区間検出部６による楽曲の終了区間検知時に映像音声記録部７で記録された楽曲と、情報記憶部５に保存されている楽曲とを比較して、既に同一の曲名・アーティスト名の楽曲が保存済みの場合には、音質の良い方を保存するようにしたものである。Embodiment 5 FIG.
FIG. 11 is a block diagram showing an example of an automatic recording apparatus according to Embodiment 5 of the present invention. In addition, the same code | symbol is attached | subjected to the structure similar to what was demonstrated in Embodiment 1-4, and the overlapping description is abbreviate | omitted. In the fifth embodiment shown below, compared to the fourth embodiment, the control unit 4 stores the music recorded in the video / audio recording unit 7 when the content section detection unit 6 detects the end of the music, and the information storage unit 5. Compared with the stored music, if a music with the same music title / artist name has already been saved, the music with the better sound quality is saved.

制御部４は、コンテンツ区間検出部６による楽曲の終了区間検知時に映像音声記録部７で録音された楽曲を取得して、当該楽曲の音質の良さを数値化する。この時、音質の良さを数値化する方法としては、Ｓ／Ｎ比などの一般的な手法を用いればよいので、ここでは説明を省略する。なお、音質の良さの基準としては、録音時間を用いてもよいし、Ｓ／Ｎ比と録音時間とを組み合わせてもよい。 The control unit 4 acquires the music recorded by the video / audio recording unit 7 when the content section detecting unit 6 detects the end section of the music, and quantifies the sound quality of the music. At this time, as a method for quantifying the quality of sound quality, a general method such as an S / N ratio may be used, and the description thereof is omitted here. Note that the recording time may be used as a reference for the sound quality, or the S / N ratio and the recording time may be combined.

さらに制御部４は、情報記憶部５に記憶されているデータを参照することにより、音声認識部２において抽出されたコンテンツの識別データについて、同一のデータ（曲名、アーティスト名を持つ曲）が情報記憶部５に存在するか否かを判別し、存在する場合には、映像音声記録部７で録音された楽曲（コンテンツ）と、情報記憶部５に保存されている楽曲（コンテンツ）の音質とを比較し、新たに映像音声記録部７により録音された楽曲（コンテンツ）の方が既存の楽曲より音質が高い場合にのみ、自動的に情報記憶部５に保存されている楽曲（コンテンツ）に上書きして保存する。 Further, the control unit 4 refers to the data stored in the information storage unit 5, so that the same data (song having the song name and artist name) is information about the content identification data extracted by the voice recognition unit 2. It is determined whether or not it exists in the storage unit 5. If it exists, the music (content) recorded by the video / audio recording unit 7 and the sound quality of the music (content) stored in the information storage unit 5 are determined. And the music (content) newly recorded by the video / audio recording unit 7 is automatically stored in the information storage unit 5 only when the sound quality is higher than that of the existing music. Overwrite and save.

次に、図１２に示すフローチャートを用いて実施の形態５における自動記録装置の動作を説明する。
ステップＳＴ６１〜ＳＴ６６の処理については、実施の形態１における図４に示したステップＳＴ１１〜ＳＴ１６の処理と同一であるため、説明を省略する。Next, the operation of the automatic recording apparatus according to the fifth embodiment will be described using the flowchart shown in FIG.
The processing in steps ST61 to ST66 is the same as the processing in steps ST11 to ST16 shown in FIG.

そして、ステップＳＴ６６において、映像音声記録部７が、コンテンツ区間検出部６からの命令を受けて楽曲の記録を停止した後、制御部４は、ステップＳＴ６２で音声認識部２により検出された曲名・アーティスト名と同一の楽曲が既に情報記憶部５に保存されているか否かを判別し（ステップＳＴ６７）、既に同一の楽曲が保存済みである場合（ステップＳＴ６７のＹＥＳの場合）には、さらにステップＳＴ６４〜ＳＴ６６において映像音声記録部７に録音された楽曲を取得して、当該楽曲の音質の良さを数値化した音質情報を、情報記憶部５に保存されている楽曲の音質と比較する（ステップＳＴ６８）。 In step ST66, after the video / audio recording unit 7 receives the command from the content section detection unit 6 and stops recording the music, the control unit 4 detects the song name / text detected by the voice recognition unit 2 in step ST62. It is determined whether or not the same music as the artist name has already been saved in the information storage unit 5 (step ST67). If the same music has already been saved (YES in step ST67), a further step is performed. The music recorded in the video / audio recording unit 7 in ST64 to ST66 is acquired, and the sound quality information obtained by quantifying the sound quality of the music is compared with the sound quality of the music stored in the information storage unit 5 (step) ST68).

ステップＳＴ６４〜ＳＴ６６において映像音声記録部７に録音された楽曲の音質が既存の楽曲の音質より高い場合（ステップＳＴ６８のＹＥＳの場合）、映像音声記録部７に録音された楽曲を情報記憶部５に保存し（ステップＳＴ６９）、曲名・アーティスト名をその楽曲と関連付けて情報記憶部５に保存する（ステップＳＴ７０）。
また、ステップＳＴ６７の判断において、同一の楽曲が情報記憶部５に保存されていない場合（ステップＳＴ６７のＮＯの場合）にも、上記ステップＳＴ６９およびＳＴ７０の処理を行う。When the sound quality of the music recorded in the video / audio recording unit 7 in steps ST64 to ST66 is higher than the sound quality of the existing music (in the case of YES in step ST68), the music recorded in the video / audio recording unit 7 is stored in the information storage unit 5. (Step ST69), the song name / artist name is associated with the song and saved in the information storage unit 5 (step ST70).
Further, in the determination of step ST67, even when the same music is not stored in the information storage unit 5 (NO in step ST67), the processes of steps ST69 and ST70 are performed.

一方、ステップＳＴ６８において、映像音声記録部７に録音された楽曲の音声津が既存の楽曲の音質以下だった場合（ステップＳＴ６８のＮＯの場合）には、曲名・アーティスト名だけを情報記憶部５に保存して、その曲名・アーティスト名の取得回数など、曲名・アーティスト名情報を更新する（ステップＳＴ７１）。 On the other hand, in step ST68, if the audio quality of the music recorded in the video / audio recording unit 7 is lower than the sound quality of the existing music (NO in step ST68), only the music name / artist name is stored in the information storage unit 5. And the music title / artist name information such as the number of times of acquisition of the music title / artist name is updated (step ST71).

以上のように、この実施の形態５によれば、実施の形態１における効果に加え、既に取得している曲名・アーティスト名について、新しく取得した楽曲の音質が高い場合には、その楽曲（コンテンツ）を記録し、既存の楽曲の音質以下だった場合には、楽曲（コンテンツ）を上書きしないようにすることにより、常に音質の良いコンテンツに自動で更新することができる。 As described above, according to the fifth embodiment, in addition to the effects in the first embodiment, when the sound quality of a newly acquired music is high for the already acquired music title / artist name, the music (content ) Is recorded, and if it is lower than the sound quality of the existing music, the music (content) is not overwritten so that it can always be automatically updated to a content with good sound quality.

なお、この実施の形態５では、新たに録音された曲の音質が既存の曲の音質より高かった場合に、自動的に上書き保存を行うものとして説明したが、ユーザに上書き保存の要否を確認してから保存するようにしてもよい。
この場合には、既存の楽曲の音質以下だった場合には楽曲（コンテンツ）を上書きしないことに加え、既存の楽曲の音質より高かった場合であっても、ユーザの確認を得た上で上書き保存するようにしたので、ユーザの都合により、音質の良い方を保存するよう選択したり、音質は多少悪くても好みの録音状態の楽曲を残すように選択したりすることができる。In the fifth embodiment, when the sound quality of a newly recorded song is higher than the sound quality of an existing song, the overwriting is automatically performed. You may make it preserve | save after confirming.
In this case, in addition to not overwriting the music (content) if it is lower than the sound quality of the existing music, even if it is higher than the sound quality of the existing music, overwriting with confirmation from the user Since it is stored, it is possible to select the one having better sound quality for the convenience of the user, or to leave the music in the desired recording state even if the sound quality is somewhat poor.

実施の形態６．
図１３は、この発明の実施の形態６による自動記録装置の一例を示すブロック図である。なお、実施の形態１〜５で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。以下に示す実施の形態６では、実施の形態２と比べると、音声認識部２が複数の音声認識器２１，２２，２３，・・・により構成されており、複数の言語ごとに認識辞書（図示せず）を有するものであり、それら言語ごとの複数の音声認識エンジンを使用して、複数の言語ごとに音声認識を行うようにしたものである。Embodiment 6 FIG.
FIG. 13 is a block diagram showing an example of an automatic recording apparatus according to Embodiment 6 of the present invention. In addition, the same code | symbol is attached | subjected to the structure similar to what was demonstrated in Embodiment 1-5, and the overlapping description is abbreviate | omitted. In Embodiment 6 shown below, compared with Embodiment 2, the speech recognition unit 2 is composed of a plurality of speech recognizers 21, 22, 23,... (Not shown), and a plurality of speech recognition engines for each language are used to perform speech recognition for each of a plurality of languages.

一般的に、例えば日本語の音声認識エンジンは、外国語の音声認識には弱く、英語が発話された場合には、英語の音声認識エンジンを使用した方が認識精度が高い。そこで、日本語用の音声認識器２−１、英語用の音声認識器２−２、ドイツ語用の音声認識器２−３、・・・等のように、それぞれが各言語ごとの認識辞書を有する各言語ごとの音声認識器２１，２２，２３，・・・を備えるようにした。ここでは、それら複数の音声認識器２１，２２，２３，・・・を並列に接続した音声認識部２を使用する場合を例として説明する。 In general, for example, a Japanese speech recognition engine is weak for speech recognition of a foreign language, and when English is spoken, the recognition accuracy is higher when the English speech recognition engine is used. Therefore, each of the recognition dictionaries for each language such as a speech recognizer 2-1 for Japanese, a speech recognizer 2-2 for English, a speech recognizer 2-3 for German, etc. Are provided with speech recognizers 21, 22, 23,... For each language. Here, the case where the speech recognition unit 2 in which the plurality of speech recognizers 21, 22, 23,... Are connected in parallel will be described as an example.

そして、音声認識部２が音声取得部１から出力された音声を認識する際に、複数の言語に対応する音声認識器２１，２２，２３，・・・とそれぞれの認識辞書（図示せず）とを並列に動作させて、各音声認識器２１，２２，２３，・・・により複数の言語ごとに音声認識を行い、その結果を制御部４に出力する。その時、各音声認識器２１，２２，２３，・・・は、認識結果とともにその認識の尤度も出力する。 When the speech recognition unit 2 recognizes the speech output from the speech acquisition unit 1, speech recognition units 21, 22, 23,... Corresponding to a plurality of languages and respective recognition dictionaries (not shown). Are operated in parallel, and each speech recognizer 21, 22, 23,... Performs speech recognition for each of a plurality of languages, and outputs the result to the control unit 4. At that time, each speech recognizer 21, 22, 23,... Outputs the recognition likelihood together with the recognition result.

制御部４は、複数の音声認識器２１，２２，２３，・・・により認識された結果の中で最も尤度の高い結果から認識された音声の言語を特定し、その認識の尤度が最も高い言語により抽出された楽曲（コンテンツ）の曲名、アーティスト名など（識別データ）を情報記憶部５に保存する。 The control unit 4 specifies the language of the speech recognized from the result with the highest likelihood among the results recognized by the plurality of speech recognizers 21, 22, 23,. The song name, artist name, etc. (identification data) of the song (content) extracted in the highest language are stored in the information storage unit 5.

なお、図１３に示す音声認識部２に代えて、図１４に示すように、１つの音声認識器２０で複数の音声認識辞書２０−１，２０−２，２０−３，・・・を切り替えて認識を行う音声認識部２を使用するようにしてもよい。 Note that, instead of the voice recognition unit 2 shown in FIG. 13, a plurality of voice recognition dictionaries 20-1, 20-2, 20-3,... The speech recognition unit 2 that performs recognition may be used.

次に、図１５に示すフローチャートを用いて実施の形態６における自動記録装置の動作を説明する。
まず、音声取得部１は、オーディオ機器より入力された音声をライン入力で取得する（ステップＳＴ８１）。この時、オーディオ機器から入力された音声がアナログ形式の場合はＡ／Ｄ変換を行い、例えばＰＣＭ形式に変換してデジタルデータとして取得する。
次に、音声認識部２は、音声取得部１により取得された音声データを認識し、認識結果を文字列で出力する。この際、定型文記憶部３と比較した上で、大語彙連続音声認識を行うことにより、曲名およびアーティスト名を抽出する（ステップＳＴ８２）。Next, the operation of the automatic recording apparatus according to the sixth embodiment will be described using the flowchart shown in FIG.
First, the voice acquisition unit 1 acquires voice input from an audio device by line input (step ST81). At this time, if the audio input from the audio device is in the analog format, A / D conversion is performed, for example, converted into the PCM format and acquired as digital data.
Next, the voice recognition unit 2 recognizes the voice data acquired by the voice acquisition unit 1 and outputs the recognition result as a character string. At this time, the song title and artist name are extracted by performing large vocabulary continuous speech recognition after comparing with the fixed phrase storage unit 3 (step ST82).

制御部４は、音声認識部２においてに認識された各言語の音声の確からしさ（もっともらしさ）を示す尤度も同時に取得し、その認識の尤度に基づいて、曲名・アーティスト名の言語を決定する（ステップＳＴ８３）。例えば、最も尤度の高い言語を、曲名・アーティスト名の言語であると特定する。これにより、多言語の音声認識辞書を用いて精度の低い音声認識が行われることを防ぎ、外国語の曲名・アーティスト名であっても正しく認識することができる。 The control unit 4 also obtains the likelihood indicating the likelihood (reliability) of the speech of each language recognized by the speech recognition unit 2 and determines the language of the song title / artist name based on the recognition likelihood. Determine (step ST83). For example, the language with the highest likelihood is specified as the language of the song title / artist name. As a result, it is possible to prevent low-accuracy speech recognition using a multilingual speech recognition dictionary and to correctly recognize even foreign language song names and artist names.

さらに制御部４は、ステップＳＴ８３で決定した言語の音声認識の尤度が所定の値以上である場合（ステップＳＴ８４のＹＥＳの場合）には、コンテンツ区間検出部６を動作させ、ステップＳＴ８５〜ＳＴ９０の処理を行う。
なお、ステップＳＴ８５〜ＳＴ９０の処理については、実施の形態１における図４に示したステップＳＴ１３〜ＳＴ１８の処理と同一であるため、説明を省略する。Further, when the likelihood of speech recognition in the language determined in step ST83 is equal to or greater than a predetermined value (YES in step ST84), control unit 4 operates content section detection unit 6 to perform steps ST85 to ST90. Perform the process.
In addition, about the process of step ST85-ST90, since it is the same as the process of step ST13-ST18 shown in FIG. 4 in Embodiment 1, description is abbreviate | omitted.

なお、ステップＳＴ８３において、認識の尤度に基づいて曲名・アーティスト名の言語を特定する方法としては、音声認識辞書を備えている複数の言語すべてに対して音声認識を行い、それらの認識の尤度を比較して最も尤度の高いものを特定する方法や、認識の尤度の閾値を設定しておき、認識の尤度が設定された閾値以上であればその言語であると判断して残りの言語については音声認識を行わずに特定する方法など、様々な方法が考えられるが、それらのいずれを用いても構わない。 In step ST83, as a method for specifying the language of the song title / artist name based on the likelihood of recognition, speech recognition is performed for all of the plurality of languages provided with the speech recognition dictionary, and the likelihood of the recognition. A method to identify the highest likelihood by comparing degrees and a threshold of recognition likelihood are set, and if the recognition likelihood is equal to or higher than the set threshold, the language is determined. Various methods such as a method of identifying the remaining languages without performing voice recognition can be considered, and any of them may be used.

以上のように、この実施の形態６によれば、実施の形態１における効果に加え、各種言語の音声認識エンジンを用いた音声認識を行って、その認識の尤度に基づいて言語を決定することにより、外国語の曲名・アーティスト名であっても正しく認識して保存することができる。 As described above, according to the sixth embodiment, in addition to the effects in the first embodiment, speech recognition using a speech recognition engine of various languages is performed, and the language is determined based on the likelihood of the recognition. Thus, even foreign song names and artist names can be recognized and stored correctly.

なお、上記の実施の形態では、コンテンツが楽曲である場合、すなわち、音楽コンテンツの場合を例として説明したが、音楽コンテンツに限らず、例えばスポーツ中継のコンテンツについて区間を抽出、記録を行ってもよいし、トーク番組のコンテンツについて区間を抽出、記録を行ってもよいし、ドキュメンタリーのコンテンツについて区間を抽出、記録を行うようにしてもよい。 In the above-described embodiment, the case where the content is music, that is, the case of music content has been described as an example. However, the present invention is not limited to music content, and for example, a segment may be extracted and recorded for sports broadcast content. Alternatively, sections may be extracted and recorded for talk program content, or sections may be extracted and recorded for documentary content.

この発明の自動記録装置は、ラジオやテレビなどの放送データを受信することができる装置であれば、外部との通信手段を備えていない場合や、インターネットの接続状態が悪い環境であっても、適用することができる。 As long as the automatic recording device of the present invention is a device capable of receiving broadcast data such as radio and television, even if it does not have communication means with the outside, or even in an environment where the Internet connection is poor, Can be applied.

なお、本願発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of the embodiments, or any modification of any component in each embodiment, or omission of any component in each embodiment is possible. .

１音声取得部、２音声認識部、３定型文記憶部、４制御部、５情報記憶部、６コンテンツ区間検出部、７映像音声記録部、８入力部、９出力部、２０，２１，２２，２３，・・・音声認識器、２０−１，２０−２，２０−３，・・・認識辞書。 DESCRIPTION OF SYMBOLS 1 Audio | voice acquisition part, 2 Voice recognition part, 3 Fixed sentence memory | storage part, 4 Control part, 5 Information storage part, 6 Content area detection part, 7 Image | video audio | voice recording part, 8 Input part, 9 Output part, 20, 21, 22 , 23,... Speech recognizer, 20-1, 20-2, 20-3,.

Claims

An audio acquisition unit that detects and acquires audio including content and identification data of the content from broadcast data;
A fixed sentence storage unit for storing words used when introducing the content;
A voice recognition unit that recognizes the voice data acquired by the voice acquisition unit and extracts and outputs the identification data of the content based on the recognition result and the text stored in the fixed phrase storage unit; ,
A control unit that instructs to detect a start time and an end time of the content when the identification data of the content is received from the voice recognition unit;
In accordance with an instruction from the control unit, a content section detection unit that detects a start time and an end time of the content from the audio data acquired by the audio acquisition unit;
A video / audio recording unit for recording content in a content section between a start time point and an end time point detected by the content section detection unit;
An information storage unit that stores at least the content recorded by the video / audio recording unit and identification data of the content;
The control unit stores the identification data of the content in the information storage unit in association with the content recorded by the video / audio recording unit.

The data stored in the information storage unit includes the number of times the content has been acquired,
The control unit refers to the data stored in the information storage unit and associates the content identification data with the content only when the number of times the content is acquired is equal to or greater than a predetermined number. The automatic recording apparatus according to claim 1, wherein the information is stored in the information storage unit.

The speech recognition unit outputs the recognition likelihood together with the recognition result,
2. The control unit according to claim 1, wherein the control unit stores the identification data of the content in the information storage unit in association with the content only when the likelihood of the recognition is a predetermined value or more. Automatic recording device.

An input unit for receiving an operation input from a user;
An output unit for presenting data to the user;
When the control unit stores the identification data of the content in the information storage unit in association with the content, the control unit confirms whether or not the storage is necessary via the output unit, and passes the input unit through the input unit. When the input indicating that it is necessary to store is received, the content identification data is stored in the information storage unit in association with the content, and the input indicating that the storage is not accepted is received via the input unit. 2. The automatic recording apparatus according to claim 1, wherein only the identification data of the content is stored in the information storage unit.

The control unit determines whether or not the same data as the extracted content identification data exists in the information storage unit by referring to the data stored in the information storage unit. In this case, the sound quality of the content recorded by the video / audio recording unit and the content stored in the information storage unit are compared, and the content recorded by the video / audio recording unit is higher in sound quality. 2. The automatic recording apparatus according to claim 1, wherein the content recorded by the video / audio recording unit is overwritten with the content stored in the information storage unit.

The speech recognition unit has a recognition dictionary for each of a plurality of languages, performs speech recognition for each of the plurality of languages, and outputs the recognition likelihood together with the recognition result,
The control unit specifies a language of the content identification data based on the likelihood of recognition, and stores the content identification data extracted in the specified language in association with the content in the information storage unit The automatic recording apparatus according to claim 1, wherein: