JP2010266721A

JP2010266721A - Device and method for attaching label to sound data, and program

Info

Publication number: JP2010266721A
Application number: JP2009118463A
Authority: JP
Inventors: Atsushi Yoshimoto; 淳善本
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2009-05-15
Filing date: 2009-05-15
Publication date: 2010-11-25
Anticipated expiration: 2029-05-15
Also published as: JP5267994B2

Abstract

PROBLEM TO BE SOLVED: To provide a device for attaching a label to sound data by which the label is attached to the sound data. SOLUTION: The device includes: a sound data receiving unit 11 for receiving a sound data which is collected by a microphone at a position of a subject's position; a corresponding information memory 16 in which two or more pieces of corresponding information including a label and a label sound image which is a sound image for indicating intensity for each frequency of sound corresponding to the label, by relating it to each other; a sound image conversion unit 15 for converting the received sound data to the sound image for indicating intensity for each frequency; and a comparison unit 17 for comparing the sound image after conversion, with the label sound image which is stored in the corresponding information memory 16; and a label attaching unit 18 in which a label corresponding to a label sound image of highest similarity with the sound image after conversion, and which attaches the specified label to the sound data corresponding to the sound image. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、マイクで集音された被験者の位置における音データに対してラベルを付与する音データラベル付与装置等に関する。 The present invention relates to a sound data label attaching device that attaches a label to sound data at a position of a subject collected by a microphone.

従来、マイライフビッツ（ＭｙＬｉｆｅＢｉｔｓ）や、ライフログ（ＬｉｆｅＬｏｇ）等のように、個人に関係する情報を蓄積し、その蓄積した情報を後に利用しようとする研究がなされている。それに関連する技術として、個人の周辺を撮影した画像データを蓄積するシステムが開発されている（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, studies have been made to accumulate information related to individuals, such as My Life Bits and Life Log, and to use the accumulated information later. As a related technique, a system for accumulating image data obtained by photographing the periphery of an individual has been developed (for example, see Patent Document 1).

特開２００６−１１５０１１号公報JP 2006-1115011 A

個人に関係する情報として、例えば、個人の周囲の音データを蓄積することも考えられる。しかし、音データは、画像データに比べて検索性がよくないという問題がある。例えば、動画像データから所望の場面を検索する場合には、該当しそうな時刻の静止画を見て、所望の場面がそれよりも前か後ろかを判断して、その判断に応じて動画像データを早送りしたり、早戻ししたりすることによって、所望の場面を検索することになる。その場合の判断は、静止画を見ることによってほぼ一瞬で行える。一方、音データから所望の音を検索する場合には、該当しそうな時刻の音データを聞き、所望の音がそれよりも前か後ろかを判断して、その判断に応じて音データを早送りしたり、早戻ししたりすることによって、所望の音を検索することになる。その場合の判断では、一定の時間（例えば、５秒や１０秒等）の音データを聞かなくてはならないため、それだけ時間がかかることになる。また、聞き手にとって有意な音（例えば、人の話し声や、電話の呼出音、電車の通過音等）が聞こえてこない場合には、その判断にさらに時間がかかることになる。 As information related to an individual, for example, it is conceivable to store sound data around the individual. However, sound data has a problem that searchability is not as good as that of image data. For example, when a desired scene is searched from moving image data, a still image at a likely time is viewed, and it is determined whether the desired scene is earlier or later, and a moving image is determined according to the determination. A desired scene is searched by fast-forwarding or rewinding data. Judgment in that case can be made almost instantaneously by looking at the still image. On the other hand, when searching for the desired sound from the sound data, listen to the sound data at the applicable time, determine whether the desired sound is before or behind it, and fast forward the sound data according to the determination By searching or rewinding, a desired sound is searched. In this case, since it is necessary to listen to sound data for a certain time (for example, 5 seconds or 10 seconds), it takes much time. In addition, if a sound that is significant for the listener (for example, a person's voice, a telephone ringing tone, a train passing sound, etc.) is not heard, it takes more time to make the determination.

そのため、音データの場合には、あらかじめ検索の助けとなる情報を付与しておくことが好適である。その情報として、例えば、音データの蓄積された位置の情報を用いることは有効である。その位置の情報として、ＧＰＳを用いて取得された緯度・経度の位置情報を用いることも考えられるが、その場合には、ＧＰＳ衛星からの電波を受信する装置等を別途用いる必要があり、それだけ装置が大がかりになることになる。また、ＧＰＳの場合には、屋外での利用が困難という問題もある。
一般的に言えば、余分な装置等を用いることなく、被験者の周囲の音データに対して、検索の助けとなるラベルを付与したいという要望があった。 Therefore, in the case of sound data, it is preferable to give information that helps the search in advance. As the information, for example, it is effective to use information on a position where sound data is accumulated. As the position information, it may be possible to use position information of latitude and longitude acquired using GPS, but in that case, it is necessary to use a device that receives radio waves from GPS satellites separately. The equipment becomes a big deal. In the case of GPS, there is also a problem that it is difficult to use outdoors.
Generally speaking, there has been a demand for providing a label for helping search to sound data around a subject without using an extra device or the like.

本発明は、上記事情に応じてなされたものであり、被験者の位置で集音された音データに対して、検索の助けとなるラベルを付与することができる音データラベル付与装置等を提供することを目的とする。 The present invention has been made in accordance with the above circumstances, and provides a sound data label attaching device and the like that can attach a label for helping search to sound data collected at the position of a subject. For the purpose.

上記目的を達成するため、本発明による音データラベル付与装置は、被験者の位置においてマイクによって集音された音データを受け付ける音データ受付部と、ラベルとラベルに対応する音の周波数ごとの強度を示す音画像であるラベル音画像とを対応付けて有する情報である対応情報が２以上記憶される対応情報記憶部と、音データ受付部が受け付けた音データを、周波数ごとの強度を示す音画像に変換する音画像変換部と、音画像変換部が変換した音画像と、対応情報記憶部で記憶されているラベル音画像とを比較する比較部と、比較部による比較結果を用いて、音画像変換部が変換した音画像と類似性の高いラベル音画像に対応するラベルを特定し、特定したラベルを、音画像に対応する音データに付与するラベル付与部と、を備えたものである。 In order to achieve the above object, a sound data label applying apparatus according to the present invention includes a sound data receiving unit that receives sound data collected by a microphone at the position of a subject, and an intensity for each frequency of sound corresponding to the label and the label. A correspondence information storage unit storing two or more correspondence information, which is information having a label sound image corresponding to a sound image to be displayed, and a sound image indicating the intensity for each frequency of the sound data received by the sound data reception unit A sound image conversion unit that converts the sound image, a sound image converted by the sound image conversion unit, a comparison unit that compares the label sound image stored in the correspondence information storage unit, and a comparison result by the comparison unit, A label attaching unit that identifies a label corresponding to a label sound image having a high similarity to the sound image converted by the image conversion unit, and assigns the identified label to sound data corresponding to the sound image; Than is.

このような構成により、音データに対して、ラベルを付与することができる。例えば、そのラベルによって被験者の位置が示される場合には、ＧＰＳ等の装置を用いなくても、ラベルの付与された音データによって、被験者の位置と、音データとの関係を容易に知ることができるようになる。したがって、ＧＰＳを利用しにくい屋内であっても、位置の判断が可能になる。また、音データを音画像に変換して比較することにより、和音を有する音データや、さらに複雑な種々の音が重畳されている音データの比較を容易に行うことができるようになる。 With such a configuration, a label can be assigned to the sound data. For example, when the position of the subject is indicated by the label, the relationship between the position of the subject and the sound data can be easily known from the sound data to which the label is attached without using a device such as GPS. become able to. Accordingly, the position can be determined even indoors where it is difficult to use GPS. Further, by converting sound data into sound images and comparing them, it is possible to easily compare sound data having chords and sound data on which more complex various sounds are superimposed.

また、本発明による音データラベル付与装置では、音データ受付部が受け付ける音データは、被験者の環境の音データである環境音データを含んでおり、音データ受付部が受け付けた音データから、環境音データを分離する分離部をさらに備え、音画像変換部は、分離部によって分離された環境音データを音画像に変換してもよい。 In the sound data label assigning device according to the present invention, the sound data received by the sound data receiving unit includes environmental sound data that is sound data of the subject's environment. From the sound data received by the sound data receiving unit, The image processing apparatus may further include a separation unit that separates the sound data, and the sound image conversion unit may convert the environmental sound data separated by the separation unit into a sound image.

このような構成により、被験者の位置で集音された音データのうち、環境音データを用いてラベルを付与することができる。例えば、ラベルによって被験者の位置が示される場合には、被験者の環境の音データを用いてラベルを付与する方が、より適切なラベルを付与できると考えられる。 With such a configuration, a label can be given using environmental sound data among sound data collected at the position of the subject. For example, when the position of the subject is indicated by the label, it is considered that a more appropriate label can be given by using the sound data of the subject's environment.

また、本発明による音データラベル付与装置では、音画像変換部は、周波数のピークを連続して有する期間の音データを一の音画像に変換してもよい。
このような構成により、例えば、駅のメロディーや店のテーマソング等のように、周波数のピークが連続している期間の音データを一の音画像として比較を行うことができる。その結果、より正確なラベルの付与が可能となる。 In the sound data label assigning device according to the present invention, the sound image conversion unit may convert sound data in a period having continuous frequency peaks into one sound image.
With such a configuration, for example, sound data in a period in which frequency peaks are continuous, such as a station melody or a store theme song, can be compared as one sound image. As a result, more accurate labeling can be performed.

また、本発明による音データラベル付与装置では、比較部は、音画像とラベル音画像との比較の際に、両者の類似性に関する情報である類似情報を算出するものであり、ラベル付与部は、特定した１または２以上のラベルと、特定した１または２以上のラベルにそれぞれ対応する類似情報とを対応付けて音データに付与してもよい。
このような構成により、例えば、２以上のラベルが音データに付与された場合に、その各ラベルの妥当性を、音データに付与された類似情報を用いて判断することができる。 In the sound data label assigning device according to the present invention, the comparison unit calculates similarity information that is information on the similarity between the sound image and the label sound image, and the label attaching unit The specified one or more labels and the similar information respectively corresponding to the specified one or more labels may be associated with each other and given to the sound data.
With such a configuration, for example, when two or more labels are given to sound data, the validity of each label can be determined using the similar information given to the sound data.

また、本発明による音データラベル付与装置では、ラベル付与部は、各音データに、３以上のラベルと３以上のラベルにそれぞれ対応する類似情報とを対応付けて付与するものであり、ラベルとラベルに対応する音を出力する音出力装置の座標とを対応付ける情報である位置対応情報が記憶される位置対応情報記憶部と、音データに付与された３以上のラベル及び類似情報と、３以上のラベルにそれぞれ対応する座標とを用いて、音データに対応する座標を算出する座標算出部と、座標算出部が算出した座標を、座標に対応する音データに付与する座標付与部と、をさらに備えてもよい。
このような構成により、音出力装置が出力した音を用いて、被験者のより正確な位置を知ることができるようになる。 In the sound data label assigning device according to the present invention, the label attaching unit assigns each sound data with three or more labels and similar information corresponding to each of the three or more labels in association with each other. A position correspondence information storage unit for storing position correspondence information, which is information for associating coordinates of a sound output device that outputs a sound corresponding to a label, three or more labels and similar information given to sound data, and three or more A coordinate calculation unit that calculates coordinates corresponding to the sound data using coordinates corresponding to the labels, and a coordinate application unit that assigns the coordinates calculated by the coordinate calculation unit to the sound data corresponding to the coordinates. Further, it may be provided.
With such a configuration, it becomes possible to know a more accurate position of the subject using the sound output from the sound output device.

また、本発明による音データラベル付与装置では、音出力装置が出力する音は、非可聴域の音であってもよい。
このような構成により、音出力装置の出力する音が人間にとって雑音とならないようにすることができる。 In the sound data label assigning device according to the present invention, the sound output from the sound output device may be a non-audible sound.
With such a configuration, the sound output from the sound output device can be prevented from becoming noise for humans.

また、本発明による音データラベル付与装置では、音データ受付部が受け付ける音データが記憶される音データ記憶部と、消去対象の音データに対応するラベルを識別するラベル識別情報が記憶される消去対象ラベル記憶部と、ラベルの付与された音データのうち、消去対象ラベル記憶部で記憶されているラベル識別情報で識別されるラベルに関する音データを音データ記憶部から消去する消去部と、をさらに備えてもよい。 In the sound data label assigning device according to the present invention, the sound data storage unit storing the sound data received by the sound data receiving unit, and the erasure storing the label identification information for identifying the label corresponding to the sound data to be deleted A target label storage unit; and an erasure unit that erases, from the sound data storage unit, sound data related to the label identified by the label identification information stored in the erasure target label storage unit among the sound data to which the label is attached. Further, it may be provided.

このような構成により、例えば、音データ記憶部で残しておきたくない音データに対応するラベルのラベル識別情報を消去対象ラベル記憶部に蓄積しておくことによって、その音データが自動的に消去されるようになる。例えば、プライバシーの侵害となりうるような音データを自動的に消去することができる。 With such a configuration, for example, by storing the label identification information of the label corresponding to the sound data that you do not want to leave in the sound data storage unit in the deletion target label storage unit, the sound data is automatically deleted. Will come to be. For example, sound data that can be a violation of privacy can be automatically deleted.

また、本発明による音データラベル付与装置では、音データ受付部が受け付ける音データが記憶される音データ記憶部と、抽出対象の音データに対応するラベルを識別するラベル識別情報が記憶される抽出対象ラベル記憶部と、ラベルの付与された音データのうち、抽出対象ラベル記憶部で記憶されているラベル識別情報で識別されるラベルに関する音データを抽出して蓄積する抽出部と、をさらに備えてもよい。 In the sound data label assigning device according to the present invention, the sound data storage unit storing the sound data received by the sound data receiving unit and the extraction for storing the label identification information for identifying the label corresponding to the sound data to be extracted are stored. A target label storage unit; and an extraction unit that extracts and accumulates sound data related to the label identified by the label identification information stored in the extraction target label storage unit among the sound data to which the label is attached. May be.

このような構成により、例えば、抽出したい音データに対応するラベルのラベル識別情報を抽出対象ラベル記憶部に蓄積しておくことによって、その音データが自動的に抽出されるようになる。 With such a configuration, for example, by storing label identification information of a label corresponding to sound data to be extracted in the extraction target label storage unit, the sound data is automatically extracted.

なお、消去対象となる音データや、抽出対象となる音データが記憶される音データ記憶部、すなわち、音データ受付部が受け付ける音データが記憶される音データ記憶部は、音データ受付部が音データを受け付けた後に、その音データが蓄積される音データ記憶部であってもよく、音データ受付部が音データを受け付ける前に、音データが記憶されていた音データ記憶部であってもよい。 Note that the sound data storage unit storing the sound data to be erased and the sound data to be extracted, that is, the sound data storage unit storing the sound data received by the sound data receiving unit is the sound data receiving unit. The sound data storage unit may store the sound data after receiving the sound data. The sound data storage unit stores the sound data before the sound data reception unit receives the sound data. Also good.

本発明による音データラベル付与装置等によれば、被験者の位置で集音された音データに対して、ラベルを付与することができる。そのラベルを用いることによって、例えば、音データの検索等を行うことができるようになり、所望の音データに容易にアクセスすることができるようになる。 According to the sound data label assigning device or the like according to the present invention, a label can be assigned to sound data collected at the position of the subject. By using the label, for example, sound data can be searched, and desired sound data can be easily accessed.

本発明の実施の形態１による音データラベル付与装置の構成を示すブロック図The block diagram which shows the structure of the sound data label provision apparatus by Embodiment 1 of this invention. 同実施の形態による音データラベル付与装置の動作を示すフローチャートThe flowchart which shows operation | movement of the sound data label provision apparatus by the embodiment 同実施の形態における対応情報の一例を示す図The figure which shows an example of the correspondence information in the embodiment 同実施の形態における消去対象ラベル記憶部で記憶される情報の一例を示す図The figure which shows an example of the information memorize | stored in the deletion object label memory | storage part in the embodiment 同実施の形態における抽出対象ラベル記憶部で記憶される情報の一例を示す図The figure which shows an example of the information memorize | stored in the extraction object label memory | storage part in the embodiment 同実施の形態における音データ記憶部で記憶される音データの一例を示す図The figure which shows an example of the sound data memorize | stored in the sound data memory | storage part in the embodiment 同実施の形態における音画像の一例を示す図The figure which shows an example of the sound image in the embodiment 同実施の形態におけるタイムコードとラベルとの対応の一例を示す図The figure which shows an example of a response | compatibility with the time code and label in the embodiment 同実施の形態におけるタイムコードとラベルとの対応の一例を示す図The figure which shows an example of a response | compatibility with the time code and label in the embodiment 同実施の形態におけるピークの存在する音画像の一例を示す図The figure which shows an example of the sound image in which the peak exists in the embodiment 本発明の実施の形態２による音データラベル付与装置の構成を示すブロック図The block diagram which shows the structure of the sound data label provision apparatus by Embodiment 2 of this invention. 同実施の形態による音データラベル付与装置の動作を示すフローチャートThe flowchart which shows operation | movement of the sound data label provision apparatus by the embodiment 同実施の形態における対応情報の一例を示す図The figure which shows an example of the correspondence information in the embodiment 同実施の形態における位置対応情報の一例を示す図The figure which shows an example of the position corresponding | compatible information in the embodiment 同実施の形態におけるタイムコードトラベルとの対応の一例を示す図The figure which shows an example of a response | compatibility with the time code travel in the embodiment 同実施の形態におけるタイムコードトラベルと座標との対応の一例を示す図The figure which shows an example of a response | compatibility with the time code travel and coordinate in the embodiment 上記各実施の形態におけるコンピュータシステムの外観一例を示す模式図The schematic diagram which shows an example of the external appearance of the computer system in each said embodiment 上記各実施の形態におけるコンピュータシステムの構成の一例を示す図The figure which shows an example of a structure of the computer system in each said embodiment

以下、本発明による音データラベル付与装置について、実施の形態を用いて説明する。なお、以下の実施の形態において、同じ符号を付した構成要素及びステップは同一または相当するものであり、再度の説明を省略することがある。 Hereinafter, a sound data label applying apparatus according to the present invention will be described with reference to embodiments. In the following embodiments, components and steps denoted by the same reference numerals are the same or equivalent, and repetitive description may be omitted.

（実施の形態１）
本発明の実施の形態１による音データラベル付与装置について、図面を参照しながら説明する。本実施の形態による音データラベル付与装置は、被験者の位置において集音された音データに対してラベルを付与するものである。そのラベルによって、例えば、被験者の位置が示されることになる。 (Embodiment 1)
A sound data label applying apparatus according to Embodiment 1 of the present invention will be described with reference to the drawings. The sound data label assigning apparatus according to the present embodiment attaches a label to sound data collected at the position of the subject. For example, the position of the subject is indicated by the label.

図１は、本実施の形態による音データラベル付与装置１の構成を示すブロック図である。本実施の形態による音データラベル付与装置１は、音データ受付部１１と、分離部１２と、蓄積部１３と、音データ記憶部１４と、音画像変換部１５と、対応情報記憶部１６と、比較部１７と、ラベル付与部１８と、消去対象ラベル記憶部１９と、消去部２０と、抽出対象ラベル記憶部２１と、抽出部２２とを備える。 FIG. 1 is a block diagram showing a configuration of a sound data label applying apparatus 1 according to this embodiment. The sound data label assigning apparatus 1 according to the present embodiment includes a sound data receiving unit 11, a separation unit 12, a storage unit 13, a sound data storage unit 14, a sound image conversion unit 15, and a correspondence information storage unit 16. The comparison unit 17, the label providing unit 18, the erasure target label storage unit 19, the erasure unit 20, the extraction target label storage unit 21, and the extraction unit 22 are provided.

音データ受付部１１は、被験者の位置においてマイクによって集音された音データを受け付ける。被験者とは、周囲の音データが収集される対象のことであり、主に個人（人間）を想定しているが、例えば、動物や自律的に行動可能なロボットであってもよい。本実施の形態では、被験者が人間である場合について説明する。また、被験者の周囲の音を集音するマイクは、被験者に装着されていてもよく（例えば、ハンズフリーマイクや、ヘッドセットのマイク、クリップマイク、タイピンマイク等）、あるいは、被験者の移動に応じて移動するようにされたものであってもよい。このマイクで集音された音データには、一般に、環境音データと、その他の音データとが含まれることになる。環境音データとは、被験者の環境の音データである。環境音データ以外の音データとしては、例えば、被験者から生じた音（例えば、被験者の発声や、被験者がロボットである場合の動作音等）の音データや、被験者の話し相手の発声した音データ等がある。また、音データの取得で用いられるマイクは、１個であってもよく、２個以上であってもよい。後者の場合には、例えば、被験者の発生した音を集音する目的のマイク（例えば、被験者の口元や、被験者ののどの位置に配置されてもよい）と、被験者の発生した音と環境音との両方を集音する目的のマイク（例えば、被験者の襟元に配置されてもよい）とが存在してもよい。また、後者の場合には、例えば、ステレオマイクであってもよい。また、音データ受付部１１は、マイクで集音された音データを、リアルタイムで受け付けてもよく、あるいは、マイクで集音され、録音された音データを一括して受け付けてもよい。また、音データ受付部１１が受け付ける音データは、デジタルデータであることが好適である。テープに録音されたアナログデータのように、テープの伸び等に起因する変化が起こらないからである。本実施の形態では音データがデジタルデータである場合について説明する。 The sound data reception unit 11 receives sound data collected by a microphone at the position of the subject. The subject is a target from which ambient sound data is collected, and is mainly assumed to be an individual (human), but may be, for example, an animal or a robot that can act autonomously. In the present embodiment, a case where the subject is a human will be described. In addition, a microphone that collects sounds around the subject may be attached to the subject (for example, a hands-free microphone, a headset microphone, a clip microphone, a tie pin microphone, or the like), or according to the movement of the subject. It may be configured to move. The sound data collected by this microphone generally includes environmental sound data and other sound data. The environmental sound data is sound data of the subject's environment. As sound data other than the environmental sound data, for example, sound data generated by a subject (for example, speech of the subject, operation sound when the subject is a robot), sound data uttered by the subject's talking partner, etc. There is. Further, the number of microphones used for obtaining sound data may be one, or two or more. In the latter case, for example, a target microphone that collects the sound generated by the subject (for example, the microphone may be arranged at any position of the subject's mouth or the subject), and the sound generated by the subject and the environmental sound. And a target microphone (for example, may be placed on the subject's neck). In the latter case, for example, a stereo microphone may be used. The sound data receiving unit 11 may receive sound data collected by a microphone in real time, or may receive sound data collected by a microphone and recorded at once. The sound data received by the sound data receiving unit 11 is preferably digital data. This is because, unlike analog data recorded on a tape, a change caused by the elongation of the tape does not occur. In this embodiment, the case where the sound data is digital data will be described.

音データ受付部１１は、前述のように、例えば、マイクから入力された音データを受け付けてもよく、有線もしくは無線の通信回線を介して送信された音データを受信してもよく、所定の記録媒体（例えば、光ディスクや磁気ディスク、半導体メモリ等）から読み出された音データを受け付けてもよい。なお、音データ受付部１１は、受け付けを行うためのデバイス（例えば、モデムやネットワークカード等）を含んでもよく、あるいは含まなくてもよい。また、音データ受付部１１は、ハードウェアによって実現されてもよく、あるいは所定のデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい。 As described above, the sound data receiving unit 11 may receive sound data input from a microphone, for example, or may receive sound data transmitted via a wired or wireless communication line. Sound data read from a recording medium (for example, an optical disk, a magnetic disk, a semiconductor memory, etc.) may be received. Note that the sound data reception unit 11 may or may not include a device (for example, a modem or a network card) for reception. The sound data receiving unit 11 may be realized by hardware, or may be realized by software such as a driver that drives a predetermined device.

分離部１２は、音データ受付部１１が受け付けた音データから、環境音データを分離する。環境音データを用いてラベルの付与を行うことによって、被験者の位置に関する適切なラベルを付与することができると考えられるからである。ここで、環境音データを分離するいくつかの方法について説明する。 The separation unit 12 separates the environmental sound data from the sound data received by the sound data reception unit 11. This is because it is considered that an appropriate label relating to the position of the subject can be given by giving the label using the environmental sound data. Here, several methods for separating the environmental sound data will be described.

（１）周波数カット
人間の声は一般に、８５Ｈｚ〜８ｋＨｚの周波数であることが知られている。したがって、分離部１２は、音データ受付部１１が受け付けた音データの全音域のうち、その人間の声に対応する周波数の成分をカット（除去）した音データである環境音データを取得してもよい。なお、人の声に対応する周波数帯域を任意に設定できるようにしてもよいことは言うまでもない。 (1) Frequency cut It is generally known that a human voice has a frequency of 85 Hz to 8 kHz. Therefore, the separation unit 12 acquires environmental sound data that is sound data obtained by cutting (removing) the frequency component corresponding to the human voice out of the entire sound range of the sound data received by the sound data receiving unit 11. Also good. Needless to say, the frequency band corresponding to the human voice may be arbitrarily set.

（２）音圧による分離
音データを集音するマイクが、例えば、被験者の口元の近傍に位置している場合には、被験者の発声した音の音圧のレベルは高くなると考えられる。したがって、分離部１２は、音圧がしきい値を超える音データは除去し、音圧がしきい値よりも小さい音データである環境音データを取得してもよい。このしきい値は、あらかじめ設定された値であってもよく、あるいは、音データ受付部１１が受け付けた音データの最大の音圧を用いて生成された値であってもよい。後者の場合には、しきい値は、例えば、最大の音圧に、１より小さい値（例えば、０．６や０．８等）を掛けた値であってもよい。 (2) Separation by sound pressure When a microphone that collects sound data is located in the vicinity of the subject's mouth, for example, the sound pressure level of the sound uttered by the subject is considered to be high. Therefore, the separation unit 12 may remove sound data whose sound pressure exceeds the threshold value, and obtain environmental sound data that is sound data whose sound pressure is smaller than the threshold value. This threshold value may be a preset value or may be a value generated using the maximum sound pressure of the sound data received by the sound data receiving unit 11. In the latter case, the threshold value may be, for example, a value obtained by multiplying the maximum sound pressure by a value smaller than 1 (for example, 0.6 or 0.8).

（３）２個のチャンネルを用いた分離
前述のように、音データが２個のマイクによってそれぞれ集音された２チャンネルのものであり、一方のチャンネル（これを「第１のチャンネル」とする）が被験者の発声した音データに対応し、他方のチャンネル（これを「第２のチャンネル」とする）が被験者の発声と環境音との音データに対応する場合には、分離部１２は、第２のチャンネルの音データから、第１のチャンネルの音データを差し引くことによって、環境音データのみを分離してもよい。その分離の際に、適宜、第１及び第２のチャンネルのレベル合わせ等を行い、最適な分離を行うことができるように調整してもよいことは言うまでもない。 (3) Separation using two channels As described above, the sound data is collected by two microphones and each of the two channels is one channel (this is referred to as “first channel”). ) Corresponds to the sound data uttered by the subject, and the other channel (referred to as “second channel”) corresponds to the sound data of the utterance of the subject and the environmental sound, the separation unit 12 Only the environmental sound data may be separated by subtracting the sound data of the first channel from the sound data of the second channel. Needless to say, at the time of the separation, the levels of the first and second channels may be appropriately adjusted so that the optimum separation can be performed.

（４）人間の声の除去
近年、ボーカルつきの音楽データから、人間の声のみを除去する技術が開発されてきている。したがって、分離部１２は、そのような人間の声を除去する技術を用いて、音データ受付部１１が受け付けた音データから、人間の声の成分を除去した音データである環境音データを取得してもよい。 (4) Removal of human voice In recent years, techniques for removing only human voice from music data with vocals have been developed. Therefore, the separation unit 12 acquires environmental sound data, which is sound data obtained by removing the components of the human voice, from the sound data received by the sound data receiving unit 11 using such a technique for removing the human voice. May be.

なお、ここでは、分離部１２が環境音データを分離するいくつかの方法について説明したが、これ以外の方法を用いて環境音データを分離してもよいことは言うまでもない。 Here, although several methods have been described in which the separation unit 12 separates the environmental sound data, it goes without saying that the environmental sound data may be separated using other methods.

蓄積部１３は、音データ受付部１１が受け付けた音データを、後述する音データ記憶部１４に蓄積する。また、蓄積部１３は、分離部１２によって分離された環境音データである音データも、その音データ記憶部１４に蓄積する。 The accumulation unit 13 accumulates the sound data received by the sound data reception unit 11 in a sound data storage unit 14 to be described later. The storage unit 13 also stores sound data, which is environmental sound data separated by the separation unit 12, in the sound data storage unit 14.

音データ記憶部１４では、音データ受付部１１が受け付けた音データが記憶される。また、音データ記憶部１４では、分離部１２によって分離された環境音データである音データも記憶される。また、この音データ記憶部１４で記憶されている音データには、タイムコードが対応付けられていることが好適である。そのタイムコードは、音データ受付部１１が受け付けた音データにあらかじめ設定されていてもよく、あるいは、蓄積部１３が音データを音データ記憶部１４に蓄積する際に設定してもよい。また、そのタイムコードは、音データのすべての時間帯にわたって設定されていてもよく、あるいは、始点や終点等の一部についてのみ設定されていてもよい。後者の場合であっても、そのタイムコードの設定されている位置からの差分の時間を算出することによって、音データのすべての時間帯についてタイムコードを知ることができる。また、タイムコードは、絶対的な日時を示すものであってもよく、相対的な時間を示すものであってもよい。前者の場合には、例えば、電波時計等から取得された日時を用いたものであってもよい。また、分離後の音データのタイムコードと、分離前の音データのタイムコードとは同期している（音データの同じ時間的な位置に、同じタイムコードが対応付けられている）ことが好適である。後述するように、ラベルの選定は分離後の音データを用いて行われ、ラベルの付与は分離前の音データに対して行われるからである。音データ記憶部１４での記憶は、ＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。音データ記憶部１４は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスク等）によって実現される。 The sound data storage unit 14 stores the sound data received by the sound data reception unit 11. The sound data storage unit 14 also stores sound data that is environmental sound data separated by the separation unit 12. Further, it is preferable that a time code is associated with the sound data stored in the sound data storage unit 14. The time code may be set in advance in the sound data received by the sound data receiving unit 11 or may be set when the storage unit 13 stores the sound data in the sound data storage unit 14. Further, the time code may be set over all time zones of the sound data, or may be set only for a part of the start point, the end point, and the like. Even in the latter case, the time code can be known for all the time zones of the sound data by calculating the time of the difference from the position where the time code is set. The time code may indicate an absolute date and time or may indicate a relative time. In the former case, for example, the date and time acquired from a radio clock or the like may be used. Further, the time code of the sound data after separation and the time code of the sound data before separation are preferably synchronized (the same time code is associated with the same temporal position of the sound data). It is. As will be described later, label selection is performed using the separated sound data, and label assignment is performed on the sound data before separation. Storage in the sound data storage unit 14 may be temporary storage in a RAM or the like, or may be long-term storage. The sound data storage unit 14 is realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

音画像変換部１５は、音データ受付部１１が受け付け、分離部１２によって分離された環境音データである音データを、周波数ごとの強度を示す音画像に変換する。音画像変換部１５は、例えば、音データ記憶部１４で記憶されている音データを、ある時間間隔ごとに音画像に変換してもよい。その時間間隔は、一定であってもよく、そうでなくてもよい。音画像は、例えば、音データをフーリエ変換した結果である、横軸が周波数で、縦軸が信号の強度であるスペクトル画像（パワースペクトル画像）であってもよく、横軸が時間、縦軸が周波数であり、強度が濃淡や色の濃さ等によって示されるスペクトログラムであってもよく、その他の周波数ごとの音の強度を示す画像であってもよい。スペクトログラムは、例えば、複数のバンドパスフィルタを用いて生成してもよく、あるいは、短時間フーリエ変換によって生成してもよい。音データから音画像を生成する方法はすでに公知であり、その詳細な説明を省略する。 The sound image conversion unit 15 converts sound data, which is environmental sound data received by the sound data reception unit 11 and separated by the separation unit 12, into a sound image indicating the intensity for each frequency. For example, the sound image conversion unit 15 may convert the sound data stored in the sound data storage unit 14 into a sound image at certain time intervals. The time interval may or may not be constant. The sound image may be, for example, a spectrum image (power spectrum image) that is a result of Fourier transform of sound data, in which the horizontal axis is frequency and the vertical axis is signal intensity, the horizontal axis is time, and the vertical axis May be a spectrogram whose intensity is indicated by shading, color intensity, or the like, or an image showing the intensity of sound for other frequencies. The spectrogram may be generated using, for example, a plurality of bandpass filters, or may be generated by a short-time Fourier transform. A method for generating a sound image from sound data is already known, and a detailed description thereof will be omitted.

対応情報記憶部１６では、２以上の対応情報が記憶される。対応情報は、ラベルとそのラベルに対応する音の周波数ごとの強度を示す音画像であるラベル音画像とを対応付けて有する情報である。そのラベルは、一般的に位置を示すラベル（例えば、「会議室」「廊下」「職場の座席」「休憩室」等）であるが、それ以外のラベルであってもよい。ここで、「位置」とは、経度や緯度、座標等で示される厳密な意味での位置ではなく、位置に関する属性であると考えることもできる。位置以外を示すラベルとしては、例えば、後述する消去部２０による消去の対象となることを示すラベル「消去トリガ」や、後述する抽出部２２による抽出の対象となることを示すラベル「抽出トリガ」、重要な部分であることを示すラベル「重要」等がある。ラベル音画像は、音画像変換部１５で変換された音画像と比較されるものであるため、その音画像と同じ種類のものであることが好適である。したがって、音画像変換部１５によって音データがスペクトル画像である音画像に変換される場合には、ラベル音画像もスペクトル画像であることが好適である。また、音画像変換部１５によって音データがスペクトログラムである音画像に変換される場合には、ラベル音画像もスペクトログラムであることが好適である。そのラベル音画像は、対応するラベルに応じて生成されることになる。例えば、ラベル「会議室」に対応するラベル音画像は、会議室の音を録音し、その録音した音データを音画像に変換することによって生成することができる。なお、ラベル「消去トリガ」等の位置以外を示すラベルに対応するラベル音画像は、そのラベルに対応する音（例えば、所定の電子音や指を鳴らす音、手をたたく音等）を録音し、その録音した音データを音画像に変換することによって生成することができる。 In the correspondence information storage unit 16, two or more pieces of correspondence information are stored. The correspondence information is information having a label and a label sound image that is a sound image indicating the intensity of each sound frequency corresponding to the label in association with each other. The label is generally a label indicating a position (for example, “meeting room”, “corridor”, “working seat”, “break room”, etc.), but other labels may also be used. Here, “position” is not a position in a strict sense indicated by longitude, latitude, coordinates, etc., but can be considered as an attribute relating to the position. As a label indicating a position other than the position, for example, a label “erase trigger” indicating that it is to be erased by an erasure unit 20 described later, or a label “extraction trigger” indicating that it is an object of extraction by an extraction unit 22 described later. , There is a label “important” indicating that it is an important part. Since the label sound image is to be compared with the sound image converted by the sound image conversion unit 15, it is preferable that the label sound image is of the same type as the sound image. Therefore, when the sound data is converted into a sound image that is a spectrum image by the sound image conversion unit 15, the label sound image is also preferably a spectrum image. In addition, when the sound data is converted into a sound image that is a spectrogram by the sound image conversion unit 15, it is preferable that the label sound image is also a spectrogram. The label sound image is generated according to the corresponding label. For example, the label sound image corresponding to the label “meeting room” can be generated by recording the sound of the meeting room and converting the recorded sound data into a sound image. Note that the label sound image corresponding to the label other than the position such as the label “erase trigger” records a sound corresponding to the label (for example, a predetermined electronic sound, a sound of a finger, a clapping sound, etc.). The recorded sound data can be generated by converting it into a sound image.

対応情報記憶部１６に２以上の対応情報が記憶される過程は問わない。例えば、記録媒体を介して２以上の対応情報が対応情報記憶部１６で記憶されるようになってもよく、通信回線等を介して送信された２以上の対応情報が対応情報記憶部１６で記憶されるようになってもよく、あるいは、入力デバイスを介して入力された２以上の対応情報が対応情報記憶部１６で記憶されるようになってもよい。対応情報記憶部１６での記憶は、ＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。対応情報記憶部１６は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスク等）によって実現されうる。 The process of storing two or more pieces of correspondence information in the correspondence information storage unit 16 does not matter. For example, two or more pieces of correspondence information may be stored in the correspondence information storage unit 16 via a recording medium, and two or more pieces of correspondence information transmitted via a communication line or the like may be stored in the correspondence information storage unit 16. The correspondence information may be stored, or two or more pieces of correspondence information input via the input device may be stored in the correspondence information storage unit 16. Storage in the correspondence information storage unit 16 may be temporary storage in a RAM or the like, or may be long-term storage. The correspondence information storage unit 16 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

比較部１７は、音画像変換部１５が変換した音画像と、対応情報記憶部１６で記憶されているラベル音画像とを比較する。この比較は、音画像変換部１５が変換した音画像と似ているラベル音画像を特定するために行われる。したがって、比較部１７は、音画像変換部１５が変換した音画像と、ラベル音画像との類似性に関する情報である類似情報を算出してもよい。類似情報は、類似の程度が分かる情報であれば、その内容を問わない。類似情報は、例えば、類似性が高いほど大きい値となる情報であってもよく、あるいは、類似性が低いほど大きい値となる情報であってもよい。画像の類似性を示す情報を算出する方法はすでに公知であり、その詳細な説明を省略する。本実施の形態では、比較部１７は、音画像とラベル音画像との類似情報を算出するものとする。 The comparison unit 17 compares the sound image converted by the sound image conversion unit 15 with the label sound image stored in the correspondence information storage unit 16. This comparison is performed in order to identify a label sound image similar to the sound image converted by the sound image conversion unit 15. Therefore, the comparison unit 17 may calculate similarity information that is information related to the similarity between the sound image converted by the sound image conversion unit 15 and the label sound image. The content of the similar information is not limited as long as it is information that shows the degree of similarity. The similarity information may be, for example, information having a larger value as the similarity is higher, or information having a larger value as the similarity is lower. A method for calculating information indicating the similarity of images is already known, and a detailed description thereof will be omitted. In the present embodiment, it is assumed that the comparison unit 17 calculates similar information between the sound image and the label sound image.

ラベル付与部１８は、比較部１７による比較結果を用いて、音画像変換部１５が変換した音画像と類似性の高いラベル音画像に対応するラベルを特定し、その特定したラベルを、音画像に対応する音データに付与する。このラベルの付与される音データは、音データ受付部１１が受け付けた音データであって、分離部１２による分離が行われていない音データである。「音画像と類似性の高いラベル音画像」とは、例えば、その音画像と類似性の最も高い１個のラベル音画像であってもよく、その音画像と類似性の最も高い方からの２個以上のラベル音画像であってもよく、その音画像としきい値よりも高い類似性を有する１個または２個以上のラベル音画像であってもよい。そのしきい値は、あらかじめ設定された値であってもよく、あるいは、比較部１７によって算出された最大の類似性を示す値を用いて生成された値であってもよい。後者の場合には、しきい値は、例えば、最大の類似性を示す値に、１より小さい値（例えば、０．６や０．８等）を掛けた値であってもよい。このように、ラベル付与部１８が特定するラベルは、１個であってもよく、あるいは、２個以上であってもよい。また、音データにラベルを付与するとは、ラベルと、音データの時間的位置との対応が分かるようになればよい、という意味である。したがって、音データにラベルを付与することは、例えば、音データの該当箇所にそのラベルを直接付与する（例えば、音データがラベル用のチャンネルを有しており、そのチャンネルにラベルを設定する等）ことであってもよく、タイムコードに対してラベルを設定する（例えば、タイムコード００：００：００〜００：０５：３５にラベル「会議室」を設定し、タイムコード００：０５：３５〜００：０６：１２にラベル「廊下」を設定する等）ことであってもよく、ラベルに対してタイムコードを設定する（例えば、ラベル「会議室」にタイムコード００：００：００〜００：０５：３５、０１：４７：５６〜０２：４３：４５を設定する等）ことであってもよい。なお、本実施の形態では、ラベル付与部１８によって付与されたラベルが音データ記憶部１４で記憶される場合について説明するが、そうでなくてもよい。ラベル付与部１８によって付与されたラベルが、他の記録媒体で記憶されてもよい。また、ラベル付与部１８は、音データに１または２以上のラベルを付与する際に、その１または２以上のラベルと、その１または２以上のラベルにそれぞれ対応する類似情報とを対応付けて音データに付与してもよく、そうでなくてもよい。なお、その類似情報は、比較部１７によって算出されたものである。また、音画像と類似性の高いラベル音画像が存在しない場合、すなわち、音画像との類似性があらかじめ設定されたしきい値よりも高いラベル音画像が存在しない場合には、ラベル付与部１８は、「不明」である旨のラベルや、ラベルが特定できない旨のラベル等を音データに付与してもよい。 The label assigning unit 18 specifies a label corresponding to the label sound image having a high similarity to the sound image converted by the sound image conversion unit 15 using the comparison result by the comparison unit 17, and uses the specified label as the sound image. To the sound data corresponding to. The sound data to which the label is attached is sound data received by the sound data receiving unit 11 and not separated by the separating unit 12. The “label sound image having a high similarity to the sound image” may be, for example, one label sound image having the highest similarity to the sound image, and from the one having the highest similarity to the sound image. It may be two or more label sound images, and may be one or two or more label sound images having a similarity higher than the threshold value. The threshold value may be a preset value or may be a value generated using a value indicating the maximum similarity calculated by the comparison unit 17. In the latter case, the threshold value may be, for example, a value obtained by multiplying a value indicating the maximum similarity by a value smaller than 1 (for example, 0.6 or 0.8). Thus, the label specified by the label applying unit 18 may be one, or two or more. In addition, giving a label to sound data means that it is only necessary to know the correspondence between the label and the temporal position of the sound data. Therefore, attaching a label to sound data means, for example, directly attaching the label to a corresponding portion of the sound data (for example, the sound data has a channel for labeling, and a label is set for the channel). In other words, a label is set for the time code (for example, the label “conference room” is set at the time code 00:00:00 to 00:05:35, and the time code 00:05:35 is set) ˜00: 06: 12, etc., and a time code may be set for the label (for example, the time code 00:00:00 to 00 for the label “meeting room”). : 05: 35, 01:47:56 to 02:43:45, etc.). In the present embodiment, the case where the label applied by the label applying unit 18 is stored in the sound data storage unit 14 is described, but this need not be the case. The label applied by the label applying unit 18 may be stored on another recording medium. Further, when adding one or more labels to the sound data, the label assigning unit 18 associates the one or more labels with the similar information corresponding to the one or more labels, respectively. It may be added to the sound data or not. The similarity information is calculated by the comparison unit 17. If there is no label sound image having a high similarity to the sound image, that is, if there is no label sound image having a similarity to the sound image higher than a preset threshold value, the label attaching unit 18 May add a label indicating that it is “unknown” or a label indicating that the label cannot be specified to the sound data.

消去対象ラベル記憶部１９では、消去対象の音データに対応するラベルを識別するラベル識別情報が記憶される。この消去対象ラベル記憶部１９で記憶されるラベル識別情報で識別されるラベルに対応する音データは、後述する消去部２０によって消去されることになる。したがって、機密上の理由や、プライバシーの保護上の理由等によって消去したい音データに対応するラベルを識別するラベル識別情報を消去対象ラベル記憶部１９に蓄積しておくことによって、そのラベル識別情報に対応する音データを自動的に消去できることになる。ラベル識別情報は、ラベルを識別できるのであればその内容を問わない。例えば、ラベル識別情報は、ラベルそのものであってもよく、ラベルを識別する数値や記号等であってもよい。 The erasure target label storage unit 19 stores label identification information for identifying a label corresponding to the erasure target sound data. The sound data corresponding to the label identified by the label identification information stored in the erasure target label storage unit 19 is erased by the erasure unit 20 described later. Therefore, the label identification information for identifying the label corresponding to the sound data to be erased for confidentiality reasons, privacy protection reasons, etc. is stored in the erasure target label storage unit 19 so that the label identification information is stored in the label identification information. The corresponding sound data can be automatically deleted. The label identification information may be any content as long as the label can be identified. For example, the label identification information may be the label itself, or a numerical value or symbol that identifies the label.

消去対象ラベル記憶部１９にラベル識別情報が記憶される過程は問わない。例えば、記録媒体を介してラベル識別情報が消去対象ラベル記憶部１９で記憶されるようになってもよく、通信回線等を介して送信されたラベル識別情報が消去対象ラベル記憶部１９で記憶されるようになってもよく、あるいは、入力デバイスを介して入力されたラベル識別情報が消去対象ラベル記憶部１９で記憶されるようになってもよい。消去対象ラベル記憶部１９での記憶は、ＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。消去対象ラベル記憶部１９は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスク等）によって実現されうる。 The process in which the label identification information is stored in the erasure target label storage unit 19 does not matter. For example, the label identification information may be stored in the erasure target label storage unit 19 via a recording medium, and the label identification information transmitted via a communication line or the like is stored in the erasure target label storage unit 19. Alternatively, the label identification information input via the input device may be stored in the erasure target label storage unit 19. Storage in the erasure target label storage unit 19 may be temporary storage in a RAM or the like, or may be long-term storage. The erasure target label storage unit 19 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

消去部２０は、ラベルの付与された音データのうち、消去対象ラベル記憶部１９で記憶されているラベル識別情報で識別されるラベルに関する音データを音データ記憶部１４から消去する。ラベルに関する音データとは、ラベルに対応する範囲の音データであってもよく、ラベルに対応する音データの範囲の始点と終点の少なくとも一方が変更された範囲を有する音データであってもよい。タイムコード００：１０：００〜００：１５：３５にラベル「トイレ」が付与されており、そのラベル「トイレ」が消去対象ラベル記憶部１９で記憶されている場合に、例えば、前者であれば、そのタイムコード００：１０：００〜００：１５：３５の範囲の音データが消去されることになり、後者であれば、そのタイムコードの拡張された範囲（例えば、前後に１分ずつ拡張したタイムコード００：０９：００〜００：１６：３５の範囲）の音データが消去されることになる。後者の場合における始点と終点の少なくとも一方を変更する程度は、消去対象ラベル記憶部１９で記憶されるすべてのラベル識別情報について同じであってもよく、あるいは、ラベル識別情報ごとに異なっていてもよい。後者の場合には、その変更の程度も消去対象ラベル記憶部１９で記憶されていてもよく、あるいは、他の記録媒体で記憶されていてもよい。また、音データを消去するとは、音データを無音のレベルにすることであってもよく、その音データ自体を音データ記憶部１４から削除することであってもよい。ただし、後者の場合であっても、タイムコード自体は変更されないように削除することが好適である。 The erasure unit 20 erases the sound data related to the label identified by the label identification information stored in the erasure target label storage unit 19 from the sound data storage unit 14 among the sound data to which the label is attached. The sound data related to the label may be sound data in a range corresponding to the label, or sound data having a range in which at least one of the start point and the end point of the sound data range corresponding to the label is changed. . When the label “toilet” is given to the time code 00: 10: 0 to 00:15:35 and the label “toilet” is stored in the erasure target label storage unit 19, for example, the former Therefore, the sound data in the range of the time code 00: 10: 0 to 00:15:35 will be erased. In the latter case, the extended range of the time code (for example, 1 minute before and after) The sound data of the time code 00:09:00 to 00:16:35) is deleted. The degree of changing at least one of the start point and the end point in the latter case may be the same for all label identification information stored in the erasure target label storage unit 19 or may be different for each label identification information. Good. In the latter case, the degree of change may be stored in the erasure target label storage unit 19, or may be stored in another recording medium. Further, erasing the sound data may be to set the sound data to a silent level, or to delete the sound data itself from the sound data storage unit 14. However, even in the latter case, it is preferable to delete the time code itself so as not to be changed.

抽出対象ラベル記憶部２１では、抽出対象の音データに対応するラベルを識別するラベル識別情報が記憶される。この抽出対象ラベル記憶部２１で記憶されるラベル識別情報で識別されるラベルに対応する音データは、後述する抽出部２２によって抽出されることになる。したがって、あらかじめ抽出したい音データに対応するラベルを識別するラベル識別情報を抽出対象ラベル記憶部２１に蓄積しておくことによって、そのラベル識別情報に対応する音データを自動的に抽出できることになる。 The extraction target label storage unit 21 stores label identification information for identifying a label corresponding to the sound data to be extracted. The sound data corresponding to the label identified by the label identification information stored in the extraction target label storage unit 21 is extracted by the extraction unit 22 described later. Therefore, by storing in the extraction target label storage unit 21 label identification information for identifying a label corresponding to sound data to be extracted in advance, the sound data corresponding to the label identification information can be automatically extracted.

抽出対象ラベル記憶部２１にラベル識別情報が記憶される過程は問わない。例えば、記録媒体を介してラベル識別情報が抽出対象ラベル記憶部２１で記憶されるようになってもよく、通信回線等を介して送信されたラベル識別情報が抽出対象ラベル記憶部２１で記憶されるようになってもよく、あるいは、入力デバイスを介して入力されたラベル識別情報が抽出対象ラベル記憶部２１で記憶されるようになってもよい。抽出対象ラベル記憶部２１での記憶は、ＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。抽出対象ラベル記憶部２１は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスク等）によって実現されうる。 The process in which the label identification information is stored in the extraction target label storage unit 21 does not matter. For example, label identification information may be stored in the extraction target label storage unit 21 via a recording medium, and label identification information transmitted via a communication line or the like is stored in the extraction target label storage unit 21. Alternatively, label identification information input via an input device may be stored in the extraction target label storage unit 21. Storage in the extraction target label storage unit 21 may be temporary storage in a RAM or the like, or may be long-term storage. The extraction target label storage unit 21 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, or the like).

抽出部２２は、ラベルの付与された音データのうち、抽出対象ラベル記憶部２１で記憶されているラベル識別情報で識別されるラベルに関する音データを音データ記憶部１４から抽出して記録媒体に蓄積する。その記録媒体は、例えば、半導体メモリや、光ディスク、磁気ディスク等であり、抽出部２２が有していてもよく、あるいは抽出部２２の外部に存在してもよい。また、この記録媒体は、音データを一時的に記憶するものであってもよく、そうでなくてもよい。「ラベルに関する音データ」については、消去対象ラベル記憶部１９が抽出対象ラベル記憶部２１となる以外、消去部２０での説明と同様である。 The extraction unit 22 extracts, from the sound data storage unit 14, the sound data related to the label identified by the label identification information stored in the extraction target label storage unit 21 from the sound data to which the label is attached, and stores it in the recording medium. accumulate. The recording medium is, for example, a semiconductor memory, an optical disk, a magnetic disk, or the like, and may be included in the extraction unit 22 or may exist outside the extraction unit 22. In addition, this recording medium may or may not store sound data temporarily. The “sound data related to the label” is the same as that described in the erasing unit 20 except that the erasure target label storage unit 19 becomes the extraction target label storage unit 21.

なお、音データ記憶部１４と、対応情報記憶部１６と、消去対象ラベル記憶部１９と、抽出対象ラベル記憶部２１との任意の２以上の記憶部は、同一の記録媒体によって実現されてもよく、あるいは、別々の記録媒体によって実現されてもよい。前者の場合には、例えば、音データを記憶している領域が音データ記憶部１４となり、２以上の対応情報を記憶している領域が対応情報記憶部１６となってもよい。 Note that any two or more storage units of the sound data storage unit 14, the correspondence information storage unit 16, the erasure target label storage unit 19, and the extraction target label storage unit 21 may be realized by the same recording medium. Alternatively, it may be realized by a separate recording medium. In the former case, for example, an area storing sound data may be the sound data storage unit 14, and an area storing two or more pieces of correspondence information may be the correspondence information storage unit 16.

また、この音データラベル付与装置１では、会議室や廊下等ごとに特有の音（例えば、パーソナルコンピュータの冷却ファンの音、モニター点灯時の恒常的な音、空調音やそれらの反射音等）が存在するため、それを用いて音データにラベルを付与するものであるが、そのラベルの付与で用いられる環境音を意図的に出力するようにしてもよい。例えば、ラベルに対応する音を出力する音出力装置を、そのラベルに対応する位置に設置しておくことにより、被験者がその位置にいる場合には、その音出力装置から出力された音の音データを取得することになる。そして、その音データを用いて、より正確なラベルの付与が可能となる。なお、その音出力装置が出力する音は、事務機器の音や空調音等とは異なる波長の特性を有する音であることが好適である。他の環境音と区別可能にするためである。また、その音出力装置が出力する音は、非可聴域の音であってもよく、そうでなくてもよい。前者の場合には、音出力装置の出力する音によって、被験者やその他の人が煩わされることがないようになる。その非可聴域の音は、可聴域の音よりも高周波側の音であることが好適である。高周波の音の方が低周波の音よりも遠くに伝わりにくいことが知られており、位置の特定にはより好適だからである。音出力装置は、例えば、スピーカを用いて音を出力してもよく、そうでなくてもよい。後者の場合には、例えば、電磁石を用いた電鈴や、ベルをたたくモータを備えたもの等であってもよい。その音出力装置は、例えば、天井に設置されてもよい。その場合に、蛍光灯から電磁誘導で電力を取得する給電技術を用いて、音出力装置用の電力を取得してもよい。また、太陽電池を用いて、音出力装置用の電力を取得してもよい。そのようにすることで、音出力装置用の電源を設けなくてもよいことになる。また、その音出力装置に人感センサを接続し、人のいないときには音を出力しないようにしてもよい。 Further, in the sound data label assigning apparatus 1, sounds unique to each meeting room, hallway, etc. (for example, a sound of a cooling fan of a personal computer, a constant sound when a monitor is lit, an air conditioning sound, a reflected sound thereof, etc.) Therefore, a label is assigned to sound data by using it, but an environmental sound used for the label may be intentionally output. For example, by installing a sound output device that outputs a sound corresponding to a label at a position corresponding to the label, when the subject is at that position, the sound of the sound output from the sound output device You will get the data. Then, more accurate labeling can be performed using the sound data. The sound output from the sound output device is preferably a sound having a wavelength characteristic different from that of office equipment, air-conditioning sound, or the like. This is to make it distinguishable from other environmental sounds. Further, the sound output from the sound output device may or may not be a sound in a non-audible range. In the former case, the subject and other people are not bothered by the sound output from the sound output device. The sound in the non-audible range is preferably a sound on the higher frequency side than the sound in the audible range. This is because it is known that a high-frequency sound is less likely to travel farther than a low-frequency sound, and is more suitable for specifying the position. For example, the sound output device may output a sound using a speaker, or may not. In the latter case, for example, an electric bell using an electromagnet, or a motor equipped with a motor that taps a bell may be used. The sound output device may be installed on the ceiling, for example. In that case, the power for the sound output device may be acquired by using a power feeding technique for acquiring power from a fluorescent lamp by electromagnetic induction. Moreover, you may acquire the electric power for sound output devices using a solar cell. By doing so, it is not necessary to provide a power source for the sound output device. Further, a human sensor may be connected to the sound output device so that no sound is output when there is no person.

次に、本実施の形態による音データラベル付与装置１の動作について、図２のフローチャートを用いて説明する。
（ステップＳ１０１）音データ受付部１１は、音データを受け付けたかどうか判断する。そして、音データを受け付けた場合には、ステップＳ１０２に進み、そうでない場合には、ステップＳ１０４に進む。 Next, the operation of the sound data label applying apparatus 1 according to the present embodiment will be described using the flowchart of FIG.
(Step S101) The sound data reception unit 11 determines whether sound data has been received. If sound data is received, the process proceeds to step S102, and if not, the process proceeds to step S104.

（ステップＳ１０２）分離部１２は、音データ受付部１１が受け付けた音データから環境音データを分離する。 (Step S102) The separation unit 12 separates the environmental sound data from the sound data received by the sound data reception unit 11.

（ステップＳ１０３）蓄積部１３は、分離部１２が分離した環境音データである音データを音データ記憶部１４に蓄積する。また、蓄積部１３は、音データ受付部１１が受け付けた分離前の音データも音データ記憶部１４に蓄積する。そして、ステップＳ１０１に戻る。 (Step S 103) The storage unit 13 stores sound data, which is environmental sound data separated by the separation unit 12, in the sound data storage unit 14. The storage unit 13 also stores the sound data before separation received by the sound data reception unit 11 in the sound data storage unit 14. Then, the process returns to step S101.

（ステップＳ１０４）ラベル付与部１８は、ラベルの付与を行うかどうか判断する。例えば、一連の音データの蓄積が終了した後に、ラベルの付与を行うと判断してもよく、ラベルの付与を行う旨の指示を音データラベル付与装置１が受け付けた場合に、ラベルの付与を行うと判断してもよく、その他のタイミングでラベルの付与を行うと判断してもよい。そして、ラベルの付与を行う場合には、ステップＳ１０５に進み、そうでない場合には、ステップＳ１０８に進む。 (Step S104) The label assigning unit 18 determines whether to apply a label. For example, it may be determined that labeling is to be performed after a series of sound data has been accumulated, and labeling is performed when the sound data label imparting apparatus 1 receives an instruction to perform labeling. It may be determined to be performed, or it may be determined that labeling is performed at other timing. If labeling is to be performed, the process proceeds to step S105, and if not, the process proceeds to step S108.

（ステップＳ１０５）音画像変換部１５は、音画像に変換する対象となる音データを音データ記憶部１４から読み出し、その音データを音画像に変換する。その音画像は、図示しない記録媒体において一時的に記憶されてもよい。 (Step S105) The sound image conversion unit 15 reads out sound data to be converted into a sound image from the sound data storage unit 14, and converts the sound data into a sound image. The sound image may be temporarily stored in a recording medium (not shown).

（ステップＳ１０６）比較部１７は、対応情報記憶部１６で記憶されているラベル音画像と、ステップＳ１０５で変換された音画像とを比較する。具体的には、比較部１７は、ラベル音画像と音画像との類似情報を算出する。 (Step S106) The comparison unit 17 compares the label sound image stored in the correspondence information storage unit 16 with the sound image converted in step S105. Specifically, the comparison unit 17 calculates similarity information between the label sound image and the sound image.

（ステップＳ１０７）ラベル付与部１８は、比較部１７による比較結果を用いて、音画像と類似性の高いラベル音画像に対応するラベルを、その音画像に対応する音データに付与する。そして、ステップＳ１０１に戻る。 (Step S107) Using the comparison result by the comparison unit 17, the label assigning unit 18 assigns a label corresponding to the label sound image having high similarity to the sound image to the sound data corresponding to the sound image. Then, the process returns to step S101.

なお、ステップＳ１０５〜Ｓ１０７の処理を、音データ記憶部１４で記憶されている音データの先頭から後端までのすべての音データに対して繰り返して実行してもよい。 Note that the processing in steps S105 to S107 may be repeatedly performed on all sound data from the beginning to the rear end of the sound data stored in the sound data storage unit 14.

（ステップＳ１０８）消去部２０は、音データの消去を行うかどうか判断する。例えば、音データへの一連のラベルの付与が終了した後に、音データの消去を行うと判断してもよく、音データの消去を行う旨の指示を音データラベル付与装置１が受け付けた場合に、音データの消去を行うと判断してもよく、その他のタイミングで音データの消去を行うと判断してもよい。そして、音データの消去を行う場合には、ステップＳ１０９に進み、そうでない場合には、ステップＳ１１２に進む。 (Step S108) The erasure unit 20 determines whether or not to erase the sound data. For example, it may be determined that the sound data is to be erased after a series of label assignments to the sound data is completed, and the sound data label attaching apparatus 1 receives an instruction to erase the sound data. It may be determined that the sound data is to be erased, or it may be determined that the sound data is to be erased at other timing. If the sound data is to be erased, the process proceeds to step S109. If not, the process proceeds to step S112.

（ステップＳ１０９）消去部２０は、消去対象ラベル記憶部１９で記憶されているラベル識別情報を用いて、そのラベル識別情報で識別されるラベルに関する音データを特定する。 (Step S109) Using the label identification information stored in the erasure target label storage unit 19, the erasure unit 20 specifies sound data relating to the label identified by the label identification information.

（ステップＳ１１０）消去部２０は、ステップＳ１０９で少なくとも一部の音データを特定したかどうか判断する。そして、特定した場合には、ステップＳ１１１に進み、そうでない場合には、ステップＳ１０１に戻る。 (Step S110) The erasure unit 20 determines whether or not at least a part of sound data is specified in step S109. And when it specifies, it progresses to step S111, and when that is not right, it returns to step S101.

（ステップＳ１１１）消去部２０は、ステップＳ１０９で特定した音データを消去する。２以上の音データの範囲を特定した場合には、そのそれぞれを消去する。そして、ステップＳ１０１に戻る。 (Step S111) The erasure unit 20 erases the sound data specified in step S109. If two or more ranges of sound data are specified, each of them is deleted. Then, the process returns to step S101.

（ステップＳ１１２）抽出部２２は、音データの抽出を行うかどうか判断する。例えば、音データへの一連のラベルの付与が終了した後に、音データの抽出を行うと判断してもよく、音データの抽出を行う旨の指示を音データラベル付与装置１が受け付けた場合に、音データの抽出を行うと判断してもよく、その他のタイミングで音データの抽出を行うと判断してもよい。そして、音データの抽出を行う場合には、ステップＳ１１３に進み、そうでない場合には、ステップＳ１０１に戻る。 (Step S112) The extraction unit 22 determines whether or not to extract sound data. For example, it may be determined that sound data is extracted after a series of labels are added to sound data, and the sound data label attaching apparatus 1 receives an instruction to extract sound data. The sound data may be extracted, or it may be determined that the sound data is extracted at other timing. If sound data is to be extracted, the process proceeds to step S113. If not, the process returns to step S101.

（ステップＳ１１３）抽出部２２は、抽出対象ラベル記憶部２１で記憶されているラベル識別情報を用いて、そのラベル識別情報で識別されるラベルに関する音データを特定する。 (Step S113) Using the label identification information stored in the extraction target label storage unit 21, the extraction unit 22 specifies sound data relating to the label identified by the label identification information.

（ステップＳ１１４）抽出部２２は、ステップＳ１１３で少なくとも一部の音データを特定したかどうか判断する。そして、特定した場合には、ステップＳ１１５に進み、そうでない場合には、ステップＳ１０１に戻る。 (Step S114) The extraction unit 22 determines whether or not at least a part of sound data has been specified in step S113. And when it specifies, it progresses to step S115, and when that is not right, it returns to step S101.

（ステップＳ１１５）抽出部２２は、ステップＳ１１３で特定した音データを抽出し、記録媒体に蓄積する。２以上の音データの範囲を特定した場合には、そのそれぞれを抽出して蓄積する。そして、ステップＳ１０１に戻る。
なお、図２のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 (Step S115) The extraction unit 22 extracts the sound data specified in Step S113 and stores it in the recording medium. When two or more ranges of sound data are specified, each of them is extracted and stored. Then, the process returns to step S101.
In the flowchart of FIG. 2, the process is terminated by powering off or a process termination interrupt.

次に、本実施の形態による音データラベル付与装置１の動作について、具体例を用いて説明する。
この具体例において、対応情報記憶部１６では、図３で示される対応情報が記憶されているものとする。図３において、ラベルは、「職場の座席」「会議室」等のように位置を示すものと、「抽出トリガ」のように位置と関係ないものとがある。また、ラベルに対応するラベル音画像「ＳＰ００１」等は、そのラベル音画像の画像データのファイル名である。そのラベル音画像「ＳＰ００１」等で識別される画像データは、それに対応するラベルに応じた音データのスペクトル画像である。なお、図３では、１個のラベルに１個のラベル音画像が対応付けられている場合について示しているが、１個のラベルに２個以上のラベル音画像が対応付けられていてもよい。例えば、廊下や会議室においては、場所によって特徴的な音が変化することがある。したがって、そのような場合には、特徴的な複数の音データを録音し、それぞれの音データに対応するラベル音画像を生成し、対応情報に登録してもよい。 Next, the operation of the sound data label applying apparatus 1 according to the present embodiment will be described using a specific example.
In this specific example, it is assumed that the correspondence information storage unit 16 stores the correspondence information shown in FIG. In FIG. 3, there are labels indicating positions such as “workplace seats” and “meeting rooms” and labels not related to positions such as “extraction trigger”. The label sound image “SP001” corresponding to the label is the file name of the image data of the label sound image. The image data identified by the label sound image “SP001” or the like is a spectrum image of sound data corresponding to the corresponding label. FIG. 3 shows a case where one label sound image is associated with one label, but two or more label sound images may be associated with one label. . For example, in hallways and conference rooms, characteristic sounds may change depending on the location. Therefore, in such a case, a plurality of characteristic sound data may be recorded, a label sound image corresponding to each sound data may be generated, and registered in the correspondence information.

また、この具体例において、消去対象ラベル記憶部１９では、図４で示されるラベル識別情報が記憶されているものとする。図４において、ラベル識別情報は、ラベルそのものを示す情報である。また、各ラベル識別情報には、拡張情報が対応付けられている。拡張情報は、ラベル識別情報に対応する音データの始点、終点をどれだけ拡張して消去するのかを示す情報である。例えば、ラベル識別情報「トイレ」の場合には、そのラベル「トイレ」の付与された音データの始点と終点をそれぞれ５秒ずつ拡張して消去することになる。したがって、ラベル「トイレ」の付与された音データよりも１０秒だけ余分に消去することになる。なお、図４では、拡張情報が正の値を有する場合について示しているが、拡張情報は負の値を有してもよい。負の値の場合には、その値の絶対値だけ、始点や終点を縮小して消去することになる。 Further, in this specific example, it is assumed that the label identification information shown in FIG. 4 is stored in the erasure target label storage unit 19. In FIG. 4, label identification information is information indicating the label itself. Each label identification information is associated with extended information. The extended information is information indicating how much the start point and the end point of the sound data corresponding to the label identification information are extended and deleted. For example, in the case of label identification information “toilet”, the start point and the end point of the sound data to which the label “toilet” is attached are expanded by 5 seconds and deleted. Therefore, the data is deleted for 10 seconds more than the sound data assigned with the label “toilet”. Although FIG. 4 shows the case where the extended information has a positive value, the extended information may have a negative value. In the case of a negative value, the start point and the end point are reduced and deleted by the absolute value of the value.

また、この具体例において、抽出対象ラベル記憶部２１では、図５で示されるラベル識別情報が記憶されているものとする。図５でも、ラベル識別情報は、ラベルそのものを示す情報である。また、各ラベル識別情報には、拡張情報が対応付けられている。拡張情報については、消去が抽出に変わった以外、図４に関する説明と同様であり、その説明を省略する。 In this specific example, it is assumed that the label identification information shown in FIG. 5 is stored in the extraction target label storage unit 21. Also in FIG. 5, the label identification information is information indicating the label itself. Each label identification information is associated with extended information. The extended information is the same as that described with reference to FIG. 4 except that erasure is changed to extraction, and the description thereof is omitted.

マイクを装着した被験者が勤務先に到着した後に、マイクに接続されたレコーダのスイッチを操作し、録音を開始したとする。なお、被験者は、意図的に音データの抽出を行いたい場合には、抽出トリガに対応する音を発生させ、その音がマイクで集音されるようにする。また、被験者は、意図的に音データの消去を行いたい場合には、消去トリガに対応する音を発生させ、その音がマイクで集音されるようにする。この具体例では、抽出トリガに対応する音のみが発生されたものとする。なお、その音は、前述のように、指を鳴らす、手をたたくと言ったように、被験者の体の一部を用いて発生されてもよく、あるいは、ある特定のノック式ボールペンのノック行為によって発生されてもよい。そして、１日中録音をした後に、そのレコーダからメモリを取り出し、音データラベル付与装置１に接続されているリーダに装着したとする。すると、音データ受付部１１は、そのメモリから録音された音データを読み出し、分離部１２と蓄積部１３とに渡す（ステップＳ１０１）。分離部１２は、受け取った音データから環境音データを分離して蓄積部１３に渡す（ステップＳ１０２）。蓄積部１３は、音データ受付部１１から受け取った音データと、分離部１２から受け取った環境音データである音データとをそれぞれ音データ記憶部１４に蓄積する（ステップＳ１０３）。図６は、そのようにして蓄積された分離後の音データの一例を示す図である。図６において、横軸によって時間が示されており、縦軸によって音圧が示されている。 Assume that after the subject wearing the microphone arrives at the office, the recorder is operated by operating the switch of the recorder connected to the microphone. When the subject wants to extract the sound data intentionally, the subject generates a sound corresponding to the extraction trigger so that the sound is collected by the microphone. In addition, when the test subject intentionally wants to erase the sound data, the subject generates a sound corresponding to the erase trigger so that the sound is collected by the microphone. In this specific example, it is assumed that only the sound corresponding to the extraction trigger is generated. The sound may be generated using a part of the subject's body, such as ringing a finger or clapping hands, as described above, or knocking with a specific knock-type ballpoint pen. May be generated. Then, it is assumed that after recording all day, the memory is taken out from the recorder and attached to the reader connected to the sound data label applying apparatus 1. Then, the sound data reception unit 11 reads out the sound data recorded from the memory and passes it to the separation unit 12 and the storage unit 13 (step S101). The separation unit 12 separates the environmental sound data from the received sound data and passes it to the storage unit 13 (step S102). The accumulating unit 13 accumulates the sound data received from the sound data receiving unit 11 and the sound data that is the environmental sound data received from the separating unit 12 in the sound data storage unit 14 (step S103). FIG. 6 is a diagram showing an example of the separated sound data accumulated as described above. In FIG. 6, time is indicated by the horizontal axis, and sound pressure is indicated by the vertical axis.

ラベル付与部１８は、一連の分離後の音データが蓄積されたことを検知すると、ラベルの付与を行うと判断し、図示しない経路によって音画像変換部１５に音画像の変換を行う旨を指示する（ステップＳ１０４）。すると、音画像変換部１５は、音データ記憶部１４で記憶されている分離後の音データにアクセスし、５０ミリ秒ごとに音画像に変換していく。具体的には、まず、音画像変換部１５は、タイムコード００：００：００．００〜００：００：００．０５の音データをフーリエ変換して、音画像に変換する（ステップＳ１０５）。その音画像は、例えば、図７で示されるように、横軸が周波数であり、縦軸が強度である画像である。なお、説明の便宜上、図７において「周波数」や「強度」等を明記しているが、音画像は、図７の波形図のみの画像情報である。 When the label attaching unit 18 detects that a series of separated sound data has been accumulated, the label attaching unit 18 determines that the label is attached, and instructs the sound image converting unit 15 to convert the sound image through a route (not shown). (Step S104). Then, the sound image conversion unit 15 accesses the separated sound data stored in the sound data storage unit 14 and converts it into a sound image every 50 milliseconds. Specifically, first, the sound image conversion unit 15 performs Fourier transform on the sound data of the time code 00: 00: 00.00 to 00: 00: 00.05 to convert it into a sound image (step S105). The sound image is, for example, an image in which the horizontal axis is frequency and the vertical axis is intensity, as shown in FIG. For convenience of explanation, “frequency”, “intensity”, and the like are clearly shown in FIG. 7, but the sound image is image information of only the waveform diagram of FIG.

次に、比較部１７は、音画像変換部１５によって変換された音画像と、図３で示される対応情報に含まれるラベル音画像との類似情報を算出し、その算出結果をラベル付与部１８に渡す（ステップＳ１０６）。ラベル付与部１８は、比較部１７によって算出された類似情報のうち、最も高い類似性を示す類似情報を特定し、その類似情報に対応するラベルを取得する。この場合には、「廊下」であったとする。すると、ラベル付与部１８は、そのタイムコード００：００：００．００〜００：００：００．０５の音データに対して、ラベル「廊下」を付与する（ステップＳ１０７）。その後、５０ミリ秒ごとに音画像への変換と、音画像の比較と、ラベルの付与とが順次繰り返して行われる（ステップＳ１０５〜Ｓ１０７）。図８は、そのようにして音データに付与されたラベルの一例を示す図である。図８において、タイムコードと、ラベルとが対応付けられている。図８で示される情報は、ラベル付与部１８によって、音データ記憶部１４に蓄積されるものとする。また、ラベル付与部１８は、音データのすべてのタイムコードに対してラベルを付与した後に、図８で示される情報のタイムコードをラベルごとにマージして、図９で示される情報を生成し、音データ記憶部１４に蓄積するものとする。 Next, the comparison unit 17 calculates similar information between the sound image converted by the sound image conversion unit 15 and the label sound image included in the correspondence information shown in FIG. (Step S106). The label assigning unit 18 specifies similar information indicating the highest similarity among the similar information calculated by the comparing unit 17 and acquires a label corresponding to the similar information. In this case, it is assumed that it is a “corridor”. Then, the label assigning unit 18 assigns a label “corridor” to the sound data of the time code 00: 00: 00.00 to 00: 00: 00.05 (step S107). Thereafter, the conversion to the sound image, the comparison of the sound images, and the labeling are sequentially repeated every 50 milliseconds (steps S105 to S107). FIG. 8 is a diagram showing an example of the label attached to the sound data as described above. In FIG. 8, time codes and labels are associated with each other. The information shown in FIG. 8 is accumulated in the sound data storage unit 14 by the label assigning unit 18. Further, after giving labels to all the time codes of the sound data, the label attaching unit 18 merges the time codes of the information shown in FIG. 8 for each label to generate the information shown in FIG. It is assumed that the sound data storage unit 14 is accumulated.

消去部２０は、一連のラベルの付与が終了したことを検知すると、音データの消去を行うと判断し（ステップＳ１０８）、図９のラベルのうち、図４のラベル識別情報で識別されるラベルに対応するタイムコードを特定する（ステップＳ１０９）。この場合には、ラベル「トイレ」に対応するタイムコード００：４０：２５．２０〜００：４１：３８．１０，…が特定されたとする。すると、消去部２０は、消去対象が存在すると判断し（ステップＳ１１０）、その消去対象であるラベル「トイレ」に対応する拡張情報を読み出し、特定したタイムコード００：４０：２５．２０〜００：４１：３８．１０，…を、読み出した拡張情報に応じて変更する。具体的には、各タイムコードの始点を５秒減らし、終点を５秒増やした拡張後のタイムコード００：４０：２０．２０〜００：４１：４３．１０，…を生成し、そのタイムコードに対応する音データ（この音データは、分離前の音データである）を消去する（ステップＳ１１１）。 When the erasing unit 20 detects the end of the series of label assignments, it determines that the sound data is to be erased (step S108), and among the labels shown in FIG. 9, the label identified by the label identification information shown in FIG. A time code corresponding to is specified (step S109). In this case, it is assumed that the time code 00: 40: 25.20 to 00: 41: 38.10,... Corresponding to the label “toilet” is specified. Then, the erasure unit 20 determines that there is an erasure target (step S110), reads the extended information corresponding to the label “toilet” that is the erasure target, and specifies the specified time code 00: 40: 25.20-00: 41: 38.10, ... are changed according to the read extended information. Specifically, the expanded time code 00: 40: 20.20 to 00: 41: 43.10, with the start point of each time code reduced by 5 seconds and the end point increased by 5 seconds, is generated, and the time code is generated. The sound data corresponding to (this sound data is the sound data before separation) is deleted (step S111).

次に、抽出部２２は、音データの消去が終了したことを検知すると、音データの抽出を行うと判断し（ステップＳ１１２）、図９のラベルのうち、図５のラベル識別情報で識別されるラベルに対応するタイムコードを特定する（ステップＳ１１３）。この場合には、ラベル「抽出トリガ」に対応するタイムコード００：４１：５６．５５〜００：４１：５７．５０が特定されたとする。すると、抽出部２２は、抽出対象が存在すると判断し（ステップＳ１１４）、その抽出対象であるラベル「抽出トリガ」に対応する拡張情報を読み出し、特定したタイムコード００：４１：５６．５５〜００：４１：５７．５０を、読み出した拡張情報に応じて変更する。その拡張後のタイムコードは、００：４１：５６．５５〜００：４２：５７．５０となる。そして、抽出部２２は、その拡張後のタイムコードに対応する音データ（この音データは、分離前の音データである）を抽出し、その抽出した音データを図示しない記録媒体に蓄積する（ステップＳ１１５）。 Next, when the extraction unit 22 detects that the deletion of the sound data has ended, it determines that the sound data is to be extracted (step S112), and is identified by the label identification information of FIG. 5 among the labels of FIG. A time code corresponding to the label is identified (step S113). In this case, it is assumed that the time code 00: 41: 56.55 to 00: 41: 57.50 corresponding to the label “extraction trigger” is specified. Then, the extraction unit 22 determines that an extraction target exists (step S114), reads the extended information corresponding to the label “extraction trigger” that is the extraction target, and specifies the specified time code 00: 41: 56.55 to 00. : 41: 57.50 is changed according to the read extended information. The time code after the expansion is 00: 41: 56.55 to 00: 42: 57.50. Then, the extraction unit 22 extracts sound data corresponding to the expanded time code (this sound data is sound data before separation), and stores the extracted sound data in a recording medium (not shown) ( Step S115).

なお、この具体例では、タイムコードが相対的な時間を示すものである場合について説明したが、そうでなくてもよい。前述のように、絶対的な日時を示すタイムコードを用いてもよい。また、この具体例のように、相対的な時間を示すタイムコードを用いた場合であっても、そのタイムコードの任意の点（例えば、始点や終点、あるいは、それ以外の時点）に対応する日時を特定しておくことが好適である。 In this specific example, the case where the time code indicates relative time has been described, but this need not be the case. As described above, a time code indicating an absolute date and time may be used. Further, as in this specific example, even when a time code indicating relative time is used, it corresponds to an arbitrary point (for example, a start point, an end point, or other time point) of the time code. It is preferable to specify the date and time.

また、この具体例では、消去トリガや抽出トリガを用いて音データを消去したり、音データを抽出したりする場合について説明したが、その消去トリガや抽出トリガには、消去対象や抽出対象の始点を示すトリガと、終点を示すトリガとの両方が存在してもよい。そして、消去部２０は、始点を示す消去トリガに対応する音データの時点から、終点を示す消去トリガに対応する音データの時点までを消去し、抽出部２２は、始点を示す抽出トリガに対応する音データの時点から、終点を示す抽出トリガに対応する音データの時点までを抽出してもよい。なお、この場合には、始点を示すトリガと、終点を示すトリガとにそれぞれ別のラベルが付与されるものとする。 In this specific example, the case where sound data is erased or sound data is extracted using an erase trigger or an extraction trigger has been described. However, the erase trigger or extraction trigger includes an erase target or an extraction target. There may be both a trigger indicating the start point and a trigger indicating the end point. Then, the erasure unit 20 erases from the time point of the sound data corresponding to the erasure trigger indicating the start point to the time point of the sound data corresponding to the erasure trigger indicating the end point, and the extraction unit 22 corresponds to the extraction trigger indicating the start point. From the time of the sound data to be performed to the time of the sound data corresponding to the extraction trigger indicating the end point may be extracted. In this case, different labels are assigned to the trigger indicating the start point and the trigger indicating the end point.

また、音データを音画像に変換する際に、ノイズを除去する処理を行ってもよい。例えば、時間積分処理や、その他の手法を用いてノイズを除去してもよい。時間積分処理等のノイズを除去する方法についてはすでに公知であり、その説明を省略する。 Moreover, when converting sound data into a sound image, a process of removing noise may be performed. For example, noise may be removed using time integration processing or other methods. A method of removing noise such as time integration processing is already known and will not be described.

以上のように、本実施の形態による音データラベル付与装置１によれば、音データに対して、位置等を示すラベルを付与することができる。その結果、そのラベルを用いて、音データの検索等を行うことができるようになり、所望の音データにアクセスすることが容易になる。例えば、廊下である人と話した際の音データにアクセスしたいと思った場合に、その時間帯等を記憶していないこともある。そのような場合であっても、本実施の形態のようにラベルが付与されていることによって、ラベル「廊下」に対応する音データのみを調べればよいことになり、すべての音データを調べるよりも容易に所望の音データにアクセスできることになる。 As described above, according to the sound data label applying apparatus 1 according to the present embodiment, a label indicating a position or the like can be attached to sound data. As a result, it becomes possible to search for sound data using the label, and it becomes easy to access desired sound data. For example, when it is desired to access sound data when talking with a person in the hallway, the time zone may not be stored. Even in such a case, only the sound data corresponding to the label “corridor” needs to be checked because the label is given as in the present embodiment, rather than checking all the sound data. Also, the desired sound data can be easily accessed.

また、あらかじめ指定しているラベルに対応する音データを自動的に消去したり、抽出したりすることができるため、例えば、プライバシーに関わる音データを自動的に消去することができ、また、抽出したい音データを自動的に抽出することができる。 In addition, since sound data corresponding to a label specified in advance can be automatically deleted or extracted, for example, sound data related to privacy can be automatically deleted or extracted. The desired sound data can be automatically extracted.

なお、本実施の形態の具体例では、一定の間隔（５０ミリ秒）ごとに音画像への変換を行う場合について説明したが、そうでなくてもよい。例えば、音画像変換部１５は、周波数のピークを連続して有する期間の音データを一の音画像に変換してもよい。周波数のピークを有する音データとは、図１０で示されるように、音画像においてピークが存在する場合（図１０では、１７００Ｈｚのあたりにピークが存在する）には、そのピークが連続している期間の音データを１個の音画像に変換してもよい。そのピークの連続において、ピークの周波数は変化しなくてもよく、あるいは、ピークの周波数が変化してもよい。例えば、駅ごとに電車の出発時に流れる電子音のメロディーが異なる場合がある。そのような場合には、そのメロディーの始めから終わりまでではピークが連続するため、そのメロディーの始めから終わりまでが１個の音画像に変換されることになり、その音画像を用いて駅の位置に応じたラベルを付与することができるようになる。この場合には、対応情報においてラベルと対応付けられるラベル音画像も、ピークが連続している期間の音データに応じた音画像であることが好適である。具体的には、音画像変換部１５は、前述の説明のように、５０ミリ秒ごとに音画像への変換を行い、ある音画像にピークが存在する場合には、その後の音画像への変換においてピークが存在しなくなるまで音画像への変換を継続すると共に、ピークが存在しなくなると、ピークの連続していた期間の音データの全体を音画像に変換する。そして、比較部１７は、ピークを有する５０ミリ秒ごとの音画像ではなく、ピークの連続していた期間の音データに対応する音画像を用いて、ラベル音画像との比較を行う。 In the specific example of the present embodiment, the case where the sound image is converted at regular intervals (50 milliseconds) is described, but this need not be the case. For example, the sound image conversion unit 15 may convert sound data of a period having continuous frequency peaks into one sound image. As shown in FIG. 10, sound data having a frequency peak is continuous when there is a peak in the sound image (in FIG. 10, there is a peak around 1700 Hz). The sound data of the period may be converted into one sound image. In the continuation of the peak, the peak frequency may not change or the peak frequency may change. For example, the melody of the electronic sound that flows when the train departs may differ from station to station. In such a case, since the peak continues from the beginning to the end of the melody, the beginning to the end of the melody will be converted into one sound image. A label according to the position can be given. In this case, the label sound image associated with the label in the correspondence information is also preferably a sound image corresponding to sound data in a period in which peaks are continuous. Specifically, as described above, the sound image conversion unit 15 performs conversion to a sound image every 50 milliseconds, and when there is a peak in a certain sound image, The conversion to the sound image is continued until the peak does not exist in the conversion, and when the peak does not exist, the entire sound data in the period in which the peak is continuous is converted into the sound image. And the comparison part 17 compares with a label sound image using the sound image corresponding to the sound data of the period when the peak continued instead of the sound image for every 50 milliseconds which has a peak.

ここで、ピークの存在を検知する方法について簡単に説明する。例えば、音データをフーリエ変換したスペクトルの強度（パワー）をｆ（ｘ）とする。なお、ｘは周波数である。そして、そのｆ（ｘ）の周波数ｘを下端の周波数から上端の周波数まで順次、変えた場合に、次の関係１，２のいずれか一方を満たすｘが存在するのであれば、ピークが存在すると判断してもよい。 Here, a method for detecting the presence of a peak will be briefly described. For example, let f (x) be the intensity (power) of a spectrum obtained by Fourier transforming sound data. Note that x is a frequency. Then, when the frequency x of f (x) is sequentially changed from the lower end frequency to the upper end frequency, if x satisfying either of the following relations 1 and 2 exists, a peak is present. You may judge.

関係１：ｆ（ｘ）×Ｓ＜ｆ（ｘ＋Δｘ）
関係２：ｆ（ｘ）＞ｆ（ｘ＋Δｘ）×Ｓ Relationship 1: f (x) × S <f (x + Δx)
Relationship 2: f (x)> f (x + Δx) × S

ただし、Ｓはピーク検出の目安となるしきい値であり、例えば、Ｓは１．５や３等の値であってもよい。また、Δｘは、周波数の増分（例えば、５Ｈｚ等）である。このピークの検出方法から明らかなように、ピークの存在の有無は、フーリエ変換後の強度ｆ（ｘ）を微分したｆ'（ｘ）の値域の絶対値が、Ｓを超える値をとる周波数ｘが存在するかどうかによって判断してもよい。その周波数ｘが存在するのであれば、ピークが存在することになり、その周波数ｘが存在しないのであれば、ピークが存在しないことになる。 However, S is a threshold value used as a standard for peak detection. For example, S may be a value such as 1.5 or 3. Δx is a frequency increment (for example, 5 Hz). As is apparent from this peak detection method, the presence or absence of a peak is determined by the frequency x at which the absolute value of the range of f ′ (x) obtained by differentiating the intensity f (x) after Fourier transformation exceeds S. Judgment may be made based on whether or not exists. If the frequency x exists, a peak exists, and if the frequency x does not exist, no peak exists.

（実施の形態２）
本発明の実施の形態２による音データラベル付与装置２について、図面を参照しながら説明する。本実施の形態による音データラベル付与装置２は、複数の音出力装置の出力する音を用いて、音データに被験者の座標をも付与するものである。 (Embodiment 2)
A sound data label applying apparatus 2 according to Embodiment 2 of the present invention will be described with reference to the drawings. The sound data label assigning device 2 according to the present embodiment also provides the subject's coordinates to sound data using sounds output from a plurality of sound output devices.

図１１は、本実施の形態による音データラベル付与装置２の構成を示すブロック図である。本実施の形態による音データラベル付与装置２は、音データ受付部１１と、分離部１２と、蓄積部１３と、音データ記憶部１４と、音画像変換部１５と、対応情報記憶部１６と、比較部１７と、ラベル付与部１８と、消去対象ラベル記憶部１９と、消去部２０と、抽出対象ラベル記憶部２１と、抽出部２２と、位置対応情報記憶部３１と、座標算出部３２と、座標付与部３３とを備える。位置対応情報記憶部３１、座標算出部３２、座標付与部３３以外の構成及び動作は、ラベル付与部１８が座標算出用のラベルを付与する際に、３個のラベルと、その各ラベルに対応する類似情報とを付与する以外、実施の形態１と同様であり、その説明を省略する。 FIG. 11 is a block diagram showing the configuration of the sound data label assigning apparatus 2 according to this embodiment. The sound data label assigning apparatus 2 according to the present embodiment includes a sound data receiving unit 11, a separation unit 12, a storage unit 13, a sound data storage unit 14, a sound image conversion unit 15, and a correspondence information storage unit 16. The comparison unit 17, the label addition unit 18, the erasure target label storage unit 19, the erasure unit 20, the extraction target label storage unit 21, the extraction unit 22, the position correspondence information storage unit 31, and the coordinate calculation unit 32. And a coordinate assigning unit 33. Configurations and operations other than the position correspondence information storage unit 31, the coordinate calculation unit 32, and the coordinate assignment unit 33 correspond to three labels and each label when the label assignment unit 18 assigns a label for coordinate calculation. This is the same as Embodiment 1 except that similar information is given, and the description thereof is omitted.

位置対応情報記憶部３１では、３個以上の位置対応情報が記憶される。位置対応情報は、ラベルとそのラベルに対応する音を出力する音出力装置の座標とを対応付ける情報である。すなわち、位置対応情報では、音出力装置の座標と、その音出力装置を識別できるラベルとが対応付けられていることになる。音出力装置は、実施の形態１で説明したものと同様のものである。音出力装置の座標は、局所的な座標（例えば、会議室内での座標、オフィス内での座標等）であってもよく、あるいは、グローバルな座標（例えば、緯度と経度等）であってもよい。 The position correspondence information storage unit 31 stores three or more pieces of position correspondence information. The position correspondence information is information that associates a label with coordinates of a sound output device that outputs a sound corresponding to the label. That is, in the position correspondence information, the coordinates of the sound output device and the label that can identify the sound output device are associated with each other. The sound output device is the same as that described in the first embodiment. The coordinates of the sound output device may be local coordinates (for example, coordinates in a conference room, coordinates in an office, etc.) or global coordinates (for example, latitude and longitude). Good.

位置対応情報記憶部３１に３個以上の位置対応情報が記憶される過程は問わない。例えば、記録媒体を介して３個以上の位置対応情報が位置対応情報記憶部３１で記憶されるようになってもよく、通信回線等を介して送信された３個以上の位置対応情報が位置対応情報記憶部３１で記憶されるようになってもよく、あるいは、入力デバイスを介して入力された３個以上の位置対応情報が位置対応情報記憶部３１で記憶されるようになってもよい。位置対応情報記憶部３１での記憶は、ＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。位置対応情報記憶部３１は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスク等）によって実現されうる。 The process of storing three or more pieces of position correspondence information in the position correspondence information storage unit 31 is not limited. For example, three or more pieces of position correspondence information may be stored in the position correspondence information storage unit 31 via a recording medium, and three or more pieces of position correspondence information transmitted via a communication line or the like may be stored. The correspondence information storage unit 31 may store the information, or three or more pieces of position correspondence information input via the input device may be stored in the position correspondence information storage unit 31. . The storage in the position correspondence information storage unit 31 may be temporary storage in a RAM or the like, or may be long-term storage. The position correspondence information storage unit 31 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

ここで、「ラベルと、座標とを対応付ける」とは、ラベルから座標を取得できればよいという意味である。したがって、位置対応情報は、ラベルと座標とを組として含む情報を有してもよく、ラベルと座標とをリンク付ける情報であってもよい。後者の場合には、位置対応情報は、例えば、ラベルと座標の格納されている位置を示すポインタやアドレスとを対応付ける情報であってもよい。本実施の形態では、前者の場合について説明する。また、ラベルと座標とは、直接対応付けられていなくてもよい。例えば、ラベルに、第３の情報が対応しており、その第３の情報に座標が対応していてもよい。 Here, “associating the label with the coordinates” means that the coordinates need only be acquired from the label. Therefore, the position correspondence information may include information including a label and coordinates as a set, or may be information that links the label and coordinates. In the latter case, the position correspondence information may be, for example, information that associates a label with a pointer or address indicating a position where coordinates are stored. In the present embodiment, the former case will be described. Further, the label and the coordinate need not be directly associated with each other. For example, the third information may correspond to the label, and the coordinates may correspond to the third information.

座標算出部３２は、音データに付与された３個のラベル及び類似情報と、３個のラベルにそれぞれ対応する座標とを用いて、音データに対応する座標を算出する。すなわち、座標算出部３２は、音データに対応する、３個の類似情報と、その３個の類似情報にそれぞれ応じた座標とを用いて、音データに対応する座標を算出する。この座標の算出は、３点測量と同様の方法によって行うことができる。なお、１個の音源から出力された音を４個のマイク（空間なので４個である。平面であれば３個でよい）で集音することによって音源の３次元空間での位置を特定する技術が特開平６−２４１８８３号公報で開示されている。座標算出部３２が行うことは、これと逆に、３個の点音源から出力された同一音圧の音を１個のマイクで集音することによってマイクの２次元平面での位置を特定するものである。例えば、音源Ａに対応する類似情報の示す類似度がＸ％であり、音源Ｂに対応する類似情報の示す類似度がＹ％であり、音源Ｃに対応する類似情報の示す類似度がＺ％である場合には、算出対象となる座標は、音源Ａまでの距離と、音源Ｂまでの距離との比がＳＱＲ（１／Ｘ）：ＳＱＲ（１／Ｙ）となる位置であり、音源Ｂまでの距離と、音源Ｃまでの距離との比がＳＱＲ（１／Ｙ）：ＳＱＲ（１／Ｚ）となる位置であり、音源Ｃまでの距離と、音源Ａまでの距離との比がＳＱＲ（１／Ｚ）：ＳＱＲ（１／Ｘ）となる位置である。したがって、この条件と、音源Ａ，Ｂ，Ｃの座標とを用いて、音データに対応する座標を算出することができる。なお、ＳＱＲ（Ｍ）は、Ｍの平方根を意味している。 The coordinate calculation unit 32 calculates the coordinates corresponding to the sound data using the three labels and similar information given to the sound data and the coordinates corresponding to the three labels. That is, the coordinate calculation unit 32 calculates the coordinates corresponding to the sound data using the three pieces of similar information corresponding to the sound data and the coordinates corresponding to the three pieces of similar information. The calculation of the coordinates can be performed by the same method as the three-point survey. Note that the sound output from one sound source is collected by four microphones (four because it is a space, but three if it is a plane), and the position of the sound source in the three-dimensional space is specified. A technique is disclosed in Japanese Patent Laid-Open No. 6-241883. What the coordinate calculation unit 32 does, conversely, identifies the position of the microphone on the two-dimensional plane by collecting sounds of the same sound pressure output from the three point sound sources with one microphone. Is. For example, the similarity indicated by the similar information corresponding to the sound source A is X%, the similarity indicated by the similar information corresponding to the sound source B is Y%, and the similarity indicated by the similar information corresponding to the sound source C is Z%. , The coordinates to be calculated are positions where the ratio of the distance to the sound source A and the distance to the sound source B is SQR (1 / X): SQR (1 / Y). The ratio between the distance to the sound source C and the distance to the sound source C is SQR (1 / Y): SQR (1 / Z), and the ratio between the distance to the sound source C and the distance to the sound source A is SQR. (1 / Z): It is a position that becomes SQR (1 / X). Therefore, using this condition and the coordinates of the sound sources A, B, and C, the coordinates corresponding to the sound data can be calculated. SQR (M) means the square root of M.

なお、３個の音源に対応した音出力装置が、それぞれ単一の周波数の音を出力するものである場合には、その３個の音源の音を含む音データの音画像は、３個のピークを有するものとなり、各音源に対応したラベル音画像は、それぞれ各音源に対応した１個のピークを有するものとなる。したがって、類似情報を算出する際に、そのピークの高さの比に応じた類似情報を算出してもよい。例えば、ある音源の周波数のピークについて、音画像のピークの高さが、ラベル音画像のピークの高さの７０％であれば、比較部１７は、類似性が７０％であることを示す類似情報を算出してもよい。 In addition, when the sound output device corresponding to the three sound sources outputs a single frequency sound, the sound image of the sound data including the sound of the three sound sources has three sound images. The label sound image corresponding to each sound source has one peak corresponding to each sound source. Therefore, when calculating the similar information, the similar information corresponding to the ratio of the peak heights may be calculated. For example, if the peak height of the sound image is 70% of the peak height of the label sound image with respect to the frequency peak of a certain sound source, the comparison unit 17 indicates that the similarity is 70%. Information may be calculated.

座標付与部３３は、座標算出部３２が算出した座標を、座標に対応する音データに付与する。座標に対応する音データとは、その座標の算出で用いられた類似情報の付与されている音データのことである。なお、音データに座標を付与する方法は、音データにラベルを付与する方法と同様であり、その説明を省略する。 The coordinate assigning unit 33 assigns the coordinates calculated by the coordinate calculating unit 32 to the sound data corresponding to the coordinates. The sound data corresponding to the coordinates is sound data to which similar information used in the calculation of the coordinates is given. Note that the method of assigning coordinates to sound data is the same as the method of assigning labels to sound data, and a description thereof is omitted.

なお、前述のように、本実施の形態では、比較部１７は、音画像とラベル音画像との比較の際に、両者の類似性に関する情報である類似情報を算出するものである。 As described above, in the present embodiment, the comparison unit 17 calculates similarity information that is information regarding the similarity between the sound image and the label sound image.

また、ラベル付与部１８は、座標算出用のラベルを付与する際には、各音データに、３個のラベルと、その３個のラベルにそれぞれ対応する類似情報とを対応付けて付与するものとする。ただし、その３個のラベルは、類似性の最も高い方から３個のラベルであるとする。 In addition, when providing a label for coordinate calculation, the label attaching unit 18 associates each sound data with three labels and similar information respectively corresponding to the three labels. And However, it is assumed that the three labels are the three labels with the highest similarity.

また、音データ記憶部１４と、対応情報記憶部１６と、消去対象ラベル記憶部１９と、抽出対象ラベル記憶部２１と、位置対応情報記憶部３１との任意の２以上の記憶部は、同一の記録媒体によって実現されてもよく、あるいは、別々の記録媒体によって実現されてもよい。 Further, any two or more storage units of the sound data storage unit 14, the correspondence information storage unit 16, the erasure target label storage unit 19, the extraction target label storage unit 21, and the position correspondence information storage unit 31 are the same. These recording media may be realized, or may be realized by separate recording media.

次に、本実施の形態による音データラベル付与装置２の動作について、図１２のフローチャートを用いて説明する。なお、ステップＳ２０１〜Ｓ２０５以外の処理については、実施の形態１における図２のフローチャートと同様であり、その説明を省略する。 Next, operation | movement of the sound data label provision apparatus 2 by this Embodiment is demonstrated using the flowchart of FIG. The processes other than steps S201 to S205 are the same as those in the flowchart of FIG. 2 in the first embodiment, and a description thereof will be omitted.

（ステップＳ２０１）座標付与部３３は、座標の付与を行うかどうか判断する。例えば、音データへの一連のラベルの付与が終了した後に、座標の付与を行うと判断してもよく、座標の付与を行う旨の指示を音データラベル付与装置１が受け付けた場合に、座標の付与を行うと判断してもよく、その他のタイミングで座標の付与を行うと判断してもよい。そして、座標の付与を行う場合には、ステップＳ２０２に進み、そうでない場合には、ステップＳ１０１に戻る。 (Step S201) The coordinate assigning unit 33 determines whether or not to assign coordinates. For example, it may be determined that the application of coordinates is performed after a series of labels are applied to the sound data. When the sound data label applying apparatus 1 receives an instruction to apply the coordinates, May be determined to be applied, or it may be determined that the coordinates are applied at other timing. If the coordinates are assigned, the process proceeds to step S202. If not, the process returns to step S101.

（ステップＳ２０２）座標算出部３２は、座標の付与を行う対象となる音データを特定する。音データが座標の付与を行う対象であるかどうかは、その音データに座標算出用のラベルが付与されているかどうかによって判断することができる。 (Step S 202) The coordinate calculation unit 32 specifies sound data that is a target to which coordinates are to be assigned. Whether or not the sound data is a target to which coordinates are assigned can be determined by whether or not a label for coordinate calculation is given to the sound data.

（ステップＳ２０３）座標算出部３２は、ステップＳ２０２で少なくとも一部の音データを特定したかどうか判断する。そして、特定した場合には、ステップＳ２０４に進み、そうでない場合には、ステップＳ１０１に戻る。 (Step S203) The coordinate calculation unit 32 determines whether or not at least a part of sound data has been specified in Step S202. Then, if specified, the process proceeds to step S204, and if not, the process returns to step S101.

（ステップＳ２０４）座標算出部３２は、ステップＳ２０２で特定した音データについて座標を算出する。具体的には、その特定した音データに付与されているラベルに対応する座標を、位置対応情報記憶部３１で記憶されている位置対応情報を用いて取得し、その取得した座標と、その座標に対応する類似情報とを用いて、音データに対応する座標を算出する。 (Step S204) The coordinate calculation unit 32 calculates coordinates for the sound data specified in step S202. Specifically, the coordinates corresponding to the label given to the specified sound data are acquired using the position correspondence information stored in the position correspondence information storage unit 31, the acquired coordinates, and the coordinates The coordinates corresponding to the sound data are calculated using the similar information corresponding to.

（ステップＳ２０５）座標付与部３３は、座標算出部３２が算出した座標を、その座標に対応する音データに付与する。そして、ステップＳ１０１に戻る。 (Step S205) The coordinate assigning unit 33 assigns the coordinates calculated by the coordinate calculating unit 32 to the sound data corresponding to the coordinates. Then, the process returns to step S101.

なお、２以上の座標の算出を行う場合には、ステップＳ２０４，Ｓ２０５の処理を繰り返して実行してもよい。また、図１２のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 When calculating two or more coordinates, the processes in steps S204 and S205 may be repeated. Further, in the flowchart of FIG. 12, the processing is ended by powering off or interruption for aborting the processing.

次に、本実施の形態による音データラベル付与装置２の動作について、具体例を用いて説明する。
この具体例では、対応情報記憶部１６において、図１３で示される対応情報が記憶されているものとする。図１３の対応情報において、ラベルとラベル音画像とが対応付けられて記憶されているのは、図３の対応情報と同様である。なお、音源１０１等は、音出力装置を識別するラベルである。すなわち、「音源」の含まれるラベルが、座標算出用のラベルである。また、その音源１０１等に対応するラベル音画像ＳＰ１０１等は、音源１０１等から出力される音に対応する音画像である。 Next, operation | movement of the sound data label provision apparatus 2 by this Embodiment is demonstrated using a specific example.
In this specific example, it is assumed that the correspondence information shown in FIG. 13 is stored in the correspondence information storage unit 16. In the correspondence information in FIG. 13, the label and the label sound image are stored in association with each other as in the correspondence information in FIG. 3. The sound source 101 or the like is a label for identifying a sound output device. That is, a label including “sound source” is a coordinate calculation label. Also, the label sound image SP101 corresponding to the sound source 101 or the like is a sound image corresponding to the sound output from the sound source 101 or the like.

また、この具体例において、位置対応情報記憶部３１では、図１４で示される位置対応情報が記憶されているものとする。図１４の位置対応情報において、ラベルと座標とが対応付けられている。その座標は、ある部屋における位置を示す座標である。 Further, in this specific example, it is assumed that the position correspondence information storage unit 31 stores the position correspondence information shown in FIG. In the position correspondence information in FIG. 14, labels and coordinates are associated with each other. The coordinates are coordinates indicating a position in a room.

また、この具体例において、消去対象ラベル記憶部１９で図４の情報が記憶されており、抽出対象ラベル記憶部２１で図５の情報が記憶されていることは、実施の形態１の具体例と同様であるとする。
また、この具体例では、被験者は、図１４の位置対応情報で示される位置に配置された音出力装置の間をも移動するものとする。 Further, in this specific example, the information of FIG. 4 is stored in the erasure target label storage unit 19 and the information of FIG. 5 is stored in the extraction target label storage unit 21 is a specific example of the first embodiment. It is assumed that
In this specific example, it is assumed that the subject also moves between sound output devices arranged at the positions indicated by the position correspondence information in FIG.

また、この具体例において、音データを受け付けて蓄積する処理や、ラベルを付与する処理、音データの消去、音データの抽出の処理は、実施の形態１の具体例と同様であり、その説明を省略する。ただし、ラベルを付与する際に、変換後の音画像と最も高い類似性を有するラベル音画像が座標算出用のラベルである場合（すなわち、そのラベルに「音源」が含まれる場合）には、ラベル付与部１８は、音データに対して、類似性の高い順に３個のラベルを付与すると共に、その付与した各ラベルに対応する類似情報をも付与するものとする。変換後の音画像と最も高い類似性を有するラベル音画像が座標算出用のラベルでない場合には、実施の形態１の具体例と同様にしてラベルの付与が行われるものとする。 In this specific example, the process of receiving and storing sound data, the process of assigning labels, the process of erasing sound data, and the process of extracting sound data are the same as those of the specific example of the first embodiment. Is omitted. However, if the label sound image having the highest similarity with the converted sound image is a label for coordinate calculation when the label is given (that is, if the label includes “sound source”), The label assigning unit 18 assigns three labels to sound data in descending order of similarity, and also assigns similar information corresponding to the given labels. If the label sound image having the highest similarity with the converted sound image is not a coordinate calculation label, labeling is performed in the same manner as in the specific example of the first embodiment.

具体的には、ラベル付与部１８が５０ミリ秒ごとにラベルを付与した結果は、例えば、図１５で示されるようになる。図１５において、タイムコード０１：１８：４０．１５〜０１：１８：４０．２０等に対応して、３個の座標算出用のラベルと、各ラベルに対応する類似情報とが付与されていることが分かる。なお、この具体例において、類似情報は、類似性を示す情報（類似度）であるとする。 Specifically, the result of labeling by the label applying unit 18 every 50 milliseconds is as shown in FIG. 15, for example. In FIG. 15, corresponding to time codes 01: 18: 40.15-01: 18: 40.20 etc., three coordinate calculation labels and similar information corresponding to each label are given. I understand that. In this specific example, the similarity information is information (similarity) indicating similarity.

座標付与部３３は、実施の形態１の具体例で説明したデータの抽出が終了したことを検知すると、座標の付与を行うと判断し（ステップＳ２０１）、座標算出部３２に座標の算出を行う旨の指示を渡す。すると、座標算出部３２は、図１５で示される情報を参照し、座標算出用のラベルが付与されているタイムコード０１：１８：４０．１５〜０１：１８：４０．２０等の音データを特定する（ステップＳ２０２）。そして、特定された座標の付与対象となる音データが存在するため（ステップＳ２０３）、座標算出部３２は、図１５の特定されたレコードごとに、座標の算出を行う。例えば、タイムコード０１：１８：４０．１５〜０１：１８：４０．２０に対応する音データについては、音源１０１に対応する類似情報が６８％であり、音源１０２に対応する類似情報が４３％であり、音源１０３に対応する類似情報が４３％であるため、それらの類似情報に応じて前述のようにして座標（３，３）を算出し、座標付与部３３に渡す（ステップＳ２０４）。座標付与部３３は、その座標を、対応する音データに付与する（ステップＳ２０５）。その結果、図１６で示されるように、タイムコード０１：１８：４０．１５〜０１：１８：４０．２０に対応する座標が付与されることになる。 When the coordinate assigning unit 33 detects that the data extraction described in the specific example of the first embodiment has been completed, the coordinate assigning unit 33 determines to add the coordinate (step S201), and calculates the coordinates to the coordinate calculating unit 32. Give instructions to that effect. Then, the coordinate calculation unit 32 refers to the information shown in FIG. 15, and obtains sound data such as time codes 01: 18: 40.15-01: 18: 40.20 to which a coordinate calculation label is attached. Specify (step S202). Since there is sound data to which the specified coordinates are to be assigned (step S203), the coordinate calculation unit 32 calculates the coordinates for each specified record in FIG. For example, for sound data corresponding to the time code 01: 18: 40.15-01: 18: 40.20, the similar information corresponding to the sound source 101 is 68%, and the similar information corresponding to the sound source 102 is 43%. Since the similarity information corresponding to the sound source 103 is 43%, the coordinates (3, 3) are calculated as described above according to the similarity information and passed to the coordinate assigning unit 33 (step S204). The coordinate assigning unit 33 assigns the coordinates to the corresponding sound data (step S205). As a result, as shown in FIG. 16, coordinates corresponding to the time code 01: 18: 40.15 to 01: 18: 40.20 are given.

なお、座標算出部３２は、座標の算出対象となるすべてのタイムコードについて、座標の算出を行い、座標付与部３３は、その算出された座標を付与する。ただし、図１５のタイムコード０１：１８：４０．２０〜０１：１８：４０．２５のように、直前のタイムコードと同じラベル、類似情報が付与されている場合には、座標の算出を行わず、すでに付与されている座標と同じ座標を付与してもよい。 Note that the coordinate calculation unit 32 calculates coordinates for all time codes to be calculated, and the coordinate assigning unit 33 assigns the calculated coordinates. However, when the same label and similar information as the previous time code are given as in time code 01: 18: 40.20 to 01: 18: 40.25 in FIG. 15, coordinates are calculated. Instead, the same coordinates as those already given may be given.

以上のように、本実施の形態による音データラベル付与装置２によれば、位置対応情報等を用いることによって、被験者の位置（厳密には、被験者の装着しているマイクの位置）を示す座標を知ることができる。したがって、位置を示すラベルが付与される場合よりもより細かい位置に関する情報を、音データに対して付与することができることになる。 As described above, according to the sound data label assigning device 2 according to the present embodiment, the coordinates indicating the position of the subject (strictly, the position of the microphone worn by the subject) by using the position correspondence information and the like. Can know. Therefore, it is possible to give more detailed information about the position to the sound data than when a label indicating the position is given.

なお、本実施の形態では、音データに座標算出用のラベルと類似情報とを付与する際に、３個のラベルを付与する場合について説明したが、そうでなくてもよい。例えば、ラベル付与部１８は、各音データに３以上のラベルと、その３以上のラベルにそれぞれ対応する類似情報とを対応付けて付与するものであってもよい。また、座標算出部３２は、音データに付与された３以上のラベル及び類似情報と、その３以上のラベルにそれぞれ対応する座標とを用いて、その音データに対応する座標を算出してもよい。例えば、座標算出部３２が３次元空間座標系における座標を算出する場合には、各音データに４個のラベルと、その４個のラベルにそれぞれ対応する類似情報とが対応付けられて付与され、座標算出部３２は、それらのラベル及び類似情報と、位置対応情報とを用いて、３次元空間座標系における座標を算出してもよい。この場合には、位置対応情報において、ラベルに対応付けられている座標は、本実施の形態の具体例で説明した２次元平面座標系の座標ではなく、３次元空間座標系の座標である。なお、平面の座標を算出する場合にも、各音データに４以上のラベルと、その４以上のラベルにそれぞれ対応する類似情報とが対応付けられて付与され、その４以上のラベル等を用いて、座標の算出が行われてもよい。また、空間の座標を算出する場合にも、各音データに５以上のラベルと、その５以上のラベルにそれぞれ対応する類似情報とが対応付けられて付与され、その５以上のラベル等を用いて、座標の算出が行われてもよい。 In the present embodiment, a case has been described in which three labels are assigned when a coordinate calculation label and similar information are assigned to sound data, but this need not be the case. For example, the label assigning unit 18 may assign each sound data with three or more labels and similar information respectively corresponding to the three or more labels. Further, the coordinate calculation unit 32 may calculate the coordinates corresponding to the sound data by using the three or more labels and similar information given to the sound data and the coordinates corresponding to the three or more labels. Good. For example, when the coordinate calculation unit 32 calculates coordinates in a three-dimensional space coordinate system, each sound data is provided with four labels and similar information corresponding to each of the four labels. The coordinate calculation unit 32 may calculate the coordinates in the three-dimensional space coordinate system using these labels, similar information, and position correspondence information. In this case, in the position correspondence information, the coordinates associated with the label are not the coordinates of the two-dimensional plane coordinate system described in the specific example of the present embodiment, but the coordinates of the three-dimensional space coordinate system. Even when calculating the coordinates of the plane, four or more labels and similar information corresponding to the four or more labels are associated with each sound data, and the four or more labels are used. Thus, the calculation of coordinates may be performed. Also, when calculating the coordinates of the space, each sound data is assigned with five or more labels and similar information corresponding to each of the five or more labels, and the five or more labels are used. Thus, the calculation of coordinates may be performed.

また、上記各実施の形態において、音データ記憶部１４で記憶される音データは、ラベルの付与や、消去、抽出等の対象となる長期に保持されるものであってもよく、あるいは、音画像変換部１５による音画像の変換のために一時的に記憶される程度のものであってもよい。後者の場合には、ラベルの付与や、消去、抽出等の対象となる音データは、音データ記憶部１４とは異なる音データ記憶部で記憶されていてもよい。例えば、音データ受付部１１が記録媒体から音データを受け付ける場合には、その記録媒体で記憶されている音データに対して、ラベルの付与や、消去、抽出等の処理が行われてもよい。 Further, in each of the above embodiments, the sound data stored in the sound data storage unit 14 may be retained for a long period of time that is subject to labeling, erasing, extraction, or the like. The image may be temporarily stored for sound image conversion by the image conversion unit 15. In the latter case, sound data to be subjected to labeling, erasure, extraction, etc. may be stored in a sound data storage unit different from the sound data storage unit 14. For example, when the sound data receiving unit 11 receives sound data from a recording medium, processing such as labeling, erasing, and extraction may be performed on the sound data stored in the recording medium. .

また、上記各実施の形態において、対応情報記憶部１６で記憶されている対応情報においてラベルに対応付けられている音画像は、あらかじめ記憶されているものである場合について説明したが、そうでなくてもよい。例えば、比較部１７が比較を行った際に、あらかじめ設定されたしきい値よりも高い類似性を有するラベル音画像が存在しなかった場合に、その比較対象となる音画像に新たなラベルを発行して、そのラベルと、その比較対象となる音画像であるラベル音画像とを対応付けて有する対応情報を対応情報記憶部１６に蓄積してもよい。この場合には、音データに対して、その新たに発行されたラベルが付与されることになる。このようにすることで、新たな音（すなわち、対応情報に含まれるいずれの音画像とも、しきい値よりも高い類似性を有しない音）が受け付けられた場合には、その音に対して新たなラベル発行され、そのラベルが音データに付与されると共に、そのラベルに関する対応情報（すなわち、その発行されたラベルと、新たな音に対応する音画像とを対応付ける対応情報）が蓄積され、過去に受け付けられた音と似た音が受け付けられた場合には、上記各実施の形態で説明したように、対応情報を用いて、その過去に受け付けられた音に発行されたラベルが付与されることになる。その発行されるラベルは、各ラベルを区別可能なものであれば、どのようなものであってもよい。例えば、「１」「２」「３」…のように順番にインクリメントしていく数字であってもよく、アルファベットやその他の記号であってもよい。また、事後的に、そのラベルに対応する音を聞くことによって、そのラベルを、「オフィス」「踏切」「休憩室」等のように、位置や音の内容を示すラベルに置換してもよい。なお、新たなラベルの発行や、対応情報の蓄積等は、比較部１７が行ってもよく、あるいは、その他の構成要素（例えば、図示しないラベル発行部等）が行ってもよい。また、このようにラベルを発行して付与する場合には、ラベルの同じ繰り返しが生じることもある。例えば、ラベル「１０」「７」「１５」「１６」の並びが繰り返して出現するような場合には、そのラベルの並びに対して、新たなラベル（例えば、ラベル「１２３」等）を発行して、そのラベルの並び「１０」「７」「１５」「１６」を、新たに発行したラベル「１２３」で置き換えるようにしてもよい。 Further, in each of the above embodiments, the sound image associated with the label in the correspondence information stored in the correspondence information storage unit 16 has been described as being stored in advance. May be. For example, when the comparison unit 17 performs the comparison, if there is no label sound image having a similarity higher than a preset threshold value, a new label is added to the sound image to be compared. The correspondence information having the label and the label sound image as the comparison target sound image associated with each other may be stored in the correspondence information storage unit 16. In this case, the newly issued label is assigned to the sound data. In this way, when a new sound (that is, a sound that has no similarity higher than the threshold value with any sound image included in the correspondence information) is received, A new label is issued, the label is added to the sound data, and correspondence information about the label (that is, correspondence information that associates the issued label with a sound image corresponding to the new sound) is accumulated, When a sound similar to a sound received in the past is received, as described in the above embodiments, a label issued to the sound received in the past is assigned using the correspondence information. Will be. The label to be issued may be any label as long as each label can be distinguished. For example, it may be a number that increments in order, such as “1”, “2”, “3”, etc., or may be an alphabet or other symbol. In addition, after the fact, by listening to the sound corresponding to the label, the label may be replaced with a label indicating the position or the content of the sound, such as “office”, “crossing”, “break room”, etc. . The issuance of new labels, the accumulation of correspondence information, and the like may be performed by the comparison unit 17 or may be performed by other components (for example, a label issuance unit (not shown)). Moreover, when issuing and giving a label in this way, the same repetition of the label may occur. For example, when the sequence of labels “10”, “7”, “15”, and “16” repeatedly appears, a new label (for example, label “123”) is issued for the label sequence. The label sequence “10” “7” “15” “16” may be replaced with the newly issued label “123”.

また、上記各実施の形態では、音データラベル付与装置１，２が分離部１２を備える場合について説明したが、そうでなくてもよい。音データラベル付与装置１，２が分離部１２を備えない場合には、蓄積部１３は、音データ受付部１１が受け付けた音データを音データ記憶部１４に蓄積することになる。その結果、音画像変換部１５によって音画像に変換される音データも、ラベル付与部１８がラベルを付与する音データも同じものとなる。 Moreover, although each said embodiment demonstrated the case where the sound data label provision apparatuses 1 and 2 were provided with the separation part 12, it does not need to be so. When the sound data label assigning devices 1 and 2 do not include the separation unit 12, the accumulation unit 13 accumulates the sound data received by the sound data reception unit 11 in the sound data storage unit 14. As a result, the sound data converted into the sound image by the sound image converting unit 15 is the same as the sound data to which the label attaching unit 18 assigns a label.

また、上記各実施の形態では、消去対象ラベル記憶部１９と消去部２０とを備える場合について説明したが、ラベルに応じた音データの自動的な消去を行わなくてもよいのであれば、音データラベル付与装置１，２は、消去対象ラベル記憶部１９と消去部２０とを備えていなくてもよい。 In each of the above-described embodiments, the case where the erasure target label storage unit 19 and the erasure unit 20 are provided has been described. However, if it is not necessary to automatically erase the sound data according to the label, The data label assignment devices 1 and 2 do not have to include the erasure target label storage unit 19 and the erasure unit 20.

また、上記各実施の形態では、抽出対象ラベル記憶部２１と抽出部２２とを備える場合について説明したが、ラベルに応じた音データの自動的な抽出を行わなくてもよいのであれば、音データラベル付与装置１，２は、抽出対象ラベル記憶部２１と抽出部２２とを備えていなくてもよい。 Moreover, although each said embodiment demonstrated the case where the extraction object label memory | storage part 21 and the extraction part 22 were provided, if it is not necessary to perform the automatic extraction of the sound data according to a label, sound The data label assigning devices 1 and 2 may not include the extraction target label storage unit 21 and the extraction unit 22.

また、上記各実施の形態において、ラベルの補間を行うようにしてもよい。例えば、タイムコード００．００秒から１０．００秒まではラベル「廊下」が付与され、タイムコード１０．００秒から１０．１０秒まではラベル「職場の座席」が付与され、タイムコード１０．１０秒から５０．００秒まではラベル「廊下」が付与された場合には、０．１秒だけ異なるラベルが付与されたことは不適切であるため、そのタイムコード１０．００秒から１０．１０秒までもラベル「廊下」に変更してもよい。一般的に言えば、注目している音データが所定の長さ以下（例えば、１秒以下等）であり、その注目している音データに対して時間的に前後である所定の長さ以上（例えば、５秒以上等）の音データに同じラベル（上の例では「廊下」である）が付与されており、その注目している音データに、前後の音データとは異なるラベル（上の例では「職場の座席」である）が付与されている場合には、その注目している音データに付与されているラベルを、その音データの前後の音データに付与されているラベルに変更するようにしてもよい。また、分離部１２によって分離されることによって消去された区間や、ラベルが付与できなかった区間についても、上記説明と同様に、その前後の所定の長さ以上の音データに同じラベルが付与されているのであれば、その区間にも前後の音データと同じラベルを付与するようにしてもよい。 In each of the above embodiments, label interpolation may be performed. For example, the label “corridor” is given from time code 00.00 seconds to 10.00 seconds, the label “workplace seat” is given from time codes 10.00 seconds to 10.10 seconds, and time code 10. When the label “corridor” is given from 10 seconds to 50.00 seconds, it is inappropriate that a label different by 0.1 seconds is given. It may be changed to the label “hallway” for up to 10 seconds. Generally speaking, the sound data of interest is less than or equal to a predetermined length (for example, 1 second or less), and is greater than or equal to a predetermined length before and after the sound data of interest. The same label (for example, “corridor” in the above example) is given to the sound data (for example, 5 seconds or more), and the sound data of interest has a different label (upper In this example, “workplace seat” is assigned), the label attached to the sound data of interest is changed to the label attached to the sound data before and after the sound data. It may be changed. In addition, the same label is assigned to sound data of a predetermined length before and after the section erased by separation by the separation unit 12 and the section in which the label could not be attached, as in the above description. If so, the same label as that of the preceding and following sound data may be given to that section.

また、上記各実施の形態において、ラベル付与部１８が付与したラベルの付与結果や、消去部２０が音データを消去した結果の音データ、抽出部２２が抽出した音データを出力する図示しない出力部を備えてもよい。その出力は、出力対象が音データである場合には、例えば、所定の機器への通信回線を介した送信でもよく、スピーカによる音声出力でもよく、記録媒体への蓄積でもよい。また、その出力は、出力対象がラベルの付与結果である場合には、例えば、表示デバイス（例えば、ＣＲＴや液晶ディスプレイ等）への表示でもよく、所定の機器への通信回線を介した送信でもよく、プリンタによる印刷でもよく、スピーカによる音声出力でもよく、記録媒体への蓄積でもよい。なお、その図示しない出力部は、出力を行うデバイス（例えば、表示デバイスやプリンタ等）を含んでもよく、あるいは含まなくてもよい。また、その図示しない出力部は、ハードウェアによって実現されてもよく、あるいは、それらのデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい。 Further, in each of the above-described embodiments, an output (not shown) for outputting the label assignment result given by the label assigning unit 18, the sound data obtained by erasing the sound data by the erasing unit 20, and the sound data extracted by the extraction unit 22 May be provided. When the output target is sound data, the output may be, for example, transmission via a communication line to a predetermined device, sound output by a speaker, or accumulation in a recording medium. The output may be displayed on a display device (for example, a CRT or a liquid crystal display) when the output target is a labeling result, or may be transmitted via a communication line to a predetermined device. Alternatively, printing by a printer, sound output by a speaker, or accumulation in a recording medium may be performed. The output unit (not shown) may or may not include an output device (for example, a display device or a printer). The output unit (not shown) may be realized by hardware, or may be realized by software such as a driver that drives these devices.

また、上記各実施の形態では、音データラベル付与装置１，２がスタンドアロンである場合について説明したが、音データラベル付与装置は、スタンドアロンの装置であってもよく、サーバ・クライアントシステムにおけるサーバ装置であってもよい。後者の場合には、受付部屋出力部は、通信回線を介して情報を受け付けたり、情報を出力したりしてもよい。 In each of the above embodiments, the case where the sound data label assigning devices 1 and 2 are stand-alone has been described. However, the sound data label assigning device may be a stand-alone device or a server device in a server / client system. It may be. In the latter case, the reception room output unit may receive information or output information via a communication line.

また、上記各実施の形態において、各処理または各機能は、単一の装置または単一のシステムによって集中処理されることによって実現されてもよく、あるいは、複数の装置または複数のシステムによって分散処理されることによって実現されてもよい。 In each of the above embodiments, each processing or each function may be realized by centralized processing by a single device or a single system, or distributed processing by a plurality of devices or a plurality of systems. May be realized.

また、上記各実施の形態において、各構成要素が実行する処理に関係する情報、例えば、各構成要素が受け付けたり、取得したり、選択したり、生成したり、送信したり、受信したりした情報や、各構成要素が処理で用いるしきい値や数式、アドレス等の情報等は、上記説明で明記していない場合であっても、図示しない記録媒体において、一時的に、あるいは長期にわたって保持されていてもよい。また、その図示しない記録媒体への情報の蓄積を、各構成要素、あるいは、図示しない蓄積部が行ってもよい。また、その図示しない記録媒体からの情報の読み出しを、各構成要素、あるいは、図示しない読み出し部が行ってもよい。 Also, in each of the above embodiments, information related to processing executed by each component, for example, each component received, acquired, selected, generated, transmitted, or received Information and information such as threshold values, mathematical formulas, addresses, etc. used by each component in processing are retained temporarily or over a long period of time on a recording medium (not shown) even if not explicitly stated in the above description. May be. Further, the storage of information in the recording medium (not shown) may be performed by each component or a storage unit (not shown). Further, reading of information from the recording medium (not shown) may be performed by each component or a reading unit (not shown).

また、上記各実施の形態において、各構成要素等で用いられる情報、例えば、各構成要素が処理で用いるしきい値やアドレス、各種の設定値等の情報がユーザによって変更されてもよい場合には、上記説明で明記していない場合であっても、ユーザが適宜、それらの情報を変更できるようにしてもよく、あるいは、そうでなくてもよい。それらの情報をユーザが変更可能な場合には、その変更は、例えば、ユーザからの変更指示を受け付ける図示しない受付部と、その変更指示に応じて情報を変更する図示しない変更部とによって実現されてもよい。その図示しない受付部による変更指示の受け付けは、例えば、入力デバイスからの受け付けでもよく、通信回線を介して送信された情報の受信でもよく、所定の記録媒体から読み出された情報の受け付けでもよい。 In each of the above embodiments, when information used by each component, for example, information such as a threshold value, an address, and various setting values used by each component may be changed by the user Even if it is not specified in the above description, the user may be able to change the information as appropriate, or it may not be. If the information can be changed by the user, the change is realized by, for example, a not-shown receiving unit that receives a change instruction from the user and a changing unit (not shown) that changes the information in accordance with the change instruction. May be. The change instruction received by the receiving unit (not shown) may be received from an input device, information received via a communication line, or information read from a predetermined recording medium, for example. .

また、上記各実施の形態において、音データラベル付与装置１，２に含まれる２以上の構成要素が通信デバイスや入力デバイス等を有する場合に、２以上の構成要素が物理的に単一のデバイスを有してもよく、あるいは、別々のデバイスを有してもよい。 In each of the above embodiments, when two or more components included in the sound data label assigning apparatuses 1 and 2 include a communication device, an input device, and the like, the two or more components are physically single devices. Or may have separate devices.

また、上記各実施の形態において、各構成要素は専用のハードウェアにより構成されてもよく、あるいは、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。なお、上記実施の形態における音データラベル付与装置１を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを、被験者の位置においてマイクによって集音された音データを受け付ける音データ受付部、音データ受付部が受け付けた音データを、周波数ごとの強度を示す音画像に変換する音画像変換部、音画像変換部が変換した音画像と、ラベルとラベルに対応する音の周波数ごとの強度を示す音画像であるラベル音画像とを対応付けて有する情報である対応情報が２以上記憶される対応情報記憶部で記憶されているラベル音画像とを比較する比較部、比較部による比較結果を用いて、音画像変換部が変換した音画像と類似性の高いラベル音画像に対応するラベルを特定し、特定したラベルを、音画像に対応する音データに付与するラベル付与部、として機能させるためのプログラムである。 In each of the above embodiments, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. In addition, the software which implement | achieves the sound data label provision apparatus 1 in the said embodiment is the following programs. That is, this program converts the sound data received by the sound data receiving unit and the sound data receiving unit that receives sound data collected by the microphone at the position of the subject into a sound image indicating the intensity for each frequency. Correspondence information that is information having a sound image conversion unit, a sound image converted by the sound image conversion unit, and a label sound image that is a sound image indicating the intensity for each frequency of the sound corresponding to the label is 2 Compared with the label sound image stored in the correspondence information storage unit stored above, the comparison result by the comparison unit is used to produce a label sound image that is highly similar to the sound image converted by the sound image conversion unit. This is a program for specifying a corresponding label and causing the specified label to function as a label attaching unit that assigns sound data corresponding to a sound image.

なお、上記プログラムにおいて、上記プログラムが実現する機能には、ハードウェアでしか実現できない機能は含まれない。例えば、情報を受け付ける受付部等におけるモデムやインターフェースカード等のハードウェアでしか実現できない機能は、上記プログラムが実現する機能には少なくとも含まれない。 In the program, the functions realized by the program do not include functions that can be realized only by hardware. For example, a function that can be realized only by hardware such as a modem or an interface card in a reception unit that receives information is not included in at least the function realized by the program.

また、このプログラムは、サーバ等からダウンロードされることによって実行されてもよく、所定の記録媒体（例えば、ＣＤ−ＲＯＭ等の光ディスクや磁気ディスク、半導体メモリ等）に記録されたプログラムが読み出されることによって実行されてもよい。また、このプログラムは、プログラムプロダクトを構成するプログラムとして用いられてもよい。 Further, this program may be executed by being downloaded from a server or the like, and a program recorded on a predetermined recording medium (for example, an optical disc such as a CD-ROM, a magnetic disc, a semiconductor memory, etc.) is read out. May be executed by Further, this program may be used as a program constituting a program product.

また、このプログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes this program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

図１７は、上記プログラムを実行して、上記各実施の形態による音データラベル付与装置１，２を実現するコンピュータの外観の一例を示す模式図である。上記各実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムによって実現されうる。 FIG. 17 is a schematic diagram showing an example of the appearance of a computer that executes the program and realizes the sound data label assigning apparatuses 1 and 2 according to the above embodiments. Each of the above embodiments can be realized by computer hardware and a computer program executed thereon.

図１７において、コンピュータシステム９００は、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブ９０５、ＦＤ（Ｆｌｏｐｐｙ（登録商標）Ｄｉｓｋ）ドライブ９０６を含むコンピュータ９０１と、キーボード９０２と、マウス９０３と、モニタ９０４とを備える。 In FIG. 17, a computer system 900 includes a computer 901 including a CD-ROM (Compact Disk Read Only Memory) drive 905, an FD (Floppy (registered trademark) Disk) drive 906, a keyboard 902, a mouse 903, a monitor 904, and the like. Is provided.

図１８は、コンピュータシステム９００の内部構成を示す図である。図１８において、コンピュータ９０１は、ＣＤ−ＲＯＭドライブ９０５、ＦＤドライブ９０６に加えて、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９１１と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ９１２と、ＭＰＵ９１１に接続され、アプリケーションプログラムの命令を一時的に記憶すると共に、一時記憶空間を提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１３と、アプリケーションプログラム、システムプログラム、及びデータを記憶するハードディスク９１４と、ＭＰＵ９１１、ＲＯＭ９１２等を相互に接続するバス９１５とを備える。なお、コンピュータ９０１は、ＬＡＮへの接続を提供する図示しないネットワークカードを含んでいてもよい。 FIG. 18 is a diagram showing an internal configuration of the computer system 900. In FIG. 18, in addition to the CD-ROM drive 905 and the FD drive 906, a computer 901 is connected to an MPU (Micro Processing Unit) 911, a ROM 912 for storing a program such as a bootup program, and the MPU 911. A RAM (Random Access Memory) 913 that temporarily stores program instructions and provides a temporary storage space, a hard disk 914 that stores application programs, system programs, and data, and an MPU 911 and a ROM 912 are interconnected. And a bus 915. The computer 901 may include a network card (not shown) that provides connection to the LAN.

コンピュータシステム９００に、上記各実施の形態による音データラベル付与装置１，２の機能を実行させるプログラムは、ＣＤ−ＲＯＭ９２１、またはＦＤ９２２に記憶されて、ＣＤ−ＲＯＭドライブ９０５、またはＦＤドライブ９０６に挿入され、ハードディスク９１４に転送されてもよい。これに代えて、そのプログラムは、図示しないネットワークを介してコンピュータ９０１に送信され、ハードディスク９１４に記憶されてもよい。プログラムは実行の際にＲＡＭ９１３にロードされる。なお、プログラムは、ＣＤ−ＲＯＭ９２１やＦＤ９２２、またはネットワークから直接、ロードされてもよい。 A program for causing the computer system 900 to execute the functions of the sound data label assigning apparatuses 1 and 2 according to the above embodiments is stored in the CD-ROM 921 or FD 922 and inserted into the CD-ROM drive 905 or FD drive 906. May be transferred to the hard disk 914. Instead, the program may be transmitted to the computer 901 via a network (not shown) and stored in the hard disk 914. The program is loaded into the RAM 913 when executed. The program may be loaded directly from the CD-ROM 921, the FD 922, or the network.

プログラムは、コンピュータ９０１に、上記各実施の形態による音データラベル付与装置１，２の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティプログラム等を必ずしも含んでいなくてもよい。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいてもよい。コンピュータシステム９００がどのように動作するのかについては周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 901 to execute the functions of the sound data label assigning apparatuses 1 and 2 according to the above embodiments. The program may include only a part of an instruction that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 900 operates is well known and will not be described in detail.

また、本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 Further, the present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上より、本発明による音データラベル付与装置等によれば、被験者の位置で集音された音データに対して、ラベルを付与することができ、音データにラベルを付与する装置等として有用である。 As described above, according to the sound data label assigning device and the like according to the present invention, a label can be assigned to the sound data collected at the position of the subject, which is useful as a device for giving a label to the sound data. is there.

１、２音データラベル付与装置
１１音データ受付部
１２分離部
１３蓄積部
１４音データ記憶部
１５音画像変換部
１６対応情報記憶部
１７比較部
１８ラベル付与部
１９消去対象ラベル記憶部
２０消去部
２１抽出対象ラベル記憶部
２２抽出部
３１位置対応情報記憶部
３２座標算出部
３３座標付与部 DESCRIPTION OF SYMBOLS 1, 2 Sound data label provision apparatus 11 Sound data reception part 12 Separation part 13 Accumulation part 14 Sound data storage part 15 Sound image conversion part 16 Corresponding information storage part 17 Comparison part 18 Label provision part 19 Erase object label storage part 20 Erase part DESCRIPTION OF SYMBOLS 21 Extraction object label memory | storage part 22 Extraction part 31 Position corresponding | compatible information storage part 32 Coordinate calculation part 33 Coordinate provision part

Claims

A sound data receiving unit for receiving sound data collected by the microphone at the position of the subject;
A correspondence information storage unit that stores two or more correspondence information, which is information having a label and a label sound image that is a sound image indicating the intensity for each frequency of sound corresponding to the label, and
A sound image conversion unit that converts the sound data received by the sound data reception unit into a sound image indicating an intensity for each frequency;
A comparison unit that compares the sound image converted by the sound image conversion unit with the label sound image stored in the correspondence information storage unit;
Using the comparison result by the comparison unit, a label corresponding to a label sound image having a high similarity to the sound image converted by the sound image conversion unit is specified, and the specified label is used as sound data corresponding to the sound image. A sound data label applying apparatus comprising: a label attaching unit for attaching to the sound data.

The sound data received by the sound data receiving unit includes environmental sound data that is sound data of the subject's environment,
A separation unit that separates environmental sound data from the sound data received by the sound data reception unit;
The sound data label assigning apparatus according to claim 1, wherein the sound image conversion unit converts the environmental sound data separated by the separation unit into a sound image.

The sound data label assigning device according to claim 1, wherein the sound image conversion unit converts sound data of a period having a frequency peak continuously into one sound image.

The comparison unit calculates similarity information that is information on the similarity between the sound image and the label sound image,
The said label provision part matches and associates the specified 1 or 2 or more label and the similar information respectively corresponding to the said specified 1 or 2 or more label, and gives to sound data. The sound data label assigning device according to any one of the above.

The label assigning unit associates and gives each sound data three or more labels and similar information respectively corresponding to the three or more labels,
A position correspondence information storage unit that stores position correspondence information that is information for associating a label with the coordinates of a sound output device that outputs a sound corresponding to the label;
A coordinate calculation unit that calculates coordinates corresponding to the sound data using three or more labels and similar information given to the sound data, and coordinates corresponding to the three or more labels;
The sound data label assigning device according to claim 4, further comprising: a coordinate assigning unit that assigns the coordinates calculated by the coordinate calculating unit to sound data corresponding to the coordinates.

The sound data label assigning device according to claim 5, wherein the sound output from the sound output device is a sound in a non-audible range.

A sound data storage unit for storing sound data received by the sound data receiving unit;
An erasure target label storage unit for storing label identification information for identifying a label corresponding to the erasure target sound data;
An erasure unit that erases, from the sound data storage unit, sound data related to the label identified by the label identification information stored in the erasure target label storage unit among the sound data to which the label is attached. The sound data label assigning device according to any one of claims 1 to 6.

A sound data storage unit for storing sound data received by the sound data receiving unit;
An extraction target label storage unit for storing label identification information for identifying a label corresponding to the sound data to be extracted;
An extraction unit that extracts and accumulates sound data related to a label identified by label identification information stored in the extraction target label storage unit from among the sound data to which the label is attached. The sound data label providing apparatus according to any one of claims 1 to 6.

A sound data reception unit, a correspondence information storage unit that stores two or more correspondence information, which is information including a label and a label sound image that is a sound image indicating the intensity of each sound frequency corresponding to the label, and A sound data labeling method processed using a sound image conversion unit, a comparison unit, and a labeling unit,
The sound data receiving unit receives sound data collected by a microphone at the position of the subject;
A sound image conversion step in which the sound image conversion unit converts the sound data received in the sound data reception step into a sound image indicating an intensity for each frequency; and
A comparison step in which the comparison unit compares the sound image converted in the sound image conversion step with the label sound image stored in the correspondence information storage unit;
The label assigning unit identifies a label corresponding to a label sound image having high similarity to the sound image converted in the sound image conversion step using the comparison result in the comparison step, and the identified label is A labeling step for labeling the sound data corresponding to the sound image.

Computer
A sound data receiving unit for receiving sound data collected by the microphone at the position of the subject;
A sound image conversion unit that converts the sound data received by the sound data reception unit into a sound image indicating an intensity for each frequency;
Two or more correspondence information, which is information including the sound image converted by the sound image conversion unit and the label sound image that is a sound image indicating the intensity for each frequency of the sound corresponding to the label, is stored. A comparison unit that compares the label sound image stored in the correspondence information storage unit
Using the comparison result by the comparison unit, a label corresponding to a label sound image having a high similarity to the sound image converted by the sound image conversion unit is specified, and the specified label is used as sound data corresponding to the sound image. A program for functioning as a label assigning unit to be provided to the user.