JP2013118488A

JP2013118488A - Audio data utilization system

Info

Publication number: JP2013118488A
Application number: JP2011264725A
Authority: JP
Inventors: Yuta Satake; 佑太佐竹; Shin Mizonishi; 慎溝西
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2011-12-02
Filing date: 2011-12-02
Publication date: 2013-06-13

Abstract

PROBLEM TO BE SOLVED: To provide an audio data utilization system in which conversations or the like of each user are recorded and accumulated at all the time and, based on the accumulated audio data, information processing including communications with other users is performed.SOLUTION: An audio data utilization system 1 comprises a recorder 20 which records voices at all the time as digital data while separating the voices for the unit of a predetermined length, and an audio data accumulation server 10 which accumulates audio data transmitted from the recorder 20 into an audio data DB 16 for each user. The recorder 20 transmits audio data to the audio data accumulation server 10 as occasion demands, and the audio data accumulation server 10 converts the audio data accumulated in the audio data DB 16 into text data and acquires dictated audio data. According to contents of conversations in the dictated audio data of a first user, contents of audio data of the first user and/or of dictated audio data of the first user within a predetermined range can be shared with another second user.

Description

本発明は、情報処理装置を利用した音声データの利用技術に関し、特に、常時録音により蓄積された音声データを利用する音声データ利用システムに適用して有効な技術に関するものである。 The present invention relates to a technique for using voice data using an information processing apparatus, and more particularly to a technique effective when applied to a voice data using system that uses voice data accumulated by continuous recording.

近年では、いわゆるライフログなどのように、例えば、個人の行動内容を長期間に渡ってデジタルデータとして記録・蓄積しておき、後からこれを検索等して過去の行動等を日記的に確認したり、分析・解析して行動傾向を調査したりなどといったことを可能とする仕組みやサービスが提供されてきている。 In recent years, like a so-called life log, for example, personal action contents are recorded and accumulated as digital data over a long period of time, and past actions are confirmed in a diary by searching for it later. And mechanisms and services have been provided that enable analysis and analysis to investigate behavioral trends.

ライフログにおける行動内容の記録手法としては、一般的に、ユーザが主体的・能動的に記録したい情報を適宜記録していく手動記録と、記録用のデバイス等がユーザの意思とは無関係に自動的に行動内容を記録する自動記録とがある。手動記録の手法では、記録内容にユーザの主観を反映させることが可能であるが、その反面、記録のための操作などのユーザの負荷が高くなるという問題がある。一方で、自動記録の手法では、客観的なデータしか記録することができないものの、ユーザは通常、行動内容に係るデータを少ない負荷で常時記録することができる。 In general, the action log recording method includes manual recording, which records information that the user wants to actively and actively record, and automatic recording device regardless of the user's intention. There is automatic recording that records action contents. The manual recording method can reflect the user's subjectivity in the recorded content, but there is a problem that the load on the user such as an operation for recording becomes high. On the other hand, in the automatic recording method, only objective data can be recorded, but the user can normally always record data related to action contents with a small load.

記録する行動内容のデータとして、手動記録の場合においても比較的容易に記録することが可能なものとして、ユーザの会話等を含む音声データがある。近年では、多くの個人が携帯電話やいわゆるスマートフォン、ＩＣレコーダーなど、デジタルデータとして音声の録音が可能な携帯型デバイスを携行している場合も多く、録音環境が容易に実現可能である。また、このような録音装置は、録音中特にユーザの操作等を必要とせず（自動記録が可能）、ビデオカメラ等と異なり指向性も少ないことから、ユーザの周辺の音声を広く録音することが可能であるという特徴を有する（映像の場合は、ビデオカメラ等が向いている方向の一定範囲の映像しか記録できない）。 As data of action contents to be recorded, there is audio data including user conversations and the like that can be recorded relatively easily even in the case of manual recording. In recent years, many individuals often carry portable devices capable of recording voice as digital data, such as mobile phones, so-called smartphones, IC recorders, and the recording environment can be easily realized. In addition, such a recording device does not require any user operation during recording (automatic recording is possible), and unlike a video camera or the like, has little directivity, so that it can record a wide range of sounds around the user. (In the case of video, only a certain range of video in the direction in which the video camera is facing can be recorded).

録音された音声データを利用する仕組みとしては、主に、音声データを音声認識技術等により解析してテキストデータとして利用するものがある。例えば、特開２００９−１７０９５３号公報（特許文献１）には、コールセンタ装置において、通話内容を録音して得られた録音データを解析して該録音データに特定語が含まれていると判別した場合に該録音データを送信する通話録音装置と、該通話録音装置からの録音データを受信して再生する管理者端末とを有し、顧客とのトラブルや通信障害が発生した際に、管理者に負担をかけることなく通話内容を検証することができる技術が記載されている。 As a mechanism for using the recorded voice data, there is mainly a mechanism for analyzing the voice data using a voice recognition technique or the like and using it as text data. For example, in Japanese Patent Application Laid-Open No. 2009-170953 (Patent Document 1), a call center device analyzes recorded data obtained by recording the contents of a call and determines that the recorded data contains a specific word. A call recording device that transmits the recorded data and an administrator terminal that receives and reproduces the recorded data from the call recording device. When a trouble or communication failure occurs with a customer, the administrator Describes a technology that can verify the contents of a call without burdening the user.

また、例えば、特開平７−１７７２３７号公報（特許文献２）には、電子交換機において、着信トランクと、通話路スイッチと、ボイスメール装置と、音声認識装置と、着信レジスタと、コンピュータと接続された中央処理装置と、テキストデータ蓄積装置とを有し、ボイスメールを音声認識により分析してテキストデータに変換することで、耳の不自由な人でもボイスメールの内容を理解することができるコンピュータ連動型ボイスメールシステムの技術が記載されている。 Further, for example, in Japanese Patent Laid-Open No. 7-177237 (Patent Document 2), an electronic trunk is connected to an incoming trunk, a speech path switch, a voice mail device, a voice recognition device, an incoming call register, and a computer. A computer which has a central processing unit and a text data storage device, and can analyze the contents of voice mail even by a hearing-impaired person by analyzing voice mail by voice recognition and converting it into text data The technology of the linked voice mail system is described.

特開２００９−１７０９５３号公報JP 2009-170953 A 特開平７−１７７２３７号公報JP-A-7-177237

上記のように、音声データを録音・蓄積することで、音声データとしての利用はもちろん、テキストデータに変換して解析等することで、会話の内容に応じた情報処理や情報の伝達などを行うといった利用形態が可能となる。しかし、これらの利用方法では、あくまで、処理対象として必要な会話等のみを手動（もしくは自動）で指定して録音するということが前提とされている。 As described above, by recording and accumulating audio data, it can be used as audio data, but it can also be converted into text data and analyzed to perform information processing and information transmission according to the content of the conversation. The usage form is possible. However, in these usage methods, it is premised that only a conversation or the like necessary for processing is designated and recorded manually (or automatically).

この点、近年では、ハードディスク等に代表される記憶媒体・記憶装置の大容量化・低価格化が大きく進んでおり、大容量の記録装置を構成することは比較的容易となっている。例えば、会話音声を９．６ｋｂｐｓ程度の低ビットレートで保存して音声データとする場合、２４時間３６５日で１００年間録音を継続しても、データ量は約３．８ＴＢとなり、現在ではこの程度の容量のハードディスク装置は１万円程度で購入することが可能である。すなわち、人の一生分の会話を常時録音し続けるというようなことも、記録装置の容量という観点では十分可能な状況となっている。 In this regard, in recent years, the capacity and price of storage media and storage devices represented by hard disks and the like have greatly increased, and it is relatively easy to configure a large-capacity recording device. For example, when conversational voice is stored at a low bit rate of about 9.6 kbps and used as voice data, even if recording is continued for 24 hours 365 days for 100 years, the amount of data is about 3.8 TB, which is currently about this level. It is possible to purchase a hard disk device with a capacity of about 10,000 yen. In other words, it is also possible to continuously record a person's lifetime conversation from the viewpoint of the capacity of the recording device.

このような低価格・大容量の記憶装置は、例えば、防犯カメラやボイスレコーダー、ドライブレコーダーなどの装置において既に有効利用されており、音声や映像を内容に関わらず常時（自動）記録するということも可能となっている。しかしながら、これらの装置では、主に事後の検証・確認目的や証拠等としてそのデータ自体を直接利用することを目的としており、データの内容に基づいて何らかの情報処理を行うというようなことは想定されていない。 Such low-priced, large-capacity storage devices are already used effectively in devices such as security cameras, voice recorders, and drive recorders, for example, and always record (automatically) audio and video regardless of content. Is also possible. However, these devices are mainly intended for direct use of the data itself for subsequent verification / confirmation purposes, evidence, etc., and it is assumed that some information processing is performed based on the contents of the data. Not.

特に、各個人単位で、およそ一生分の会話をそれぞれ常時録音して蓄積しておくことが可能な環境が構築されることを考えると、これらの音声データを利用した複数ユーザ間での新たな情報の共有や伝達等のコミュニケーション手法を実現することも可能であると考えられる。また、記憶容量を気にすることなく常時録音することができるため、例えば、処理対象としたい会話のみを選択・指定して録音するなどの煩わしい処理を必要とせずに、音声データの蓄積を容易に行うことが可能となる。 In particular, considering the construction of an environment where each individual unit can continuously record and store conversations for about a lifetime, there is a new situation among multiple users using these audio data. It is also possible to realize communication methods such as information sharing and transmission. In addition, since recording can be performed at any time without worrying about storage capacity, for example, it is easy to store audio data without the need for cumbersome processing such as selecting and specifying only the conversation to be processed. Can be performed.

そこで本発明の目的は、各ユーザの会話等を２４時間３６５日常時録音して蓄積し、蓄積された音声データに基づいて他のユーザとのコミュニケーションを含む情報処理を行う音声データ利用システムを提供することにある。本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a voice data utilization system that records and accumulates each user's conversations 24 hours a day, 365 days a day, and performs information processing including communication with other users based on the accumulated voice data. There is to do. The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.

本発明の代表的な実施の形態による音声データ利用システムは、複数のユーザにそれぞれ保有され、ユーザの周囲の音声をデジタルデータとして常時録音し、録音した音声データを所定の長さ毎に区切って記録する録音装置と、前記録音装置から送信された前記音声データをユーザ毎に音声データ記録部に蓄積する音声データ蓄積サーバとを有する音声データ利用システムであって、以下の特徴を有するものである。 A voice data utilization system according to a typical embodiment of the present invention is held by a plurality of users, and the voices around the users are always recorded as digital data, and the recorded voice data is divided into predetermined lengths. An audio data utilization system comprising a recording device for recording and an audio data storage server for storing the audio data transmitted from the recording device in an audio data recording unit for each user, and has the following characteristics: .

すなわち、前記録音装置は、前記音声データを随時前記音声データ蓄積サーバに送信し、前記音声データ蓄積サーバは、前記音声データ記録部に蓄積された前記音声データをテキストデータに変換してテキスト化音声データを取得し、第１のユーザの前記テキスト化音声データ内の会話の内容に従って、他の第２のユーザとの間で、所定の範囲の、前記第１のユーザの前記音声データおよび／または前記第１のユーザの前記テキスト化音声データの内容を共有可能とすることを特徴とする。 That is, the recording device transmits the voice data to the voice data storage server as needed, and the voice data storage server converts the voice data stored in the voice data recording unit into text data and converts the voice data into text voice Data is acquired, and the voice data of the first user and / or a predetermined range with another second user according to the content of the conversation in the text voiced data of the first user and / or The contents of the text-formatted voice data of the first user can be shared.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

すなわち、本発明の代表的な実施の形態によれば、各ユーザの会話等を、処理対象としたい会話のみを選択・指定して録音するなどの煩わしい処理を必要とせずに容易に２４時間３６５日常時録音して蓄積し、蓄積された音声データに基づいて他のユーザとのコミュニケーションを含む情報処理を行うことが可能となる。 That is, according to a typical embodiment of the present invention, it is possible to easily perform a 24 hour 365 operation without requiring a troublesome process such as selecting and specifying only a conversation to be processed as a conversation of each user. It is possible to record and accumulate daily and perform information processing including communication with other users based on the accumulated audio data.

本発明の一実施の形態である音声データ利用システムの構成例について概要を示した図である。It is the figure which showed the outline | summary about the structural example of the audio | voice data utilization system which is one embodiment of this invention. 本発明の一実施の形態における常時録音された音声データを利用することで可能となるサービスの例について概要を示した図である。It is the figure which showed the outline | summary about the example of the service which becomes possible by utilizing the audio | voice data recorded constantly in one embodiment of this invention. 本発明の一実施の形態における常時録音された音声データを利用することで可能となるサービスの別の例について概要を示した図である。It is the figure which showed the outline | summary about another example of the service which becomes possible by utilizing the audio | voice data recorded constantly in one embodiment of this invention. 本発明の一実施の形態におけるユーザＤＢのデータ構成の例について示した図である。It is the figure shown about the example of the data structure of user DB in one embodiment of this invention. 本発明の一実施の形態における音声データＤＢのデータ構成の例について示した図である。It is the figure shown about the example of the data structure of audio | voice data DB in one embodiment of this invention.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

本発明の一実施の形態である音声データ利用システムは、各ユーザの会話等の音声データを携帯端末等を利用して録音し、これをネットワークを介してサーバに送信・蓄積する。サーバでは、大容量・低価格化した記憶装置を利用することで、各ユーザにつき２４時間３６５日の常時録音が可能な記憶容量を確保する。ユーザ毎に記憶装置に蓄積された音声データを音声認識技術によりテキストデータに変換し、各ユーザのテキスト化された音声データを照合、解析することで、時間と空間にとらわれないユーザ間での同期・非同期でのコミュニケーション等の情報処理を含むサービスを提供することを可能とするシステムである。 The voice data utilization system according to an embodiment of the present invention records voice data such as conversations of each user using a portable terminal or the like, and transmits / stores the voice data to a server via a network. The server secures a storage capacity capable of continuous recording for 24 hours 365 days for each user by using a storage device with a large capacity and a low price. Voice data stored in the storage device for each user is converted to text data by voice recognition technology, and the voice data that has been converted to text by each user is collated and analyzed, allowing synchronization between users independent of time and space A system that can provide services including information processing such as asynchronous communication.

＜システム構成＞
図１は、本発明の一実施の形態である音声データ利用システムの構成例について概要を示した図である。音声データ利用システム１は、例えば、インターネットやイントラネット等のネットワーク４０に接続された音声データ蓄積サーバ１０と、ネットワーク４０を介して音声データ蓄積サーバ１０に接続可能な録音装置２０およびユーザ端末３０を有する。 <System configuration>
FIG. 1 is a diagram showing an outline of a configuration example of an audio data utilization system according to an embodiment of the present invention. The voice data utilization system 1 includes, for example, a voice data storage server 10 connected to a network 40 such as the Internet or an intranet, and a recording device 20 and a user terminal 30 that can be connected to the voice data storage server 10 via the network 40. .

録音装置２０は、各ユーザがそれぞれ保持して使用する情報処理装置やデバイスであり、デジタルデータとして周囲の音声の録音が可能なものである。ユーザの会話等を常時録音する用途を考慮すると、例えば、携帯電話やスマートフォン、タブレット型端末、ノート型ＰＣ（Personal Computer）、ＩＣレコーダーなど、携帯可能な装置であることが望ましい。なお、デスクトップＰＣなどの据置型の装置であってもよいし、ユーザが複数の異なる種類の録音装置２０を状況毎に切り替えて使用することで常時録音を可能とする構成としてもよい。 The recording device 20 is an information processing device or device that is held and used by each user, and can record surrounding sounds as digital data. Considering the use of constantly recording user conversations and the like, it is desirable to be a portable device such as a mobile phone, a smartphone, a tablet terminal, a notebook PC (Personal Computer), and an IC recorder. In addition, it may be a stationary device such as a desktop PC, or may be configured such that the user can always record by switching and using a plurality of different types of recording devices 20 for each situation.

録音装置２０は、例えば、ユーザの会話等の外部の音声や他の各種データを取得するデバイスであるデータ取得部２１、および、図示しないＯＳ（Operating System）やミドルウェア等の上で動作するソフトウェアプログラムや、回路等によって実装されるハッシュ化部２６、モード管理部２７、通信部２９などの各部を有する。 The recording device 20 includes, for example, a data acquisition unit 21 that is a device for acquiring external voices such as user conversations and other various data, and a software program that operates on an OS (Operating System) and middleware (not shown). In addition, each unit includes a hashing unit 26, a mode management unit 27, a communication unit 29, and the like implemented by a circuit or the like.

データ取得部２１は、音声データを常時取得して、これを例えば、所定の長さに区切ってファイル等の形式とする。また、他の各種外部環境データについても常時取得して、音声データとこれらのデータとを合わせて、図示しないハードディスクやメモリ等からなる記憶装置に記録する。 The data acquisition unit 21 constantly acquires audio data and divides it into a predetermined length, for example, into a file format. Also, other various external environment data are always acquired, and the audio data and these data are combined and recorded in a storage device such as a hard disk or a memory (not shown).

データ取得部２１は、音声データを取得するためのデバイスとして、マイク２２を有する。また、他の各種外部環境データを取得するためのデバイスとして、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）や赤外線通信、ＷｉＦｉ（登録商標）など、近傍の他の録音装置２０との間で直接無線通信を行ってこれらの存在を把握することができる近距離無線通信部２３や、衛星からの電波を受信して録音装置２０の位置情報を取得するＧＰＳ（Global Positioning System）センサ２４、およびその他の環境情報等を取得する各種センサ２５を有していてもよい。各種センサ２５には、例えば、気温や気圧を測定するセンサや、録音装置２０の向きや移動に係る情報を取得するジャイロセンサ、地磁気センサ、加速度センサなどが含まれる。また、デスクトップＰＣなどの据置型の装置の場合は、ＧＰＳセンサ２４を有する代わりに固定の位置情報を手動で予め設定しておくようにしてもよい。 The data acquisition unit 21 includes a microphone 22 as a device for acquiring audio data. As other devices for acquiring various external environment data, direct wireless communication is performed with other recording devices 20 such as Bluetooth (registered trademark), infrared communication, and WiFi (registered trademark). A short-range wireless communication unit 23 capable of grasping the presence of these devices, a GPS (Global Positioning System) sensor 24 that receives radio waves from a satellite and acquires position information of the recording device 20, and other environmental information You may have the various sensors 25 which acquire. The various sensors 25 include, for example, a sensor that measures temperature and pressure, a gyro sensor that acquires information related to the direction and movement of the recording device 20, a geomagnetic sensor, an acceleration sensor, and the like. In the case of a stationary apparatus such as a desktop PC, fixed position information may be manually set in advance instead of having the GPS sensor 24.

マイク２２によって取得した周囲の音声を音声データとして記録する際に、必要に応じて無音部分を消去する（もしくは無音部分は録音しない）などの処理を行うようにしてもよい。また、このような処理の実施有無を後述するモードに応じて変更できるようにしてもよい。 When recording the surrounding voice acquired by the microphone 22 as voice data, a process such as deleting the silent part (or not recording the silent part) may be performed as necessary. Moreover, you may enable it to change the implementation presence or absence of such a process according to the mode mentioned later.

ハッシュ化部２６は、データ取得部２１で取得した一定の長さに区切られた音声データに対してハッシュ値を生成する。生成されたハッシュ値は、対応する音声データと関連付けてデータ取得部２１の記憶装置に記録する。データ取得部２１で取得した音声データ等を音声データ蓄積サーバ１０に送信する際に、対応するハッシュ値も合わせて送信する。 The hashing unit 26 generates a hash value for the audio data divided by a certain length acquired by the data acquisition unit 21. The generated hash value is recorded in the storage device of the data acquisition unit 21 in association with the corresponding audio data. When the audio data acquired by the data acquisition unit 21 is transmitted to the audio data storage server 10, the corresponding hash value is also transmitted.

音声データを受け取った音声データ蓄積サーバ１０では、後述するように、録音装置２０と同様の手法によりハッシュ値を生成し、録音装置２０から送信されたハッシュ値と比較検証することで、送信された音声データについての改竄の有無を検証することができる。当該ハッシュ化部２６は、例えば、秘密鍵等を有し、公知のハッシュアルゴリズムにより、対象のデータに対するハッシュ値を生成して応答することができる。なお、ハッシュ化部２６は、例えば、上記の機能を有するマイクロデバイス等のハードウェアとして実装してもよい。 In the audio data storage server 10 that has received the audio data, as described later, the hash value is generated by the same method as that of the recording device 20, and the hash value transmitted from the recording device 20 is compared and verified. It is possible to verify whether or not the audio data has been tampered with. The hashing unit 26 has, for example, a secret key, and can generate and return a hash value for the target data using a known hash algorithm. Note that the hashing unit 26 may be implemented as hardware such as a micro device having the above functions, for example.

モード管理部２７は、音声データ等を取得する際の録音装置２０のモードを自動もしくは手動により切替える。ここで、モードとは、音声データを録音する際の録音条件および取得した音声データについての利用条件の内容を定義したものである。 The mode management unit 27 switches the mode of the recording device 20 when acquiring audio data or the like automatically or manually. Here, the mode defines the contents of the recording conditions for recording the voice data and the usage conditions for the acquired voice data.

音声データ中の会話等の内容の中には、当然ながらプライバシーとして保護されるべきものも存在することから、２４時間３６５日の常時録音の中で、必要最低限の範囲のユーザにしか知られたくない場合も生じ得る。一方で、ユーザの所属組織における上長等の管理者の場合など、ユーザの会話内容を第三者が強制的に把握すべき場合もあり得る。 Of course, there are conversations in audio data that must be protected as privacy, so only 24 hours 365 days of regular recording is known only to the minimum necessary range of users. It may happen that you don't want to. On the other hand, there may be a case where a third party should compulsorily understand the content of the user's conversation, such as in the case of an administrator such as a senior manager in the user's organization.

そこで、録音装置２０のモードとして、例えば、ユーザがオフィスにいる状態でのオフィスモード（複数の勤務場所や会議質等の所在場所に応じてさらに細分化してもよい）、自宅等（オフィス以外）にいる状態でのプライベートモードなどを予め定義しておき、録音装置２０（およびこれを保持するユーザ）の所在場所や時間帯等に応じてモードを自動もしくは手動により切り替える。例えば、ＧＰＳセンサ２４により取得した位置情報や、近距離無線通信部２３により検知した社内ＬＡＮ（Local Area Network）のネットワーク情報などに基づいてオフィスＡに到着したことを検知した場合に、オフィスＡモードに切り替えることができる。同じオフィスＡにいる場合でも、昼食時間帯はプライベートモードに切り替えるようにしてもよい。 Therefore, as a mode of the recording apparatus 20, for example, an office mode in a state where the user is in the office (may be further subdivided according to locations such as a plurality of work places and conference quality), home, etc. (other than the office) The private mode in the state of being in the mobile phone is defined in advance, and the mode is switched automatically or manually according to the location, time zone, etc. of the recording device 20 (and the user holding it). For example, the office A mode is detected when arrival at the office A is detected based on position information acquired by the GPS sensor 24 or network information of an in-house LAN (Local Area Network) detected by the short-range wireless communication unit 23. You can switch to Even in the same office A, the lunch mode may be switched to the private mode.

音声データ等が記録された際のモードにより、例えば、当該データを音声データ蓄積サーバ１０に送信した際の当該データの所有者や、公開範囲・アクセス権限等を設定することが可能である。また、モードに応じて、例えば、マイク２２による会話等の録音時の感度や品質を変更したり、他のセンサによるデータの取得頻度を調整したりといった制御を行うことも可能である。また、マイク２２での会話等の録音時に無音部分を除外する処理の有無をモードに応じて変更できるようにしてもよい。 Depending on the mode in which the audio data or the like is recorded, it is possible to set, for example, the owner of the data when the data is transmitted to the audio data storage server 10, the disclosure range, access authority, and the like. Also, depending on the mode, for example, it is possible to perform control such as changing the sensitivity and quality at the time of recording such as conversation by the microphone 22 or adjusting the data acquisition frequency by other sensors. In addition, the presence or absence of the process of excluding the silent portion during recording such as conversation with the microphone 22 may be changed according to the mode.

なお、設定可能なモードの種類、および各モードにおける設定内容は、例えば、モードプロファイル２８として録音装置２０に保持しておく。各モードプロファイル２８は、例えば、音声データ蓄積サーバ１０から必要なものをダウンロードして保持することができる。 Note that the types of modes that can be set and the settings in each mode are stored in the recording device 20 as a mode profile 28, for example. Each mode profile 28 can download and hold necessary ones from the audio data storage server 10, for example.

通信部２９は、ネットワーク４０を介して音声データ蓄積サーバ１０に接続するための通信処理を行う。通信手段としては、録音装置２０のデバイスに応じて一般的な無線通信や有線通信を適宜利用することができ、ＷｉＦｉなどのように近距離無線通信部２３と手段を共用可能なものであってもよい。通信部２９は、データ取得部２１が一定の長さ毎にファイル等の形式で記録している音声データ等と、各音声データに対応するハッシュ値のデータとを、随時音声データ蓄積サーバ１０に対して送信（アップロード）する。 The communication unit 29 performs communication processing for connecting to the audio data storage server 10 via the network 40. As the communication means, general wireless communication or wired communication can be used as appropriate according to the device of the recording device 20, and the means can be shared with the short-range wireless communication unit 23, such as WiFi. Also good. The communication unit 29 stores the audio data and the like recorded in the form of a file or the like by the data acquisition unit 21 at a certain length and the hash value data corresponding to each audio data in the audio data storage server 10 as needed. Send to (upload).

音声データ蓄積サーバ１０は、各録音装置２０によって録音された音声データ等を収集して蓄積し、解析することによってユーザ間のコミュニケーションを含む情報処理サービスを提供するサーバ機器である。当該音声データ蓄積サーバ１０は、例えば、図示しないＯＳやデータベース管理システム、Ｗｅｂサーバプログラム等のミドルウェアの上で動作するソフトウェアプログラムによって実装される認証部１１、データ収集部１２、ハッシュ化部１３、解析部１４などの各部を有する。また、データベースやファイルテーブル等の形式で実装されるユーザデータベース（ＤＢ）１５および音声データデータベース（ＤＢ）１６を有する。 The voice data storage server 10 is a server device that provides information processing services including communication between users by collecting, storing, and analyzing voice data recorded by each recording device 20. The audio data storage server 10 includes, for example, an authentication unit 11, a data collection unit 12, a hashing unit 13, and an analysis implemented by a software program operating on middleware such as an OS (not shown), a database management system, and a Web server program. Each part such as the part 14 is included. In addition, it has a user database (DB) 15 and an audio data database (DB) 16 implemented in the form of a database, a file table, or the like.

認証部１１は、各録音装置２０が音声データ等をアップロードする際や、各ユーザがユーザ端末３０等を介して音声データＤＢ１６に蓄積された音声データ等を利用する際などに、ユーザＤＢ１５を利用してユーザ認証を行う。データ収集部１２は、各録音装置２０からの音声データ等のアップロードを受け付けて、ユーザ毎に音声データＤＢ１６に記録して蓄積する。 The authentication unit 11 uses the user DB 15 when each recording device 20 uploads voice data or the like, or when each user uses voice data or the like stored in the voice data DB 16 via the user terminal 30 or the like. User authentication. The data collection unit 12 accepts uploading of audio data or the like from each recording device 20, and records and accumulates it in the audio data DB 16 for each user.

ハッシュ化部１３は、データ収集部１２が取得した音声データに対してハッシュ値を生成して応答する。録音装置２０におけるハッシュ化部２６と同様に、例えば、秘密鍵等を有し、ハッシュ化部１３と同等のハッシュアルゴリズムにより、対象の音声データに対するハッシュ値を生成して応答する。ハッシュ値を取得したデータ収集部１２は、録音装置２０から音声データ等と合わせて取得したハッシュ値と比較することで、通信経路上などでの音声データの改竄の有無を検証することができる。なお、一定の長さ毎に区切られた音声データ等の連続性を検証することで改竄の有無を検証することもできる。 The hashing unit 13 generates a hash value for the audio data acquired by the data collection unit 12 and responds. Similar to the hashing unit 26 in the recording device 20, for example, it has a secret key or the like, and generates a hash value for the target audio data using a hash algorithm equivalent to the hashing unit 13 and responds. The data collection unit 12 that has acquired the hash value can verify whether or not the audio data has been tampered with on the communication path by comparing the hash value acquired together with the audio data or the like from the recording device 20. Note that it is possible to verify the presence or absence of tampering by verifying the continuity of audio data or the like divided by a certain length.

解析部１４は、音声データＤＢ１６に蓄積された音声データについて、例えば、公知の音声認識技術により解析してテキストデータを取得し、さらに必要に応じて形態素解析等の公知の文章構造解析技術を利用して会話の構造等を解析して、話者を特定したり、トピックやキーワードを抽出したり等の処理を行う。取得したテキストデータは、音声データＤＢ１６に記録する。当該処理は、データ収集部１２による音声データ等の音声データＤＢ１６への記録処理とは非同期に行うようにしてもよい。 The analysis unit 14 analyzes the voice data stored in the voice data DB 16 by, for example, a known voice recognition technique to obtain text data, and further uses a known sentence structure analysis technique such as morphological analysis as necessary. Then, the conversation structure and the like are analyzed, and a process such as specifying a speaker and extracting topics and keywords is performed. The acquired text data is recorded in the voice data DB 16. This process may be performed asynchronously with the process of recording the voice data or the like in the voice data DB 16 by the data collection unit 12.

また、各ユーザについてのテキスト化された音声データを突合し、時間帯や場所などに基づいて関連付けて把握することにより、複数のユーザが協働して活動している「場」（例えば会議等）の状況を解析する。このとき、各ユーザのテキスト化された音声データを突合する組み合わせは、ユーザの数が多くなると膨大な数になり得るため、音声データについての位置情報や時刻情報、近傍の他の録音装置２０との関係等に基づいて、突合するデータを絞り込むことにより、現実的な処理時間での解析を可能とする。 In addition, a “place” (for example, a meeting) in which a plurality of users are working together by collating text data about each user and associating and grasping them based on time zones and places. Analyzing the situation. At this time, the number of combinations that match the text data of each user can become enormous as the number of users increases, so the position information and time information about the voice data, and other recording devices 20 in the vicinity By narrowing down the data to be collated based on the relationship or the like, analysis in a realistic processing time is made possible.

「場」の状況の解析においては、例えば、音声データ蓄積サーバ１０もしくは他のシステム等において、スケジュールデータベース（ＤＢ）１７等の形で各ユーザのスケジュール情報が保持されている場合には、スケジュール情報と「場」の情報とを関連付けることで、例えば、議事録の自動作成と参加者への配布というようなサービスを提供することも可能である。 In the analysis of the situation of “place”, for example, when the schedule information of each user is held in the form of the schedule database (DB) 17 or the like in the voice data storage server 10 or another system, the schedule information It is also possible to provide services such as automatic creation of minutes and distribution to participants by associating information with “place” information.

また、テキスト化された音声データを解析することで、後述するように、例えば、会話中に特定のキーワードやコマンドに相当する単語が現れた場合は、会話内容のテキストデータ（もしくは音声データ）を必要なユーザに転送もしくは共有して参照可能とする。これにより、音声による明示・黙示での指示に基づく情報の伝達・共有、コミュニケーションを実現することも可能である。 Also, by analyzing the voiced text data, as described later, for example, when a word corresponding to a specific keyword or command appears in the conversation, the text data (or voice data) of the conversation content is changed. Transfer or share it with the required users for reference. As a result, it is also possible to realize information transmission / sharing and communication based on explicit / implicit instructions by voice.

ユーザ端末３０は、各ユーザが保持して使用するＰＣ等のクライアント端末であり、例えば、図示しないＷｅｂブラウザ等を利用して音声データ蓄積サーバ１０にアクセスし、音声データＤＢ１６に蓄積されたデータの解析や検索、照会等の処理を行うことができる。録音装置２０が上記のような機能を有する情報処理装置であれば、特に必要ではない。 The user terminal 30 is a client terminal such as a PC that is held and used by each user. For example, the user terminal 30 accesses the voice data storage server 10 using a web browser (not shown) and stores the data stored in the voice data DB 16. Processing such as analysis, search, and inquiry can be performed. If the recording device 20 is an information processing device having the above functions, it is not particularly necessary.

＜利用形態＞
基本的な利用形態としては、例えば、ユーザは、音声データ蓄積サーバ１０の音声データＤＢ１６に蓄積された各ユーザのテキスト化された音声データを検索等することにより、適宜取り出して参照することができる。これを利用して、例えばユーザは、思いついたアイデアをつぶやく等により即座に録音しておくことで、非常に簡易な手法で情報を蓄積し、またこれを後から取り出すことができる。 <Usage form>
As a basic usage mode, for example, the user can appropriately retrieve and refer to text data of each user stored in the voice data DB 16 of the voice data storage server 10 by searching for the voice data. . By utilizing this, for example, the user can immediately record the idea by murmuring, etc., so that the information can be accumulated by a very simple method and can be retrieved later.

また、音声データＤＢ１６に蓄積されたデータは、ユーザだけでなく、モードによって設定された公開範囲に公開される。従って、例えば、ユーザが属する組織の上長等の管理者は、ユーザと合意することにより、ユーザが公開可能とした音声データおよびテキスト化された音声データの内容を参照し、これを解析することができる。これにより、管理者は、組織内での情報の伝達ルートを把握することが可能となり、例えば、社内で機密情報が漏洩した場合に漏洩ルートを追跡したり、組織内のユーザ間のコミュニケーション状態を把握し、組織の活性を図ったりすることが可能となる。 The data stored in the audio data DB 16 is disclosed not only to the user but also to the disclosure range set by the mode. Therefore, for example, an administrator such as the manager of the organization to which the user belongs refers to and analyzes the contents of the voice data and the voice data made available to the user by agreeing with the user. Can do. This makes it possible for the administrator to understand the information transmission route within the organization. For example, when confidential information is leaked in the company, the administrator can track the leakage route or check the communication status between users in the organization. It is possible to grasp and to activate the organization.

また、音声データＤＢ１６に蓄積されている音声データは改竄されていないことが保証されるため、例えば、ユーザが特定の日時に特定の発言を行ったことや、特定の「場」に所在したことなどを証明するという用途に用いることも可能である。 Also, since it is guaranteed that the voice data stored in the voice data DB 16 has not been tampered with, for example, the user made a specific statement at a specific date and time, or was located in a specific “place” It is also possible to use for the purpose of proving etc.

図２は、本実施の形態における常時録音された音声データを利用することで可能となるサービスの例について概要を示した図である。ここでは、各ユーザがそれぞれ会話等の音声データを個別に常時録音することから、これらのデータを突合することで、例えば、各ユーザが同一空間に所在していたことの証明サービスを提供することが可能であることを示している。 FIG. 2 is a diagram showing an outline of an example of a service that can be performed by using voice data that is always recorded in the present embodiment. Here, since each user individually records voice data such as conversations individually, by providing these data, for example, providing a proof service that each user was located in the same space Indicates that it is possible.

図２の例では、ユーザＡとユーザＢが同一空間に所在して、会議等での会話を行なっている場合を示しており、ユーザＡの有する録音装置２０ａによって録音され、音声データ蓄積サーバ１０にアップロードされた音声データと、ユーザＢの有する録音装置２０ｂによって録音され、音声データ蓄積サーバ１０にアップロードされた音声データがそれぞれ示されている。 The example of FIG. 2 shows a case where the user A and the user B are located in the same space and are having a conversation at a meeting or the like. The user A and the user B are recorded by the recording device 20a and are stored in the audio data storage server 10. And the audio data recorded by the recording device 20b of the user B and uploaded to the audio data storage server 10 are respectively shown.

ここでは、ユーザＡとユーザＢとは会議室等の同一空間に存在していたため、両者の音声データには、ユーザＡとユーザＢが話者となっている同一の会話が記録されていることになる。従って、例えば、ユーザＡの音声データのみであっても、ユーザＡとユーザＢが話者となっている会話が記録されていることから、ユーザＡとユーザＢとが同一空間にいたことが推定されるが、両者の音声データを突合して、いずれのデータにも同様の会話が記録されていることを検証することで、同一空間にいたことの証明力をより一層高めることができる。 Here, since user A and user B exist in the same space such as a conference room, the same conversation in which user A and user B are speakers is recorded in both voice data. become. Therefore, for example, even if only the voice data of user A is recorded, since conversations in which user A and user B are speakers are recorded, it is estimated that user A and user B were in the same space. However, it is possible to further increase the proof of being in the same space by collating the voice data of both and verifying that the same conversation is recorded in any data.

なお、上述したように、この場合において、スケジュールＤＢ１７におけるユーザＡやユーザＢのスケジュール情報と時間帯をキーとして突合することで、例えば、当該同一空間が会議等であることを把握することができる。これにより、例えばさらに、テキスト化された音声データを公知の文章構造解析技術等により自動要約することで、当該会議の議事録を自動的に生成し、さらに、会話内容やスケジュール情報から特定された会議の出席者に配信・通知するといったサービスを実現することが可能となる。 As described above, in this case, for example, it is possible to grasp that the same space is a meeting or the like by matching the schedule information of the user A and the user B in the schedule DB 17 with the time zone as a key. . As a result, for example, by automatically summarizing the voice data that has been converted into text using a well-known sentence structure analysis technology, the minutes of the meeting are automatically generated, and further specified from the conversation content and schedule information. It is possible to realize services such as delivery and notification to meeting attendees.

図３は、本実施の形態における常時録音された音声データを利用することで可能となるサービスの別の例について概要を示した図である。ここでは、会話等の音声データを音声認識技術によりテキスト化し、テキスト化した音声データをさらに必要に応じて文章構造解析技術等により解析することで、会話内容に特定のキーワードやコマンドに相当する単語が含まれているかを判定する。対象の単語が含まれている場合に、これをトリガとして音声データＤＢ１６に蓄積された情報の伝達・共有やこれを利用したコミュニケーションを実現する場合の例を示している。 FIG. 3 is a diagram showing an outline of another example of a service that can be performed by using constantly recorded audio data according to the present embodiment. Here, speech data such as conversation is converted into text using speech recognition technology, and the speech data that has been converted into text is further analyzed using sentence structure analysis technology, etc., as necessary. Is included. In the case where a target word is included, an example in which transmission and sharing of information stored in the voice data DB 16 and communication using the target word are realized using this as a trigger is shown.

例えば、拠点αにおいて、ユーザＡとユーザＢとが、会議等をしており、図２の例と同様に、その会話内容が録音されて音声データ蓄積サーバ１０にアップロードされ蓄積されている。ここで、例えば“１０：１６”の発話のように、ユーザＢが音声によりコマンドを発話して録音させることで、音声データ蓄積サーバ１０の解析部１４での解析の結果、当該発話がコマンドであると認識され、コマンドに従った処理を実行することが可能である。図３の例では、指示されたコマンドに従い、“１０：００”〜“１０：１５”のユーザＡとユーザＢの過去の会話の内容を、離れた拠点βに所在するユーザＣに転送する。図３の例では、時間帯によって処理対象の会話内容を指定しているが、直近の会話や、特定の話者との会話を条件として指定できるようにしてもよい。 For example, at the base α, the user A and the user B have a meeting or the like, and the content of the conversation is recorded and uploaded and stored in the voice data storage server 10 as in the example of FIG. Here, for example, as the utterance “10:16”, the user B utters and records a command by voice, and as a result of analysis by the analysis unit 14 of the voice data storage server 10, the utterance is a command. It is recognized that there is, and it is possible to execute processing according to the command. In the example of FIG. 3, according to the instructed command, the content of the past conversation between the user “10:00” to “10:15” and the user B is transferred to the user C located in the remote base β. In the example of FIG. 3, the content of the conversation to be processed is specified according to the time zone, but it may be possible to specify the most recent conversation or a conversation with a specific speaker as a condition.

なお、転送の手段は特に限定されず、例えば、電子メール等の公知の手段であってもよいし、音声データＤＢ１６にユーザＣに関連付けられた音声データとして記録し、その旨をユーザＣの利用するユーザ端末３０ｃ等に通知して閲覧可能とするようなものであってもよい。転送するデータについても、音声データであってもよいし、テキスト化されたものであってもよいし、その双方であってもよい。 The transfer means is not particularly limited, and may be a known means such as e-mail, or is recorded as voice data associated with the user C in the voice data DB 16, and that fact is used by the user C. The user terminal 30c or the like may be notified to enable browsing. The data to be transferred may also be voice data, text data, or both.

音声データおよび／またはテキスト化された音声データを受け取ったユーザＣは、これを再生／表示し、これに対する返信を行うことができる。返信の手法としても、内容を発話して録音装置２０で録音することにより音声データ（もしくはこれをテキスト化したデータ）として返信したり、電子メール等の公知の手段を用いたりすることができる。 The user C who has received the voice data and / or voiced text data can reproduce / display the voice data and send a reply thereto. As a replying method, the contents can be spoken and recorded by the recording device 20 to reply as voice data (or data obtained by converting the data into text), or a known means such as an electronic mail can be used.

これにより、例えば拠点αのユーザＢのように、電話を掛けたりメールを送信したりといった明示的な発呼等の処理を伴わずに、拠点βのユーザＣと通話（コミュニケーション）を開始することができる。図３の例では、音声での発話により、データを転送するというコマンドを指定しているが、会話中の特定のキーワードなどをトリガとして対応する処理（情報の伝達や共有等）を自動的に行うようにしてもよい。また、明示的な発呼ではないが、例えば、録音装置２０の所定のボタン等を押下している間は、録音された発話はコマンドとして取り扱う、というような構成としてもよい。 Thus, for example, as in the case of the user B at the base α, a call (communication) with the user C at the base β is started without processing of an explicit call such as making a call or sending an email. Can do. In the example of FIG. 3, a command to transfer data is specified by voice utterance, but a corresponding process (information transmission, sharing, etc.) is automatically triggered by a specific keyword during conversation. You may make it perform. Further, although not an explicit call, for example, a recorded utterance may be handled as a command while a predetermined button or the like of the recording device 20 is pressed.

さらに、過去の会話の内容を遡って、特定のユーザと内容を共有することができ、非同期での会話類似のコミュニケーション手段を実現することができる。また、このようなコミュニケーション手段により、離れた拠点間においても、音声データ（もしくはテキスト化されたデータ）の共有により統合された仮想チームとしての作業環境を実現することができる。 Furthermore, the content of the past conversation can be traced back, and the content can be shared with a specific user, and an asynchronous conversation-like communication means can be realized. Further, such a communication means can realize a working environment as a virtual team integrated by sharing voice data (or text data) between remote bases.

例えば、１つの拠点に集まってチームとして作業している状況と同様に、単に“Ｃさん、あれって何だったっけ？”というように発話するだけで、離れた拠点にいるユーザＣに対して問い合わせが自動的に送られ、これに対してユーザＣが、必要に応じて過去の会話内容を検索するなどして応える、というようなコミュニケーションを実現することが可能である。また、このようなコミュニケーションを、ユーザによる会話の録音やコマンド入力等の明示的な処理や操作等を伴わず、通常の会話と同様に開始・実行することが可能となる。 For example, just as if you were gathering at one base and working as a team, just say “Mr. C, what was that?” To a user C at a remote base Thus, it is possible to realize a communication in which an inquiry is automatically sent and the user C responds to this by searching for past conversation contents as necessary. Further, such communication can be started / executed in the same manner as a normal conversation without an explicit process or operation such as recording of a conversation or command input by a user.

＜データ構成＞
図４は、音声データ蓄積サーバ１０におけるユーザＤＢ１５のデータ構成の例について示した図である。ユーザＤＢ１５は、音声データ利用システム１によるサービスの提供を受けることができるユーザについて、アカウント情報や属性情報、ユーザ毎の設定情報などを保持するテーブルである。ユーザＤＢ１５は、例えば、ユーザＩＤ、氏名、所属部署、電子メールアドレス、声紋情報、モード設定などの項目を有する。キー項目はユーザＩＤである。 <Data structure>
FIG. 4 is a diagram showing an example of the data configuration of the user DB 15 in the audio data storage server 10. The user DB 15 is a table that holds account information, attribute information, setting information for each user, and the like regarding users who can receive the service provided by the voice data utilization system 1. The user DB 15 includes items such as user ID, name, department, e-mail address, voiceprint information, mode setting, and the like. The key item is a user ID.

ユーザＩＤの項目は、音声データ利用システム１において各ユーザを一意に識別するためのＩＤの情報を保持する。氏名、所属部署、および電子メールアドレスの各項目は、それぞれ、対象のユーザの属性情報として、氏名や所属する部署（その他、グループやプロジェクトなど、対象のユーザが属する集団）、および連絡先となる電子メールアドレスの情報を保持する。 The item of user ID holds ID information for uniquely identifying each user in the voice data utilization system 1. The name, department, and e-mail address items are the attribute information of the target user, the name, the department to which they belong (other groups such as groups and projects, to which the target user belongs), and contact information. Holds email address information.

声紋情報の項目は、例えば、対象のユーザの声を識別するための声紋に係るデータを保持する。解析部１４によって音声データＤＢ１６の音声データを解析する際に、当該情報を参照することで、会話中における各発話者を特定することが可能である。モード設定の項目は、対象のユーザが指定可能な各モードにおけるユーザ毎の設定内容（プリファレンス）の情報を保持する。この情報を参照することにより、音声データＤＢ１６に蓄積された対象のユーザの音声データに対して指定されたモードに応じて、音声データの公開範囲や処理内容等を制御・制限することができる。なお、例えばオフィスモードについてはユーザによる公開範囲等の設定内容の変更を許可せず、組織等により決められた範囲に強制的に公開されるよう制限してもよい。 The voiceprint information item holds, for example, data relating to a voiceprint for identifying the voice of the target user. When analyzing the voice data in the voice data DB 16 by the analysis unit 14, it is possible to identify each speaker in the conversation by referring to the information. The mode setting item holds information on setting contents (preferences) for each user in each mode that can be specified by the target user. By referring to this information, it is possible to control / limit the disclosure range of voice data, the processing content, and the like according to the mode specified for the voice data of the target user stored in the voice data DB 16. For example, in the office mode, the user may not be allowed to change the setting contents such as the disclosure range, and may be restricted to be disclosed to the range determined by the organization or the like.

図５は、音声データ蓄積サーバ１０における音声データＤＢ１６のデータ構成の例について示した図である。音声データＤＢ１６は、ユーザ毎に、録音装置２０によって録音された音声データ、および関連する環境情報等などを保持するテーブルである。音声データＤＢ１６は、例えば、音声データＩＤ、音声データ、テキスト化データ、ユーザＩＤ、取得時刻、モード、アクセス権情報、話者ユーザＩＤ、位置情報、気圧情報、気温情報、地磁気情報などの項目を有する。キー項目は音声データＩＤである。 FIG. 5 is a diagram showing an example of the data configuration of the audio data DB 16 in the audio data storage server 10. The voice data DB 16 is a table that holds voice data recorded by the recording device 20 and related environment information for each user. The voice data DB 16 includes items such as voice data ID, voice data, text data, user ID, acquisition time, mode, access right information, speaker user ID, position information, atmospheric pressure information, temperature information, geomagnetic information, and the like. Have. The key item is an audio data ID.

音声データＩＤの項目は、音声データ利用システム１において各音声データを一意に識別するためのＩＤの情報を保持する。また、音声データの項目は、録音装置２０からアップロードされた音声データの情報を保持する。音声データは、録音装置２０から一定の長さに区切られてアップロードされるため、音声データＩＤの値は、例えば、音声データの順序関係を把握することが容易となるようユーザ毎に連番となるようにデータ収集部１２によって採番してもよい。また、音声データの項目には、音声のデジタルデータ自体を保持してもよいし、音声データをファイル等として音声データ蓄積サーバ１０に保持し、そのファイル名等のインデックス情報を音声データの項目に保持するようにしてもよい。 The audio data ID item holds ID information for uniquely identifying each audio data in the audio data utilization system 1. The audio data item holds information on audio data uploaded from the recording device 20. Since the audio data is uploaded in a predetermined length from the recording device 20, the value of the audio data ID is, for example, a serial number for each user so that it is easy to grasp the order relationship of the audio data. As such, the data collection unit 12 may perform numbering. In addition, the audio data item may hold the audio digital data itself, or the audio data may be held in the audio data storage server 10 as a file, and index information such as the file name may be stored in the audio data item. You may make it hold | maintain.

テキスト化データの項目は、対象の音声データについて解析部１４によりテキストデータに変換した内容を保持する。解析部１４による文章構造解析処理の結果情報を合わせて保持するようにしてもよい。ユーザＩＤの項目は、対象の音声データを録音したユーザを特定するユーザＩＤの情報を保持する。各音声データは、このユーザＩＤ毎にまとめて管理されることになる。取得時刻の項目は、対象の音声データを録音・記録したタイムスタンプの情報を保持する。この情報を参照することでも、ユーザ毎の各音声データの順序関係を把握することは可能である。 The text data item holds the contents of the target voice data converted into text data by the analysis unit 14. You may make it hold | maintain together the result information of the sentence structure analysis process by the analysis part 14. FIG. The user ID item holds information on a user ID that identifies the user who recorded the target audio data. Each audio data is managed collectively for each user ID. The item “acquisition time” holds information on a time stamp of recording / recording the target audio data. By referring to this information, it is possible to grasp the order relationship of each audio data for each user.

モードの項目は、対象の音声データを録音した際の録音装置２０のモードを識別する情報を保持する。このモードの値に応じて、ユーザＤＢ１５のモード設定の項目における当該モードの設定内容（プリファレンス）に従い、対象のデータの公開範囲等が設定される。アクセス権情報の項目は、対象の音声データに対する参照やコピー等のアクセス権の情報を保持する。アクセス権については、上記のモードに応じてデフォルトでのアクセス権を設定するようにしてもよいし、これに追加・上書きする形で、アクセス権情報の項目に音声データ毎に個別にユーザがアクセス権を指定可能なようにしてもよい。 The mode item holds information for identifying the mode of the recording device 20 when the target audio data is recorded. In accordance with the value of this mode, the disclosure range of the target data is set according to the setting contents (preference) of the mode in the mode setting item of the user DB 15. The access right information item holds access right information such as reference and copy for the target audio data. As for the access right, the default access right may be set according to the above mode, or the user can access the access right information item individually for each audio data by adding or overwriting it. The right may be specified.

話者ユーザＩＤの項目は、対象の音声データについて解析部１４での解析により把握された発話者を特定するユーザＩＤの情報を保持する。会話の場合には登場する複数の話者の情報を保持するようにしてもよい。位置情報、気圧情報、気温情報、および地磁気情報の各項目は、対象の音声データを録音した際に、録音装置２０のデータ取得部２１の各センサにより取得した環境情報をそれぞれ保持する。従って、データ取得部２１の各センサによって取得可能な情報に応じて、項目の種類等は適宜設定されることになる。 The item of the speaker user ID holds information of a user ID that identifies the speaker who is grasped by the analysis by the analysis unit 14 with respect to the target voice data. In the case of conversation, information of a plurality of speakers appearing may be held. Each item of position information, atmospheric pressure information, air temperature information, and geomagnetic information holds environment information acquired by each sensor of the data acquisition unit 21 of the recording device 20 when the target audio data is recorded. Accordingly, the item type and the like are appropriately set according to information that can be acquired by each sensor of the data acquisition unit 21.

なお、上述の図４、図５で示した各テーブルのデータ構成（項目）はあくまで一例であり、同様のデータを保持・管理することが可能な構成であれば、他のテーブル構成やデータ構成であってもよい。 Note that the data configuration (items) of each table shown in FIGS. 4 and 5 is merely an example, and other table configurations and data configurations may be used as long as similar data can be held and managed. It may be.

以上に説明したように、本発明の一実施の形態である音声データ利用システム１によれば、各ユーザの会話等の音声データを録音装置２０を利用して録音し、これをネットワーク４０を介して音声データ蓄積サーバ１０に送信・蓄積する。音声データ蓄積サーバ１０では、大容量・低価格化した記憶装置を利用することで、各ユーザにつき２４時間３６５日の常時録音が可能な記憶容量を確保して音声データＤＢ１６を構成する。さらに、ユーザ毎に音声データＤＢ１６に蓄積された音声データを音声認識技術によりテキストデータに変換し、各ユーザのテキスト化された音声データを照合、解析する。 As described above, according to the voice data utilization system 1 according to an embodiment of the present invention, voice data such as conversations of each user is recorded using the recording device 20, and this is recorded via the network 40. Are transmitted to and stored in the voice data storage server 10. The voice data storage server 10 uses a storage device with a large capacity and a low price, thereby ensuring a storage capacity capable of continuous recording for 24 hours and 365 days for each user to configure the voice data DB 16. Further, the voice data stored in the voice data DB 16 for each user is converted into text data by voice recognition technology, and the voice data converted into text of each user is collated and analyzed.

これにより、時間と空間にとらわれない、ユーザ間での同期・非同期でのコミュニケーション等の情報処理を含むサービスを、録音時や音声データの利用時双方において、ユーザの能動的な操作や処理を要しない形で提供することが可能となる。 As a result, services that include information processing such as synchronous / asynchronous communication between users, regardless of time and space, require active user operations and processing both during recording and when using audio data. It becomes possible to provide in a form that does not.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

本発明は、常時録音により蓄積された音声データを利用する音声データ利用システムに利用可能である。 The present invention is applicable to a voice data utilization system that uses voice data accumulated by continuous recording.

１…音声データ利用システム、
１０…音声データ蓄積サーバ、１１…認証部、１２…データ収集部、１３…ハッシュ化部、１４…解析部、１５…ユーザデータベース（ＤＢ）、１６…音声データデータベース（ＤＢ）、１７…スケジュールデータベース（ＤＢ）、
２０、２０ａ、２０ｂ…録音装置、２１…データ取得部、２２…マイク、２３…近距離無線通信部、２４…ＧＰＳセンサ、２５…各種センサ、２６…ハッシュ化部、２７…モード管理部、２８…モードプロファイル、２９…通信部、
３０、３０ｃ…ユーザ端末、
４０…ネットワーク。 1 ... voice data utilization system,
DESCRIPTION OF SYMBOLS 10 ... Voice data storage server, 11 ... Authentication part, 12 ... Data collection part, 13 ... Hashing part, 14 ... Analysis part, 15 ... User database (DB), 16 ... Voice data database (DB), 17 ... Schedule database (DB),
20, 20a, 20b ... recording device, 21 ... data acquisition unit, 22 ... microphone, 23 ... short-range wireless communication unit, 24 ... GPS sensor, 25 ... various sensors, 26 ... hashing unit, 27 ... mode management unit, 28 ... mode profile, 29 ... communication part,
30, 30c ... user terminal,
40 ... Network.

Claims

A recording device that is held by each of a plurality of users, continuously records voices around the users as digital data, and records the recorded voice data divided into predetermined lengths; and
A voice data use system having a voice data storage server that stores the voice data transmitted from the recording device in a voice data recording unit for each user;
The recording device is
Sending the voice data to the voice data storage server as needed;
The voice data storage server is
The voice data stored in the voice data recording unit is converted into text data to obtain text voice data, and another second user according to the content of the conversation in the text voice data of the first user The voice data utilization system can share the contents of the voice data of the first user and / or the text voice data of the first user within a predetermined range.

The voice data utilization system according to claim 1,
The contents of the voice data of the first user and / or the text-voiced voice data of the first user stored in the voice data recording unit are disclosed to a third user within a predetermined range, and 3. A voice data utilization system that can be referred to by three users.

In the voice data utilization system according to claim 2,
The recording device is
When recording the audio data, a mode for specifying recording conditions and / or usage conditions of the audio data is switched according to the location and / or time zone of the recording device, and the mode information is changed to the audio data. And record with
The voice data storage server is
A voice data utilization system, wherein a public disclosure range of the contents of the voice data and / or the text-voiced voice data is set according to the contents of the mode of the voice data.

The voice data utilization system according to any one of claims 1 to 3,
The recording device is
An audio data utilization system that acquires environmental information at the time of recording including position information and time of the recording device together with the recording of the audio data, and records it together with the audio data.

In the voice data utilization system according to any one of claims 1 to 4,
The recording device is
Generating a first hash value for the audio data, sending the first hash value together with the audio data to the audio data storage server;
The voice data storage server is
Generating a second hash value for the audio data acquired from the recording device and comparing the first hash value with the second hash value to detect whether the audio data has been tampered with; Voice data utilization system characterized by

In the voice data utilization system according to any one of claims 1 to 5,
The text-formed voice data of the fourth user stored in the voice data recording unit includes conversations of the fourth user and the fifth user, and the voice is recorded in the same time zone as the conversation. When the fourth user and the fifth user have the conversation included in the text-voiced voice data of the fifth user stored in the data recording unit, the fourth user and the fifth user The voice data utilization system, wherein the five users are treated as if they were located at the same place in the time period.

The voice data utilization system according to claim 6,
The voice data storage server is
The schedule information of the fourth user or the fifth user in the time zone was acquired, and a meeting including the fourth user and the fifth user was performed in the time zone in the same place. In this case, the audio data utilization system is characterized in that the minutes of the conference are generated based on the contents of the conversation in the text-formatted audio data of the fourth user or the fifth user in the time period.