JP7048113B2

JP7048113B2 - Information processing equipment, information processing systems, and programs

Info

Publication number: JP7048113B2
Application number: JP2020155497A
Authority: JP
Inventors: 善久橋本; 秀之春日; 優一林
Original assignee: Ziku Technologies Co Ltd
Current assignee: Ziku Technologies Co Ltd
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2022-04-05
Anticipated expiration: 2040-02-10
Also published as: JP2021128744A

Description

本発明は、情報処理装置、情報処理システム、および、プログラムに関する。 The present invention relates to an information processing apparatus , an information processing system, and a program.

ＩＣ（Integrated Circuit）レコーダは、マイクロフォンからのアナログ信号をアナログ／デジタル変換してデジタルの音データを生成し、音データを記憶媒体に記憶する。ＩＣレコーダでは、生成された音データを圧縮する場合もある。ＩＣレコーダは、例えば、会議の議事録作成、または、打合せ記録などの用途で使用される。 An IC (Integrated Circuit) recorder converts an analog signal from a microphone into analog / digital to generate digital sound data, and stores the sound data in a storage medium. The IC recorder may compress the generated sound data. The IC recorder is used, for example, for creating minutes of a meeting or recording a meeting.

特開２０１７－２０７８０９号公報JP-A-2017-207809

例えばＩＣレコーダなどによって生成された音データに対して各種のデータ処理を実行し、ユーザにとって利便性が高く有益なデータを生成することのニーズは高い。 For example, there is a great need to execute various data processes on sound data generated by an IC recorder or the like to generate useful data that is highly convenient for the user.

本発明は、上記実情に鑑みてなされたものであり、ユーザにとって利便性の高いデータを生成する情報処理装置、情報処理システム、および、プログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information processing apparatus , an information processing system, and a program that generate data highly convenient for the user.

本実施形態の情報処理装置は、記憶装置とプロセッサとを備える。記憶装置は、複数の音セグメントを記憶する。プロセッサは、ユーザ端末と通信可能であり、記憶装置からデータを読み出し、記憶装置へデータを記憶させる。プロセッサは、記憶装置に記憶されている複数の音セグメントに対する文字起こし処理によって得られる複数の文字セグメントを記憶装置へ記憶させ、ユーザ端末に、複数の文字セグメントと、複数の文字セグメントの組み込み先となり得るファイルを示すファイル情報とを表示させ、ユーザ端末に表示されている複数の文字セグメントのうちユーザに選択された複数の文字セグメントのそれぞれを示す複数のセグメント指定、および、ユーザ端末に表示されているファイル情報のうちユーザに選択された特定のファイル情報を、ユーザ端末から受信し、記憶装置に記憶されており特定のファイル情報の示すファイルに、複数のセグメント指定の示す複数の文字セグメントまたは複数の文字セグメントのデータ本体をまとめて組み込み、ユーザ端末に、ファイルを表示させ、ユーザ端末からの指示に基づいて、記憶装置に記憶されているファイルの編集を実行する。ファイルに、複数のセグメント指定の示す複数の文字セグメントまたは複数の文字セグメントのデータ本体を組み込むとは、ファイル内に、複数のセグメント指定の示す複数の文字セグメントまたは複数の文字セグメントのデータ本体を追加することである。 The information processing device of the present embodiment includes a storage device and a processor. The storage device stores a plurality of sound segments. The processor can communicate with the user terminal, reads data from the storage device, and stores the data in the storage device. The processor stores a plurality of character segments obtained by transcription processing for a plurality of sound segments stored in the storage device in the storage device, and becomes a destination for incorporating the plurality of character segments and the plurality of character segments in the user terminal. The file information indicating the file to be obtained is displayed, and among the multiple character segments displayed on the user terminal, multiple segment designations indicating each of the plurality of character segments selected by the user, and displayed on the user terminal are displayed. The specific file information selected by the user among the existing file information is received from the user terminal and stored in the storage device. The data body of the character segment of is incorporated together, the file is displayed on the user terminal, and the file stored in the storage device is edited based on the instruction from the user terminal. Incorporating a data body of multiple character segments or multiple character segments indicated by multiple segment specifications into a file means adding the data body of multiple character segments or multiple character segments indicated by multiple segment specifications to the file. It is to be.

本発明によれば、ユーザにとって利便性の高いデータを生成する情報処理装置、情報処理システム、および、プログラムを提供することができる。 According to the present invention, it is possible to provide an information processing device , an information processing system, and a program that generate data that is highly convenient for the user.

第１の実施形態に係るレコーダの構成の一例を示すブロック図。The block diagram which shows an example of the structure of the recorder which concerns on 1st Embodiment. 第１の実施形態に係るデータの構成の例を示すブロック図。The block diagram which shows the example of the structure of the data which concerns on 1st Embodiment. 第１の実施形態に係る第１のサーバの構成の一例を示すブロック図。The block diagram which shows an example of the structure of the 1st server which concerns on 1st Embodiment. 第１のサーバからダウンロードされたデータをユーザ端末のブラウザで表示した画面の例を示す図。The figure which shows the example of the screen which displayed the data downloaded from the 1st server with the browser of the user terminal. 第２の実施形態に係るレコーダの構成の一例を示すブロック図。The block diagram which shows an example of the structure of the recorder which concerns on 2nd Embodiment. 第２の実施形態に係るレコーダの外観を示す正面図。The front view which shows the appearance of the recorder which concerns on 2nd Embodiment. 第３の実施形態に係る第１のサーバの構成の一例を示すブロック図。The block diagram which shows an example of the structure of the 1st server which concerns on 3rd Embodiment.

以下、図面を参照して実施形態を説明する。図面において、同一の機能及び構成要素については、同一符号を付して説明を省略するか、または、簡単に説明を行う。 Hereinafter, embodiments will be described with reference to the drawings. In the drawings, the same functions and components are designated by the same reference numerals and the description thereof will be omitted, or the description will be briefly described.

（第１の実施形態）
第１の実施形態では、複数のマイクロフォン接続用のコネクタ（接続端子）を備えるレコーダと、当該レコーダによって生成された音データ（例えば音声データ）を処理する第１のサーバ（情報処理装置）とを説明する。 (First Embodiment)
In the first embodiment, a recorder provided with connectors (connection terminals) for connecting a plurality of microphones and a first server (information processing device) for processing sound data (for example, voice data) generated by the recorder are provided. explain.

図１は、第１の実施形態に係るレコーダ１の一例を示すブロック図である。 FIG. 1 is a block diagram showing an example of the recorder 1 according to the first embodiment.

レコーダ１は、複数の外付けのマイクロフォンＭ１～Ｍｎ（ｎは２以上の整数）と接続可能な複数のコネクタＣ１～Ｃｎと、内蔵のマイクロフォンＭと、操作装置（ユーザインタフェース装置）２と、表示装置３と、アナログ／デジタルコンバータ（以下、ＡＤＣという）４と、コントローラ５とを備える。コントローラ５は、例えば、プロセッサ６と、記憶装置７と、通信装置８とを備える。レコーダ１は、例えば、携帯型のＩＣレコーダでもよい。 The recorder 1 displays a plurality of connectors C1 to Cn that can be connected to a plurality of external microphones M1 to Mn (n is an integer of 2 or more), a built-in microphone M, and an operating device (user interface device) 2. It includes a device 3, an analog / digital converter (hereinafter referred to as ADC) 4, and a controller 5. The controller 5 includes, for example, a processor 6, a storage device 7, and a communication device 8. The recorder 1 may be, for example, a portable IC recorder.

複数のコネクタＣ１～Ｃｎのそれぞれは、複数のマイクロフォンＭ１～Ｍｎを着脱可能である。また、複数のコネクタＣ１～Ｃｎは、ＡＤＣ４と接続されている。 A plurality of microphones M1 to Mn can be attached to and detached from each of the plurality of connectors C1 to Cn. Further, the plurality of connectors C1 to Cn are connected to the ADC4.

マイクロフォンＭは、レコーダ１に内蔵されており、音を測定し、アナログ信号をＡＤＣ４へ送信する。 The microphone M is built in the recorder 1, measures the sound, and transmits an analog signal to the ADC 4.

操作装置２は、ユーザによる操作を受け付ける。操作装置２は、例えば、ユーザからの指示を受け付け、指示を例えばコントローラ５のプロセッサ６へ送信する。操作装置２は、例えば、ボタン、または、タッチパネルなどである。 The operation device 2 accepts an operation by the user. The operating device 2 receives, for example, an instruction from the user, and transmits the instruction to, for example, the processor 6 of the controller 5. The operating device 2 is, for example, a button, a touch panel, or the like.

第１の実施形態において、操作装置２は、例えば、コントローラ５のプロセッサ６から周期的にモードの問合せを受信し、問合せに対してユーザによって指定されているモードをプロセッサ６へ返す。あるいは、操作装置２は、ユーザからモードの指定を受け付けた場合に、このユーザによって指定されたモードをプロセッサ６へ通知する。 In the first embodiment, the operating device 2 periodically receives a mode inquiry from the processor 6 of the controller 5, and returns the mode specified by the user to the query to the processor 6. Alternatively, when the operating device 2 receives the mode designation from the user, the operating device 2 notifies the processor 6 of the mode designated by the user.

第１の実施形態において、モードとは、レコーダ１の動作の種類・態様を示す。レコーダ１は少なくとも２つのモードのいずれかで選択的に動作する。 In the first embodiment, the mode indicates the type / mode of operation of the recorder 1. Recorder 1 selectively operates in at least one of two modes.

第１のモードは、非標準モード（例えばワンショットボイスモード）であり、タイトル、見出し、要約、メモ、管理情報、書誌事項の内容、解説、注意事項などの音入力に用いられる。 The first mode is a non-standard mode (for example, one-shot voice mode), which is used for inputting sounds such as titles, headings, summaries, memos, management information, contents of bibliographic items, explanations, and notes.

第２のモードは、標準モードであり、議事録、打合せの内容などの標準の音入力に用いられる。 The second mode is a standard mode, which is used for standard sound input such as minutes and meeting contents.

表示装置３は、例えばコントローラ５のプロセッサ６などによる制御にしたがって、例えば記憶装置７に記憶されている各種のデータを表示する。表示装置３は、例えば、液晶ディスプレイ、または、有機ＥＬ（Electro-Luminescence）ディスプレイなどである。 The display device 3 displays various data stored in, for example, the storage device 7 under the control of the processor 6 of the controller 5, for example. The display device 3 is, for example, a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like.

ＡＤＣ４は、複数のマイクロフォンＭ１～Ｍｎから複数のコネクタＣ１～Ｃｎ経由で複数のアナログ信号を受信可能である。また、ＡＤＣ４は、マイクロフォンＭからアナログ信号を受信可能である。第１の実施形態において、ＡＤＣ４によって受信されるアナログ信号は、ステレオ信号であるとする。 The ADC 4 can receive a plurality of analog signals from the plurality of microphones M1 to Mn via the plurality of connectors C1 to Cn. Further, the ADC 4 can receive an analog signal from the microphone M. In the first embodiment, it is assumed that the analog signal received by the ADC 4 is a stereo signal.

ＡＤＣ４は、受信した少なくとも１つのアナログ信号に対してアナログ／デジタル変換を行い、デジタル信号をコントローラ５へ送信する。ＡＤＣ４は、例えば、複数のマイクロフォンＭ１～Ｍｎから複数のコネクタＣ１～Ｃｎ経由で複数のアナログ信号を受信した場合に、この複数のアナログ信号に基づいて１つのデジタル信号を生成（例えば合成）し、生成したデジタル信号を例えばコントローラ５のプロセッサ６へ送信する。 The ADC 4 performs analog / digital conversion on at least one received analog signal, and transmits the digital signal to the controller 5. For example, when the ADC4 receives a plurality of analog signals from a plurality of microphones M1 to Mn via a plurality of connectors C1 to Cn, the ADC4 generates (for example, synthesizes) one digital signal based on the plurality of analog signals. The generated digital signal is transmitted to, for example, the processor 6 of the controller 5.

また、ＡＤＣ４は、複数のマイクロフォンＭ１～Ｍｎから複数のコネクタＣ１～Ｃｎ経由で複数のアナログ信号を受信した場合に、複数のアナログ信号のレベルの取得、複数のアナログ信号が有効かまたは無効かの判断、複数のアナログ信号のゲイン（ボリューム）値の取得などを含む解析を実行する。そして、ＡＤＣ４は、解析の結果を示す解析情報９をコントローラ５のプロセッサ６へ送信する。解析情報９は、例えば、複数のアナログ信号のレベル、複数のアナログ信号のゲイン値などを含む。 Further, when the ADC4 receives a plurality of analog signals from a plurality of microphones M1 to Mn via a plurality of connectors C1 to Cn, the ADC4 acquires a plurality of analog signal levels and determines whether the plurality of analog signals are valid or invalid. Perform analysis including judgment, acquisition of gain (volume) values of multiple analog signals, and so on. Then, the ADC 4 transmits the analysis information 9 indicating the analysis result to the processor 6 of the controller 5. The analysis information 9 includes, for example, a level of a plurality of analog signals, a gain value of the plurality of analog signals, and the like.

なお、ＡＤＣ４は、例えば、コントローラ５のプロセッサ６へ送信するデジタル信号または解析情報９に対してデータ圧縮を実行してもよい。 The ADC 4 may perform data compression on the digital signal or analysis information 9 transmitted to the processor 6 of the controller 5, for example.

さらに、ＡＤＣ４は、例えばコントローラ５のプロセッサ６などから受信した制御コマンド１０にしたがって、複数のアナログ信号のレベルの調整、または、ゲイン値の調整などの制御を行う。これにより、デジタル信号の品質が向上する。 Further, the ADC 4 controls the level of a plurality of analog signals or the gain value according to the control command 10 received from the processor 6 of the controller 5, for example. This improves the quality of the digital signal.

記憶装置７は、例えばＮＡＮＤ型フラッシュメモリなどのような不揮発性メモリと、例えばＤＲＡＭ（Dynamic Random Access Memory）などのような揮発性メモリとを備える。 The storage device 7 includes a non-volatile memory such as a NAND flash memory and a volatile memory such as a DRAM (Dynamic Random Access Memory).

記憶装置７は、例えば、オペレーティング・システム（以下、ＯＳという）１１、ソフトウェア１２、メタデータ１３、音データ１４、解析データ１５、音データ１４に対応する文字データ１６および翻訳データ１７、音データ１４に対する話者認識データ１８などの各種のデータを記憶する。なお、メタデータ１３、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８は、基本的には、第１のサーバ１９で管理されており、必要に応じて、必要な部分が、第１のサーバ１９から記憶装置７に部分的にダウンロードされ、記憶装置７に一時的に記憶され、レコーダ１で使用されるとしてもよい。この場合、レコーダ１の記憶装置７の記憶容量を低減させることができる。また、メタデータ１３、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８のうちの一部が記憶装置７に記憶され、他の部分が第１のサーバ１９で管理されてもよい。 The storage device 7 is, for example, an operating system (hereinafter referred to as OS) 11, software 12, metadata 13, sound data 14, analysis data 15, character data 16 corresponding to sound data 14, translation data 17, and sound data 14. Various data such as speaker recognition data 18 for the data are stored. The metadata 13, the sound data 14, the analysis data 15, the character data 16, the translation data 17, and the speaker recognition data 18 are basically managed by the first server 19, and if necessary, The necessary parts may be partially downloaded from the first server 19 to the storage device 7, temporarily stored in the storage device 7, and used by the recorder 1. In this case, the storage capacity of the storage device 7 of the recorder 1 can be reduced. Further, a part of the metadata 13, the sound data 14, the analysis data 15, the character data 16, the translation data 17, and the speaker recognition data 18 is stored in the storage device 7, and the other part is stored in the first server 19. It may be managed.

メタデータ１３は、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８に関するメタ情報を含む。メタデータ１３は、例えば、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８を適宜関連付けている。メタデータ１３は、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８の記憶位置の情報を含む。 The metadata 13 includes metadata about sound data 14, analysis data 15, character data 16, translation data 17, and speaker recognition data 18. The metadata 13 appropriately associates, for example, sound data 14, analysis data 15, character data 16, translation data 17, and speaker recognition data 18. The metadata 13 includes information on the storage positions of the sound data 14, the analysis data 15, the character data 16, the translation data 17, and the speaker recognition data 18.

音データ１４は、ＡＤＣ４から受信されたデジタル信号に基づいて生成される。 The sound data 14 is generated based on the digital signal received from the ADC 4.

解析データ１５は、音データ１４に対応しておりＡＤＣ４から受信された解析情報９を含む。 The analysis data 15 corresponds to the sound data 14 and includes the analysis information 9 received from the ADC 4.

文字データ１６は、音データ１４に対応しており音データ１４に対する文字起こし処理によって生成される例えばテキストデータを含む。 The character data 16 corresponds to the sound data 14 and includes, for example, text data generated by a transcription process for the sound data 14.

翻訳データ１７は、音データ１４に対応しており文字データ１６に対する翻訳処理によって生成されるテキストデータを含む。 The translation data 17 corresponds to the sound data 14 and includes text data generated by a translation process for the character data 16.

話者認識データ１８は、音データ１４に対応しており音データ１４および解析データ１５に基づいて実行された話者認識処理によって生成され、話者識別情報を含む。 The speaker recognition data 18 corresponds to the sound data 14, is generated by the speaker recognition process executed based on the sound data 14 and the analysis data 15, and includes speaker identification information.

記憶装置７に記憶されるメタデータ１３、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８は、図２を用いて後で具体的に説明する。 The metadata 13, sound data 14, analysis data 15, character data 16, translation data 17, and speaker recognition data 18 stored in the storage device 7 will be specifically described later with reference to FIG.

通信装置８は、例えばプロセッサ６による制御にしたがって、無線または有線により、例えば第１のサーバ１９または第２のサーバ２０などの他の装置との間で、データ、情報、信号、リクエスト、コマンド、指示、通知、呼び出し、または、応答の送受信を行う。 The communication device 8 receives data, information, signals, requests, commands, for example, wirelessly or by wire, as controlled by the processor 6, to and from other devices such as the first server 19 or the second server 20. Send and receive instructions, notifications, calls, or responses.

プロセッサ６は、記憶装置７に記憶されているＯＳ１１およびソフトウェア１２を実行することにより、例えば、制御部２１、データ生成部２２、判断部２３、送信制御部２４、受信制御部２５、表示データ生成部２６、表示制御部２７として機能する。 By executing the OS 11 and software 12 stored in the storage device 7, the processor 6 has, for example, a control unit 21, a data generation unit 22, a determination unit 23, a transmission control unit 24, a reception control unit 25, and display data generation. It functions as a unit 26 and a display control unit 27.

なお、制御部２１、データ生成部２２、判断部２３、送信制御部２４、受信制御部２５、表示データ生成部２６、表示制御部２７は、適宜組み合わせてもよく、または、分割してもよい。例えば、送信制御部２４と受信制御部２５とは、通信制御部として組み合わせてもよい。例えば、表示データ生成部２６と表示制御部２７とは組み合わせてもよい。 The control unit 21, data generation unit 22, judgment unit 23, transmission control unit 24, reception control unit 25, display data generation unit 26, and display control unit 27 may be appropriately combined or divided. .. For example, the transmission control unit 24 and the reception control unit 25 may be combined as a communication control unit. For example, the display data generation unit 26 and the display control unit 27 may be combined.

制御部２１は、レコーダ１に備えられている各種の構成要素、例えば、マイクロフォンＭ、操作装置２、表示装置３、ＡＤＣ４を制御する。 The control unit 21 controls various components provided in the recorder 1, for example, a microphone M, an operation device 2, a display device 3, and an ADC 4.

制御部２１は、例えば、ＡＤＣ４から受信した解析情報９に基づいて、複数のマイクロフォンＭ１～Ｍｎから複数のコネクタＣ１～Ｃｎ経由で受信する複数のアナログ信号の各レベルまたは各ゲイン値を所定範囲に調整するための制御コマンド１０を決定し、制御コマンド１０をＡＤＣ４へ送信する。これにより、デジタル信号の品質が向上する。 The control unit 21 sets each level or each gain value of a plurality of analog signals received from the plurality of microphones M1 to Mn via the plurality of connectors C1 to Cn from the plurality of microphones M1 to Mn into a predetermined range, for example, based on the analysis information 9 received from the ADC 4. The control command 10 for adjustment is determined, and the control command 10 is transmitted to the analog 4. This improves the quality of the digital signal.

制御部２１は、例えば、複数のコネクタＣ１～Ｃｎのうちのどのコネクタがマイクロフォンと接続状態にあるかを検出する。 The control unit 21 detects, for example, which of the plurality of connectors C1 to Cn is connected to the microphone.

制御部２１は、例えば、ＡＤＣ４から受信したデジタル信号、または、解析情報９に対するデータ復号を行う。 The control unit 21 performs data decoding for, for example, a digital signal received from the ADC 4 or the analysis information 9.

判断部２３は、操作装置２へモードの問合せを例えば周期的に送信し、操作装置２からモードの通知を受信する。そして、判断部２３は、ユーザが第１のモードを指定しているか、または、第２のモードを指定しているかを判断する。プロセッサ６は、モードの判断結果にしたがって、例えば、通信方式、制御、処理、機能、利用するサーバを切り替え可能である。第１の実施形態では、プロセッサ６は、モードの判断結果にしたがって、利用するＡＰＩ（Application Programming Interface）を切り替える。 The determination unit 23 periodically sends a mode inquiry to the operation device 2, and receives a mode notification from the operation device 2. Then, the determination unit 23 determines whether the user has specified the first mode or the second mode. The processor 6 can switch, for example, a communication method, control, processing, function, and a server to be used according to a mode determination result. In the first embodiment, the processor 6 switches the API (Application Programming Interface) to be used according to the determination result of the mode.

データ生成部２２は、例えば、ＡＤＣ４から受信したデジタル信号および解析情報９と、判断部２３によるモードの判断結果とに基づいて、メタデータ１３と音データ１４と解析データ１５とを生成する。データ生成部２２は、例えば、音データ１４を、時間または音のゲイン値の増減などに基づいて分割する。この分割されたデータのそれぞれを、音セグメントという。 The data generation unit 22 generates metadata 13, sound data 14, and analysis data 15 based on, for example, the digital signal received from the ADC 4 and the analysis information 9, and the mode determination result by the determination unit 23. The data generation unit 22 divides the sound data 14, for example, based on the time or increase / decrease in the gain value of the sound. Each of these divided data is called a sound segment.

そして、データ生成部２２は、メタデータ１３と音データ１４と解析データ１５とを記憶装置７へ記憶させる。 Then, the data generation unit 22 stores the metadata 13, the sound data 14, and the analysis data 15 in the storage device 7.

送信制御部２４は、判断部２３によるモードの判断結果にしたがって、第１のサーバ１９または第２のサーバ２０に備えられているＡＰＩ（機能としてもよい）のうちどのＡＰＩを使用するかを決定し、決定されたＡＰＩを利用して、記憶装置７に記憶されているメタデータ１３、音データ１４、解析データ１５、文字起こしリクエスト、翻訳リクエスト、話者認識リクエストを、通信装置８経由で第１のサーバ１９または第２のサーバ２０へ送信する。 The transmission control unit 24 determines which API (which may be a function) provided in the first server 19 or the second server 20 is used according to the mode determination result by the determination unit 23. Then, using the determined API, the metadata 13, the sound data 14, the analysis data 15, the transcription request, the translation request, and the speaker recognition request stored in the storage device 7 are sent to the communication device 8 via the communication device 8. It is transmitted to the server 19 of 1 or the server 20 of the second.

なお、送信制御部２４は、文字起こしリクエスト、翻訳リクエスト、話者認識リクエストなどのリクエストの送信を省略してもよい。この場合、例えば、送信制御部２４が第１のサーバ１９または第２のサーバ２０へメタデータ１３、音データ１４、解析データ１５を送信することで、文字起こしリクエスト、翻訳リクエスト、話者認識リクエストが送信されたものとみなす。以下の他の説明でも、リクエストの発行を省略し、データの送信をリクエストの発行とみなしてもよい。 The transmission control unit 24 may omit transmission of requests such as transcription requests, translation requests, and speaker recognition requests. In this case, for example, the transmission control unit 24 transmits the metadata 13, the sound data 14, and the analysis data 15 to the first server 19 or the second server 20, so that a transcription request, a translation request, and a speaker recognition request can be made. Is considered to have been sent. In the other description below, the issuance of the request may be omitted and the transmission of data may be regarded as the issuance of the request.

第１の実施形態において、レコーダ１は、ＡＰＩ１９ａを用いることにより、第１のサーバ１９によって提供される機能を使用可能であり、ＡＰＩ２０ａを用いることにより、第２のサーバ２０によって提供される機能を使用可能である。 In the first embodiment, the recorder 1 can use the function provided by the first server 19 by using the API 19a, and can use the function provided by the second server 20 by using the API 20a. It can be used.

第１の実施形態において、送信制御部２４は、例えば、通信装置８経由で第１のサーバ１９または第２のサーバ２０へ、ストリーミングにより、メタデータ１３、音データ１４、または、解析データ１５を送信してもよい。送信制御部２４は、例えば、通信装置８経由で第１のサーバ１９または第２のサーバ２０へ、ストリーミングではなく間隔をあけて（例えば所定のデータ量または所定の時間ごとに）、メタデータ１３、音データ１４、または、解析データ１５をまとめて送信してもよい。送信制御部２４は、例えば、データの送信開始から所定の期間、ストリーミングにより第１のサーバ１９または第２のサーバ２０へ、メタデータ１３、音データ１４、または、解析データ１５を送信し、所定の期間経過後に、ストリーミングではなく間隔をあけて、第１のサーバ１９または第２のサーバ２０へ、メタデータ１３、音データ１４、または、解析データ１５を送信してもよい。 In the first embodiment, the transmission control unit 24 transfers the metadata 13, the sound data 14, or the analysis data 15 to the first server 19 or the second server 20 via, for example, the communication device 8 by streaming. You may send it. The transmission control unit 24, for example, sends the metadata 13 to the first server 19 or the second server 20 via the communication device 8 at intervals (for example, at a predetermined amount of data or at a predetermined time) instead of streaming. , Sound data 14, or analysis data 15 may be transmitted together. For example, the transmission control unit 24 transmits the metadata 13, the sound data 14, or the analysis data 15 to the first server 19 or the second server 20 by streaming for a predetermined period from the start of data transmission, and determines. After the elapse of the period, the metadata 13, the sound data 14, or the analysis data 15 may be transmitted to the first server 19 or the second server 20 at intervals instead of streaming.

送信制御部２４は、例えば、メタデータ１３、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８の評価（特徴検出、属性判断）演算を行い、評価値（特徴量、属性情報）に応じて利用するＡＰＩ、サーバ、または、機能を切り替えてもよい。より具体的には、送信制御部２４は、例えば、文字データ１６が所定の分野の用語を所定の割合以上含む場合に、分野判断結果にしたがって以降の音データを分野に特化した文字起こし処理へ送信してもよい。 The transmission control unit 24 performs an evaluation (feature detection, attribute determination) calculation of, for example, metadata 13, sound data 14, analysis data 15, character data 16, translation data 17, and speaker recognition data 18, and evaluates values (features). The API, server, or function to be used may be switched according to the amount (quantity, attribute information). More specifically, for example, when the character data 16 contains a term of a predetermined field or more in a predetermined ratio, the transmission control unit 24 performs a field-specific transcription process of the subsequent sound data according to the field determination result. May be sent to.

受信制御部２５は、例えば、第１のサーバ１９または第２のサーバ２０から通信装置８経由で、ストリーミングにより、文字データ１６、翻訳データ１７、話者認識データ１８を受信してもよい。受信制御部２５は、例えば、第１のサーバ１９または第２のサーバ２０から通信装置８経由で、ストリーミングではなく間隔をあけて、文字データ１６、翻訳データ１７、話者認識データ１８を受信してもよい。受信制御部２５は、例えば、データの送信開始から所定の期間、第１のサーバ１９または第２のサーバ２０から通信装置８経由で、ストリーミングにより文字データ１６、翻訳データ１７、話者認識データ１８を受信し、所定の期間経過後に、第１のサーバ１９または第２のサーバ２０から通信装置８経由で、ストリーミングではなく間隔をあけて、文字データ１６、翻訳データ１７、話者認識データ１８を受信してもよい。 The reception control unit 25 may receive the character data 16, the translation data 17, and the speaker recognition data 18 by streaming from, for example, the first server 19 or the second server 20 via the communication device 8. The reception control unit 25 receives, for example, the character data 16, the translation data 17, and the speaker recognition data 18 from the first server 19 or the second server 20 via the communication device 8 at intervals instead of streaming. You may. For example, the reception control unit 25 may stream the character data 16, the translation data 17, and the speaker recognition data 18 from the first server 19 or the second server 20 via the communication device 8 for a predetermined period from the start of data transmission. Is received, and after a predetermined period of time has elapsed, the character data 16, the translation data 17, and the speaker recognition data 18 are input from the first server 19 or the second server 20 via the communication device 8 at intervals instead of streaming. You may receive it.

受信制御部２５は、受信した文字データ１６、翻訳データ１７、話者認識データ１８を、記憶装置７に記憶させ、例えば文字データ１６、翻訳データ１７、話者認識データ１８の位置情報を追加するなどのように、記憶装置７に記憶されているメタデータ１３を更新する。 The reception control unit 25 stores the received character data 16, translation data 17, and speaker recognition data 18 in the storage device 7, and adds, for example, the position information of the character data 16, translation data 17, and speaker recognition data 18. And so on, the metadata 13 stored in the storage device 7 is updated.

表示データ生成部２６は、記憶装置７に記憶されている文字データ１６、翻訳データ１７、話者認識データ１８を読み出し、ユーザの指示に対応する表示データを生成する。 The display data generation unit 26 reads out the character data 16, the translation data 17, and the speaker recognition data 18 stored in the storage device 7, and generates display data corresponding to the user's instruction.

表示制御部２７は、表示データ生成部２６によって生成された表示データを表示装置３に表示させる。 The display control unit 27 causes the display device 3 to display the display data generated by the display data generation unit 26.

第１の実施形態において、表示制御部２７は、文字データ１６または翻訳データ１７をまとめて表示するのではなく、短い周期で１文字ずつ表示してもよい。これにより、ユーザは、レコーダ１が文字データ１６または翻訳データ１７を継続的に取得および記憶していることを認識することができる。 In the first embodiment, the display control unit 27 may display the character data 16 or the translation data 17 character by character in a short cycle instead of displaying the character data 16 or the translation data 17 together. As a result, the user can recognize that the recorder 1 continuously acquires and stores the character data 16 or the translation data 17.

図２は、第１の実施形態に係るデータの構成の例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the configuration of data according to the first embodiment.

メタデータ１３は、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８に関する各種のメタ情報を含む。具体的には、メタデータ１３は、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８に対して付される各種の属性情報であり、例えば、レコーダ１を使用するユーザのユーザ識別情報（ユーザＩＤ）、レコーダ１のデバイス識別情報（デバイスＩＤ）、時間情報（タイムスタンプ）、音データ１４の位置情報、解析データ１５の位置情報、文字データ１６の位置情報、翻訳データ１７の位置情報、話者認識データ１８の位置情報、音データ１４のサイズ、解析データ１５のサイズ、文字データ１６のサイズ、翻訳データ１７のサイズ、話者認識データ１８のサイズ、音データ１４の種類情報（例えばデータ形式）、解析データ１５の種類情報、文字データ１６の種類情報、翻訳データ１７の種類情報、話者認識データ１８の種類情報などを含む。 The metadata 13 includes various meta information regarding sound data 14, analysis data 15, character data 16, translation data 17, and speaker recognition data 18. Specifically, the metadata 13 is various attribute information attached to the sound data 14, the analysis data 15, the character data 16, the translation data 17, and the speaker recognition data 18, and the recorder 1 is used, for example. User identification information (user ID) of the user, device identification information (device ID) of recorder 1, time information (time stamp), position information of sound data 14, position information of analysis data 15, position information of character data 16, Position information of translation data 17, position information of speaker recognition data 18, size of sound data 14, size of analysis data 15, size of character data 16, size of translation data 17, size of speaker recognition data 18, sound data It includes 14 types of information (for example, data format), analysis data 15, type information of character data 16, translation data 17, type information of speaker recognition data 18, and the like.

音データ１４は、ＡＤＣ４から受信されたデジタル信号に基づいて生成されたデータである。音データ１４は、複数の音セグメントＳＳ１～ＳＳｍ（ｍは、２以上の整数）を含む。デジタル信号に基づいて生成される音データ１４のデータ本体は、時間経過、ゲイン値の増減、データ量などに基づいて複数の音セグメントＳＳ１～ＳＳｍのデータ本体ＳＤ１～ＳＤｍに分割される。複数の音セグメントＳＳ１～ＳＳｍのそれぞれは、メタデータＳＭ１～ＳＭｍとデータ本体ＳＤ１～ＳＤｍを含む。音セグメントＳＳ１～ＳＳｍに含まれるメタデータＳＭ１～ＳＭｍは、音セグメントＳＳ１～ＳＳｍに含まれるデータ本体ＳＤ１～ＳＤｍに関する各種のメタ情報であり、例えば、時間情報、モード種別情報などを含む。なお、音データ１４内のメタデータＳＭ１～ＳＭｍは、省略されてもよい。 The sound data 14 is data generated based on the digital signal received from the ADC 4. The sound data 14 includes a plurality of sound segments SS1 to SSm (m is an integer of 2 or more). The data body of the sound data 14 generated based on the digital signal is divided into the data bodies SD1 to SDm of a plurality of sound segments SS1 to SSm based on the passage of time, the increase / decrease in the gain value, the amount of data, and the like. Each of the plurality of sound segments SS1 to SSm includes metadata SM1 to SMm and data main body SD1 to SDm. The metadata SM1 to SMm included in the sound segments SS1 to SSm are various meta information related to the data main bodies SD1 to SDm included in the sound segments SS1 to SSm, and include, for example, time information, mode type information, and the like. The metadata SM1 to SMm in the sound data 14 may be omitted.

解析データ１５は、ＡＤＣ４から受信された解析情報９に基づいて生成されたデータである。解析データ１５は、複数の解析セグメントＡＳ１～ＡＳｍを含む。複数の解析セグメントＡＳ１～ＡＳｍのそれぞれは、メタデータＡＭ１～ＡＭｍとデータ本体ＡＤ１～ＡＤｍを含む。解析セグメントＡＳ１～ＡＳｍに含まれるメタデータＡＭ１～ＡＭｍは、解析セグメントＡＳ１～ＡＳｍに含まれるデータ本体ＡＤ１～ＡＤｍに関する各種のメタ情報である。 The analysis data 15 is data generated based on the analysis information 9 received from the ADC 4. The analysis data 15 includes a plurality of analysis segments AS1 to ASm. Each of the plurality of analysis segments AS1 to ASm includes metadata AM1 to AMm and data bodies AD1 to ADm. The metadata AM1 to AMm included in the analysis segments AS1 to ASm are various meta information related to the data bodies AD1 to ADm included in the analysis segments AS1 to ASm.

文字データ１６は、音データ１４に対する文字起こし処理により生成された例えばテキスト形式のデータである。文字データ１６は、複数の文字セグメントＣＳ１～ＣＳｍを含む。複数の文字セグメントＣＳ１～ＣＳｍのそれぞれは、メタデータＣＭ１～ＣＭｍとデータ本体ＣＤ１～ＣＤｍを含む。文字セグメントＣＳ１～ＣＳｍに含まれるメタデータＣＭ１～ＣＭｍは、文字セグメントＣＳ１～ＣＳｍに含まれるデータ本体ＣＤ１～ＣＤｍに関する各種のメタ情報である。 The character data 16 is, for example, text format data generated by a transcription process for the sound data 14. The character data 16 includes a plurality of character segments CS1 to CSm. Each of the plurality of character segments CS1 to CSm includes metadata CM1 to CMm and data body CD1 to CDm. The metadata CM1 to CMm included in the character segments CS1 to CSm are various meta information related to the data main bodies CD1 to CDm included in the character segments CS1 to CSm.

翻訳データ１７は、文字データ１６に対する翻訳処理により生成された例えばテキスト形式のデータである。翻訳データ１７は、複数の翻訳セグメントＴＳ１～ＴＳｍを含む。複数の翻訳セグメントＴＳ１～ＴＳｍのそれぞれは、メタデータＴＭ１～ＴＭｍとデータ本体ＴＤ１～ＴＤｍを含む。翻訳セグメントＴＳ１～ＴＳｍに含まれるメタデータＴＭ１～ＴＭｍは、翻訳セグメントＴＳ１～ＴＳｍに含まれるデータ本体ＴＤ１～ＴＤｍに関する各種のメタ情報である。 The translation data 17 is, for example, text format data generated by a translation process for the character data 16. The translation data 17 includes a plurality of translation segments TS1 to TSm. Each of the plurality of translation segments TS1 to TSm includes metadata TM1 to TMm and data bodies TD1 to TDm. The metadata TM1 to TMm included in the translation segments TS1 to TSm are various meta information related to the data bodies TD1 to TDm included in the translation segments TS1 to TSm.

話者認識データ１８は、音データ１４および解析データ１５に基づいて話者認識処理により生成されたデータである。話者認識データ１８は、複数の話者認識セグメントＲＳ１～ＲＳｍを含む。複数の話者認識セグメントＲＳ１～ＲＳｍのそれぞれは、メタデータＲＭ１～ＲＭｍとデータ本体ＲＤ１～ＲＤｍを含む。話者認識セグメントＲＳ１～ＲＳｍに含まれるメタデータＲＭ１～ＲＭｍは、話者認識セグメントＲＳ１～ＲＳｍに含まれるデータ本体ＲＤ１～ＲＤｍに関する各種のメタ情報である。 The speaker recognition data 18 is data generated by the speaker recognition process based on the sound data 14 and the analysis data 15. The speaker recognition data 18 includes a plurality of speaker recognition segments RS1 to RSm. Each of the plurality of speaker recognition segments RS1 to RSm includes metadata RM1 to RMm and data body RD1 to RDm. The metadata RM1 to RMm included in the speaker recognition segments RS1 to RSm are various meta information related to the data bodies RD1 to RDm included in the speaker recognition segments RS1 to RSm.

メタデータＳＭ１～ＳＭｍ，ＡＭ１～ＡＭｍ，ＣＳ１～ＣＳｍ，ＴＭ１～ＴＭｍ，ＲＭ１～ＲＭｍは、音セグメントＳＳ１～ＳＳｍ、解析セグメントＡＳ１～ＡＳｍ、文字セグメントＣＳ１～ＣＳｍ、翻訳セグメントＴＳ１～ＴＳｍ、話者認識セグメントＲＳ１～ＲＳｍのそれぞれの位置情報を含む。さらに、音セグメントＳＳ１、解析セグメントＡＳ１、文字セグメントＣＳ１、翻訳セグメントＴＳ１、話者認識セグメントＲＳ１は、例えば、それぞれのメタデータＳＭ１，ＡＭ１，ＣＭ１，ＴＭ１，ＲＭ１内の時間情報などにより互いに関連付けられている。同様に、他の音セグメントＳＳ２～ＳＳｍ、解析セグメントＡＳ２～ＡＳｍ、文字セグメントＣＳ２～ＣＳｍ、翻訳セグメントＴＳ２～ＴＳｍ、話者認識セグメントＲＳ２～ＲＳｍについても、メタデータＳＭ２～ＳＭｍ，ＡＭ２～ＡＭｍ，ＣＳ２～ＣＳｍ，ＴＭ２～ＴＭｍ，ＲＭ２～ＲＭｍに基づいて、関連付けられている他のセグメントを認識可能である。 Metadata SM1 to SMm, AM1 to AMm, CS1 to CSm, TM1 to TMm, RM1 to RMm are sound segments SS1 to SSm, analysis segments AS1 to ASm, character segments CS1 to CSm, translation segments TS1 to TSm, and speaker recognition. Includes the position information of each of the segments RS1 to RSm. Further, the sound segment SS1, the analysis segment AS1, the character segment CS1, the translation segment TS1, and the speaker recognition segment RS1 are associated with each other by, for example, time information in their respective metadata SM1, AM1, CM1, TM1, RM1. There is. Similarly, for other sound segments SS2-SSm, analysis segments AS2-ASm, character segments CS2-CSm, translation segments TS2-TSm, and speaker recognition segments RS2-RSm, the metadata SM2-SMm, AM2-AMm, CS2 Other associated segments can be recognized based on ~ CSm, TM2 ~ TMm, RM2 ~ RMm.

図３は、第１の実施形態に係る第１のサーバ１９の構成の一例を示すブロック図である。 FIG. 3 is a block diagram showing an example of the configuration of the first server 19 according to the first embodiment.

情報処理システム２８は、レコーダ１と第１のサーバ１９とを備える。 The information processing system 28 includes a recorder 1 and a first server 19.

第１のサーバ１９は、レコーダ１、第２のサーバ２０、ユーザ端末２９と、無線または有線により通信可能である。 The first server 19 can communicate with the recorder 1, the second server 20, and the user terminal 29 wirelessly or by wire.

第１のサーバ１９は、通信装置３０と、記憶装置３１と、プロセッサ３２とを備える。 The first server 19 includes a communication device 30, a storage device 31, and a processor 32.

通信装置３０は、レコーダ１、第２のサーバ２０、または、ユーザ端末２９と、無線または有線により、例えば、データ、情報、信号、リクエスト、コマンド、指示、通知、呼び出し、または、応答などの送受信を行う。 The communication device 30 transmits / receives, for example, data, information, signals, requests, commands, instructions, notifications, calls, or responses to / from the recorder 1, the second server 20, or the user terminal 29 wirelessly or by wire. I do.

記憶装置３１は、ＯＳ３３と、当該ＯＳ３３によって制御されるサーバ・ソフトウェア３４とを記憶している。サーバ・ソフトウェア３４は、文字セグメントなどを含む表示データを、通信装置３０経由でユーザ端末２９のブラウザ３６に提供可能である。なお、サーバ・ソフトウェア３４は、例えば、メッセージ交換ソフトウェア、Ｗｅｂ会議ソフトウェア、または、ＳＮＳ（Social Networking Service）を提供するソフトウェアなどでもよい。 The storage device 31 stores the OS 33 and the server software 34 controlled by the OS 33. The server software 34 can provide display data including character segments and the like to the browser 36 of the user terminal 29 via the communication device 30. The server software 34 may be, for example, message exchange software, web conferencing software, or software that provides SNS (Social Networking Service).

さらに、記憶装置３１は、例えば、ユーザ情報６８、メタデータ１３、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８、ユーザに選択（例えばピックアップ）されたピックアップ・セグメント３５を組み込んだファイル、ブログデータ５０などを記憶する。ここで、ファイルにピックアップ・セグメント３５を組み込むとは、例えば、ファイル内に、ピックアップ・セグメント３５に含まれるデータ本体のテキストデータを追加することを意味する。 Further, the storage device 31 has, for example, user information 68, metadata 13, sound data 14, analysis data 15, character data 16, translation data 17, speaker recognition data 18, and a pickup selected (for example, picked up) by the user. A file incorporating the segment 35, blog data 50, etc. are stored. Here, incorporating the pickup segment 35 into the file means, for example, adding the text data of the data body included in the pickup segment 35 into the file.

第１の実施形態において、ユーザによって指定されたピックアップ・セグメント３５は、ユーザによって指定されたファイルに組み込まれる。 In the first embodiment, the pick-up segment 35 specified by the user is incorporated into the file specified by the user.

ユーザ情報６８は、第１のサーバ１９で提供されるＷｅｂサイトに登録をしたユーザの各種情報を含む、具体的には、ユーザ情報６８は、例えば、ユーザ識別情報、ユーザの使用するデバイス識別情報、ユーザの属性情報などを含む。ユーザ情報６８は、例えば、ユーザ識別情報またはデバイス識別情報により、メタデータ１３と関連付けられている。したがって、サーバ・ソフトウェア３４は、ユーザ情報６８に関連するメタデータ１３、音データ１４、解析データ１５、文字データ１６、翻訳データ１７、話者認識データ１８、ピックアップ・セグメント３５を検索または読み出すことができる。 The user information 68 includes various information of the user registered on the website provided by the first server 19. Specifically, the user information 68 includes, for example, user identification information and device identification information used by the user. , Includes user attribute information, etc. The user information 68 is associated with the metadata 13, for example, by user identification information or device identification information. Therefore, the server software 34 can search or read the metadata 13, the sound data 14, the analysis data 15, the character data 16, the translation data 17, the speaker recognition data 18, and the pickup segment 35 related to the user information 68. can.

プロセッサ３２は、レコーダ１から通信装置３０経由で受信したリクエストまたはデータにしたがって、ＡＰＩ１９ａに基づく各種の機能を提供する。換言すれば、第１のサーバ１９は、ＡＰＩ１９ａを用いて他の装置と連携して動作する。 The processor 32 provides various functions based on the API 19a according to the request or data received from the recorder 1 via the communication device 30. In other words, the first server 19 operates in cooperation with other devices by using the API 19a.

プロセッサ３２は、記憶装置３１に記憶されているＯＳ３３およびサーバ・ソフトウェア３４を実行することにより、例えば、受信部３７、文字起こし部３８、翻訳部３９、話者認識部４０、送信部４１、表示制御部４２、ピックアップ部４３、見積生成部４４、依頼部４５、ブログエディタ４６として機能する。 By executing the OS 33 and the server software 34 stored in the storage device 31, the processor 32 has, for example, a receiving unit 37, a transcription unit 38, a translation unit 39, a speaker recognition unit 40, a transmission unit 41, and a display unit. It functions as a control unit 42, a pickup unit 43, an estimate generation unit 44, a request unit 45, and a blog editor 46.

受信部３７は、レコーダ１から通信装置３０経由で、文字起こしリクエストと、翻訳リクエストと、話者認識リクエストと、メタデータ１３と、音データ１４と、解析データ１５とを受信し、ユーザ情報６８と、メタデータ１３と、音データ１４と、解析データ１５とを関連付けた状態で、記憶装置３１に記憶させる。 The receiving unit 37 receives the transcription request, the translation request, the speaker recognition request, the metadata 13, the sound data 14, and the analysis data 15 from the recorder 1 via the communication device 30, and the user information 68. , The metadata 13, the sound data 14, and the analysis data 15 are stored in the storage device 31 in an associative state.

文字起こし部３８は、受信部３７によって文字起こしリクエストが受信された場合に、音データ１４に対する文字起こし処理を実行し、文字データ１６を生成し、文字データ１６を記憶装置３１に記憶させ、メタデータ１３を更新する。文字起こし部３８は、例えば、メタデータ１３に、文字データ１６の位置情報などを登録する。 When the transcription request is received by the reception unit 37, the transcription unit 38 executes transcription processing for the sound data 14, generates character data 16, stores the character data 16 in the storage device 31, and meta. Data 13 is updated. The transcription unit 38 registers, for example, the position information of the character data 16 in the metadata 13.

あるいは、文字起こし部３８は、受信部３７によって文字起こしリクエストが受信された場合に、通信装置３０経由で第２のサーバ２０に文字起こしリクエストと音データ１４とを送信し、ＡＰＩ２０ａを用いて第２のサーバ２０に文字起こし処理４７を実行させ、第２のサーバ２０から通信装置３０経由で文字データ１６を受信し、文字データ１６を記憶装置３１に記憶させ、メタデータ１３を更新してもよい。 Alternatively, when the transcription request is received by the reception unit 37, the transcription unit 38 transmits the transcription request and the sound data 14 to the second server 20 via the communication device 30, and uses the API 20a to perform the transcription request. Even if the server 20 of 2 executes the transcription process 47, the character data 16 is received from the second server 20 via the communication device 30, the character data 16 is stored in the storage device 31, and the metadata 13 is updated. good.

翻訳部３９は、受信部３７によって翻訳リクエストが受信された場合に、文字データ１６に対する翻訳処理を実行し、翻訳データ１７を生成し、翻訳データ１７を記憶装置３１に記憶させ、メタデータ１３を更新する。翻訳部３９は、例えば、メタデータ１３に、翻訳データ１７の位置情報などを登録する。 When the translation request is received by the receiving unit 37, the translation unit 39 executes a translation process for the character data 16, generates the translation data 17, stores the translation data 17 in the storage device 31, and stores the metadata 13. Update. The translation unit 39 registers, for example, the position information of the translation data 17 in the metadata 13.

あるいは、翻訳部３９は、受信部３７によって翻訳リクエストが受信された場合に、通信装置３０経由で第２のサーバ２０に翻訳リクエストと文字データ１６とを送信し、ＡＰＩ２０ａを用いて第２のサーバ２０に翻訳処理４８を実行させ、第２のサーバ２０から通信装置３０経由で翻訳データ１７を受信し、翻訳データ１７を記憶装置３１に記憶させ、メタデータ１３を更新してもよい。 Alternatively, when the translation request is received by the receiving unit 37, the translation unit 39 transmits the translation request and the character data 16 to the second server 20 via the communication device 30, and the second server uses the API 20a. 20 may execute the translation process 48, receive the translation data 17 from the second server 20 via the communication device 30, store the translation data 17 in the storage device 31, and update the metadata 13.

話者認識部４０は、受信部３７によって話者認識リクエストが受信された場合に、音データおよび解析データ１５に基づいて話者認識処理を実行し、話者認識データ１８を生成し、話者認識データ１８を記憶装置３１に記憶させ、メタデータ１３を更新する。話者認識部４０は、例えば、メタデータ１３に、話者認識データ１８の位置情報などを登録する。 When the speaker recognition request is received by the receiver 37, the speaker recognition unit 40 executes the speaker recognition process based on the sound data and the analysis data 15, generates the speaker recognition data 18, and the speaker. The recognition data 18 is stored in the storage device 31, and the metadata 13 is updated. The speaker recognition unit 40 registers, for example, the position information of the speaker recognition data 18 in the metadata 13.

あるいは、話者認識部４０は、受信部３７によって話者認識リクエストが受信された場合に、通信装置３０経由で第２のサーバ２０に話者認識リクエストと音データ１４と解析データ１５とを送信し、ＡＰＩ２０ａを用いて第２のサーバ２０に話者認識処理４９を実行させ、第２のサーバ２０から通信装置３０経由で話者認識データ１８を受信し、話者認識データ１８を記憶装置３１に記憶させ、メタデータ１３を更新してもよい。 Alternatively, when the speaker recognition request is received by the reception unit 37, the speaker recognition unit 40 transmits the speaker recognition request, the sound data 14, and the analysis data 15 to the second server 20 via the communication device 30. Then, the second server 20 is made to execute the speaker recognition process 49 using the API 20a, the speaker recognition data 18 is received from the second server 20 via the communication device 30, and the speaker recognition data 18 is stored in the storage device 31. It may be stored in the data 13 and the metadata 13 may be updated.

第１の実施形態において、解析データ１５は、複数のマイクロフォンＭ１～Ｍｎのそれぞれによって取得された複数のアナログ信号のレベルまたはゲイン値を含むため、音データ１４の信号がどの話者による音声であるかを精度よく認識することができる。 In the first embodiment, since the analysis data 15 includes the levels or gain values of the plurality of analog signals acquired by each of the plurality of microphones M1 to Mn, the signal of the sound data 14 is the voice of which speaker. Can be recognized accurately.

送信部４１は、通信装置３０経由でレコーダ１へ、文字起こしリクエストの応答である文字データ１６を送信し、翻訳リクエストの応答である翻訳データ１７を送信し、話者認識リクエストの応答である話者認識データ１８を送信する。 The transmission unit 41 transmits the character data 16 which is the response of the transcription request to the recorder 1 via the communication device 30, transmits the translation data 17 which is the response of the translation request, and is the response of the speaker recognition request. The person recognition data 18 is transmitted.

表示制御部４２は、ユーザ端末２９から通信装置３０経由で受信した表示リクエストにしたがって、記憶装置３１に記憶されているユーザ情報６８、メタデータ１３、音データ１４、文字データ１６、翻訳データ１７、話者認識データ１８、ピックアップ・セグメント３５に基づいて、表示データを生成し、表示データを通信装置３０経由でユーザ端末２９に送信する。ユーザ端末２９では、ブラウザ３６により受信した表示データをユーザが閲覧可能な状態で表示する。この表示データの画面は、図４を用いて後で説明する。なお、表示制御部４２と、ユーザ端末２９のブラウザ３６などのソフトウェアとの連携により、画面表示が行われてもよい。 The display control unit 42 receives user information 68, metadata 13, sound data 14, character data 16, translation data 17, and user information 68, metadata 13, and sound data 14, which are stored in the storage device 31, according to a display request received from the user terminal 29 via the communication device 30. Display data is generated based on the speaker recognition data 18 and the pickup segment 35, and the display data is transmitted to the user terminal 29 via the communication device 30. The user terminal 29 displays the display data received by the browser 36 in a state in which the user can view it. The screen of this display data will be described later with reference to FIG. The screen may be displayed by linking the display control unit 42 with software such as the browser 36 of the user terminal 29.

ピックアップ部４３は、ユーザ端末２９のブラウザ３６の画面に、メニューを表示させる処理を実行する。メニューは、ユーザ端末２９で表示されておりユーザによって指定されたセグメントをコピーする宛先（例えばファイル）を選択するために用いられる。このメニューは、図４を用いて後で説明する。ピックアップ部４３は、ユーザによって指定されたセグメントの宛先の指定を促すメニューを生成し、メニューを通信装置３０経由でユーザ端末２９に表示させる。なお、ピックアップ部４３と、ユーザ端末２９のブラウザ３６などのソフトウェアとの連携により、メニュー表示が行われてもよい。 The pickup unit 43 executes a process of displaying a menu on the screen of the browser 36 of the user terminal 29. The menu is displayed on the user terminal 29 and is used to select a destination (eg, a file) to copy the segment specified by the user. This menu will be described later with reference to FIG. The pickup unit 43 generates a menu for prompting the designation of the destination of the segment designated by the user, and displays the menu on the user terminal 29 via the communication device 30. The menu may be displayed by linking the pickup unit 43 with software such as the browser 36 of the user terminal 29.

ユーザは、メニューを使用してこのメニューに対応するセグメントをコピーすることおよび宛先（ファイル、フォルダ、ディレクトリ）を指定することができる。 The user can use the menu to copy the segment corresponding to this menu and specify the destination (file, folder, directory).

そして、ピックアップ部４３は、ユーザ端末２９による指定にしたがって、ユーザ端末２９によって指定されたピックアップ・セグメント３５を、ユーザ端末２９によって指定された記憶装置３１の宛先に記憶させる。上述のように、第１の実施形態では、ピックアップ・セグメント３５は、ユーザ端末２９によって指定されたファイルに組み込まれる。 Then, the pickup unit 43 stores the pickup segment 35 designated by the user terminal 29 in the destination of the storage device 31 designated by the user terminal 29 according to the designation by the user terminal 29. As mentioned above, in the first embodiment, the pickup segment 35 is incorporated into the file specified by the user terminal 29.

なお、ピックアップ部４３は、ユーザに指定された複数のピックアップ・セグメント３５を、同じファイルへまとめて組み込んでもよい。ピックアップ部４３は、先に少なくとも１つのピックアップ・セグメント３５の指定を受け付け、次に、宛先を受け付けてもよい。あるいは、ピックアップ部４３は、先に宛先を受け付け、次に、少なくとも１つのピックアップ・セグメント３５の指定を受け付けてもよい。 The pickup unit 43 may incorporate a plurality of pickup segments 35 designated by the user into the same file. The pickup unit 43 may first accept the designation of at least one pickup segment 35 and then accept the destination. Alternatively, the pickup unit 43 may first accept the destination and then accept the designation of at least one pickup segment 35.

見積生成部４４は、ユーザ端末２９から通信装置３０経由で、人による文字起こしのリクエストを受信した場合に、記憶装置３１に記憶されている音データ１４と文字データ１６とのうちの少なくとも１つに基づいて、見積生成処理を実行し、見積データを、通信装置３０経由でユーザ端末２９に送信する。 The estimate generation unit 44 receives at least one of the sound data 14 and the character data 16 stored in the storage device 31 when a request for transcription by a person is received from the user terminal 29 via the communication device 30. The estimate generation process is executed based on the above, and the estimate data is transmitted to the user terminal 29 via the communication device 30.

見積生成処理は、例えば、音データ１４の時間長と単位時間あたりの料金との掛け算により、見積額を計算してもよく、文字データ１６の文字数と１文字あたりの料金との掛け算により、見積額を計算してもよい。 In the estimate generation process, for example, the estimated amount may be calculated by multiplying the time length of the sound data 14 and the charge per unit time, and the estimate is estimated by multiplying the number of characters of the character data 16 and the charge per character. You may calculate the amount.

ユーザ端末２９のブラウザ３６は、見積データを表示する。ユーザ端末２９は、見積データを閲覧したユーザから人による文字起こしの発注指示を受け付けると、人による文字起こしの発注リクエストを第１のサーバ１９へ送信する。 The browser 36 of the user terminal 29 displays the estimation data. When the user terminal 29 receives an order instruction for transcription by a person from a user who has browsed the quotation data, the user terminal 29 transmits an order request for transcription by a person to the first server 19.

依頼部４５は、ユーザ端末２９から通信装置３０経由で、発注リクエストを受信した場合に、例えば、発注書データと音データ１４とを、通信装置３０経由で所定の文字起こし業者のアドレスへ送信する。 When the request unit 45 receives an order request from the user terminal 29 via the communication device 30, for example, the purchase order data and the sound data 14 are transmitted to the address of a predetermined transcription company via the communication device 30. ..

ブログエディタ４６は、記憶装置３１に記憶されている例えばユーザ情報６８、メタデータ１３、音データ１４、文字データ１６、翻訳データ１７、話者認識データ１８、ヒックアップ・セグメント３５を適宜読み出し、読み出したユーザ情報６８、メタデータ１３、音データ１４、文字データ１６、翻訳データ１７、話者認識データ１８、ヒックアップ・セグメント３５の少なくとも一部をブログデータ５０に組み込み、編集可能とする。ブログエディタ４６は、編集中または編集結果であるブログデータ５０を記憶装置３１へ記憶させる。 The blog editor 46 appropriately reads and reads, for example, user information 68, metadata 13, sound data 14, character data 16, translation data 17, speaker recognition data 18, and hit-up segment 35 stored in the storage device 31. At least a part of the user information 68, the metadata 13, the sound data 14, the character data 16, the translation data 17, the speaker recognition data 18, and the hitup segment 35 are incorporated into the blog data 50 and made editable. The blog editor 46 stores the blog data 50 being edited or the result of editing in the storage device 31.

第２のサーバ２０は、ＡＰＩ２０ａを用いてレコーダ１または第１のサーバ１９などの他の装置と連携して動作する。第２のサーバ２０は、レコーダ１または第１のサーバ１９から受信した文字起こしリクエスト、翻訳リクエスト、話者認識リクエスト、または、データの受信にしたがって、文字起こし処理４７、翻訳処理４８、または、話者認識処理４９を実行し、実行結果をリクエストまたはデータの発信元へ返す。第２のサーバ２０は、例えば、ＡＳＰ（Application Service Provider）のサーバである。 The second server 20 operates in cooperation with other devices such as the recorder 1 or the first server 19 by using the API 20a. The second server 20 receives a transcription request, a translation request, a speaker recognition request, or data received from the recorder 1 or the first server 19, and the transcription process 47, the translation process 48, or the talk. The person recognition process 49 is executed, and the execution result is returned to the source of the request or data. The second server 20 is, for example, an ASP (Application Service Provider) server.

ユーザ端末２９は、例えば、第１のサーバ１９によって提供されるサイトへアクセス可能であり、ログインし、第１のサーバ１９へデータをアップロードすることができ、第１のサーバ１９からデータをダウンロードすることができる。ユーザ端末２９は、ブラウザ３６などを用いて、第１のサーバ１９からダウンロードされたデータを表示可能である。ユーザ端末２９は、マウス、タッチパネル、キーボードなどのユーザインタフェース装置によりユーザの操作を受け付け、データ、情報、信号、リクエスト、コマンド、指示、呼び出し、または、通知を第１のサーバ１９へ送信する。ユーザ端末２９は、第１のサーバ１９からダウンロードされたデータまたはプログラムを実行することにより第１のサーバ１９と連携して動作可能であり、例えばユーザの指定の受け付けまたはデータの表示などを実行する。 The user terminal 29 can, for example, access the site provided by the first server 19, log in, upload data to the first server 19, and download the data from the first server 19. be able to. The user terminal 29 can display the data downloaded from the first server 19 by using a browser 36 or the like. The user terminal 29 receives a user's operation by a user interface device such as a mouse, a touch panel, and a keyboard, and transmits data, information, signals, requests, commands, instructions, calls, or notifications to the first server 19. The user terminal 29 can operate in cooperation with the first server 19 by executing the data or program downloaded from the first server 19, for example, accepting a user's designation or displaying data. ..

ユーザ端末２９は、レコーダ１と同様に、このユーザ端末２９に取得されている音データ（例えば動画データとともに再生される音データ）を第１のサーバ１９へ送信し、音データに対応する文字データ、翻訳データ、話者認識データを受信し、表示してもよい。 Similar to the recorder 1, the user terminal 29 transmits the sound data (for example, the sound data reproduced together with the moving image data) acquired by the user terminal 29 to the first server 19, and the character data corresponding to the sound data. , Translation data, speaker recognition data may be received and displayed.

図４は、第１のサーバ１９からダウンロードされたデータをユーザ端末２９のブラウザ３６で表示した画面５１の例を示す図である。 FIG. 4 is a diagram showing an example of a screen 51 in which the data downloaded from the first server 19 is displayed on the browser 36 of the user terminal 29.

画面５１は、例えば、メタデータ１３に含まれている例えば時間情報Ｔ、音データ１４の時間変化５２、文字データ１６に含まれる文字セグメントＣＳ１～ＣＳ６のデータ本体ＣＤ１～ＣＤ６、文字データ１６に含まれる文字セグメントＣＳ１～ＣＳ６のメタデータＣＭ１～ＣＭ６に含まれる時間情報Ｔ１～Ｔ６、ユーザ情報６８に含まれているユーザ名Ｎ、ユーザ名Ｎのユーザに関連する音データ１４のログ情報５２Ｌ、ユーザ名Ｎのユーザに関連するピックアップ・セグメント３５の宛先（ファイル名）５３、人による文字起こしボタン５５を含む。 The screen 51 is included in, for example, the time information T included in the metadata 13, the time change 52 of the sound data 14, the data bodies CD1 to CD6 of the character segments CS1 to CS6 included in the character data 16, and the character data 16. The time information T1 to T6 included in the metadata CM1 to CM6 of the character segments CS1 to CS6, the user name N included in the user information 68, the log information 52L of the sound data 14 related to the user of the user name N, and the user. Includes a destination (file name) 53 of the pickup segment 35 associated with the user of name N, and a human transcription button 55.

さらに、画面５１は、ユーザが指定した（例えばマウスオーバーした）文字セグメントＣＳ２のデータ本体ＣＤ２に対して表示されたメニュー５４を含む。メニュー５４は、ユーザに対して、ピックアップ・セグメント３５の宛先５３の指定を促す。図面５１では、マウスオーバーされたデータ本体ＣＤ２の表示表域の右上部分に、メニュー５４が表示されている。 Further, the screen 51 includes a menu 54 displayed for the data body CD2 of the character segment CS2 specified by the user (for example, mouse over). The menu 54 prompts the user to specify the destination 53 of the pickup segment 35. In FIG. 51, the menu 54 is displayed in the upper right portion of the display table area of the data main body CD2 that has been moused over.

第１の実施形態において、ユーザがユーザ端末２９を操作し、文字セグメントＣＳ２の宛先を指定すると、ユーザ端末２９は、例えば第１のサーバ１９のピックアップ部４３と連携して、指定された文字セグメントＣＳ２を、ユーザ端末２９によって指定された宛先に記憶させる。 In the first embodiment, when the user operates the user terminal 29 and designates the destination of the character segment CS2, the user terminal 29 cooperates with, for example, the pickup unit 43 of the first server 19, and the designated character segment is specified. The CS2 is stored in the destination specified by the user terminal 29.

画面５１では、音データ１４の時間変化５２が上から下へ時間が経過するように表示されている。文字セグメントＣＳ１～ＣＳ６のデータ本体ＣＤ１～ＣＤ６は、音データ１４の時間変化５２の横に表示されており、時間情報Ｔ１～Ｔ６にしたがってデータ本体ＣＤ１～ＣＤ６と音データ１４の時間変化５２とが紐づけられている。 On the screen 51, the time change 52 of the sound data 14 is displayed so that the time elapses from top to bottom. The data main bodies CD1 to CD6 of the character segments CS1 to CS6 are displayed next to the time change 52 of the sound data 14, and the time change 52 of the data main bodies CD1 to CD6 and the sound data 14 are displayed according to the time information T1 to T6. It is tied.

人による文字起こしボタン５５は、人による文字起こしを使用するユーザによって押下される。人による文字起こしボタン５５が押下されると、ブラウザ３６は、見積データを表示する。 The human transcription button 55 is pressed by a user who uses the human transcription. When the human transcription button 55 is pressed, the browser 36 displays the estimation data.

以上説明した第１の実施形態において、レコーダ１のコントローラ５は、ＡＤＣ４から受信した解析情報９に基づいてＡＤＣ４を制御することができる。このため、コントローラ５は、高品質の音データ１４を生成することができ、音データ１４に基づいて高品質の文字データ１６または翻訳データ１７を取得することができる。 In the first embodiment described above, the controller 5 of the recorder 1 can control the ADC 4 based on the analysis information 9 received from the ADC 4. Therefore, the controller 5 can generate high-quality sound data 14, and can acquire high-quality character data 16 or translation data 17 based on the sound data 14.

第１の実施形態において、レコーダ１は、複数のマイクロフォンＭ１～Ｍｎを接続するための複数のコネクタＣ１～Ｃｎを備えており、第１のサーバ１９または第２のサーバ２０は、複数のマイクロフォンＭ１～Ｍｎによって取得された複数のアナログ信号の解析情報９などに基づいて話者の認識を行う。このため、話者認識を高精度に行うことができる。 In the first embodiment, the recorder 1 includes a plurality of connectors C1 to Cn for connecting a plurality of microphones M1 to Mn, and the first server 19 or the second server 20 includes a plurality of microphones M1. -The speaker is recognized based on the analysis information 9 and the like of the plurality of analog signals acquired by Mn. Therefore, speaker recognition can be performed with high accuracy.

第１の実施形態においては、レコーダ１と、第１のサーバ１９と第２のサーバ２０とのうちの少なくとも一方との連携により、音データ１４に対応する文字データ１６または翻訳データ１７が生成される。このため、ユーザは、第１のサーバ１９によって提供される特殊なまたは専門的な文字起こし処理、翻訳処理、話者認識処理を利用することができる。また、ユーザは、第２のサーバ２０によって提供される最新の文字起こし処理４７、翻訳処理４８、話者認識処理４９を利用することができる。これにより、ユーザは、高品質の文字データ１６、翻訳データ１７、話者認識データ１８を取得することができる。 In the first embodiment, the character data 16 or the translation data 17 corresponding to the sound data 14 is generated by the cooperation between the recorder 1 and at least one of the first server 19 and the second server 20. To. Therefore, the user can utilize the special or specialized transcription processing, translation processing, and speaker recognition processing provided by the first server 19. In addition, the user can use the latest transcription processing 47, translation processing 48, and speaker recognition processing 49 provided by the second server 20. As a result, the user can acquire high-quality character data 16, translation data 17, and speaker recognition data 18.

第１の実施形態において、ユーザは、レコーダ１の操作装置２を用いて、第１のモードと第２のモードとの切り替えを容易に行うことができ、モードの切り替えに応じて容易にＡＰＩ、機能、処理、サーバを切り替えることができる。このため、ユーザの利便性が向上する。 In the first embodiment, the user can easily switch between the first mode and the second mode by using the operation device 2 of the recorder 1, and the API can be easily switched according to the mode switching. You can switch between functions, processes, and servers. Therefore, the convenience of the user is improved.

第１の実施形態において、レコーダ１は、音データ１４を記憶するとともに、文字データ１６または翻訳データ１７を周期的に１文字ずつ表示していく。この場合、レコーダ１の表示内容は、継続的に変化する。このため、ユーザは、レコーダ１が動作していることを容易に理解できる。 In the first embodiment, the recorder 1 stores the sound data 14 and periodically displays the character data 16 or the translation data 17 character by character. In this case, the display content of the recorder 1 changes continuously. Therefore, the user can easily understand that the recorder 1 is operating.

第１の実施形態において、ユーザは、第１のサーバ１９から受信した表示データをユーザ端末２９のブラウザ３６により閲覧し、メタデータ１３、音データ１４、文字データ１６、翻訳データ１７、話者認識データ１８を相互に関連付けて参照することができる。 In the first embodiment, the user browses the display data received from the first server 19 by the browser 36 of the user terminal 29, and the metadata 13, the sound data 14, the character data 16, the translation data 17, and the speaker recognition. Data 18 can be associated with each other for reference.

第１の実施形態において、ユーザは、文字セグメントＣＳ１～ＣＳｍ、翻訳セグメントＴＳ１～ＴＳｍの中からピックアップ・セグメント３５を指定すること、および、ピックアップ・セグメント３５の宛先を指定することにより、ピックアップ・セグメント３５を宛先のファイルに組み込んで記憶することができる。これにより、ユーザは、データの整理を効率的に行うことができる。 In the first embodiment, the user specifies the pickup segment 35 from the character segments CS1 to CSm and the translation segments TS1 to TSm, and specifies the destination of the pickup segment 35 to specify the pickup segment. 35 can be incorporated into the destination file and stored. As a result, the user can efficiently organize the data.

第１の実施形態において、ユーザは、音セグメントＳＳ１～ＳＳｍ、文字セグメントＣＳ１～ＣＳｍ、翻訳セグメントＴＳ１～ＴＳｍを組み込んで、ブログデータ５０を生成することができる。これにより、ユーザは、ブログ作成・編集を効率的に行うことができる。 In the first embodiment, the user can generate the blog data 50 by incorporating the sound segments SS1 to SSm, the character segments CS1 to CSm, and the translation segments TS1 to TSm. As a result, the user can efficiently create and edit a blog.

（第２の実施形態）
第２の実施形態では、第１の実施形態で説明したレコーダ１の変形例を説明する。 (Second embodiment)
In the second embodiment, a modification of the recorder 1 described in the first embodiment will be described.

図５は、第２の実施形態に係るレコーダ１Ａの一例を示すブロック図である。 FIG. 5 is a block diagram showing an example of the recorder 1A according to the second embodiment.

レコーダ１Ａは、複数のコネクタＣ１～Ｃｎと、出力用コネクタＣｏと、内蔵のマイクロフォンＭと、スピーカ５６と、ＡＤＣ４と、デジタル／アナログコンバータ（以下、ＤＡＣという）５７と、電源装置５８と、操作装置２と、表示装置３と、時計装置５９と、記憶装置７と、通信装置８と、プロセッサ（またはコントローラ）６とを備える。なお、ＡＤＣ４、ＤＡＣ５７、時計装置５９、通信装置８、プロセッサ６は、適宜組み合わせてもよい。レコーダ１Ａの各種の構成要素は、例えば、バス６０を介して互いにデータ、情報、信号、リクエスト、コマンド、指示、通知、呼び出し、または、応答などを送受信可能である。 The recorder 1A operates a plurality of connectors C1 to Cn, an output connector Co, a built-in microphone M, a speaker 56, an ADC 4, a digital / analog converter (hereinafter referred to as DAC) 57, and a power supply device 58. It includes a device 2, a display device 3, a clock device 59, a storage device 7, a communication device 8, and a processor (or controller) 6. The ADC 4, DAC 57, clock device 59, communication device 8, and processor 6 may be combined as appropriate. Various components of the recorder 1A can send and receive data, information, signals, requests, commands, instructions, notifications, calls, responses, and the like to and from each other via, for example, the bus 60.

出力用コネクタＣｏは、外付けのスピーカ、ヘッドフォン、または、イヤホンなどの音出力装置と接続可能である。出力用コネクタＣｏは、例えばＤＡＣ５７から受信したアナログ信号を、この出力用コネクタＣｏに接続された音出力装置へ出力する。 The output connector Co can be connected to a sound output device such as an external speaker, headphones, or earphones. The output connector Co outputs, for example, an analog signal received from the DAC 57 to a sound output device connected to the output connector Co.

また、出力用コネクタＣｏは、他の情報処理装置などと接続可能である。出力用コネクタＣｏは、データを、この出力用コネクタＣｏに接続された情報処理装置へ出力する。 Further, the output connector Co can be connected to another information processing device or the like. The output connector Co outputs data to the information processing device connected to the output connector Co.

スピーカ５６は、レコーダ１に内蔵されており、ＤＡＣ５７から受信したアナログ信号に基づいて音を出力する。 The speaker 56 is built in the recorder 1 and outputs sound based on the analog signal received from the DAC 57.

電源装置５８は、電池を搭載可能であるか、または、充電式の電池を備えており、レコーダ１Ａの各構成要素に対して電力を供給する。 The power supply 58 can be battery-mounted or has a rechargeable battery to supply power to each component of the recorder 1A.

操作装置２は、ユーザによって操作される。操作装置２は、例えば、ユーザからの指示を受け付け、指示をプロセッサ６へ通知する。操作装置２は、第１の操作部２ａと第２の操作部２ｂとを備える。第１の操作部２ａと第２の操作部２ｂとのうちの少なくとも一方は、例えばボタンなどでもよい。 The operating device 2 is operated by the user. The operating device 2 receives, for example, an instruction from the user and notifies the instruction to the processor 6. The operation device 2 includes a first operation unit 2a and a second operation unit 2b. At least one of the first operation unit 2a and the second operation unit 2b may be, for example, a button or the like.

第１の操作部２ａは、ユーザからモードの指定を受け付け、ユーザのモードの指定状態をプロセッサ６へ送信する。第２の実施形態において、レコーダ１Ａは少なくとも第１および第２のモードで動作可能である。 The first operation unit 2a receives the mode designation from the user and transmits the user's mode designation state to the processor 6. In the second embodiment, the recorder 1A can operate in at least the first and second modes.

第２の操作部２ｂは、音データ生成と文字起こし（テキストデータ生成。書き起こしと表記されてもよい）と翻訳との開始をユーザから１回の指定（クリックまたは押下）で受け付け、ユーザから音データ生成と文字起こしと翻訳とが指示されたことを示す信号をプロセッサ６へ送信する。 The second operation unit 2b accepts the start of sound data generation, transcription (text data generation, which may be described as transcription) and translation from the user with one designation (click or press), and the user. A signal indicating that sound data generation, transcription, and translation have been instructed is transmitted to the processor 6.

なお、第２の操作部２ｂは、音データ生成と文字起こしとをユーザから１回の指定で受け付け、翻訳を他の指定で受け付けてもよい。 The second operation unit 2b may receive sound data generation and transcription from the user with one designation, and may accept translation with other designations.

ＡＤＣ４は、解析情報９をプロセッサ６へ送信する。なお、ＡＤＣ４は、解析情報９をＤＡＣ５７経由でプロセッサ６の入力ポート６ｐへ送信してもよい。 The ADC 4 transmits the analysis information 9 to the processor 6. The ADC 4 may transmit the analysis information 9 to the input port 6p of the processor 6 via the DAC 57.

ＤＡＣ５７は、ＡＤＣ４から受信したデジタル信号に対して、デジタル／アナログ変換を行い、アナログ信号を、プロセッサ６におけるアナログ信号用の入力ポート６ｐへ送信する。 The DAC 57 performs digital / analog conversion on the digital signal received from the ADC 4 and transmits the analog signal to the input port 6p for the analog signal in the processor 6.

また、ＤＡＣ５７は、プロセッサ６から受信した音出力用のデジタル信号をアナログ信号へ変換し、アナログ信号をスピーカ５６または出力用コネクタＣｏへ出力する。 Further, the DAC 57 converts the digital signal for sound output received from the processor 6 into an analog signal, and outputs the analog signal to the speaker 56 or the output connector Co.

時計装置５９は、例えばプロセッサ６へ時間情報を送信する。 The clock device 59 transmits time information to, for example, the processor 6.

プロセッサ６は、入力ポート６ｐから入力したアナログ信号に対するアナログ／デジタル変換機能６ａを備える。 The processor 6 includes an analog / digital conversion function 6a for an analog signal input from the input port 6p.

アナログ／デジタル変換機能１１ａは、ＤＡＣ５７からプロセッサ６のアナログ信号用の入力ポート６ｐ経由でアナログ信号を受信すると、アナログ信号をデジタル信号に変換する。 When the analog / digital conversion function 11a receives an analog signal from the DAC 57 via the input port 6p for the analog signal of the processor 6, the analog signal is converted into a digital signal.

図６は、第２の実施形態に係るレコーダ１Ａの外観を示す正面図である。 FIG. 6 is a front view showing the appearance of the recorder 1A according to the second embodiment.

このレコーダ１Ａの正面には、表示装置３と、第１の操作部２ａと、第２の操作部２ｂと、第３の操作部２ｃと、スピーカ５６と、マイクロフォンＭとが配置されている。 A display device 3, a first operation unit 2a, a second operation unit 2b, a third operation unit 2c, a speaker 56, and a microphone M are arranged in front of the recorder 1A.

表示装置３には、メタデータ１３の一部と文字データ１６の一部とが表示されている。 A part of the metadata 13 and a part of the character data 16 are displayed on the display device 3.

図６には図示されていないが、例えば、レコーダ１Ａの上面または側面には、外付けのマイクロフォンＭ１～Ｍｎ用の複数のコネクタＣ１～Ｃｎが配置されている。 Although not shown in FIG. 6, for example, a plurality of connectors C1 to Cn for external microphones M1 to Mn are arranged on the upper surface or the side surface of the recorder 1A.

第１の操作部２ａは、モードの指定を受け付ける。第２の操作部２ｂは、文字起こしの開始と終了の指示を受け付ける。第３の操作部２ｃは、電源のオン／オフを受け付ける。 The first operation unit 2a accepts the designation of the mode. The second operation unit 2b receives instructions for starting and ending transcription. The third operation unit 2c receives power on / off.

以上説明した第２の実施形態に係るレコーダ１Ａを用いることにより、上記の第１の実施形態で説明したレコーダ１を用いる場合と同様の効果を得ることができる。 By using the recorder 1A according to the second embodiment described above, the same effect as the case of using the recorder 1 described in the first embodiment described above can be obtained.

第２の実施形態に係るレコーダ１Ａを使用するユーザは、レコーダ１Ａの第２の操作部２ｂを用いて、音の録音と文字起こし、あるいは、音の録音と文字起こしと翻訳とを１回の指定により容易に行うことができ、ユーザの利便性を向上させることができる。 The user who uses the recorder 1A according to the second embodiment uses the second operation unit 2b of the recorder 1A to record and transcribe the sound, or to record, transcribe, and translate the sound once. It can be easily performed by designation, and the convenience of the user can be improved.

（第３の実施形態）
第３の実施形態では、第１の実施形態で説明した第１のサーバ１９の変形例を説明する。第３の実施形態では、第１のサーバが、レコーダ１またはユーザ端末２９から、メタデータ１３と、音データ１４と、文字起こしリクエストとを受信した場合を例として説明する。なお、第１のサーバが、レコーダ１またはユーザ端末２９から、翻訳リクエストまたは話者認識リクエストを受信した場合も、同様である。また、先で説明したように、リクエストの送受信は省略されてもよい。 (Third embodiment)
In the third embodiment, a modification of the first server 19 described in the first embodiment will be described. In the third embodiment, the case where the first server receives the metadata 13, the sound data 14, and the transcription request from the recorder 1 or the user terminal 29 will be described as an example. The same applies when the first server receives a translation request or a speaker recognition request from the recorder 1 or the user terminal 29. Further, as described above, sending and receiving of requests may be omitted.

図７は、第３の実施形態に係る第１のサーバ１９Ａの構成の一例を示すブロック図である。 FIG. 7 is a block diagram showing an example of the configuration of the first server 19A according to the third embodiment.

第１のサーバ１９Ａは、ユーザの所有するレコーダ１またはユーザ端末２９とゲートウェイ６１を介して通信可能である。ゲートウェイ６１は、インタフェースの異なる装置間での通信を可能とする。 The first server 19A can communicate with the recorder 1 owned by the user or the user terminal 29 via the gateway 61. The gateway 61 enables communication between devices having different interfaces.

第１のサーバ１９Ａは、ＡＰＩ＆スタティックウェブページ６２、データベース６３、音データ１４用の記憶装置６４、文字起こしタスクキュー６５、文字起こし処理６６、文字データ１６およびピックアップ・セグメント３５用の記憶装置６７を備える。 The first server 19A includes an API & static web page 62, a database 63, a storage device 64 for sound data 14, a transcription task queue 65, a transcription process 66, a character data 16, and a storage device 67 for a pickup segment 35. Be prepared.

データベース６３、記憶装置６４、記憶装置６７は、上記第１の実施形態で説明した第１のサーバの記憶装置３１に相当する。 The database 63, the storage device 64, and the storage device 67 correspond to the storage device 31 of the first server described in the first embodiment.

ＡＰＩ＆スタティックウェブページ６２は、まず、スタティックウェブページを、ゲートウェイ６１経由で、レコーダ１またはユーザ端末２９へ提供する。レコーダ１またはユーザ端末２９は、スタティックウェブページに基づいて動作する。これにより、レコーダ１またはユーザ端末２９と第１のサーバ１９ＡとがＡＰＩを用いて連携して動作可能となる。 The API & static web page 62 first provides the static web page to the recorder 1 or the user terminal 29 via the gateway 61. The recorder 1 or the user terminal 29 operates based on a static web page. As a result, the recorder 1 or the user terminal 29 and the first server 19A can operate in cooperation with each other using the API.

ＡＰＩ＆スタティックウェブページ６２は、例えば、第１の実施形態で説明した通信装置３０、受信部３７、表示制御部４２、送信部４１、ピックアップ部４３、見積生成部４４、依頼部４５、ブログエディタ４６などに相当する。 The API & static web page 62 is, for example, the communication device 30, the reception unit 37, the display control unit 42, the transmission unit 41, the pickup unit 43, the estimate generation unit 44, the request unit 45, and the blog editor 46 described in the first embodiment. And so on.

ＡＰＩ＆スタティックウェブページ６２は、レコーダ１またはユーザ端末２９へ、ＡＰＩサービスを提供するとともに、ウェブサイトとしての機能を提供する。ＡＰＩ＆スタティックウェブページ６２は、レコーダ１またはユーザ端末２９からゲートウェイ６１経由でリクエストまたはデータを受信した場合に、リクエストまたはデータに応じた処理を実行し、データベース６３、記憶装置６４、記憶装置６７に記憶されておりリクエストまたはデータに対応するデータを、ゲートウェイ６１経由でレコーダ１またはユーザ端末２９へ送信する。 The API & static web page 62 provides an API service to the recorder 1 or the user terminal 29, and also provides a function as a website. When the API & static web page 62 receives a request or data from the recorder 1 or the user terminal 29 via the gateway 61, the API & static web page 62 executes a process according to the request or data and stores it in the database 63, the storage device 64, and the storage device 67. The data corresponding to the request or data is transmitted to the recorder 1 or the user terminal 29 via the gateway 61.

具体的には、ＡＰＩ＆スタティックウェブページ６２は、例えば、レコーダ１またはユーザ端末２９からゲートウェイ６１経由で、メタデータ１３、音データ１４、文字起こしリクエストを受信する。そして、ＡＰＩ＆スタティックウェブページ６２は、メタデータ１３を、ユーザ情報６８と関連付けた状態でデータベース６３へ記憶させ、音データ１４を記憶装置６４へ記憶させる。 Specifically, the API & static web page 62 receives, for example, the metadata 13, the sound data 14, and the transcription request from the recorder 1 or the user terminal 29 via the gateway 61. Then, the API & static web page 62 stores the metadata 13 in the database 63 in association with the user information 68, and stores the sound data 14 in the storage device 64.

また、ＡＰＩ＆スタティックウェブページ６２は、文字起こしリクエストまたは音データ１４を受信すると、文字起こしタスクキュー６５に、文字起こしタスクを記憶させる。 Further, when the API & static web page 62 receives the transcription request or the sound data 14, the transcription task queue 65 stores the transcription task.

さらに、ＡＰＩ＆スタティックウェブページ６２は、必要に応じて、データベース６３に記憶されているメタデータ１３、記憶装置６４に記憶されている音データ１４、または、記憶装置６７に記憶されている文字データ１６またはピックアップ・セグメント３５を読み出し、読み出したメタデータ１３、音データ１４、文字データ１６、ピックアップ・セグメント３５を、ゲートウェイ６１経由でユーザ端末２９へ送信する。 Further, the API & static web page 62 may have metadata 13 stored in the database 63, sound data 14 stored in the storage device 64, or character data 16 stored in the storage device 67, as needed. Alternatively, the pickup segment 35 is read, and the read metadata 13, sound data 14, character data 16, and pickup segment 35 are transmitted to the user terminal 29 via the gateway 61.

文字起こしタスクキュー６５は、先入先出方式で、文字起こしタスクの実行順序を管理し、実行すべき文字起こしタスクを文字起こし処理６６へ提供する。 The transcription task queue 65 manages the execution order of the transcription tasks in a first-in, first-out manner, and provides the transcription task to be executed to the transcription process 66.

文字起こし処理６６は、上記第１の実施形態で説明した文字起こし部３８に相当する。文字起こし処理６６は、文字起こしタスクキュー６５から取得した文字起こしタスクにしたがって、記憶装置６４に記憶されている音データ１４を読み出し、音データ１４に対応する文字データ１６を生成し、文字データ１６を記憶装置６７に記憶させる。さらに、文字起こし処理６６は、データベース６３で管理されているメタデータ１３を更新し、メタデータ１３に文字データ１６の位置情報を追加する。 The transcription process 66 corresponds to the transcription unit 38 described in the first embodiment. The transcription process 66 reads the sound data 14 stored in the storage device 64 according to the transcription task acquired from the transcription task queue 65, generates the character data 16 corresponding to the sound data 14, and character data 16. Is stored in the storage device 67. Further, the transcription process 66 updates the metadata 13 managed in the database 63, and adds the position information of the character data 16 to the metadata 13.

文字起こし処理６６は、例えばＡＰＩ２０ａを用いて第２のサーバ２０の文字起こし処理４７により文字データ１６を取得してもよい。 The transcription process 66 may acquire the character data 16 by the transcription process 47 of the second server 20 using, for example, the API 20a.

以上説明した第３の実施形態に係る第１のサーバ１９Ａを用いることにより、上記の第１の実施形態で説明した第１のサーバ１９を用いる場合と同様の効果を得ることができる。 By using the first server 19A according to the third embodiment described above, the same effect as the case of using the first server 19 described in the first embodiment described above can be obtained.

第３の実施形態においては、メタデータ１３を記憶するデータベース６３と、音データ１４を記憶する記憶装置６４と、文字データ１６およびピックアップ・セグメント３５を記憶する記憶装置６７とを区別している。メタデータ１３、音データ１４、文字データ１６およびピックアップ・セグメント３５は、データの形式および種類が異なる。このように、形式および種類が異なるデータを異なる記憶装置に記憶することで、データの形式および種類に適した環境で、データを管理することができ、例えば検索のスピードを速くすることができ、記憶容量を抑制することができる。 In the third embodiment, the database 63 for storing the metadata 13, the storage device 64 for storing the sound data 14, and the storage device 67 for storing the character data 16 and the pickup segment 35 are distinguished. The metadata 13, the sound data 14, the character data 16, and the pickup segment 35 have different data formats and types. By storing data of different formats and types in different storage devices in this way, it is possible to manage the data in an environment suitable for the format and type of data, for example, to speed up the search. The storage capacity can be suppressed.

なお、本願発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具現化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、各実施形態に示される全構成要素から幾つかの構成要素を削減してもよい。更に、異なる実施形態に亘る構成要素を適宜組合せてもよい。 The invention of the present application is not limited to each of the above embodiments as it is, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof. In addition, various inventions can be formed by an appropriate combination of the plurality of components disclosed in each of the above embodiments. For example, some components may be reduced from all the components shown in each embodiment. Further, components over different embodiments may be combined as appropriate.

１，１Ａ…レコーダ、Ｍ１～Ｍｎ，Ｍ…マイクロフォン、Ｃ１～Ｃｎ…コネクタ、２…操作装置、３…表示装置、４…ＡＤＣ、５…コントローラ、６…プロセッサ、７，６４，６７…記憶装置、８…通信装置、９…解析情報、１０…制御コマンド、１３…メタデータ、１４…音データ、１５…解析データ、１６…文字データ、１７…翻訳データ、１８…話者認識データ、１９，１９Ａ…第１のサーバ、２０…第２のサーバ、２２…データ生成部、２３…判断部、２４…送信制御部、２５…受信制御部、２６…表示データ生成部、２７…表示制御部、６８…ユーザ情報、３８…文字起こし部、３９…翻訳部、４０…話者認識部、４４…見積生成部、４３…ピックアップ部、５４…メニュー、６２…ＡＰＩ＆スタティックウェブページ、６３…データベース、６５…文字起こしタスクキュー、６６…文字起こし処理。 1,1A ... Recorder, M1-Mn, M ... Microphone, C1-Cn ... Connector, 2 ... Operating device, 3 ... Display device, 4 ... ADC, 5 ... Controller, 6 ... Processor, 7,64,67 ... Storage device , 8 ... communication device, 9 ... analysis information, 10 ... control command, 13 ... metadata, 14 ... sound data, 15 ... analysis data, 16 ... character data, 17 ... translation data, 18 ... speaker recognition data, 19, 19A ... 1st server, 20 ... 2nd server, 22 ... data generation unit, 23 ... judgment unit, 24 ... transmission control unit, 25 ... reception control unit, 26 ... display data generation unit, 27 ... display control unit, 68 ... User information, 38 ... Transcription section, 39 ... Translation section, 40 ... Speaker recognition section, 44 ... Estimate generation section, 43 ... Pickup section, 54 ... Menu, 62 ... API & static web page, 63 ... Database, 65 … Transcription task queue, 66… Transcription processing.

Claims

A storage device that stores multiple sound segments,
A processor that can communicate with a user terminal, reads data from the storage device, and stores the data in the storage device.
Equipped with
The processor
A plurality of character segments obtained by transcription processing for the plurality of sound segments stored in the storage device are stored in the storage device.
The user terminal is displayed with the plurality of character segments and file information indicating a file that can be a destination for incorporating the plurality of character segments.
A plurality of segment designations indicating each of the plurality of character segments selected by the user among the plurality of character segments displayed on the user terminal, and the user among the file information displayed on the user terminal. The specific file information selected for is received from the user terminal and
The data body of the plurality of character segments or the plurality of character segments indicated by the plurality of segment designations is collectively incorporated into the file stored in the storage device and indicated by the specific file information.
The file is displayed on the user terminal, and the file stored in the storage device is edited based on the instruction from the user terminal.
Incorporating the plurality of character segments or the data bodies of the plurality of character segments indicated by the plurality of segment designations into the file means that the plurality of character segments or the plurality of characters indicated by the plurality of segment designations are incorporated in the file. Is to add the data body of the character segment of
Information processing equipment.

The storage device stores the plurality of character segments in association with each other and a plurality of time information.
The processor causes the user terminal to display the plurality of character segments along the passage of time based on the plurality of time information.
The information processing device according to claim 1.

The processor
First, the plurality of segment designations are received from the user terminal, and then the specific file information is received from the user terminal.
The information processing device according to claim 1 or 2.

The processor
Among the plurality of character segments displayed on the user terminal, the plurality of segment designations indicating each of the plurality of character segments selected by the user are received from the user terminal.
The user terminal is displayed with the file information indicating the file that can be the embedding destination of the plurality of character segments indicated by the plurality of segment designations.
Among the file information displayed on the user terminal, the specific file information selected by the user is received from the user terminal.
The information processing device according to claim 3.

The processor
First, the specific file information is received from the user terminal, and then the plurality of segment designations are received from the user terminal.
The information processing device according to claim 1 or 2.

The information processing device according to any one of claims 1 to 5.
A recorder capable of communicating with the information processing device,
Equipped with
The recorder is
With multiple connectors that can be connected to multiple microphones,
An analog-to-digital converter that converts a plurality of analog signals received from each of the plurality of connectors into a digital signal.
A controller that generates sound data based on the digital signal and transmits the sound data to the information processing device.
Equipped with
The controller receives analysis information including the levels of the plurality of analog signals from the analog-to-digital converter, and issues a control command for adjusting the levels of the plurality of analog signals based on the analysis information. Send to the converter
The processor stores the sound data received from the controller in the storage device, and stores the sound data in the storage device.
The sound data includes the plurality of sound segments.
Information processing system.

For computers that can communicate with user terminals
A function to store multiple sound segments in a storage device,
A function of storing a plurality of character segments obtained by transcription processing for the plurality of sound segments stored in the storage device in the storage device, and a function of storing the plurality of character segments in the storage device.
A function of displaying the plurality of character segments and file information indicating a file that can be a destination of incorporating the plurality of character segments on the user terminal.
A plurality of segment designations indicating each of the plurality of character segments selected by the user among the plurality of character segments displayed on the user terminal, and the user among the file information displayed on the user terminal. The function to receive the specific file information selected for from the user terminal, and
A function of collectively incorporating the plurality of character segments indicated by the plurality of segment designations or the data body of the plurality of character segments into a file stored in the storage device and indicated by the specific file information.
A function of displaying the file on the user terminal and editing the file stored in the storage device based on an instruction from the user terminal.
Realized,
Incorporating the plurality of character segments or the data bodies of the plurality of character segments indicated by the plurality of segment designations into the file means that the plurality of character segments or the plurality of characters indicated by the plurality of segment designations are incorporated in the file. A program that is to add the data body of a character segment of.