JP5206553B2

JP5206553B2 - Browsing system, method, and program

Info

Publication number: JP5206553B2
Application number: JP2009086474A
Authority: JP
Inventors: 秀彰深澤; 孝文越仲
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2013-06-12
Anticipated expiration: 2029-03-31
Also published as: JP2010238050A

Description

本発明は、閲覧システム、方法、およびプログラムに関する。 The present invention relates to a browsing system, method, and program.

近年、たとえば会議等、複数の発言者が存在する場において、議事録作成を容易にする等の目的のために、音声認識技術を用いることが検討されている。音声認識技術を用いて議事録等を作成した後、音声認識結果のテキストを表示させながらユーザが音声を聞き、議事内容について再生確認しながら議論を行うことがある。 In recent years, use of speech recognition technology has been studied for the purpose of facilitating the creation of minutes, for example, when there are a plurality of speakers such as a conference. After creating the minutes etc. using the voice recognition technology, the user may listen to the voice while displaying the text of the voice recognition result, and discuss the contents of the agenda while confirming the reproduction.

複数人が会議を行った議事録を自動的に作成して記憶する会議システムの一例が特許文献１に記載されている。特許文献１の会議システムは、発言者の音声を入力するマイクロフォンが設けられた複数の端末装置と、複数の端末装置にネットワークを介して接続され、発言者に関連する情報を記憶するサーバとを備え、サーバが、少なくとも１つの端末装置から送信される端末特定情報に基づいて発言者および発言時刻を特定する発言者特定手段と、特定された発言者の音声をテキスト化し且つ音声テキストを発言時刻と共に記憶する音声テキスト記憶手段と、ネットワーク会議の資料として使用される画像ファイルおよび画像ファイルの表示時刻を記憶する画像記憶手段と、音声テキスト、発言時刻、画像ファイルおよび表示時刻に関連付けされた議事録ファイルを作成する議事録作成手段と、から構成されている。さらに、発言時刻に応じて音声テキストを表示すると共に、表示時刻に応じて画像ファイルを表示する議事録表示手段を備えることが記載されている。 Patent Document 1 describes an example of a conference system that automatically creates and stores minutes of meetings held by a plurality of people. The conference system of Patent Literature 1 includes a plurality of terminal devices provided with microphones for inputting a voice of a speaker, and a server connected to the plurality of terminal devices via a network and storing information related to the speaker. A speaker specifying means for specifying a speaker and a speaking time based on terminal specifying information transmitted from at least one terminal device, and converting the voice of the specified speaking person into text and speaking the voice text Audio text storage means for storing together, image storage means for storing image files used as network conference materials and image file display times, and minutes associated with audio text, speech time, image files and display times And minutes creation means for creating a file. Further, it is described that the apparatus includes a minutes display means for displaying an audio text according to the utterance time and displaying an image file according to the display time.

これにより、実際の会議進行と同様の会議進行を可能にすると共に第三者が会議内容をリアルに追体験可能な議事録の作成を実現することができるとされている。 Accordingly, it is possible to realize a conference proceeding similar to the actual conference proceeding and to create a minutes that allows a third party to realistically experience the content of the conference.

また、裁判や会議のような特殊な状況で収録された音声を認識し、文書を作成する音声認識システムの一例が特許文献２に記載されている。特許文献２の音声認識システムは、音声データを取得するためのマイクロフォンと、マイクロフォンにて取得された音声データに対して、発話区間を特定し、各発話区間の音声認識を行い、得られた発話区間ごとの認識データの文字列と音声データとを発話時刻の情報により対応付ける音声認識処理部と、音声認識処理部の処理により得られた発話区間ごとの認識データからテキストを作成するテキスト作成部と、テキスト作成部にて作成されたテキストを表示出力する出力制御部と、から構成されている。 Patent Document 2 describes an example of a speech recognition system that recognizes speech recorded in a special situation such as a trial or a meeting and creates a document. The speech recognition system of Patent Document 2 specifies a speech section for the speech data acquired by the microphone for acquiring speech data and the speech data acquired by the microphone, and performs speech recognition for each speech section. A speech recognition processing unit for associating character strings and speech data of recognition data for each section with speech time information; a text creation unit for creating text from recognition data for each speech section obtained by processing of the speech recognition processing unit; And an output control unit for displaying and outputting the text created by the text creation unit.

これにより、音声認識によって書き起こされたテキストに対して修正が行われた場合でも、修正後のテキストと元の音声との間に適当な対応付けを行うことができ、特に、裁判や会議といった特殊な状況で取得された音声のテキストへの書き起こしに適しているとされている。 As a result, even when the text transcribed by voice recognition is corrected, it is possible to make an appropriate association between the corrected text and the original voice, especially in trials and meetings. It is said that it is suitable for the transcription of the sound acquired in special situations.

特許文献３には、会議の属性を記述する情報（会議名、参加者、日時及び場所等）とグループ化された会議の音声及び画像とを関連付ける情報処理装置が記載されている。この情報処理装置において、履歴表示行列ＭＳは、資料グループを表す表示である資料リンク表示ＬＳと、会議に関する情報である会議名ＮＭとを、会議の開催日付に基づいて時系列に履歴表示画面に表示することが記載されている。 Patent Document 3 describes an information processing apparatus that associates information describing conference attributes (conference name, participants, date and place, etc.) with grouped conference audio and images. In this information processing apparatus, the history display matrix MS displays a material link display LS, which is a display representing a material group, and a conference name NM, which is information related to a conference, on a history display screen in chronological order based on the date of the conference. It is described to display.

特許文献４には、開発プロジェクトの進行に伴って作成又は修正される文書と、その文書のレビュー会議の議事録とを関連付けて管理する議事録管理システムが記載されている。これにより、開発プロジェクトの進行に伴って作成又は修正される文書のバージョンとそのバージョンについてレビューした会議とを対応付け、時系列の順に配列して表示することができるとされている。また、数回のレビュー会議が行われ、その都度文書の作成又は修正が発生し、文書情報テーブルによりその議事録とリンクがとられていることを示す文書作成履歴一覧画面を表示できることが記載されている。 Patent Document 4 describes a minutes management system that associates and manages a document that is created or modified as a development project progresses and the minutes of a review meeting of the document. Thereby, it is supposed that the version of the document created or modified with the progress of the development project and the meeting reviewed about the version can be associated with each other and arranged and displayed in time series. In addition, it is described that a document creation history list screen can be displayed indicating that a document is created or modified each time a review meeting is held several times, and that the minutes are linked to the minutes by the document information table. ing.

特開２００７−１８０８２８号公報JP 2007-180828 A 特開２００５−１６５０６６号公報JP 2005-165066 A 特開２００８−１５８８１２号公報JP 2008-158812 A 特開２００１−５８７４号公報JP 2001-5874 A

議事録を見ながら、会議内容について再度議論する場合、議論ポイントとなる部分は限定されている場合が多く、この場合にその他の議事録部分については不要である。そのため、議事録のうち同じポイントについて何度も議論するような場合、議事録全体からその箇所を容易に特定または検索できないと、長時間に及ぶ会議の議事録全体からその箇所を見つけ出すのに時間がかかるとともに、何度も同じ箇所を探さなければならず、時間を浪費してしまうという問題点があった。 When discussing the contents of a meeting again while looking at the minutes, there are many cases where the discussion points are limited. In this case, the other minutes are unnecessary. Therefore, when discussing the same point in the minutes many times, if it is not easy to identify or search the location from the entire minutes, it will take time to find the location from the entire minutes of the long meeting. In addition, there is a problem that the same part must be searched many times and time is wasted.

上述した特許文献１および２に記載のシステムにおいては、音声をテキスト化することにより会議議事録を作成することは記載されているが、議事録を部分的にポイント再生しながら特定の箇所について繰り返し集中的に再度議論することや、さらに、議事録を参照した閲覧履歴を記録して表示することは記載されていない。 In the systems described in Patent Documents 1 and 2 described above, it is described that a meeting minutes is created by converting audio into text, but it is repeated for a specific part while partially reproducing the minutes. There is no description of intensive discussions or recording and displaying browsing history referring to the minutes.

また、上述した特許文献３および４に記載のシステムにおいては、会議資料と会議情報とを関連付けたものを、会議の開催日付に基づいて時系列にスケジュール上に並べる点や、特許文献４には、レビュー議事録を議論した順に日付とともに文書作成履歴一覧に表示する点が記載されているものの、議事録の一連の内容ごとに閲覧履歴を記録することができないので、議事録の中から議論のポイントを探すのに手間がかかる。 In addition, in the systems described in Patent Documents 3 and 4 described above, a method in which conference materials and conference information are associated with each other is arranged on a schedule in chronological order based on the date of the conference. Although the points to be displayed in the document creation history list along with the date in the order in which the review minutes were discussed are listed, the browsing history cannot be recorded for each series of minutes. It takes time to find the points.

本発明の目的は、上述した課題である議事録などを閲覧しながら、その内容について議論するために繰り返し見直す場合に、毎回特定の箇所を探す労力と時間の浪費を解決する閲覧システム、方法、およびプログラムを提供することにある。 The object of the present invention is to provide a browsing system, method, and solution for waste of time and labor to search for a specific location every time when reviewing the contents repeatedly while discussing the minutes and the like, which are the above-mentioned problems. And to provide a program.

本発明の閲覧システムは、一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶装置および音声認識結果記憶装置を参照し、前記一連の発言の前記音声の前記音声認識結果を時系列で発言一覧画面に表示する発言一覧表示手段と、
前記発言一覧画面に表示された前記一連の発言の中から発言の選択を受け付ける選択受付手段と、
前記選択受付手段で選択された発言に対する前記音声の前記テキストデータを、その選択指示順を識別する選択識別情報に関連付けて選択履歴として記録する選択履歴記録手段と、
記録された前記選択履歴のテキストデータを前記選択識別情報に基づいて並べて選択履歴一覧画面に表示する選択履歴表示手段と、を備える。 The browsing system of the present invention performs speech recognition processing on a series of speech sounds, converts speech recognition results into text data, and the obtained speech speech data and the text data are associated with time information for each speech. A speech list display means for displaying the speech recognition results of the speech of the series of speech on a speech list screen in chronological order with reference to the speech storage device and the speech recognition result storage device stored respectively.
Selection accepting means for accepting a choice of a comment from the series of comments displayed on the message list screen;
A selection history recording unit that records the text data of the voice for the utterance selected by the selection receiving unit as a selection history in association with selection identification information that identifies a selection instruction order;
Selection history display means for displaying the recorded text data of the selection history on the selection history list screen side by side based on the selection identification information.

本発明の閲覧方法は、一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶装置および音声認識結果記憶装置を参照し、前記一連の発言の前記音声の前記音声認識結果を時系列で発言一覧画面に表示し、
前記発言一覧画面に表示された前記一連の発言の中から発言の選択を受け付け、
選択された発言に対する前記音声の前記テキストデータを、その選択指示順を識別する選択識別情報に関連付けて選択履歴として選択履歴記録装置に記録し、
記録された前記選択履歴のテキストデータを前記選択識別情報に基づいて並べて選択履歴一覧画面に表示する。 The browsing method of the present invention performs speech recognition processing on the speech of a series of speech, converts the speech recognition result into text data, and the obtained speech speech data and the text data are associated with time information for each speech. The speech recognition results of the voices of the series of utterances are displayed in a chronological order on the speech list screen, with reference to the speech storage device and the speech recognition result storage device respectively stored.
Accepting a selection of comments from the series of messages displayed on the message list screen,
The text data of the voice for the selected utterance is recorded in a selection history recording device as a selection history in association with selection identification information that identifies the selection instruction order,
The recorded text data of the selection history is arranged on the basis of the selection identification information and displayed on the selection history list screen.

本発明のコンピュータプログラムは、一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶装置および音声認識結果記憶装置を参照し、前記一連の発言の前記音声の前記音声認識結果を時系列で発言一覧画面に表示する手順と、
前記発言一覧画面に表示された前記一連の発言の中から発言の選択を受け付ける手順と、
選択された発言に対する前記音声の前記テキストデータを、その選択指示順を識別する選択識別情報に関連付けて選択履歴として記録する手順と、
記録された前記選択履歴のテキストデータを前記選択識別情報に基づいて並べて選択履歴一覧画面に表示する手順と、をコンピュータに実行させるためのコンピュータプログラム。 The computer program according to the present invention performs speech recognition processing on a series of speech sounds, converts speech recognition results into text data, and the obtained speech speech data and the text data are associated with time information for each speech. A procedure for displaying the speech recognition results of the speech of the series of utterances in a chronological order on the speech list screen with reference to the stored speech storage device and the speech recognition result storage device, respectively.
A procedure for accepting selection of a message from the series of messages displayed on the message list screen;
A procedure for recording the text data of the voice for a selected utterance as a selection history in association with selection identification information for identifying a selection instruction order;
A computer program for causing a computer to execute the procedure of displaying the recorded text data of the selection history on the selection history list screen side by side based on the selection identification information.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

また、本発明の各種の構成要素は、必ずしも個々に独立した存在である必要はなく、複数の構成要素が一個の部材として形成されていること、一つの構成要素が複数の部材で形成されていること、ある構成要素が他の構成要素の一部であること、ある構成要素の一部と他の構成要素の一部とが重複していること、等でもよい。 The various components of the present invention do not necessarily have to be independent of each other. A plurality of components are formed as a single member, and a single component is formed of a plurality of members. It may be that a certain component is a part of another component, a part of a certain component overlaps with a part of another component, or the like.

また、本発明の方法およびコンピュータプログラムには複数の手順を順番に記載してあるが、その記載の順番は複数の手順を実行する順番を限定するものではない。このため、本発明の方法およびコンピュータプログラムを実施するときには、その複数の手順の順番は内容的に支障しない範囲で変更することができる。 Moreover, although the several procedure is described in order in the method and computer program of this invention, the order of the description does not limit the order which performs a several procedure. For this reason, when implementing the method and computer program of this invention, the order of the several procedure can be changed in the range which does not interfere in content.

さらに、本発明の方法およびコンピュータプログラムの複数の手順は個々に相違するタイミングで実行されることに限定されない。このため、ある手順の実行中に他の手順が発生すること、ある手順の実行タイミングと他の手順の実行タイミングとの一部ないし全部が重複していること、等でもよい。 Furthermore, the plurality of procedures of the method and the computer program of the present invention are not limited to being executed at different timings. For this reason, another procedure may occur during the execution of a certain procedure, or some or all of the execution timing of a certain procedure and the execution timing of another procedure may overlap.

本発明によれば、再生履歴から所望の箇所を容易に迅速に検索または特定できる閲覧システム、方法、およびプログラムが提供される。 ADVANTAGE OF THE INVENTION According to this invention, the browsing system, method, and program which can search or specify a desired location easily from a reproduction | regeneration log | history easily are provided.

本発明の実施の形態に係る音声再生システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの記憶部の構造の一例を示す図である。It is a figure which shows an example of the structure of the memory | storage part of the audio | voice reproduction system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの画面の一例を示す図である。It is a figure which shows an example of the screen of the audio | voice reproduction system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの画面の一例を示す図である。It is a figure which shows an example of the screen of the audio | voice reproduction system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの画面の一例を示す図である。It is a figure which shows an example of the screen of the audio | voice reproduction system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの再生履歴一覧画面の例を示す図である。It is a figure which shows the example of the reproduction history list screen of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの検索画面の一例を示す図である。It is a figure which shows an example of the search screen of the audio | voice reproduction system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの記憶部の構造の一例を示す図である。It is a figure which shows an example of the structure of the memory | storage part of the audio | voice reproduction system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの再生リストの一例を示す図である。It is a figure which shows an example of the reproduction | regeneration list | wrist of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention. 本発明の実施の形態に係る音声再生システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the audio | voice reproduction | regeneration system which concerns on embodiment of this invention.

以下、本発明の実施の形態について、図面を用いて説明する。尚、すべての図面において、同様な構成要素には同様の符号を付し、適宜説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same reference numerals are given to the same components, and the description will be omitted as appropriate.

（第１の実施の形態）
図１は、本発明の実施の形態に係る音声再生システム１００の構成を示す機能ブロック図である。
本実施形態の閲覧システム（音声再生システム１００）は、一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶部１０４および音声認識結果記憶部１１４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面に表示する発言一覧表示部１２０と、発言一覧画面に表示された一連の発言の中から発言の選択を受け付ける選択受付部（第１再生指示受付部１２２）と、第１再生指示受付部１２２で選択された発言に対する音声のテキストデータを、その選択指示順（再生指示順）を識別する選択識別情報（再生識別情報）に関連付けて選択履歴（再生履歴）として記録する選択履歴記録部（再生履歴記録部１３０）と、記録された選択履歴（再生履歴）のテキストデータを選択識別情報（再生識別情報）に基づいて並べて選択履歴一覧画面（再生履歴一覧画面）に表示する選択履歴表示部（再生履歴表示部１４０）と、を備える。 (First embodiment)
FIG. 1 is a functional block diagram showing a configuration of an audio reproduction system 100 according to the embodiment of the present invention.
The browsing system (sound reproduction system 100) of the present embodiment performs speech recognition processing on a series of speech sounds, converts speech recognition results into text data, and obtains the speech data and the text data obtained for each speech. A speech list display unit that displays speech recognition results of a series of speeches on a speech list screen in time series with reference to the speech storage unit 104 and the speech recognition result storage unit 114 stored in association with each other's time information 120, a selection reception unit (first reproduction instruction reception unit 122) that receives a selection of a message from a series of messages displayed on the message list screen, and a voice for the message selected by the first reproduction instruction reception unit 122 Text data is recorded as selection history (reproduction history) in association with selection identification information (reproduction identification information) that identifies the selection instruction order (reproduction instruction order) The selection history recording unit (playback history recording unit 130) and the text data of the recorded selection history (playback history) are arranged on the basis of the selection identification information (playback identification information) and displayed on the selection history list screen (playback history list screen). A selection history display section (playback history display section 140) to be displayed.

本実施形態の閲覧システム（音声再生システム１００）において、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択指示受付部（第１再生指示受付部１２２）により選択された発言から、時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する第１再生部１２４をさらに備える。
なお、本実施形態において、本発明の実施の形態に係る閲覧システムとして、音声再生システム１００を例に説明するが、発言一覧画面に表示された一連の発言の中から選択された発言は、必ずしも再生される必要はなく、少なくとも選択履歴一覧画面として閲覧できればよい。 In the browsing system (sound reproduction system 100) of this embodiment, referring to the voice storage unit 104 and the voice recognition result storage unit 114, the time selected from the remarks selected by the selection instruction receiving unit (first playback instruction receiving unit 122) A first reproduction unit 124 is further provided that associates audio data and text data based on the information and reproduces them in synchronization.
In the present embodiment, the audio reproduction system 100 will be described as an example of the browsing system according to the embodiment of the present invention. However, a comment selected from a series of comments displayed on the comment list screen is not necessarily limited. It does not need to be played back, and may be browsed as at least a selection history list screen.

さらに、本実施形態の音声再生システム１００は、再生履歴一覧画面に表示された再生履歴の中から、再生する発言の選択を受け付ける第２再生指示受付部１４２と、音声認識結果記憶部１１４および再生履歴記録部１３０を参照し、選択された発言から、再生識別情報および時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する第２再生部１４４と、を備える。 Furthermore, the audio reproduction system 100 according to the present embodiment includes a second reproduction instruction accepting unit 142 that accepts selection of a message to be reproduced from the reproduction histories displayed on the reproduction history list screen, a voice recognition result storage unit 114, and a reproduction. A second playback unit 144 that refers to the history recording unit 130 and associates the audio data with the text data based on the playback identification information and the time information and plays back in synchronization with the selected message.

なお、以下の各図において、本発明の本質に関わらない部分の構成については省略してあり、たとえば、図示されていない。 In each of the following drawings, the configuration of parts not related to the essence of the present invention is omitted, and is not shown, for example.

また、音声再生システムの各構成要素は、任意のコンピュータのＣＰＵ（Central Processing Unit）、メモリ、メモリにロードされた本図の構成要素を実現するプログラム、そのプログラムを格納するハードディスクなどの記憶ユニット、ネットワーク接続用インタフェースを中心にハードウェアとソフトウェアの任意の組合せによって実現される。そして、その実現方法、装置にはいろいろな変形例があることは、当業者には理解されるところである。以下説明する各図は、ハードウェア単位の構成ではなく、機能単位のブロックを示している。 Each component of the audio reproduction system includes a CPU (Central Processing Unit) of an arbitrary computer, a memory, a program that realizes the components of this figure loaded in the memory, a storage unit such as a hard disk that stores the program, It is realized by any combination of hardware and software, centering on the network connection interface. It will be understood by those skilled in the art that there are various modifications to the implementation method and apparatus. Each figure described below shows functional unit blocks, not hardware unit configurations.

また、音声再生システム１００は、キーボードやマウス等の入力装置やディスプレイやプリンタ等の出力装置が接続可能なコンピュータであり、たとえば、記録媒体に格納されたコンピュータプログラムにしたがって処理を実行するコンピュータによって各構成要素の各機能が実現される。 The audio reproduction system 100 is a computer to which an input device such as a keyboard and a mouse and an output device such as a display and a printer can be connected. For example, each of the audio reproduction systems 100 can be executed by a computer that executes processing according to a computer program stored in a recording medium. Each function of the component is realized.

本実施形態のコンピュータプログラムは、一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた音声の音声データおよびテキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶部１０４および音声認識結果記憶部１１４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面に表示する手順と、発言一覧画面に表示された一連の発言の中から発言の選択を受け付ける手順と、選択された発言に対する音声のテキストデータを、その再生指示順を識別する再生識別情報に関連付けて再生履歴として記録する手順と、記録された再生履歴のテキストデータを再生識別情報に基づいて並べて再生履歴一覧画面に表示する手順と、をコンピュータに実行させるように記述されている。 The computer program according to the present embodiment performs speech recognition processing on a series of speech sounds, converts speech recognition results into text data, and the obtained speech speech data and text data are associated with time information for each speech. With reference to the stored voice storage unit 104 and voice recognition result storage unit 114 respectively, a procedure for displaying the speech recognition results of a series of speeches in a time series on the speech list screen, and a series of sequences displayed on the speech list screen A procedure for accepting the selection of a speech from among the utterances, a procedure for recording audio text data corresponding to the selected utterance as a playback history in association with playback identification information for identifying the playback instruction order, and a recorded playback history To cause the computer to execute a procedure for displaying text data on the playback history list screen side by side based on the playback identification information. It is predicates.

さらに、本実施形態のコンピュータプログラムは、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択指示受付手順で選択された発言から、時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する再生手順をコンピュータに実行させるように記述されている。 Further, the computer program of the present embodiment refers to the voice storage unit 104 and the voice recognition result storage unit 114, associates voice data and text data based on time information from the remarks selected in the selection instruction reception procedure, It is described to cause a computer to execute a playback procedure for playback in synchronization.

具体的には、音声再生システム１００は、発言一覧表示部１２０と、第１再生指示受付部１２２と、第１再生部１２４と、再生履歴記録部１３０と、再生履歴表示部１４０と、第２再生指示受付部１４２と、第２再生部１４４と、を備えている。 Specifically, the audio reproduction system 100 includes a message list display unit 120, a first reproduction instruction reception unit 122, a first reproduction unit 124, a reproduction history recording unit 130, a reproduction history display unit 140, a second A playback instruction receiving unit 142 and a second playback unit 144 are provided.

音声再生システム１００は、さらに、音声取得部１０２、音声記憶部１０４、音声認識部１１２、および音声認識結果記憶部１１４を有する音声認識システム１１０と接続可能である。音声認識システム１１０において、音声取得部１０２は、マイクロフォンなどの音声入力部（不図示）から入力された発言者の一連の発言の音声を取得する。音声認識部１１２は、時刻情報と関連付けられた発言ごとに音声認識処理し、音声認識結果をテキストデータに変換する。 The voice reproduction system 100 can be further connected to a voice recognition system 110 having a voice acquisition unit 102, a voice storage unit 104, a voice recognition unit 112, and a voice recognition result storage unit 114. In the voice recognition system 110, the voice acquisition unit 102 acquires a voice of a series of speeches of a speaker input from a voice input unit (not shown) such as a microphone. The voice recognition unit 112 performs voice recognition processing for each utterance associated with the time information, and converts the voice recognition result into text data.

図２に示すように、音声記憶部１０４は、音声取得部１０２が取得した音声を音声データとして時刻情報と関連付けられた発言ごとに記憶する。音声認識結果記憶部１１４は、音声認識部１１２により変換されたテキストデータを時刻情報と関連付けられた発言ごとにテキスト区間として記憶する。これにより、音声の音声データおよびテキストデータが、発言ごとの時刻情報に互いに関連付けて音声記憶部１０４および音声認識結果記憶部１１４にそれぞれ記憶される。 As shown in FIG. 2, the voice storage unit 104 stores the voice acquired by the voice acquisition unit 102 as voice data for each utterance associated with time information. The speech recognition result storage unit 114 stores the text data converted by the speech recognition unit 112 as a text section for each utterance associated with time information. As a result, voice voice data and text data are stored in the voice storage unit 104 and the voice recognition result storage unit 114 in association with the time information for each utterance.

図２の例では、音声認識結果記憶部１１４は、時刻情報ごとに、テキストデータの識別情報となるテキスト区間、発言者の識別情報、テキストデータが関連付けられて記憶される。 In the example of FIG. 2, the speech recognition result storage unit 114 stores, for each time information, a text section serving as text data identification information, speaker identification information, and text data in association with each other.

図１に戻り、発言一覧表示部１２０は、音声記憶部１０４および音声認識結果記憶部１１４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面に表示する。この画面の一例として、図３（ａ）の発言一覧画面１５０は、発言リスト１５２を含む。発言リスト１５２は、名前や呼称など発言者を区別する発言者名１５４と、音声認識処理された発言内容１５６を示すテキストデータとが表示される。発言リスト１５２には、これらの情報が発言ごとに時系列で表示される。
なお、発言リスト１５２において、たとえば、発言時刻のソート順を逆にしたり、期間を限定して表示したり、発言者でソートしたりできる。 Returning to FIG. 1, the speech list display unit 120 refers to the speech storage unit 104 and the speech recognition result storage unit 114 and displays the speech recognition results of a series of speeches on the speech list screen in time series. As an example of this screen, the message list screen 150 in FIG. 3A includes a message list 152. The speech list 152 displays a speaker name 154 for distinguishing speakers such as names and names, and text data indicating the speech content 156 that has been subjected to speech recognition processing. In the speech list 152, these pieces of information are displayed in time series for each speech.
In the speech list 152, for example, the sorting order of the speech time can be reversed, displayed for a limited period, or sorted by the speaker.

図１に戻り、第１再生指示受付部１２２は、発言一覧画面１５０（図３（ａ））に表示された一連の発言の中から再生開始する発言の選択をユーザ操作により受け付ける。たとえば、図３（ｂ）の発言一覧画面１６０のように、発言リスト１５２の中から再生したい発言の行１６２をユーザ操作によりクリックして選択すると、色替えや反転表示などにより、選択されたことが視認できるようになっている。なお、行１６４は、行１６２の次に選択された箇所を示している。 Returning to FIG. 1, the first reproduction instruction accepting unit 122 accepts a user's operation to select a message to start reproduction from a series of messages displayed on the message list screen 150 (FIG. 3A). For example, as shown in the message list screen 160 in FIG. 3B, when the user selects and clicks on a message line 162 to be reproduced from the message list 152, the selection is made by color change or reverse display. Is visible. Note that a row 164 indicates a location selected next to the row 162.

図１に戻り、第１再生部１２４は、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択された発言から、時刻情報に基づいて対応する音声データとテキストデータを関連付けて、同期して再生する。音声データは、音声出力制御部（不図示）によりスピーカ（不図示）から出力される。また、テキストデータは、上述した発言一覧画面１５０の発言リスト１５２の中で、音声出力されている箇所のテキストを色替えや反転表示することで、音声出力されている箇所が視認できるように発言一覧表示部１２０により表示される。なお、図１では、第１再生部１２４と音声記憶部１０４や音声認識結果記憶部１１４、再生履歴記録部１３０との間の接続は省略してある。なお、再生は、特に限定されないが、開始位置から連続して再生することができる。 Returning to FIG. 1, the first playback unit 124 refers to the voice storage unit 104 and the voice recognition result storage unit 114, and associates the corresponding voice data and text data based on the time information from the selected utterance, and synchronizes them. And play it. The audio data is output from a speaker (not shown) by an audio output control unit (not shown). In addition, the text data is remarked so that the voice output part can be visually recognized by changing the color of the part of the voice output part in the comment list 152 of the comment list screen 150 described above or by reversing the text. Displayed by the list display unit 120. In FIG. 1, the connection between the first playback unit 124 and the voice storage unit 104, the voice recognition result storage unit 114, and the playback history recording unit 130 is omitted. In addition, although reproduction | regeneration is not specifically limited, It can reproduce | regenerate continuously from a starting position.

再生履歴記録部１３０は、図２に示すように、第１再生部１２４で再生した音声のテキストデータの識別情報であるテキスト区間を、その再生指示順を識別する再生ＩＤに関連付けて再生履歴として記録する。再生ＩＤは、たとえば、再生指示された順に振られたシリアル番号や、再生時刻情報などである。上述したように、再生された音声は連続して次の発言も出力されるが、再生履歴記録部１３０に議論されるのは、第１再生指示受付部１２２によって再生指示された発言についてのみ記録される。 As shown in FIG. 2, the playback history recording unit 130 associates a text section that is identification information of audio text data played back by the first playback unit 124 with a playback ID that identifies the playback instruction order as a playback history. Record. The reproduction ID is, for example, a serial number assigned in the order in which reproduction is instructed or reproduction time information. As described above, the reproduced speech continuously outputs the next utterance, but the reproduction history recording unit 130 discusses only the utterance instructed to be reproduced by the first reproduction instruction accepting unit 122. Is done.

再生履歴表示部１４０は、記録された再生履歴を再生履歴一覧画面に表示する。図３（ｃ）に、再生履歴一覧画面１７０の一例を示す。たとえば、図３（ｂ）の発言一覧画面１６０上で、ユーザがはじめに行１６２を選択して再生し、次に行１６４を選択して再生し、その後、再び行１６２を選択して再生した場合、このように、再生された順に発言がソートされた再生履歴一覧画面１７０が再生履歴表示部１４０により表示される。 The playback history display unit 140 displays the recorded playback history on the playback history list screen. FIG. 3C shows an example of the reproduction history list screen 170. For example, on the message list screen 160 of FIG. 3B, when the user first selects and reproduces the line 162, then selects and reproduces the line 164, and then selects and reproduces the line 162 again. Thus, the reproduction history display screen 140 displays the reproduction history list screen 170 in which the messages are sorted in the order of reproduction.

第２再生指示受付部１４２は、再生履歴一覧画面１７０に表示された再生履歴の中から、再生する発言の選択をユーザ操作により受け付ける。すなわち、本実施形態の音声再生システム１００では、一連の発言の発言リスト１５２から発言を選択して再生するだけでなく、一度再生した発言の再生履歴一覧画面１７０のリストから再度発言を選択して再生することができる。再生履歴一覧画面１７０のリストは、発言リスト１５２に比較して通常は大幅に少ないので、発言を探すのが容易である。また、繰り返して再生する場合にも再生履歴一覧画面１７０のリストから再生指示した方が、操作性が大幅に向上する。 The second playback instruction receiving unit 142 receives a selection of a message to be played from the playback history displayed on the playback history list screen 170 by a user operation. That is, in the audio reproduction system 100 according to the present embodiment, not only the utterance is selected and reproduced from the utterance list 152 of a series of utterances, but the utterance is selected again from the list on the reproduction history list screen 170 of the utterances once reproduced. Can be played. Since the list on the playback history list screen 170 is usually much smaller than the speech list 152, it is easy to search for the speech. In addition, when replaying repeatedly, the operability is greatly improved by instructing playback from the list on the playback history list screen 170.

図１に戻り、第２再生部１４４は、音声認識結果記憶部１１４および再生履歴記録部１３０を参照し、選択された発言から、再生識別情報および時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する。この再生方法などは、第１再生部１２４と同様である。 Returning to FIG. 1, the second playback unit 144 refers to the speech recognition result storage unit 114 and the playback history recording unit 130, and associates speech data and text data from the selected utterance based on the playback identification information and time information. Play in sync. This reproduction method and the like are the same as those of the first reproduction unit 124.

このように構成された本実施形態の音声再生システム１００の動作について、以下に説明する。図４は、本発明の実施の形態に係る音声再生システム１００の動作の一例を示すフローチャートである。以下、図１乃至４を用いて説明する。 The operation of the sound reproduction system 100 of the present embodiment configured as described above will be described below. FIG. 4 is a flowchart showing an example of the operation of the audio reproduction system 100 according to the embodiment of the present invention. Hereinafter, a description will be given with reference to FIGS.

本発明の実施の形態に係る閲覧方法は、一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し（ステップＳ１０５）、得られた音声の音声データおよびテキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶部１０４および音声認識結果記憶部１１４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面１５０に表示し（ステップＳ１０９）、発言一覧画面１５０に表示された一連の発言の中から発言の選択を受け付け（ステップＳ１１１）、選択された発言に対する音声のテキストデータを、その再生指示順を識別する再生識別情報に関連付けて再生履歴として再生履歴記録部１３０に記録し（ステップＳ１１５）、記録された再生履歴のテキストデータを再生識別情報に基づいて並べて再生履歴一覧画面１７０に表示する（ステップＳ１１７）。 In the browsing method according to the embodiment of the present invention, the speech of a series of speeches is subjected to speech recognition processing, the speech recognition result is converted into text data (step S105), and the obtained speech speech data and text data are obtained for each speech. The speech storage unit 104 and the speech recognition result storage unit 114 stored in association with each other's time information are referred to, and the speech recognition results of a series of speeches are displayed on the speech list screen 150 in time series (step S109). ), Receiving a selection of a message from a series of messages displayed on the message list screen 150 (step S111), and associating the voice text data corresponding to the selected message with reproduction identification information for identifying the reproduction instruction order. The reproduction history is recorded in the reproduction history recording unit 130 (step S115), and the text data of the recorded reproduction history is reproduced. Side-by-side to display the playback history list screen 170 on the basis of the information (step S117).

さらに、本発明の実施の形態に係る閲覧方法は、さらに、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択された発言から、時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する（ステップＳ１１３）。 Furthermore, the browsing method according to the embodiment of the present invention further refers to the voice storage unit 104 and the voice recognition result storage unit 114 and associates voice data and text data based on time information from the selected utterance. The data is reproduced in synchronism (step S113).

図４において、ステップＳ１０１〜Ｓ１０７は、音声認識システム１１０で行われる。これにより、議論や解析対象である会議や裁判ごとの音声データとテキストデータを参照して同期させ再生することができる。
詳細には、まず、音声認識システム１１０において、音声取得部１０２が、一例の発言の音声Ａを取得し（ステップＳ１０１）、音声認識部１１２が音声認識処理し、音声認識結果をテキストデータに変換する（ステップＳ１０３）。そして、得られた音声の音声データおよびテキストデータが発言ごとの時刻情報に互いに関連付けてそれぞれ音声記憶部１０４および音声認識結果記憶部１１４に記憶される（ステップＳ１０５、Ｓ１０７）。 In FIG. 4, steps S101 to S107 are performed by the speech recognition system 110. Thereby, it can reproduce | regenerate by synchronizing with the audio | voice data and text data for every meeting and trial which are discussion and analysis object.
Specifically, first, in the speech recognition system 110, the speech acquisition unit 102 acquires the speech A of an example utterance (step S101), the speech recognition unit 112 performs speech recognition processing, and converts the speech recognition result into text data. (Step S103). Then, the obtained voice data and text data are stored in the voice storage unit 104 and the voice recognition result storage unit 114 in association with the time information for each utterance (steps S105 and S107).

このようにして得られた音声データと音声認識結果は、図１の音声記憶部１０４および音声認識結果記憶部１１４に該当する記録媒体またはハードディスクなどの記憶ユニットに記録でき、音声再生システム１００のコンピュータに読み取られ、処理される。会議ごとや裁判所での審理などごとにファイルまたはフォルダに記憶され、音声再生システム１００では、議事内容を閲覧したい会議や公判に該当するファイルなどを選択して処理を行う。 The voice data and the voice recognition result obtained in this way can be recorded in a recording medium or a storage unit such as a hard disk corresponding to the voice storage unit 104 and the voice recognition result storage unit 114 in FIG. Read and processed. The information is stored in a file or folder for each meeting or court hearing, and the audio reproduction system 100 selects and processes a file corresponding to a meeting or trial for which the contents of the proceedings are to be browsed.

まず、発言一覧表示部１２０が、音声記憶部１０４および音声認識結果記憶部１１４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面１５０に表示する（ステップＳ１０９）。そして、第１再生指示受付部１２２が、発言一覧画面１５０に表示された一連の発言の中から再生開始する発言の選択をユーザ操作により受け付ける（ステップＳ１１１）。ユーザからの再生開始指示に呼応して、第１再生部１２４が、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択された発言から、時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する（ステップＳ１１３）。上述したように、音声が出力されるとともに、発言リスト１５２の該当箇所が色替え表示などされて、どこの発言を再生しているか視認できるようになっている。 First, the speech list display unit 120 refers to the speech storage unit 104 and the speech recognition result storage unit 114, and displays the speech recognition results of a series of speeches on the speech list screen 150 in time series (step S109). Then, the first reproduction instruction accepting unit 122 accepts a selection of an utterance to be reproduced from a series of utterances displayed on the utterance list screen 150 by a user operation (step S111). In response to the reproduction start instruction from the user, the first reproduction unit 124 refers to the voice storage unit 104 and the voice recognition result storage unit 114, and based on the selected utterance, the first reproduction unit 124 generates voice data and text data based on the time information. In association with each other, reproduction is performed (step S113). As described above, sound is output and the corresponding part of the speech list 152 is displayed in a color-changed manner so that it is possible to visually recognize which speech is being reproduced.

発言一覧表示部１２０または第１再生部１２４が、再生した音声のテキストデータを、その再生指示順を識別する再生識別情報に関連付けて再生履歴として再生履歴記録部１３０に記録する（ステップＳ１１５）。再生履歴表示部１４０が、再生履歴記録部１３０に記録された再生履歴のテキストデータを再生ＩＤに基づいて並べて再生履歴一覧画面１７０に表示する（ステップＳ１１７）。なお、図１では、第１再生部１２４と音声認識システム１１０および再生履歴記録部１３０との間の接続は省略してある。 The message list display unit 120 or the first reproduction unit 124 records the reproduced audio text data in the reproduction history recording unit 130 as a reproduction history in association with the reproduction identification information for identifying the reproduction instruction order (step S115). The reproduction history display unit 140 arranges the reproduction history text data recorded in the reproduction history recording unit 130 on the reproduction history list screen 170 based on the reproduction ID (step S117). In FIG. 1, the connection between the first reproduction unit 124, the voice recognition system 110, and the reproduction history recording unit 130 is omitted.

そして、再生履歴一覧画面１７０を表示した状態で、第２再生指示受付部１４２が、再生履歴一覧画面１７０に表示された再生履歴の中から、再生する発言の選択をユーザ操作により受け付ける（ステップＳ１１９）。ユーザからの再生開始指示に呼応して、第２再生部１４４が、音声認識結果記憶部１１４および再生履歴記録部１３０を参照し、再生開始指示により選択された発言から、再生識別情報および時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生する（ステップＳ１２１）。この再生方法などは、第１再生部１２４と同様である。なお、図１では、第２再生部１４４と音声認識システム１１０および再生履歴記録部１３０との間の接続は省略してある。 Then, in a state where the reproduction history list screen 170 is displayed, the second reproduction instruction accepting unit 142 accepts selection of a message to be reproduced from the reproduction history displayed on the reproduction history list screen 170 by a user operation (step S119). ). In response to the reproduction start instruction from the user, the second reproduction unit 144 refers to the voice recognition result storage unit 114 and the reproduction history recording unit 130, and from the remarks selected by the reproduction start instruction, reproduction identification information and time information The voice data and the text data are associated with each other based on the above and are reproduced synchronously (step S121). This reproduction method and the like are the same as those of the first reproduction unit 124. In FIG. 1, the connection between the second playback unit 144, the voice recognition system 110, and the playback history recording unit 130 is omitted.

以上説明したように、本発明の実施形態の音声再生システム１００によれば、再生履歴から所望の箇所を容易に迅速に検索または特定して再生できる。すなわち、議事録などを閲覧しながら、その内容について議論するために繰り返し見直す場合に、毎回特定の箇所を探す労力と時間の浪費を解決することができる。 As described above, according to the audio reproduction system 100 of the embodiment of the present invention, it is possible to easily search and specify a desired location from the reproduction history and reproduce it. That is, it is possible to solve the labor and time waste of searching for a specific part each time when reviewing the contents repeatedly while discussing the contents while browsing the minutes.

たとえば、音声再生システム１００で作成した音声認識結果は、会議や裁判などの議事録として利用できる。たとえば、この議事録の中で、必要なテキスト区間のみについて、過去に見た選択順に履歴表示する再生履歴表示部１４０を設けることで、議事録に基づいて再度議論したときの時系列で並ぶため、必要な箇所のみを見つけ出しやすい。特に、複数日にわたって議論する場合にも、前日見た箇所を直ぐに見つけることができる。 For example, a voice recognition result created by the voice reproduction system 100 can be used as a minutes of a meeting or a trial. For example, by providing a playback history display unit 140 that displays a history of only the necessary text sections in the minutes in the order of selection seen in the past, it is arranged in chronological order when discussions are again made based on the minutes. , Easy to find only the necessary parts. In particular, even when discussing over a plurality of days, it is possible to quickly find the place seen the previous day.

本実施形態の音声再生システム１００は、たとえば、会議や裁判などのやりとりを記録した音声データとテキストデータを同期させて再生することができる。再生開始位置は、任意の位置を指定でき、時系列に並んでいる一連の発言のリストを表示し、リストの中から会議や裁判などのやりとりの内容について議論するために、任意の位置から発言を再生して聞きながら議論ができる。たとえば、話者の声が聞き取りづらいような場合や話者の論点に矛盾があるのを確認する場合に聞き直したり、話者の心理状態を推測するために、何度も口調や間合いの息づかいなどを確認する場合など様々な状況で、記録全体の中の一部である何カ所かを繰り返し再生してみることができる。 The voice reproduction system 100 according to the present embodiment can reproduce, for example, voice data in which communication such as a meeting or a trial is recorded and text data in synchronization. The playback start position can be specified at any position, and a list of utterances arranged in chronological order is displayed. In order to discuss the contents of exchanges such as meetings and trials from within the list, utterances can be made from any position. You can discuss while listening to. For example, if the speaker's voice is difficult to hear or if there is a contradiction in the speaker's argument, it may be rehearsed, or the tone and breathing may be repeated many times to infer the speaker's psychological state. In various situations, such as when confirming, etc., it is possible to repeatedly reproduce some of the parts of the entire recording.

（第２の実施の形態）
図５は、本発明の実施の形態に係る音声再生システム２００の構成を示す機能ブロック図である。
本実施形態の音声再生システム２００は、上記実施の形態とは、一連の発言の音声とともに、そのときの話者を撮像した映像を同期して再生できる点で相違する。さらに、本実施形態では、再生履歴表示部１４０により表示される再生履歴一覧画面の具体例を示す。 (Second Embodiment)
FIG. 5 is a functional block diagram showing the configuration of the audio reproduction system 200 according to the embodiment of the present invention.
The audio reproduction system 200 according to the present embodiment is different from the above-described embodiment in that a video of a speaker at that time can be synchronized and reproduced along with a series of speech sounds. Furthermore, in the present embodiment, a specific example of a playback history list screen displayed by the playback history display unit 140 is shown.

図５において、音声認識システム２１０が、図１の音声認識システム１１０の音声取得部１０２、音声記憶部１０４、音声認識部１１２、および音声認識結果記憶部１１４に加え、映像取得部２０２と、映像記憶部２０４と、を備えている。また、本実施形態の音声再生システム２００は、図１の上記実施形態の音声再生システム１００とは発言一覧表示部１２０、第１再生部１２４、および第２再生部１４４以外、同様の構成を有し、発言一覧表示部１２０、第１再生部１２４、および第２再生部１４４に換えて発言一覧表示部２２０、第１再生部２２４、および第２再生部２４４をさらに有する。 5, the voice recognition system 210 includes a video acquisition unit 202, a video acquisition unit 102, a voice storage unit 104, a voice recognition unit 112, and a voice recognition result storage unit 114 of the voice recognition system 110 of FIG. And a storage unit 204. Further, the audio reproduction system 200 of this embodiment has the same configuration as the audio reproduction system 100 of the above-described embodiment of FIG. 1 except for the statement list display unit 120, the first reproduction unit 124, and the second reproduction unit 144. In addition, the message list display unit 120, the first reproduction unit 124, and the second reproduction unit 144 are replaced with a message list display unit 220, a first reproduction unit 224, and a second reproduction unit 244.

映像取得部２０２は、ビデオカメラ（不図示）などにより話者を撮像した映像Ｖを取得する。映像記憶部２０４は、映像取得部２０２が取得した映像を映像データとして時刻情報に関連付けて記憶する。本実施形態において、音声記憶部１０４に格納されている音声データと、音声認識結果記憶部１１４に格納されているテキストデータと、映像記憶部２０４に格納されている映像データとが、発言ごとの時刻情報に互いに関連付けられる。 The video acquisition unit 202 acquires a video V obtained by imaging a speaker using a video camera (not shown) or the like. The video storage unit 204 stores the video acquired by the video acquisition unit 202 in association with time information as video data. In the present embodiment, the voice data stored in the voice storage unit 104, the text data stored in the voice recognition result storage unit 114, and the video data stored in the video storage unit 204 are Associated with time information.

発言一覧表示部２２０は、音声記憶部１０４、音声認識結果記憶部１１４、および映像記憶部２０４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面に表示する。第１再生部２２４は、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択された発言から、時刻情報に基づいて対応する音声データとテキストデータを関連付けて、同期して再生するとともに、ユーザ操作により指定された発言の音声出力に同期させて映像を画面表示する。 The speech list display unit 220 refers to the speech storage unit 104, the speech recognition result storage unit 114, and the video storage unit 204, and displays the speech recognition results of a series of speeches in a time series on the speech list screen. The first playback unit 224 refers to the voice storage unit 104 and the voice recognition result storage unit 114, and from the selected utterance, associates the corresponding voice data and text data based on the time information and plays back in synchronization. The video is displayed on the screen in synchronism with the voice output of the speech specified by the user operation.

図６は、本実施形態の発言一覧表示部２２０が表示する発言一覧画面２５０の一例を示す図である。発言一覧表示部２２０は、発言一覧画面２５０において、発言リスト２５２に加え、映像表示ウインドウ２８０を開くことができ、映像表示ウインドウ２８０に映像記憶部２０４に記憶された映像が音声と同期して再生表示される。 FIG. 6 is a diagram illustrating an example of the message list screen 250 displayed by the message list display unit 220 of the present embodiment. The message list display unit 220 can open a video display window 280 in addition to the message list 252 on the message list screen 250, and the video stored in the video storage unit 204 in the video display window 280 is reproduced in synchronization with the audio. Is displayed.

図５に戻り、第１再生部２２４は、音声記憶部１０４、音声認識結果記憶部１１４、および映像記憶部２０４を参照し、選択された発言から、時刻情報に基づいて対応する音声データとテキストデータと映像データを関連付けて、同期して再生する。音声データとテキストデータの再生方法については図１の第１再生部１２４と同様である。映像データは、上述した発言一覧画面２５０の映像表示ウインドウ２８０に表示される。なお、図５では、第１再生部２２４と音声認識システム２１０および再生履歴記録部１３０との間の接続は省略してある。なお、再生は、特に限定されないが、開始位置から連続して再生することができる。 Returning to FIG. 5, the first playback unit 224 refers to the voice storage unit 104, the voice recognition result storage unit 114, and the video storage unit 204, and selects the corresponding voice data and text based on the time information from the selected utterance. Data and video data are linked and played back in synchronization. The method for reproducing audio data and text data is the same as that of the first reproducing unit 124 in FIG. The video data is displayed in the video display window 280 of the message list screen 250 described above. In FIG. 5, the connection between the first playback unit 224, the voice recognition system 210, and the playback history recording unit 130 is omitted. In addition, although reproduction | regeneration is not specifically limited, It can reproduce | regenerate continuously from a starting position.

また、図７に示すように、再生履歴表示部１４０は、再生履歴ボタン２７４をユーザ操作により押下された時、再生履歴一覧ウインドウ２７０を開くことができ、再生履歴一覧ウインドウ２７０には、再生リスト２７２が表示される。図７の例では、再生リスト２７２は、時系列で再生した順に発言が並べられて表示されている。再生リスト２７２は、発言時刻、発言者、発言の内容が含まれる。 As shown in FIG. 7, the playback history display unit 140 can open the playback history list window 270 when the playback history button 274 is pressed by a user operation. The playback history list window 270 includes a playback list. 272 is displayed. In the example of FIG. 7, the reproduction list 272 displays the messages arranged in the order of reproduction in time series. The reproduction list 272 includes a speech time, a speaker, and the content of the speech.

図５に戻り、第２再生部２４４は、音声記憶部１０４、音声認識結果記憶部１１４、および映像記憶部２０４、再生履歴記録部１３０を参照し、選択された発言から、再生ＩＤおよび時刻情報に基づいて音声データとテキストデータと映像データを関連付けて、同期して再生する。この再生方法などは、第１再生部２２４と同様であり、図５では、第２再生部２４４と音声認識システム２１０および再生履歴記録部１３０との間の接続は省略してある。 Returning to FIG. 5, the second playback unit 244 refers to the voice storage unit 104, the voice recognition result storage unit 114, the video storage unit 204, and the playback history recording unit 130, and determines the playback ID and time information from the selected remarks. The audio data, the text data, and the video data are associated with each other and reproduced in synchronization. The playback method and the like are the same as those of the first playback unit 224. In FIG. 5, the connection between the second playback unit 244, the voice recognition system 210, and the playback history recording unit 130 is omitted.

このように構成された本実施形態の音声再生システム２００の動作について、以下に説明する。図８は、本発明の実施の形態に係る音声再生システム２００の動作の一例を示すフローチャートである。以下、図５乃至９を用いて説明する。 The operation of the audio reproduction system 200 of the present embodiment configured as described above will be described below. FIG. 8 is a flowchart showing an example of the operation of the audio reproduction system 200 according to the embodiment of the present invention. Hereinafter, description will be made with reference to FIGS.

図８のフローチャートは、上記実施形態の図４のフローチャートと同じステップを有するとともに、さらに、ステップＳ２０１、Ｓ２０３、Ｓ２１１、Ｓ２１３を含む。 The flowchart in FIG. 8 includes the same steps as those in the flowchart in FIG. 4 of the above embodiment, and further includes steps S201, S203, S211, and S213.

まず、音声取得部１０２および音声認識部１１２により、Ｓ１０１からＳ１０７が実行される時、同時に、映像取得部２０２が映像Ｖを取得し（ステップＳ２０１）、映像データとして、対応する音声データと音声認識部１１２の音声認識結果であるテキストデータと発言ごとの時刻情報に互いに関連付けて記憶する（ステップＳ２０３）。 First, when S101 to S107 are executed by the voice acquisition unit 102 and the voice recognition unit 112, at the same time, the video acquisition unit 202 acquires the video V (step S201), and the corresponding voice data and voice recognition as the video data. The text data as the speech recognition result of the unit 112 and the time information for each utterance are stored in association with each other (step S203).

ステップＳ１１１で、第１再生指示受付部１２２が、発言一覧画面２５０（図７）に表示された一連の発言の中から再生開始する発言の選択をユーザ操作により受け付けると、第１再生部２２４が、音声記憶部１０４、音声認識結果記憶部１１４、および映像記憶部２０４を参照し、発言一覧表示部２２０に、一連の発言の音声の音声認識結果を時系列で発言一覧画面２５０に表示させるとともに、ユーザ操作により指定された発言の音声出力に同期させて映像を映像表示ウインドウ２８０（図７）に表示させる（ステップＳ２１１、Ｓ１１３）。なお、ステップＳ１０９で発言一覧表示部２２０は、図６の発言一覧画面２５０を表示している。 In step S111, when the first reproduction instruction accepting unit 122 accepts selection of an utterance to start reproduction from a series of utterances displayed on the utterance list screen 250 (FIG. 7), the first reproducing unit 224 The speech storage unit 104, the speech recognition result storage unit 114, and the video storage unit 204 are referred to, and the speech recognition result of a series of speeches is displayed on the speech list screen 250 in time series on the speech list display unit 220. Then, the video is displayed on the video display window 280 (FIG. 7) in synchronization with the voice output of the speech specified by the user operation (steps S211 and S113). In step S109, the message list display unit 220 displays the message list screen 250 of FIG.

そして、ステップＳ１１９で、再生履歴一覧ウインドウ２７０に表示された再生履歴（再生履歴一覧ウインドウ２７０）の中から、再生する発言の選択を受け付けた時、発言一覧表示部２２０が、音声記憶部１０４、音声認識結果記憶部１１４、および映像記憶部２０４を参照し、一連の発言の音声の音声認識結果を時系列で発言一覧画面２５０（図６）に表示する。第２再生部２４４が、音声記憶部１０４および音声認識結果記憶部１１４を参照し、選択された発言から、時刻情報に基づいて音声データとテキストデータを関連付けて、同期して再生するとともに、ユーザ操作により指定された発言の音声出力に同期させて映像を映像表示ウインドウ２８０に画面表示する（ステップＳ２１３、Ｓ１２１）。 In step S119, when the selection of the message to be reproduced is received from the reproduction history (reproduction history list window 270) displayed in the reproduction history list window 270, the message list display unit 220 displays the voice storage unit 104, The speech recognition result storage unit 114 and the video storage unit 204 are referred to, and speech recognition results of a series of speech are displayed on the speech list screen 250 (FIG. 6) in time series. The second playback unit 244 refers to the voice storage unit 104 and the voice recognition result storage unit 114, and from the selected utterance, the voice data and the text data are associated with each other based on the time information and played back synchronously. The video is displayed on the video display window 280 in synchronization with the voice output of the speech specified by the operation (steps S213 and S121).

本実施形態において、以下に、再生履歴表示の方法について説明する。図９は、本発明の実施の形態に係る音声再生システムの再生履歴一覧画面の例を示す図である。図９（ａ）は、再生回数順に再生履歴をソートして表示する画面の例であり、図９（ｂ）は、再生回数順で再生履歴を表示する画面の例である。なお、図７の例は、再生時刻順で時系列に再生履歴を表示する画面である。 In the present embodiment, a method for displaying a playback history will be described below. FIG. 9 is a diagram showing an example of a playback history list screen of the audio playback system according to the embodiment of the present invention. FIG. 9A shows an example of a screen that sorts and displays the playback history in the order of the number of playbacks, and FIG. 9B shows an example of a screen that displays the playback history in the order of the number of playbacks. Note that the example of FIG. 7 is a screen that displays the playback history in chronological order in the order of playback time.

本実施形態の音声再生システム２００において、再生履歴記録部１３０は、第１再生部２２４で再生した音声のテキストデータを、その再生時刻情報に関連付けて再生履歴として記録し、再生履歴表示部１４０は、再生識別情報（再生ＩＤ）に基づいて、記録された再生履歴を時系列で再生履歴一覧ウインドウ２７０に表示する。そして、本実施形態の音声再生システム２００において、再生履歴記録部１３０は、第２再生部２４４で再生した音声のテキストデータの再生回数を計数する再生回数計数部（不図示）を有し、再生履歴表示部１４０は、再生回数に基づいて、記録された再生履歴を再生履歴一覧画面に表示する。 In the audio reproduction system 200 of the present embodiment, the reproduction history recording unit 130 records the audio text data reproduced by the first reproduction unit 224 as a reproduction history in association with the reproduction time information, and the reproduction history display unit 140 Based on the reproduction identification information (reproduction ID), the recorded reproduction history is displayed in the reproduction history list window 270 in time series. In the audio reproduction system 200 of the present embodiment, the reproduction history recording unit 130 includes a reproduction number counting unit (not shown) that counts the number of reproductions of the audio text data reproduced by the second reproduction unit 244. The history display unit 140 displays the recorded playback history on the playback history list screen based on the number of times of playback.

たとえば、図９（ａ）の例では、再生履歴一覧画面２７５は、再生回数が多い順にソートされて表示されるとともに、所定回数以上のもののみ表示している。これにより、再生履歴を再生時刻順に表示する上記実施形態の再生履歴一覧画面１７０に比較して、何回も再生した発言が一目瞭然である。図示していないが、再生回数を表示してもよい。 For example, in the example of FIG. 9A, the reproduction history list screen 275 is sorted and displayed in descending order of the number of times of reproduction, and only those that are a predetermined number of times or more are displayed. As a result, the replayed message is clearly recognized at a glance compared to the reproduction history list screen 170 of the above embodiment that displays the reproduction history in the order of the reproduction time. Although not shown, the number of reproductions may be displayed.

また、図９（ｂ）の例では、再生履歴一覧画面２８５は、再生回数または頻度に応じて、発言を色替え表示している。たとえば、行２８４の発言の再生回数の方が、行２８２の発言の再生回数よりも多い場合、行２８４の方がより目立つ色に変更することもできる。 Further, in the example of FIG. 9B, the playback history list screen 285 displays the speech color-changed according to the number of playbacks or the frequency. For example, when the number of times of utterances in the row 284 is greater than the number of times of utterances in the row 282, the color in the row 284 can be changed to a more prominent color.

すなわち、再生履歴の中でも見る頻度が多いもの（何度も議論するポイント）については再生回数に応じてラベルを自動的に付けることによって、当該ラベルをもとに議論のポイントのみ抽出でき、さらに検索効率が上がることとなる。 In other words, for the most frequently viewed items in the playback history (points that are discussed many times), only the points of discussion can be extracted based on the labels by automatically attaching labels according to the number of playbacks. Efficiency will increase.

以上、説明したように、本発明の実施の形態に係る音声再生システム２００は、上記実施形態と同様な効果を奏するとともに、音声だけでなく映像とテキストデータを同期させて再生することができる。これにより、話者が発言している時の表情や仕草などを確認することもできる。
また、この構成によれば、議論したときの時刻を再生時刻として再生履歴中のテキスト区間と関連付けて登録できるので、再生時刻に基づいてテキスト区間や、テキスト区間に関連付けられた映像を、「いつ議論をしたのか」をキーとして探すことができ、さらに迅速に所望の発言箇所を見つけることができる。 As described above, the audio reproduction system 200 according to the embodiment of the present invention achieves the same effect as the above embodiment, and can reproduce not only audio but also video and text data in synchronization. This makes it possible to check the facial expression and gesture when the speaker is speaking.
Further, according to this configuration, since the time at the time of discussion can be registered as the playback time in association with the text section in the playback history, the text section and the video associated with the text section are displayed based on the playback time. It is possible to search for “whether the argument was made” as a key, and to find a desired remark part more quickly.

さらに、再生履歴の表示を再生回数の多いものを視認しやすく表示することができるので、繰り返し議論になったポイントが一目瞭然である。 Furthermore, since the playback history can be displayed with a high number of playbacks in an easy-to-view manner, the points that have been repeatedly discussed are obvious.

例えば、裁判では被告人に殺意があったかどうかを議論する際、関連ある議事録やその動画について何度も見て、被告人の真意をさぐる場合が想定される。議論のポイントとなるところは何度も見ることが想定され、今後も見る可能性がありうる。このような場合には、再生履歴中の対応するテキスト区間の色を変えるなどのマーキング処理をすることによって、議論ポイントを容易に抽出し、検索できる。別画面に表示できるようにしても良いし、マーキングによって検索ができるようにしても良い。 For example, when discussing whether or not the defendant was murderous in a trial, it is assumed that the defendant's real intention is searched by repeatedly looking at the relevant minutes and videos. It is assumed that the point of discussion will be seen many times, and may be seen in the future. In such a case, discussion points can be easily extracted and searched by performing a marking process such as changing the color of the corresponding text section in the reproduction history. You may enable it to display on another screen, and you may enable it to search by marking.

また、例えば見た頻度によって色を変えることで、再生履歴をみたときに、どのような点について議論を重ねてきたかが一目で分かり、まだ議論していない点がないか、議論が偏っていないかなども容易に視覚的に把握できるという効果もある。また、頻度が大きいということは重点議論ポイント（裁判では「論点」）であることを意味しているから、履歴を別途重要な順に並べたり、重要な論点だけを抽出して検索することもできる。このように、再生履歴が格段に使い易く便利になる。 Also, for example, by changing the color according to the frequency of viewing, you can see at a glance what points have been discussed when viewing the playback history, whether there are any points that have not been discussed yet, or whether the discussion is biased There is also an effect that can be easily grasped visually. In addition, since the high frequency means that it is a priority discussion point ("argument" in the trial), it is possible to arrange the history separately in an important order, or to extract only the important argument. . Thus, the playback history is much easier to use and convenient.

（第３の実施の形態）
図１０は、本発明の実施の形態に係る音声再生システム３００の構成を示す機能ブロック図である。
本実施形態の音声再生システム３００は、上記実施形態とは、発言一覧画面１５０に表示される発言を条件に従って検索して絞り込み、その中から再生指示を受け付けることができる点で相違する。 (Third embodiment)
FIG. 10 is a functional block diagram showing the configuration of the audio reproduction system 300 according to the embodiment of the present invention.
The audio reproduction system 300 of this embodiment is different from the above embodiment in that it can search and narrow down the messages displayed on the message list screen 150 according to the conditions, and can receive a reproduction instruction from them.

本実施形態の音声再生システム３００は、発言一覧画面に表示される発言の中から絞り込むための検索条件を受け付ける条件受付部３０２と、音声認識結果記憶部１１４を参照し、検索条件に従って発言を検索する検索部３０４と、検索された検索結果を検索結果一覧画面として表示する検索結果表示部３０６と、をさらに備え、第１再生指示受付部３２２は、検索結果一覧画面に表示された発言の中から再生開始する発言の選択を受け付ける。 The voice reproduction system 300 according to the present embodiment refers to the condition receiving unit 302 that receives a search condition for narrowing down the messages displayed on the message list screen, and the voice recognition result storage unit 114, and searches for a message according to the search condition. And a search result display unit 306 that displays the searched search results as a search result list screen. The first reproduction instruction receiving unit 322 includes a search result list screen that includes a search result list screen. The selection of the speech to start playback from is accepted.

具体的には、本実施形態の音声再生システム３００は、上記実施形態の音声再生システム２００の構成に加え、さらに、条件受付部３０２、検索部３０４、および検索結果表示部３０６を備える。また、上記実施形態の音声再生システム２００の第１再生指示受付部１２２に替えて、第１再生指示受付部３２２を有する。なお、図１の上記実施形態の音声再生システム１００の構成に、これらを組み合わせた構成としてもよい。 Specifically, the audio reproduction system 300 of this embodiment further includes a condition receiving unit 302, a search unit 304, and a search result display unit 306 in addition to the configuration of the audio reproduction system 200 of the above embodiment. Further, a first reproduction instruction receiving unit 322 is provided instead of the first reproduction instruction receiving unit 122 of the audio reproduction system 200 of the above embodiment. In addition, it is good also as a structure which combined these with the structure of the audio | voice reproduction | regeneration system 100 of the said embodiment of FIG.

条件受付部３０２は、発言一覧画面２５０に表示される発言の中から絞り込むための検索条件を受け付ける。たとえば、図１１に示すように、本実施形態において、発言一覧画面２５０には、検索ボタン３４０が設けられており、検索ボタン３４０の押下により、検索結果表示部３０６が検索画面３５０を表示する。検索画面３５０は、検索条件入力欄３５２と、検索結果表示欄３５４を有する。 The condition receiving unit 302 receives a search condition for narrowing down the messages displayed on the message list screen 250. For example, as shown in FIG. 11, in this embodiment, a search button 340 is provided on the message list screen 250, and the search result display unit 306 displays the search screen 350 when the search button 340 is pressed. The search screen 350 has a search condition input field 352 and a search result display field 354.

条件受付部３０２は、検索条件入力欄３５２に入力または選択された各種検索条件を受け付ける。ここでは、たとえば、検索条件として、発言の内容の文字列、発言者の種別、発言時間、時間帯などを指定できる。 The condition receiving unit 302 receives various search conditions input or selected in the search condition input field 352. Here, for example, as a search condition, a character string of the content of a statement, a type of speaker, a speech time, a time zone, and the like can be designated.

検索部３０４は、検索条件入力欄３５２でユーザ操作により指定された検索条件に従って音声認識結果記憶部１１４を参照し、発言を検索する。 The search unit 304 searches the speech recognition result storage unit 114 according to the search condition specified by the user operation in the search condition input field 352 and searches for a statement.

検索結果表示部３０６は、検索された検索結果を検索画面３５０の検索結果表示欄３５４にとして表示する。 The search result display unit 306 displays the searched search results in the search result display field 354 of the search screen 350.

そして、本実施形態では、第１再生指示受付部３２２は、検索画面３５０の検索結果表示欄３５４に表示された発言の中から再生開始する発言の選択を受け付ける。 In the present embodiment, the first reproduction instruction accepting unit 322 accepts selection of an utterance for starting reproduction from utterances displayed in the search result display field 354 of the search screen 350.

このように構成された本実施形態の音声再生システム３００の動作について、以下に説明する。図１２は、本発明の実施の形態に係る音声再生システムの動作の一例を示すフローチャートである。以下、図１０乃至図１２を用いて説明する。 The operation of the sound reproduction system 300 of the present embodiment configured as described above will be described below. FIG. 12 is a flowchart showing an example of the operation of the audio reproduction system according to the embodiment of the present invention. Hereinafter, a description will be given with reference to FIGS.

上記実施形態の音声再生システムの図４または図８のフローチャートのステップＳ１０９において、発言一覧表示部１２０または発言一覧表示部２２０により発言一覧画面１５０または発言一覧画面２５０が表示された後、たとえば、図１１の発言一覧画面２５０の検索ボタン３４０のユーザ操作により押下された時、本処理が開始する。検索結果表示部３０６により検索画面３５０が表示され、検索条件入力欄３５２に入力された検索条件を条件受付部３０２が受け付ける（ステップＳ３０１）。 In step S109 of the flowchart of FIG. 4 or FIG. 8 of the voice reproduction system of the above embodiment, after the message list display unit 120 or the message list display unit 220 displays the message list screen 150 or the message list screen 250, for example, FIG. This processing starts when the user presses the search button 340 of the 11 message list screen 250. A search screen 350 is displayed by the search result display unit 306, and the condition receiving unit 302 receives the search condition input in the search condition input field 352 (step S301).

そして、検索部３０４が、音声認識結果記憶部１１４を参照し、ステップＳ３０１で受け付けた検索条件に従って、該当する発言を検索する（ステップＳ３０３）。検索結果表示部３０６が、ステップＳ３０３で抽出された検索結果を検索結果表示欄３５４に表示する（ステップＳ３０５）。そして、図４または図８に戻り、ステップＳ１１１で、再生開始する発言の選択を第１再生指示受付部３２２が受け付ける。以下は、上記実施形態と同様である。 Then, the search unit 304 refers to the voice recognition result storage unit 114 and searches for a corresponding statement according to the search condition received in step S301 (step S303). The search result display unit 306 displays the search result extracted in step S303 in the search result display field 354 (step S305). Returning to FIG. 4 or FIG. 8, in step S <b> 111, the first reproduction instruction accepting unit 322 accepts selection of a message to start reproduction. The following is the same as in the above embodiment.

以上、説明したように、本実施形態の音声再生システム３００によれば、上記実施形態の音声再生システムと同様な効果を奏するとともに、検索条件によって発言内容を絞り込んだ上で、再生指示ができるので、効率よく所望の発言を見つけ出すことが可能になる。 As described above, according to the audio reproduction system 300 of the present embodiment, the same effect as the audio reproduction system of the above embodiment can be obtained, and a reproduction instruction can be given after narrowing down the content of a statement by search conditions. It becomes possible to find a desired remark efficiently.

（第４の実施の形態）
図１３は、本発明の実施の形態に係る音声再生システム４００の構成を示す機能ブロック図である。
本実施形態の音声再生システム４００は、上記実施形態とは、たとえば、図７の再生履歴一覧ウインドウ２７０の再生リスト２７２に表示された発言に対して任意の操作を受け付け、操作を実行する点で相違する。 (Fourth embodiment)
FIG. 13 is a functional block diagram showing the configuration of the audio reproduction system 400 according to the embodiment of the present invention.
The audio reproduction system 400 according to the present embodiment is different from the above embodiment in that, for example, an arbitrary operation is accepted in response to a message displayed in the reproduction list 272 of the reproduction history list window 270 in FIG. Is different.

本実施形態の音声再生システム４００は、再生履歴一覧画面１７０に表示された発言に対する任意の操作を受け付ける操作受付部４０２と、受け付けた操作を実行する実行部４０４と、をさらに備える。
たとえば、発言にマーカを付与したり、発言にメモやコメントを入力することができる。 The audio reproduction system 400 of the present embodiment further includes an operation accepting unit 402 that accepts an arbitrary operation on a message displayed on the reproduction history list screen 170, and an execution unit 404 that executes the accepted operation.
For example, a marker can be given to the utterance, and a memo or comment can be input to the utterance.

本実施形態において、図１４に示すように、上記実施形態の再生履歴記録部１３０に換えて再生履歴記録部４３０を有する。再生履歴記録部４３０は、再生履歴記録部１３０で記録される情報に加えて、さらに、再生ＩＤごとに、マーカによって区別されたレベルやあるいは、上記実施形態の再生回数に基づくレベル分けや、発言に付与されたメモを記録することができる。 In this embodiment, as shown in FIG. 14, a playback history recording unit 430 is provided instead of the playback history recording unit 130 of the above embodiment. In addition to the information recorded by the playback history recording unit 130, the playback history recording unit 430 further performs level classification based on the number of playbacks in the above embodiment, remarks, and remarks for each playback ID. The memo attached to can be recorded.

本実施形態では、再生履歴記録部４３０の情報を表示する再生履歴一覧画面４８０は、図１５に示すように、メモ欄４８２を設けることができる。 In the present embodiment, the playback history list screen 480 that displays information of the playback history recording unit 430 can include a memo field 482 as shown in FIG.

なお、本実施形態の音声再生システム４００は、図１、５、および１０に示す上記実施形態の音声再生システム１００、音声再生システム２００、および音声再生システム３００のいずれの構成とも組み合わせることができる。 Note that the audio reproduction system 400 of this embodiment can be combined with any of the configurations of the audio reproduction system 100, the audio reproduction system 200, and the audio reproduction system 300 of the above-described embodiment shown in FIGS.

たとえば、操作受付部４０２は、再生履歴一覧画面に表示された発言の中から不要な発言の削除指示を受け付け、実行部４０４は、削除指示を受け付けた発言を再生履歴から削除することができる。 For example, the operation receiving unit 402 can receive an instruction to delete unnecessary messages from the messages displayed on the reproduction history list screen, and the execution unit 404 can delete the messages that have received the deletion instruction from the reproduction history.

あるいは、記録された前記再生履歴の中から所定の条件を満たす発言を抽出し、抽出された前記発言に、所定の識別情報を付与する抽出部（不図示）をさらに備え、前記再生履歴表示手段は、抽出された前記発言を識別可能に表示することができる。 Alternatively, the playback history display means further includes an extraction unit (not shown) that extracts a message that satisfies a predetermined condition from the recorded playback history, and adds predetermined identification information to the extracted message. Can display the extracted utterances in an identifiable manner.

たとえば、再生回数が所定回数以上のものを抽出したり、所定のマーカのもの、所定のキーワードをメモやコメントに含むもの、所定の日時のものを抽出することができる。抽出された発言は、色替え、ブリンク、反転、マーク付与、別ウインドウ（ポップアップ）表示などにより、視認性を高めて表示するのが好ましい。 For example, it is possible to extract those whose number of reproductions is equal to or greater than a predetermined number, those having a predetermined marker, those having a predetermined keyword in a memo or comment, and those having a predetermined date and time. The extracted remarks are preferably displayed with improved visibility by color change, blinking, reversal, mark assignment, separate window (pop-up) display, and the like.

このように構成された本実施形態の音声再生システム４００の動作について、以下に説明する。図１６は、本発明の実施の形態に係る音声再生システムの動作の一例を示すフローチャートである。以下、図１３乃至１６を用いて説明する。 The operation of the sound reproduction system 400 of this embodiment configured as described above will be described below. FIG. 16 is a flowchart showing an example of the operation of the audio reproduction system according to the embodiment of the present invention. This will be described below with reference to FIGS.

上記実施形態の音声再生システムの各図のフローチャートのステップＳ１１７で再生履歴一覧画面が表示された後、操作受付部４０２が各種操作を受け付ける（ステップＳ４０１）。発言の削除をユーザ操作により受け付けた時（ステップＳ４０１の削除）、実行部４０４は、指定された発言を再生履歴記録部４３０の再生履歴から削除する（ステップＳ４０３）。 After the reproduction history list screen is displayed in step S117 in the flowcharts of the respective drawings of the audio reproduction system of the above embodiment, the operation reception unit 402 receives various operations (step S401). When deletion of an utterance is received by a user operation (deletion in step S401), the execution unit 404 deletes the specified utterance from the reproduction history of the reproduction history recording unit 430 (step S403).

発言に対するメモ入力をユーザ操作により受け付けた時（ステップＳ４０１のメモ）、実行部４０４は、指定された発言にメモを追記する。ここでは、再生履歴記録部４３０の再生履歴の該当再生ＩＤの情報にメモを記憶する（ステップＳ４０５）。 When a memo input for a utterance is received by a user operation (a memo in step S401), the execution unit 404 adds a memo to the specified utterance. Here, the memo is stored in the information of the corresponding reproduction ID of the reproduction history of the reproduction history recording unit 430 (step S405).

以上説明したように、本発明の実施の形態に係る音声再生システム４００によれば、上記実施形態と同様な効果を奏するとともに、再生履歴一覧画面の再生リストの各発言に対して、操作が行え、たとえば、再生してみたけれど不要だった発言などを再生履歴から削除することができたり、議論中に発言に対してコメントを付与したくなった場合にも、メモを追記することができ、より効率よく再生履歴を利用することが可能になる。 As described above, according to the audio reproduction system 400 according to the embodiment of the present invention, the same effects as those of the above embodiment can be obtained, and operations can be performed on each message in the reproduction list on the reproduction history list screen. For example, you can delete a comment that you tried to play but were not needed from the playback history, or if you want to add a comment to a comment during a discussion, you can add a note, The playback history can be used more efficiently.

以上、図面を参照して本発明の実施形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。 As mentioned above, although embodiment of this invention was described with reference to drawings, these are the illustrations of this invention, Various structures other than the above are also employable.

たとえば、他の実施形態において、音声再生システムは、上記実施形態の音声再生システムの再生履歴記録部に記録された再生履歴を外部に出力する出力部（不図示）をさらに備えることができる。
この構成によれば、一連の発言を見ながら、議論を行った会議などの議事録として、再生履歴を利用することが可能になる。 For example, in another embodiment, the audio reproduction system can further include an output unit (not shown) that outputs the reproduction history recorded in the reproduction history recording unit of the audio reproduction system of the above embodiment to the outside.
According to this configuration, it is possible to use the reproduction history as a minutes of a meeting or the like in which a discussion is performed while watching a series of statements.

また、他の実施形態において、再生履歴一覧画面に表示する再生履歴の再生時刻の指定を受け付ける時刻指定受付部（不図示）をさらに備えてもよい。再生履歴表示部は、指定された再生時刻に基づいて、記録された再生履歴を再生履歴一覧画面に表示することができる。 In another embodiment, a time designation receiving unit (not shown) that receives designation of a reproduction time of a reproduction history displayed on the reproduction history list screen may be further provided. The reproduction history display unit can display the recorded reproduction history on the reproduction history list screen based on the designated reproduction time.

以上、実施形態および実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
＜＜付記＞＞
＜付記１＞
一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶装置および音声認識結果記憶装置を参照し、前記一連の発言の前記音声の前記音声認識結果を時系列で発言一覧画面に表示する発言一覧表示手段と、
前記発言一覧画面に表示された前記一連の発言の中から発言の選択を受け付ける選択受付手段と、
前記選択受付手段で選択された発言に対する前記音声の前記テキストデータを、その選択指示順を識別する選択識別情報に関連付けて選択履歴として記録する選択履歴記録手段と、
記録された前記選択履歴のテキストデータを前記選択識別情報に基づいて並べて選択履歴一覧画面に表示する選択履歴表示手段と、を備える閲覧システム。
＜付記２＞
付記１に記載の閲覧システムにおいて、
前記選択受付手段は、前記選択履歴一覧画面に表示された前記選択履歴の中から、さらに発言の選択を受け付ける閲覧システム。
＜付記３＞
付記１または２に記載の閲覧システムにおいて、
前記選択履歴記録手段は、前記選択受付手段で選択された発言に対する前記音声の前記テキストデータを、その選択時刻情報に関連付けて選択履歴として記録し、
前記選択履歴表示手段は、前記選択時刻情報に基づいて、記録された前記選択履歴を前記選択履歴一覧画面に表示する閲覧システム。
＜付記４＞
付記１乃至３いずれかに記載の閲覧システムにおいて、
前記選択履歴記録手段は、前記選択受付手段で選択された発言に対する前記音声の前記テキストデータの選択回数を計数する選択回数計数手段をさらに備え、
前記選択履歴表示手段は、前記選択回数に基づいて、記録された前記選択履歴を前記選択履歴一覧画面に表示する閲覧システム。
＜付記５＞
付記４に記載の閲覧システムにおいて、
前記選択履歴一覧画面に表示する前記選択履歴の前記選択時刻情報の指定を受け付ける時刻指定受付手段をさらに備え、
前記選択履歴表示手段は、指定された前記選択時刻情報に基づいて、記録された前記選択履歴を前記選択履歴一覧画面に表示する閲覧システム。
＜付記６＞
付記１乃至５いずれかに記載の閲覧システムにおいて、
前記選択履歴一覧画面に表示された前記発言に対する任意の操作を受け付ける操作受付手段と、
受け付けた前記操作を実行する実行手段と、をさらに備える閲覧システム。
＜付記７＞
付記６に記載の閲覧システムにおいて、
前記操作受付手段は、前記選択履歴一覧画面に表示された前記発言の中から不要な発言の削除指示を受け付け、
前記実行手段は、前記削除指示を受け付けた前記発言を前記選択履歴から削除する閲覧システム。
＜付記８＞
付記１乃至７いずれかに記載の閲覧システムにおいて、
記録された前記選択履歴の中から所定の条件を満たす発言を抽出し、抽出された前記発言に、所定の識別情報を付与する抽出手段をさらに備え、
前記選択履歴表示手段は、抽出された前記発言を識別可能に表示する閲覧システム。
＜付記９＞
付記１乃至８いずれかに記載の閲覧システムにおいて、
前記選択履歴記録手段に記録された前記選択履歴を外部に出力する出力手段をさらに備える閲覧システム。
＜付記１０＞
付記１乃至９いずれかに記載の閲覧システムにおいて、
前記発言一覧画面に表示される発言の中から絞り込むための検索条件を受け付ける条件受付手段と、
前記音声認識結果記憶装置を参照し、前記検索条件に従って前記発言を検索する検索手段と、
検索された検索結果を検索結果一覧画面として表示する検索結果表示手段と、をさらに備え、
前記選択受付手段は、前記検索結果一覧画面に表示された発言の中から発言の選択を受け付ける閲覧システム。
＜付記１１＞
付記１乃至１０いずれかに記載の閲覧システムにおいて、
前記音声記憶装置および前記音声認識結果記憶装置を参照し、前記選択受付手段により選択された前記発言から、前記時刻情報に基づいて前記音声データと前記テキストデータを関連付けて、同期して再生する再生手段をさらに備える閲覧システム。
＜付記１２＞
一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶装置および音声認識結果記憶装置を参照し、前記一連の発言の前記音声の前記音声認識結果を時系列で発言一覧画面に表示し、
前記発言一覧画面に表示された前記一連の発言の中から発言の選択を受け付け、
選択された発言に対する前記音声の前記テキストデータを、その選択指示順を識別する選択識別情報に関連付けて選択履歴として選択履歴記録装置に記録し、
記録された前記選択履歴のテキストデータを前記選択識別情報に基づいて並べて選択履歴一覧画面に表示する閲覧方法。
＜付記１３＞
付記１２に記載の閲覧方法において、
前記音声記憶装置および前記音声認識結果記憶装置を参照し、選択された前記発言から、前記時刻情報に基づいて前記音声データと前記テキストデータを関連付けて、同期して再生する閲覧方法。
＜付記１４＞
一連の発言の音声を音声認識処理し、音声認識結果をテキストデータに変換し、得られた前記音声の音声データおよび前記テキストデータが発言ごとの時刻情報に互いに関連付けられてそれぞれ記憶された音声記憶装置および音声認識結果記憶装置を参照し、前記一連の発言の前記音声の前記音声認識結果を時系列で発言一覧画面に表示する手順と、
前記発言一覧画面に表示された前記一連の発言の中から発言の選択を受け付ける手順と、
選択された発言に対する前記音声の前記テキストデータを、その選択指示順を識別する選択識別情報に関連付けて選択履歴として記録する手順と、
記録された前記選択履歴のテキストデータを前記選択識別情報に基づいて並べて選択履歴一覧画面に表示する手順と、をコンピュータに実行させるためのコンピュータプログラム。
＜付記１５＞
付記１４に記載のコンピュータプログラムにおいて、
前記音声記憶装置および前記音声認識結果記憶装置を参照し、前記選択を受け付ける手順で選択された前記発言から、前記時刻情報に基づいて前記音声データと前記テキストデータを関連付けて、同期して再生する再生手順をコンピュータにさらに実行させるためのコンピュータプログラム。 While the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
<< Appendix >>
<Appendix 1>
A speech memory in which speech of a series of speeches is speech-recognized, speech recognition results are converted into text data, and the obtained speech speech data and the text data are stored in association with time information for each speech Referring to a device and a speech recognition result storage device, a speech list display means for displaying the speech recognition results of the speech of the series of speech on a speech list screen in time series;
Selection accepting means for accepting a choice of a comment from the series of comments displayed on the message list screen;
A selection history recording unit that records the text data of the voice for the utterance selected by the selection receiving unit as a selection history in association with selection identification information that identifies a selection instruction order;
A browsing history system comprising: a selection history display unit configured to display the recorded text data of the selection history on the selection history list screen side by side based on the selection identification information.
<Appendix 2>
In the browsing system described in Appendix 1,
The selection receiving unit is a browsing system that receives a selection of a statement from the selection history displayed on the selection history list screen.
<Appendix 3>
In the browsing system according to appendix 1 or 2,
The selection history recording unit records the text data of the voice for the utterance selected by the selection receiving unit as a selection history in association with the selection time information,
The browsing history display unit displays the recorded selection history on the selection history list screen based on the selection time information.
<Appendix 4>
In the browsing system according to any one of appendices 1 to 3,
The selection history recording unit further includes a selection number counting unit that counts the number of selections of the text data of the voice for the utterance selected by the selection receiving unit,
The selection history display means displays the recorded selection history on the selection history list screen based on the number of times of selection.
<Appendix 5>
In the browsing system described in Appendix 4,
A time designation accepting unit for accepting designation of the selection time information of the selection history displayed on the selection history list screen;
The browsing history system, wherein the selection history display means displays the recorded selection history on the selection history list screen based on the designated selection time information.
<Appendix 6>
In the browsing system according to any one of appendices 1 to 5,
Operation accepting means for accepting an arbitrary operation on the remarks displayed on the selection history list screen;
A browsing system further comprising execution means for executing the received operation.
<Appendix 7>
In the browsing system described in appendix 6,
The operation accepting unit accepts an instruction to delete unnecessary utterances from the utterances displayed on the selection history list screen,
The browsing system in which the execution means deletes the message that has received the deletion instruction from the selection history.
<Appendix 8>
In the browsing system according to any one of appendices 1 to 7,
Extraction means for extracting a utterance that satisfies a predetermined condition from the recorded selection history, and adding predetermined identification information to the extracted utterance,
The selection history display means is a browsing system that displays the extracted utterances so that they can be identified.
<Appendix 9>
In the browsing system according to any one of appendices 1 to 8,
A browsing system further comprising output means for outputting the selection history recorded in the selection history recording means to the outside.
<Appendix 10>
In the browsing system according to any one of appendices 1 to 9,
Condition accepting means for accepting a search condition for narrowing down the comments displayed on the comment list screen;
Search means for referring to the voice recognition result storage device and searching for the utterance according to the search condition;
Search result display means for displaying the searched search results as a search result list screen;
The selection receiving unit is a browsing system that receives a selection of a message from among messages displayed on the search result list screen.
<Appendix 11>
In the browsing system according to any one of appendices 1 to 10,
Playback that refers to the voice storage device and the voice recognition result storage device, and plays back the speech data and the text data in association with each other based on the time information from the utterance selected by the selection receiving means. A browsing system further comprising means.
<Appendix 12>
A speech memory in which speech of a series of speeches is speech-recognized, speech recognition results are converted into text data, and the obtained speech speech data and the text data are stored in association with time information for each speech Referring to a device and a speech recognition result storage device, displaying the speech recognition results of the speech of the series of speech on a speech list screen in time series,
Accepting a selection of comments from the series of messages displayed on the message list screen,
The text data of the voice for the selected utterance is recorded in a selection history recording device as a selection history in association with selection identification information that identifies the selection instruction order,
A browsing method for displaying the recorded text data of the selection history on the selection history list screen side by side based on the selection identification information.
<Appendix 13>
In the browsing method described in appendix 12,
A browsing method of referring to the voice storage device and the voice recognition result storage device, and reproducing the synchronized voice data and the text data in association with the voice data based on the time information from the selected utterance.
<Appendix 14>
A speech memory in which speech of a series of speeches is speech-recognized, speech recognition results are converted into text data, and the obtained speech speech data and the text data are stored in association with time information for each speech Referring to a device and a speech recognition result storage device, and displaying the speech recognition result of the speech of the series of speeches on a speech list screen in time series;
A procedure for accepting selection of a message from the series of messages displayed on the message list screen;
A procedure for recording the text data of the voice for a selected utterance as a selection history in association with selection identification information for identifying a selection instruction order;
A computer program for causing a computer to execute the procedure of displaying the recorded text data of the selection history on the selection history list screen side by side based on the selection identification information.
<Appendix 15>
In the computer program according to attachment 14,
Referring to the voice storage device and the voice recognition result storage device, the voice data and the text data are associated with each other and reproduced based on the time information from the utterance selected in the procedure of accepting the selection. A computer program for causing a computer to further execute a playback procedure.

１００音声再生システム
１０２音声取得部
１０４音声記憶部
１１０音声認識システム
１１２音声認識部
１１４音声認識結果記憶部
１２０発言一覧表示部
１２２第１再生指示受付部
１２４第１再生部
１３０再生履歴記録部
１４０再生履歴表示部
１４２第２再生指示受付部
１４４第２再生部
１５０発言一覧画面
１５２発言リスト
１５４発言者名
１５６発言内容
１６０発言一覧画面
１７０再生履歴一覧画面
２００音声再生システム
２０２映像取得部
２０４映像記憶部
２１０音声認識システム
２２０発言一覧表示部
２２４第１再生部
２４４第２再生部
２５０発言一覧画面
２５２発言リスト
２７０再生履歴一覧ウインドウ
２７２再生リスト
２７４再生履歴ボタン
２７５再生履歴一覧画面
２８０映像表示ウインドウ
２８５再生履歴一覧画面
３００音声再生システム
３０２条件受付部
３０４検索部
３０６検索結果表示部
３２２再生指示受付部
３４０検索ボタン
３５０検索画面
３５２検索条件入力欄
３５４検索結果表示欄
４００音声再生システム
４０２操作受付部
４０４実行部
４３０再生履歴記録部
４８０再生履歴一覧画面
４８２メモ欄 100 voice reproduction system 102 voice acquisition unit 104 voice storage unit 110 voice recognition system 112 voice recognition unit 114 voice recognition result storage unit 120 speech list display unit 122 first playback instruction receiving unit 124 first playback unit 130 playback history recording unit 140 playback History display unit 142 Second playback instruction accepting unit 144 Second playback unit 150 Statement list screen 152 Statement list 154 Speaker name 156 Message content 160 Message list screen 170 Playback history list screen 200 Audio reproduction system 202 Image acquisition unit 204 Image storage unit 210 Speech recognition system 220 Statement list display unit 224 First playback unit 244 Second playback unit 250 Message list screen 252 Message list 270 Playback history list window 272 Playback list 274 Playback history button 275 Playback history list screen 280 Video display window 285 Playback history List screen 300 Audio playback system 302 Condition reception unit 304 Search unit 306 Search result display unit 322 Playback instruction reception unit 340 Search button 350 Search screen 352 Search condition input column 354 Search result display column 400 Audio playback system 402 Operation reception unit 404 Execution unit 430 Playback history recording unit 480 Playback history list screen 482 Memo field

Claims

A speech list display means for displaying a speech list screen for displaying a list of text data of a plurality of the speeches obtained by performing speech recognition processing on speech data including a plurality of speeches ;
Selection accepting means for accepting selection of one of the plurality of the comments displayed on the comment list screen;
Playback means for playing back the audio data from the position of the statement received by the selection receiving means;
Selection history display means for displaying the history of the selected utterance on a selection history list screen;
A browsing system comprising:

The browsing system according to claim 1,
The selection receiving means further includes a browsing system for receiving selection of one of the utterances from the utterances displayed on the selection history list screen.

In the browsing system according to claim 1 or 2,
The browsing history system, wherein the selection history display means displays the selected utterances on the selection history list screen in order of increasing selection frequency.

In the browsing system according to any one of claims 1 to 3,
The browsing history system, wherein the selection history display means displays, on the selection history list screen, the remarks that are selected a predetermined number of times or more.

  The browsing system according to any one of claims 1 to 4,
  The speech list display means performs speech recognition processing on the speech of a series of speech, converts the speech recognition result into text data, and the obtained speech data of the speech and the text data are associated with time information for each speech. The speech recognition results of the voices of the series of utterances are displayed on the utterance list screen in chronological order, with reference to the speech storage device and the speech recognition result storage device stored respectively.
  A selection history recording unit that records the text data of the voice for the utterance selected by the selection receiving unit as a selection history in association with selection identification information that identifies a selection instruction order;
  The browsing history system, wherein the selection history display means displays the selection history text data recorded by the selection history recording means on the selection history list screen side by side based on the selection identification information.

In the browsing system according to claim 5 ,
The selection history recording unit records the text data of the voice for the utterance selected by the selection receiving unit as a selection history in association with the selection time information,
The browsing history display unit displays the recorded selection history on the selection history list screen based on the selection time information.

In the browsing system according to claim 5 or 6 ,
The selection history recording unit further includes a selection number counting unit that counts the number of selections of the text data of the voice for the utterance selected by the selection receiving unit,
The selection history display means displays the recorded selection history on the selection history list screen based on the number of times of selection.

The browsing system according to claim 7 ,
A time designation accepting unit for accepting designation of the selection time information of the selection history displayed on the selection history list screen;
The browsing history system, wherein the selection history display means displays the recorded selection history on the selection history list screen based on the designated selection time information.

The browsing system according to any one of claims 5 to 8 ,
Operation accepting means for accepting an arbitrary operation on the remarks displayed on the selection history list screen;
A browsing system further comprising execution means for executing the received operation.

The browsing system according to claim 9 , wherein
The operation accepting unit accepts an instruction to delete unnecessary utterances from the utterances displayed on the selection history list screen,
The browsing system in which the execution means deletes the message that has received the deletion instruction from the selection history.

The browsing system according to any one of claims 5 to 10 ,
Extraction means for extracting a utterance that satisfies a predetermined condition from the recorded selection history, and adding predetermined identification information to the extracted utterance,
The selection history display means is a browsing system that displays the extracted utterances so that they can be identified.

The browsing system according to any one of claims 5 to 11 ,
A browsing system further comprising output means for outputting the selection history recorded in the selection history recording means to the outside.

The browsing system according to any one of claims 5 to 12 ,
Condition accepting means for accepting a search condition for narrowing down the comments displayed on the comment list screen;
Search means for referring to the voice recognition result storage device and searching for the utterance according to the search condition;
Search result display means for displaying the searched search results as a search result list screen;
The selection receiving unit is a browsing system that receives a selection of a message from among messages displayed on the search result list screen.

  Computer
  A speech list display step for displaying a speech list screen for displaying a list of a plurality of text data of the speech obtained by performing speech recognition processing on speech data including a plurality of speeches;
  A selection accepting step for accepting selection of one of the plurality of the comments displayed on the comment list screen;
  A reproduction step of reproducing the audio data from the selected position of the utterance;
  A selection history display step for displaying the history of the selected utterance on a selection history list screen;
Browsing method to execute.

Computer
  A speech list display means for displaying a speech list screen for displaying a list of text data of the plurality of speeches obtained by performing speech recognition processing on speech data including a plurality of speeches;
  Selection accepting means for accepting selection of one of the plurality of the comments displayed on the comment list screen;
  Playback means for playing back the audio data from the selected position of the utterance;
  Selection history display means for displaying the history of the selected utterance on a selection history list screen;
Program to function as.