JP7282118B2

JP7282118B2 - Program and information processing method

Info

Publication number: JP7282118B2
Application number: JP2021042812A
Authority: JP
Inventors: 重里糸井
Original assignee: HOBONICHI CO., LTD.
Current assignee: HOBONICHI CO., LTD.
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2023-05-26
Anticipated expiration: 2041-03-16
Also published as: JP2022142589A

Description

本発明は、プログラム及び情報処理方法に関する。 The present invention relates to a program and an information processing method.

動画コンテンツにおける音声テキストに関する技術がある。例えば特許文献１には、動画コンテンツに含まれる音声に基づいて取得された音声テキストと、該動画コンテンツに含まれる字幕テキストとが対応するか否かを判定する音声確認システムが開示されている。 There are technologies related to audio text in video content. For example, Patent Literature 1 discloses an audio confirmation system that determines whether or not audio text obtained based on audio included in moving image content corresponds to caption text included in the moving image content.

特開２０２０－１６６２６２号公報JP 2020-166262 A

しかしながら、特許文献１に係る発明は、動画コンテンツにおける音声テキスト（発話テキスト）を登録することができない。 However, the invention according to Patent Document 1 cannot register voice text (spoken text) in video content.

一つの側面では、動画コンテンツにおける発話テキストを登録することが可能なプログラム等を提供することにある。 One aspect is to provide a program or the like capable of registering spoken text in video content.

一つの側面にプログラムは、動画コンテンツ、及び前記動画コンテンツにおける話者の発話内容を記述した発話テキストを取得し、取得した動画コンテンツの再生に合わせて段階的に前記発話テキストを表示し、表示された前記発話テキストに対する登録操作を受け付けた場合に、前記発話テキストを情報処理装置に送信し、ユーザが視聴した動画コンテンツ別に前記登録操作による登録済みの前記発話テキストの一覧を示す第１タブ、及び、前記発話テキストの登録順に前記登録操作による登録済みの前記発話テキストの一覧を示す第２タブを表示し、前記第１タブの選択操作を受け付けた場合に、動画コンテンツ別に登録済みの前記発話テキストを一覧で表示し、前記第２タブの選択操作を受け付けた場合に、登録順に前記発話テキストを一覧で表示する処理をコンピュータに実行させる。 In one aspect, the program acquires video content and an utterance text describing the utterance content of a speaker in the video content, displays the utterance text step by step according to the reproduction of the acquired video content, and is displayed. a first tab that transmits the spoken text to an information processing device when an operation for registering the spoken text is received , and shows a list of the registered spoken texts by the registration operation for each video content viewed by the user; displaying a second tab showing a list of the registered utterance texts by the registration operation in the order of registration of the utterance texts, and displaying the registered utterance texts for each video content when the selection operation of the first tab is accepted. are displayed in a list, and when the selection operation of the second tab is accepted, the computer executes a process of displaying the utterance texts in a list in the order of registration.

一つの側面では、動画コンテンツにおける発話テキストを登録することが可能となる。 In one aspect, it is possible to register spoken text in video content.

動画配信システムの概要を示す説明図である。1 is an explanatory diagram showing an overview of a video distribution system; FIG. サーバの構成例を示すブロック図である。It is a block diagram which shows the structural example of a server. 動画コンテンツＤＢ及び発話テキストＤＢのレコードレイアウトの一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of record layouts of a moving image content DB and an utterance text DB; ユーザＤＢ及び履歴ＤＢのレコードレイアウトの一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of record layouts of a user DB and a history DB; ノートＤＢ及びカテゴリＤＢのレコードレイアウトの一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of record layouts of a note DB and a category DB; 端末の構成例を示すブロック図である。2 is a block diagram showing a configuration example of a terminal; FIG. 動画コンテンツの一覧画面の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a video content list screen; 動画コンテンツの詳細画面の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a detailed screen of moving image content; カテゴリの一覧画面の一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of a category list screen; カテゴリの詳細画面の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a category detail screen; 動画コンテンツにおける発話テキストを表示する際の処理手順を示すフローチャートである。FIG. 10 is a flow chart showing a processing procedure when displaying spoken text in video content; FIG. 発話テキストの登録または登録解除を行う際の処理手順を示すフローチャートである。FIG. 10 is a flow chart showing a processing procedure when registering or canceling the registration of an utterance text; FIG. 動画コンテンツ別の発話テキストの一覧画面の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a list screen of utterance texts by moving image content; 動画コンテンツにおけるすべての発話テキストの表示画面の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a display screen of all spoken texts in video content; 登録順に発話テキストの一覧画面の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a list screen of utterance texts in order of registration; 発話テキストを一覧で表示する際の処理手順を示すフローチャートである。FIG. 10 is a flow chart showing a processing procedure for displaying a list of spoken texts; FIG. 切り替え画面の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of a switching screen;

以下、本発明をその実施形態を示す図面に基づいて詳述する。 Hereinafter, the present invention will be described in detail based on the drawings showing its embodiments.

（実施形態１）
実施形態１は、動画コンテンツの再生に合わせて、該動画コンテンツにおける話者の発話内容を記述した発話テキストを表示する形態に関する。動画コンテンツは、例えば、授業用動画コンテンツ、セミナー用動画コンテンツ、インタビュー動画コンテンツ、または料理、メイク、文化もしくはゲーム等に関する動画コンテンツを含む。 (Embodiment 1)
Embodiment 1 relates to a form of displaying an utterance text describing the utterance content of a speaker in moving image content in accordance with the reproduction of the moving image content. The video content includes, for example, video content for classes, video content for seminars, video interview content, or video content related to cooking, makeup, culture, games, or the like.

図１は、動画配信システムの概要を示す説明図である。本実施形態のシステムは、情報処理装置１及び情報処理端末２を含み、各装置はインターネット等のネットワークＮを介して情報の送受信を行う。 FIG. 1 is an explanatory diagram showing an overview of a video distribution system. The system of this embodiment includes an information processing device 1 and an information processing terminal 2, and each device transmits and receives information via a network N such as the Internet.

情報処理装置１は、種々の情報に対する処理、記憶及び送受信を行う情報処理装置である。情報処理装置１は、例えばサーバ装置、パーソナルコンピュータまたは汎用のタブレットＰＣ（パソコン）等である。本実施形態において、情報処理装置１はサーバ装置であるものとし、以下では簡潔のためサーバ１と読み替える。 The information processing device 1 is an information processing device that processes, stores, and transmits/receives various information. The information processing device 1 is, for example, a server device, a personal computer, a general-purpose tablet PC (personal computer), or the like. In the present embodiment, the information processing device 1 is assumed to be a server device, which is replaced with the server 1 for the sake of brevity.

情報処理端末２は、動画コンテンツの受信及び再生、並びに、該動画コンテンツにおける発話テキストの取得及び表示等を行う端末装置である。情報処理端末２は、例えばスマートフォン、携帯電話、タブレット、パーソナルコンピュータ端末等の情報処理機器である。以下では簡潔のため、情報処理端末２を端末２と読み替える。 The information processing terminal 2 is a terminal device that receives and reproduces moving image content, acquires and displays spoken text in the moving image content, and the like. The information processing terminal 2 is, for example, an information processing device such as a smart phone, a mobile phone, a tablet, or a personal computer terminal. For the sake of brevity, the information processing terminal 2 is replaced with the terminal 2 below.

本実施形態に係るサーバ１は、動画コンテンツ、及び該動画コンテンツにおける話者の発話内容を記述した発話テキストを取得する。サーバ１は、取得した動画コンテンツの再生に合わせて、段階的に該動画コンテンツにおける発話テキストを画面に表示する。サーバ１は、発話テキストに対する登録操作を受け付けた場合、該発話テキストを登録（記憶）する。 The server 1 according to the present embodiment acquires video content and utterance text that describes the utterance content of the speaker in the video content. The server 1 displays the spoken text in the moving image content on the screen step by step in accordance with the reproduction of the acquired moving image content. The server 1 registers (stores) the utterance text when receiving a registration operation for the utterance text.

図２は、サーバ１の構成例を示すブロック図である。サーバ１は、制御部１１、記憶部１２、通信部１３、入力部１４、表示部１５、読取部１６及び大容量記憶部１７を含む。各構成はバスＢで接続されている。 FIG. 2 is a block diagram showing a configuration example of the server 1. As shown in FIG. The server 1 includes a control section 11 , a storage section 12 , a communication section 13 , an input section 14 , a display section 15 , a reading section 16 and a mass storage section 17 . Each configuration is connected by a bus B.

制御部１１はＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、ＧＰＵ（Graphics Processing Unit）等の演算処理装置を含み、記憶部１２に記憶された制御プログラム１Ｐを読み出して実行することにより、サーバ１に係る種々の情報処理、制御処理等を行う。なお、制御プログラム１Ｐは、単一のコンピュータ上で、または１つのサイトにおいて配置されるか、もしくは複数のサイトにわたって分散され、通信ネットワークによって相互接続された複数のコンピュータ上で実行されるように展開することができる。なお、図２では制御部１１を単一のプロセッサであるものとして説明するが、マルチプロセッサであっても良い。 The control unit 11 includes arithmetic processing units such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), and a GPU (Graphics Processing Unit). , various information processing, control processing, etc. related to the server 1 are performed. It should be noted that the control program 1P is deployed to run on a single computer or on multiple computers located at one site or distributed across multiple sites and interconnected by a communication network. can do. Note that although FIG. 2 illustrates the controller 11 as a single processor, it may be a multiprocessor.

記憶部１２はＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等のメモリ素子を含み、制御部１１が処理を実行するために必要な制御プログラム１Ｐ又はデータ等を記憶している。また、記憶部１２は、制御部１１が演算処理を実行するために必要なデータ等を一時的に記憶する。通信部１３は通信に関する処理を行うための通信モジュールであり、ネットワークＮを介して、端末２との間で情報の送受信を行う。 The storage unit 12 includes memory elements such as RAM (Random Access Memory) and ROM (Read Only Memory), and stores the control program 1P or data necessary for the control unit 11 to execute processing. The storage unit 12 also temporarily stores data and the like necessary for the control unit 11 to perform arithmetic processing. The communication unit 13 is a communication module for performing processing related to communication, and transmits and receives information to and from the terminal 2 via the network N.

入力部１４は、マウス、キーボード、タッチパネル、ボタン等の入力デバイスであり、受け付けた操作情報を制御部１１へ出力する。表示部１５は、液晶ディスプレイ又は有機ＥＬ（electroluminescence）ディスプレイ等であり、制御部１１の指示に従い各種情報を表示する。 The input unit 14 is an input device such as a mouse, keyboard, touch panel, buttons, etc., and outputs received operation information to the control unit 11 . The display unit 15 is a liquid crystal display, an organic EL (electroluminescence) display, or the like, and displays various information according to instructions from the control unit 11 .

読取部１６は、ＣＤ（Compact Disc）－ＲＯＭ又はＤＶＤ（Digital Versatile Disc）－ＲＯＭを含む可搬型記憶媒体１ａを読み取る。制御部１１が読取部１６を介して、制御プログラム１Ｐを可搬型記憶媒体１ａより読み取り、大容量記憶部１７に記憶しても良い。また、ネットワークＮ等を介して他のコンピュータから制御部１１が制御プログラム１Ｐをダウンロードし、大容量記憶部１７に記憶しても良い。さらにまた、半導体メモリ１ｂから、制御部１１が制御プログラム１Ｐを読み込んでも良い。 The reader 16 reads a portable storage medium 1a including CD (Compact Disc)-ROM or DVD (Digital Versatile Disc)-ROM. The control unit 11 may read the control program 1P from the portable storage medium 1a via the reading unit 16 and store it in the large-capacity storage unit 17 . Alternatively, the control unit 11 may download the control program 1P from another computer via the network N or the like and store it in the large-capacity storage unit 17 . Furthermore, the control unit 11 may read the control program 1P from the semiconductor memory 1b.

大容量記憶部１７は、例えばＨＤＤ（Hard disk drive:ハードディスク）、ＳＳＤ(Solid State Drive:ソリッドステートドライブ)等の記録媒体を備える。大容量記憶部１７は、動画コンテンツＤＢ（database）１７１、発話テキストＤＢ１７２、ユーザＤＢ１７３、履歴ＤＢ１７４、ノートＤＢ１７５及びカテゴリＤＢ１７６を含む。 The large-capacity storage unit 17 includes a recording medium such as an HDD (Hard disk drive) or an SSD (Solid State Drive). The large-capacity storage unit 17 includes a video content DB (database) 171 , an utterance text DB 172 , a user DB 173 , a history DB 174 , a note DB 175 and a category DB 176 .

動画コンテンツＤＢ１７１は、動画コンテンツに関する情報を記憶している。発話テキストＤＢ１７２は、動画コンテンツにおける話者の発話内容を記述した発話テキストを記憶している。ユーザＤＢ１７３は、ユーザに関する情報を記憶している。履歴ＤＢ１７４は、ユーザが動画コンテンツを視聴した履歴を記憶している。ノートＤＢ１７５は、ユーザの登録操作により登録された発話テキストを記憶している。カテゴリＤＢ１７６は、動画コンテンツのカテゴリを記憶している。 The video content DB 171 stores information on video content. The utterance text DB 172 stores utterance texts that describe the contents of the utterances of the speakers in the video content. The user DB 173 stores information about users. The history DB 174 stores a history of viewing video content by the user. The note DB 175 stores utterance texts registered by a user's registration operation. The category DB 176 stores categories of moving image content.

なお、本実施形態において記憶部１２及び大容量記憶部１７は一体の記憶装置として構成されていても良い。また、大容量記憶部１７は複数の記憶装置により構成されていても良い。更にまた、大容量記憶部１７はサーバ１に接続された外部記憶装置であっても良い。 In addition, in this embodiment, the storage unit 12 and the large-capacity storage unit 17 may be configured as an integrated storage device. Also, the large-capacity storage unit 17 may be composed of a plurality of storage devices. Furthermore, the large-capacity storage unit 17 may be an external storage device connected to the server 1 .

サーバ１はコンピュータ単体で実行しても良いし、複数のコンピュータで分散して実行しても良いし、１台のサーバ内に設けられた複数の仮想マシンによって実現されても良いし、クラウドサーバを用いて実現されても良い。 The server 1 may be executed by a single computer, may be executed by a plurality of computers in a distributed manner, may be realized by a plurality of virtual machines provided in one server, or may be implemented by a cloud server. may be implemented using

図３は、動画コンテンツＤＢ１７１及び発話テキストＤＢ１７２のレコードレイアウトの一例を示す説明図である。
動画コンテンツＤＢ１７１は、動画コンテンツＩＤ列、カテゴリＩＤ列、動画列、タイトル列、サムネイル画像列及び公開日列を含む。動画コンテンツＩＤ列は、各動画コンテンツを識別するために、一意に特定される動画コンテンツのＩＤを記憶している。 FIG. 3 is an explanatory diagram showing an example of record layouts of the video content DB 171 and the spoken text DB 172. As shown in FIG.
The video content DB 171 includes a video content ID column, a category ID column, a video column, a title column, a thumbnail image column, and a release date column. The video content ID column stores IDs of video content that are uniquely identified in order to identify each video content.

カテゴリＩＤ列は、動画コンテンツのカテゴリを特定するカテゴリＩＤを記憶している。動画列は、動画コンテンツのデータを記憶している。タイトル列は、動画コンテンツのタイトルを記憶している。サムネイル画像列は、動画コンテンツのサムネイル画像を記憶している。公開日列は、動画コンテンツを公開した日付を記憶している。 The category ID column stores category IDs that specify categories of video content. The moving image column stores data of moving image content. The title column stores titles of video content. The thumbnail image row stores thumbnail images of moving image content. The release date column stores the dates when the video content was released.

発話テキストＤＢ１７２は、動画コンテンツＩＤ列、発話ＩＤ列、発話テキスト列、開始時点列、表示時間列及び発話テキスト種類列を含む。動画コンテンツＩＤ列は、発話テキストに対応する動画コンテンツを特定するための動画コンテンツＩＤを記憶している。発話ＩＤ列は、各発話テキストを識別するために、一意に特定される発話テキストのＩＤを記憶している。 The spoken text DB 172 includes a video content ID column, a spoken ID column, a spoken text column, a start time column, a display time column, and a spoken text type column. The video content ID column stores video content IDs for specifying video content corresponding to the spoken text. The utterance ID column stores IDs of uniquely specified utterance texts in order to identify each utterance text.

発話テキスト列は、動画コンテンツにおける話者の発話内容を記憶している。発話テキストは、話者の発話をテキスト化した情報に基づいて、発話の意図（目的）または主旨を表すために編集者により編集されたテキストである。なお、発話のすべての内容をテキスト化して発話テキストとして提供しても良い。開始時点列は、発話テキストの表示のタイミングを記憶している。表示時間列は、発話テキストの表示時間を記憶している。発話テキスト種類列は、発話テキストの種類を記憶している。発話テキストの種類は、複数の発話テキストから特定された代表的な発話テキストである「代表」、及び、代表的な発話テキスト以外の発話テキストである「一般」を含む。 The utterance text string stores the utterance content of the speaker in the video content. The utterance text is text edited by an editor to express the intention (objective) or gist of the utterance based on the textual information of the utterance of the speaker. It should be noted that all the contents of the utterance may be converted into text and provided as the utterance text. The starting time column stores the timing of displaying the spoken text. The display time column stores the display time of the spoken text. The spoken text type column stores the type of spoken text. The types of speech texts include “representative”, which is representative speech texts identified from a plurality of speech texts, and “general”, which are speech texts other than the representative speech texts.

図４は、ユーザＤＢ１７３及び履歴ＤＢ１７４のレコードレイアウトの一例を示す説明図である。
ユーザＤＢ１７３は、ユーザＩＤ列、氏名列、会員種類列及び視聴開始日列を含む。ユーザＩＤ列は、各ユーザを識別するために、一意に特定されるユーザのＩＤを記憶している。氏名列は、ユーザの氏名を記憶している。会員種類列は、ユーザの会員種類を記憶している。会員種類は、例えば「法人企業」及び「一般」を含む。なお、会員種類は実際のニーズに応じて設けられても良い。視聴開始日列は、ユーザが動画コンテンツを視聴し始めた日付を記憶している。 FIG. 4 is an explanatory diagram showing an example of the record layout of the user DB 173 and history DB 174. As shown in FIG.
The user DB 173 includes a user ID column, a name column, a membership type column, and a viewing start date column. The user ID column stores unique user IDs for identifying each user. The name column stores the names of users. The member type column stores the member type of the user. Membership types include, for example, "corporation" and "general". Note that membership types may be provided according to actual needs. The viewing start date column stores the date when the user started viewing the video content.

履歴ＤＢ１７４は、履歴ＩＤ列、動画コンテンツＩＤ列、ユーザＩＤ列及び視聴日時列を含む。履歴ＩＤ列は、各履歴データを識別するために、一意に特定される履歴データのＩＤを記憶している。動画コンテンツＩＤ列は、動画コンテンツを特定する動画コンテンツＩＤを記憶している。ユーザＩＤ列は、ユーザを特定するユーザＩＤを記憶している。視聴日時列は、ユーザが動画コンテンツを視聴した日時を記憶している。 The history DB 174 includes a history ID column, a video content ID column, a user ID column, and a viewing date/time column. The history ID column stores IDs of history data that are uniquely identified in order to identify each history data. The video content ID column stores video content IDs that specify video content. The user ID column stores user IDs that identify users. The viewing date and time column stores the date and time when the user viewed the video content.

図５は、ノートＤＢ１７５及びカテゴリＤＢ１７６のレコードレイアウトの一例を示す説明図である。
ノートＤＢ１７５は、ユーザＩＤ列、動画コンテンツＩＤ列、発話ＩＤ列及び登録日時列を含む。ユーザＩＤ列は、ユーザを特定するユーザＩＤを記憶している。動画コンテンツＩＤ列は、動画コンテンツを特定する動画コンテンツＩＤを記憶している。発話ＩＤ列は、発話テキストを特定する発話ＩＤを記憶している。登録日時列は、発話テキストを登録した日時を記憶している。 FIG. 5 is an explanatory diagram showing an example of the record layout of the note DB 175 and category DB 176. As shown in FIG.
The note DB 175 includes a user ID column, a video content ID column, an utterance ID column, and a registration date/time column. The user ID column stores user IDs that identify users. The video content ID column stores video content IDs that specify video content. The utterance ID column stores utterance IDs that specify utterance texts. The registration date and time column stores the date and time when the utterance text was registered.

カテゴリＤＢ１７６は、カテゴリＩＤ列、カテゴリ名称列、背景色列及びアイコン列を含む。カテゴリＩＤ列は、各カテゴリを識別するために、一意に特定されるカテゴリのＩＤを記憶している。カテゴリ名称列は、カテゴリの名称を記憶している。背景色列は、カテゴリ毎に分類して色分け表示された表示領域の背景色を記憶している。アイコン列は、各カテゴリを示すアイコンを記憶している。 The category DB 176 includes a category ID column, category name column, background color column and icon column. The category ID column stores a unique category ID to identify each category. The category name column stores the names of categories. The background color column stores the background color of the display area classified by category and displayed in different colors. The icon column stores icons indicating each category.

カテゴリは、動画コンテンツに応じて設定された任意の種類である。カテゴリは、例えば「おもしろい人生」、「魅せる人生」または「お医者さんに聞いてみる」等のある物事または話題を中心として、その周辺または関連する物事を扱うために設定された種類であっても良い。 A category is an arbitrary type set according to video content. A category is a type that is set to deal with things around or related to a certain thing or topic, such as ``interesting life'', ``fascinating life'', or ``ask a doctor''. Also good.

図６は、端末２の構成例を示すブロック図である。端末２は、制御部２１、記憶部２２、通信部２３、入力部２４、表示部２５及びスピーカ２６を含む。各構成はバスＢで接続されている。 FIG. 6 is a block diagram showing a configuration example of the terminal 2. As shown in FIG. The terminal 2 includes a control section 21 , a storage section 22 , a communication section 23 , an input section 24 , a display section 25 and a speaker 26 . Each configuration is connected by a bus B.

制御部２１はＣＰＵ、ＭＰＵ等の演算処理装置を含み、記憶部２２に記憶された制御プログラム２Ｐを読み出して実行することにより、端末２に係る種々の情報処理、制御処理等を行う。なお、図６では制御部２１を単一のプロセッサであるものとして説明するが、マルチプロセッサであっても良い。記憶部２２はＲＡＭ、ＲＯＭ等のメモリ素子を含み、制御部２１が処理を実行するために必要な制御プログラム２Ｐ又はデータ等を記憶している。また、記憶部２２は、制御部２１が演算処理を実行するために必要なデータ等を一時的に記憶する。 The control unit 21 includes an arithmetic processing device such as a CPU and an MPU, and reads and executes a control program 2P stored in the storage unit 22 to perform various information processing, control processing, and the like related to the terminal 2 . In addition, in FIG. 6, the controller 21 is described as being a single processor, but it may be a multiprocessor. The storage unit 22 includes memory elements such as RAM and ROM, and stores the control program 2P or data necessary for the control unit 21 to execute processing. The storage unit 22 also temporarily stores data and the like necessary for the control unit 21 to perform arithmetic processing.

通信部２３は通信に関する処理を行うための通信モジュールであり、ネットワークＮを介して、サーバ１と情報の送受信を行う。入力部２４は、キーボード、マウスまたは表示部２５と一体化したタッチパネルでも良い。表示部２５は、液晶ディスプレイ又は有機ＥＬディスプレイ等であり、制御部２１の指示に従い各種情報を表示する。スピーカ２６は、電気信号を音に変換する装置である。 The communication unit 23 is a communication module for performing processing related to communication, and transmits/receives information to/from the server 1 via the network N. The input unit 24 may be a keyboard, a mouse, or a touch panel integrated with the display unit 25 . The display unit 25 is a liquid crystal display, an organic EL display, or the like, and displays various information according to instructions from the control unit 21 . The speaker 26 is a device that converts electrical signals into sound.

続いて、動画コンテンツの再生に合わせて発話テキストを表示する処理を説明する。なお、本実施形態では、動画コンテンツのカテゴリが、予めサーバ１のカテゴリＤＢ１７６に記憶される。また、動画コンテンツ及び該動画コンテンツにおける発話テキストは、予めサーバ１の動画コンテンツＤＢ１７１及び発話テキストＤＢ１７２に記憶される。 Next, the process of displaying the spoken text in accordance with the reproduction of the video content will be described. Note that, in the present embodiment, categories of moving image content are stored in the category DB 176 of the server 1 in advance. Also, the moving image content and the spoken text in the moving image content are stored in advance in the moving image content DB 171 and the spoken text DB 172 of the server 1 .

発話テキストについては、編集者の手入力により取得されても良く、または動画コンテンツの音声データから変換して取得されても良い。音声データの変換処理を利用した場合、サーバ１は動画コンテンツから音声データを抽出し、抽出した音声データを発話テキストに変換するための音声認識処理を行う。音声認識処理は、例えばＳＴＴ（Speech To Text）技術が利用されて良い。なお、ディープラーニング、またはその他のニューラルネットワークで実現された学習モデルによって音声データは発話テキストに変換されて良い。サーバ１は、取得した発話テキストを動画コンテンツＩＤに対応付けて発話テキストＤＢ１７２に記憶する。 The spoken text may be obtained by manual input by an editor, or may be obtained by converting from audio data of video content. When the conversion processing of voice data is used, the server 1 extracts voice data from the moving image content and performs voice recognition processing for converting the extracted voice data into spoken text. Speech To Text (STT) technology, for example, may be used for speech recognition processing. Note that the speech data may be converted to spoken text by deep learning or other neural network implemented learning models. The server 1 stores the acquired spoken text in the spoken text DB 172 in association with the moving image content ID.

そして、端末２は、動画コンテンツ及び該動画コンテンツにおける発話テキストをサーバ１から取得する。具体的には、端末２は、動画コンテンツ及び発話テキストの取得リクエストをサーバ１に送信する。サーバ１は、端末２から送信された取得リクエストに応じて、複数の動画コンテンツに関する情報を動画コンテンツＤＢ１７１から取得する。動画コンテンツに関する情報は、動画コンテンツＩＤ、動画コンテンツのカテゴリ、タイトル、サムネイル画像及び公開日等を含む。 Then, the terminal 2 acquires the moving image content and the spoken text in the moving image content from the server 1 . Specifically, the terminal 2 transmits to the server 1 an acquisition request for moving image content and spoken text. The server 1 acquires information on a plurality of moving image contents from the moving image content DB 171 in response to the acquisition request transmitted from the terminal 2 . The information about the moving image content includes the moving image content ID, the moving image content category, the title, the thumbnail image, the release date, and the like.

サーバ１は、取得した各動画コンテンツＩＤに基づいて、各動画コンテンツにおける発話テキストに関する情報を発話テキストＤＢ１７２から取得する。発話テキストに関する情報は、発話ＩＤ、発話テキスト、発話テキストの表示の開始時点及び表示時間等を含む。サーバ１は、取得した動画コンテンツに関する情報、及び発話テキストに関する情報を端末２に送信する。端末２は、サーバ１から送信された動画コンテンツに関する情報、及び発話テキストに関する情報を受信する。 The server 1 acquires information about the spoken text in each moving image content from the spoken text DB 172 based on each moving image content ID thus acquired. The information about the utterance text includes the utterance ID, the utterance text, the display start time and display time of the utterance text, and the like. The server 1 transmits to the terminal 2 information about the acquired video content and information about the spoken text. The terminal 2 receives the information about the video content and the information about the spoken text transmitted from the server 1 .

端末２は、それぞれの動画コンテンツにおける複数の発話テキストから、代表的な発話テキストを特定する。代表的な発話テキストは、例えば予め編集者により設定される。例えば、編集者が、発話の内容から発話の意図または主旨等を表すことが可能な発話テキストを選定した場合、端末２は、編集者により選定された発話テキストの入力を受け付ける。端末２は、受け付けた代表的な発話テキストの発話ＩＤに対応付けて、「代表」である発話テキストの種類を発話テキストＤＢ１７２に記憶する。このように、代表的な発話テキストが予め発話テキストＤＢ１７２に記憶された場合、端末２は、動画コンテンツにおける複数の発話テキストから、発話テキストの種類が「代表」である発話テキストを発話テキストＤＢ１７２から抽出する。端末２は、抽出した発話テキストを代表的な発話テキストとして特定する。 Terminal 2 identifies a representative spoken text from a plurality of spoken texts in each video content. A typical spoken text is set in advance by an editor, for example. For example, when the editor selects an utterance text that can express the intention or gist of the utterance from the contents of the utterance, the terminal 2 receives input of the utterance text selected by the editor. The terminal 2 stores the type of the "representative" utterance text in the utterance text DB 172 in association with the utterance ID of the received representative utterance text. In this way, when representative spoken texts are stored in advance in the spoken text DB 172, the terminal 2 selects, from the spoken text DB 172, an spoken text whose type of spoken text is "representative" from a plurality of spoken texts in the video content. Extract. Terminal 2 identifies the extracted speech text as a representative speech text.

なお、上述した代表的な発話テキストの特定処理に限るものではない。例えばサーバ１は、発話テキストに含まれる動画コンテンツのタイトルまたはタイトルの関連語に基づいて、発話テキストをクラスタリングする。サーバ１は、各クラスタに含まれる発話テキスト数が所定数（例えば、１０）以上のクラスタから、ランダムに所定数（例えば、２）の発話テキストを取得しても良い。これにより、多様性を持たせ、且つ、話題になり易い発話テキストを選択することができる。 It should be noted that the process is not limited to the above-described typical speech text specifying process. For example, the server 1 clusters the spoken texts based on the title of the video content included in the spoken texts or related words of the titles. The server 1 may randomly acquire a predetermined number (eg, 2) of spoken texts from clusters in which the number of spoken texts included in each cluster is equal to or greater than a predetermined number (eg, 10). This makes it possible to select utterance texts that have diversity and are likely to become topics of discussion.

端末２は、動画コンテンツのカテゴリ毎に、動画コンテンツのタイトル及びサムネイル画像に対応付けて代表的な発話テキストを画面に表示する。端末２は、動画コンテンツの再生操作を受け付けた場合、動画コンテンツＩＤに基づいて、該動画コンテンツのデータをサーバ１の動画コンテンツＤＢ１７１から取得する。端末２は、取得した動画コンテンツのデータを再生する。端末２は、動画コンテンツの再生に合わせて、段階的に該動画コンテンツにおける発話テキストを画面に表示する。具体的には、端末２は動画コンテンツの再生中に、各発話テキストの表示の開始時点に基づいて各発話テキストを画面に表示する。 The terminal 2 displays a representative spoken text on the screen in association with the title and thumbnail image of the moving image content for each category of moving image content. When terminal 2 receives an operation to reproduce video content, terminal 2 acquires data of the video content from video content DB 171 of server 1 based on the video content ID. The terminal 2 reproduces the acquired video content data. The terminal 2 displays the spoken text in the moving image content on the screen step by step in accordance with the reproduction of the moving image content. Specifically, the terminal 2 displays each utterance text on the screen based on the display start time of each utterance text during playback of the video content.

端末２は、発話テキストに対する登録操作を受け付けた場合、ユーザＩＤ及び動画コンテンツＩＤに対応付けて該発話テキストの発話ＩＤをサーバ１に送信する。サーバ１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び発話ＩＤを受信する。サーバ１は、受信したユーザＩＤ及び動画コンテンツＩＤに対応付けて、発話ＩＤ及び登録日時を一つのレコードとしてノートＤＢ１７５に登録（記憶）する。 When the terminal 2 receives the registration operation for the utterance text, the terminal 2 transmits the utterance ID of the utterance text to the server 1 in association with the user ID and the video content ID. The server 1 receives the user ID, video content ID, and speech ID transmitted from the terminal 2 . The server 1 registers (stores) the utterance ID and the date and time of registration in the note DB 175 as one record in association with the received user ID and video content ID.

図７は、動画コンテンツの一覧画面の一例を示す説明図である。該画面は、本日公開表示欄１１ａ、カテゴリ表示欄１１ｂ及びカテゴリ個別表示欄１１ｃを含む。 FIG. 7 is an explanatory diagram showing an example of a video content list screen. The screen includes a today's disclosure display field 11a, a category display field 11b, and an individual category display field 11c.

本日公開表示欄１１ａは、本日公開された動画コンテンツを表示する表示欄である。本日公開表示欄１１ａには、本日公開された複数の動画コンテンツ、またはおすすめの複数の動画コンテンツ等が第２方向（例えば、横方向）に並べて表示される。なお、本日公開された動画コンテンツ及びおすすめの動画コンテンツは、同時に並んで表示されても良い。なお、第２方向に並べられた複数の動画コンテンツが本日公開表示欄１１ａに収まりきらない場合、複数の動画コンテンツを第２方向にスクロール可能に表示することができる。 The today's release display column 11a is a display column for displaying the video content released today. In today's release display field 11a, a plurality of video contents released today, a plurality of recommended video contents, or the like are displayed side by side in the second direction (for example, horizontal direction). Note that the video content released today and the recommended video content may be displayed side by side at the same time. If the plurality of moving image contents arranged in the second direction cannot fit in the today's disclosure display field 11a, the plurality of moving image contents can be scrollably displayed in the second direction.

本日公開表示欄１１ａは、動画表示欄１１ａ１、再生ボタン１１ａ２、タイトル表示欄１１ａ３、公開日表示欄１１ａ４及び詳細ボタン１１ａ５を含む。動画表示欄１１ａ１は、動画コンテンツのサムネイル画像を表示する表示欄である。再生ボタン１１ａ２は、動画コンテンツの再生のボタンである。タイトル表示欄１１ａ３は、動画コンテンツのタイトルを表示する表示欄である。公開日表示欄１１ａ４は、動画コンテンツの公開日を表示する表示欄である。詳細ボタン１１ａ５は、後述する動画コンテンツの詳細画面（図８）に遷移するボタンである。 The today's release display field 11a includes a moving image display field 11a1, a play button 11a2, a title display field 11a3, a release date display field 11a4, and a details button 11a5. The moving image display column 11a1 is a display column for displaying thumbnail images of moving image content. The play button 11a2 is a button for playing video content. The title display field 11a3 is a display field for displaying the title of the video content. The release date display column 11a4 is a display column for displaying the release date of the video content. The detail button 11a5 is a button for transitioning to a detail screen (FIG. 8) of moving image content, which will be described later.

カテゴリ表示欄１１ｂは、動画コンテンツのカテゴリ（島）を表示する表示欄である。なお、横及び縦方向に隣接させて列状に並べられた複数のカテゴリが、カテゴリ表示欄１１ｂに収まりきらない場合、複数のカテゴリを横スクロール及び縦スクロール可能に表示することができる。カテゴリ表示欄１１ｂは、カテゴリ一覧ボタン１１ｂ１及びカテゴリ名称表示欄１１ｂ２を含む。カテゴリ一覧ボタン１１ｂ１は、後述するカテゴリの一覧画面（図９）に遷移するボタンである。カテゴリ名称表示欄１１ｂ２は、カテゴリの名称を表示する表示欄である。 The category display column 11b is a display column for displaying categories (islands) of video content. If a plurality of categories arranged in a row adjacent to each other in the horizontal and vertical directions cannot fit in the category display field 11b, the plurality of categories can be horizontally and vertically scrollable. The category display field 11b includes a category list button 11b1 and a category name display field 11b2. The category list button 11b1 is a button for transitioning to a category list screen (FIG. 9), which will be described later. The category name display field 11b2 is a display field for displaying the name of the category.

端末２は、カテゴリＩＤ、カテゴリの名称及び背景色を含むカテゴリに関する情報をサーバ１のカテゴリＤＢ１７６から取得する。端末２は、取得したカテゴリに関する情報をカテゴリ表示欄１１ｂに表示する。具体的には、端末２は、取得した各カテゴリの名称を該当するカテゴリ名称表示欄１１ｂ２に表示する。端末２は、取得した各カテゴリの背景色を該当するカテゴリ名称表示欄１１ｂ２の背景色として設定する。なお、カテゴリの表示に関しては、上述した色分け表示に限定せず、各カテゴリを示すアイコンを用いても良い。端末２は、カテゴリ一覧ボタン１１ｂ１のタッチ操作を受け付けた場合、カテゴリＩＤ及びカテゴリの名称をカテゴリの一覧画面（図９）に受け渡し、カテゴリの一覧画面に遷移する。 The terminal 2 acquires category-related information including the category ID, category name, and background color from the category DB 176 of the server 1 . The terminal 2 displays the acquired information about the category in the category display field 11b. Specifically, the terminal 2 displays the acquired name of each category in the corresponding category name display field 11b2. The terminal 2 sets the acquired background color of each category as the background color of the corresponding category name display field 11b2. It should be noted that the category display is not limited to the color-coded display described above, and an icon indicating each category may be used. When receiving a touch operation on the category list button 11b1, the terminal 2 transfers the category ID and the name of the category to the category list screen (FIG. 9), and transitions to the category list screen.

カテゴリ個別表示欄１１ｃは、カテゴリに属する複数の動画コンテンツを表示する表示欄である。カテゴリ個別表示欄１１ｃは、カテゴリの数に応じて設けられ、第１方向（例えば、縦方向）に並んで表示される。なお、第１方向に並べられた複数のカテゴリ個別表示欄１１ｃが画面に収まりきらない場合、複数のカテゴリ個別表示欄１１ｃを第１方向にスクロール可能に表示することができる。 The category individual display field 11c is a display field for displaying a plurality of moving image contents belonging to the category. The individual category display fields 11c are provided according to the number of categories, and displayed side by side in the first direction (for example, the vertical direction). If the plurality of individual category display fields 11c arranged in the first direction cannot be displayed on the screen, the plurality of individual category display fields 11c can be scrollably displayed in the first direction.

また、各カテゴリ個別表示欄１１ｃには、該カテゴリに属する複数の動画コンテンツが第２方向（例えば、横方向）に並べて表示される。なお、第２方向に並べられた複数の動画コンテンツが該当するカテゴリ個別表示欄１１ｃに収まりきらない場合、複数の動画コンテンツを第２方向にスクロール可能に表示することができる。 Also, in each category individual display field 11c, a plurality of moving image contents belonging to the category are displayed side by side in the second direction (for example, horizontal direction). If the plurality of moving image contents arranged in the second direction cannot fit in the corresponding category individual display field 11c, the plurality of moving image contents can be scrollably displayed in the second direction.

カテゴリ個別表示欄１１ｃは、動画表示欄１１ａ１、タイトル表示欄１１ａ３、カテゴリ名称表示欄１１ｃ１、発話テキスト表示欄１１ｃ２、カテゴリ詳細ボタン１１ｃ３を含む。動画表示欄１１ａ１及びタイトル表示欄１１ａ３については、上述した内容と同様であるため、説明を省略する。カテゴリ名称表示欄１１ｃ１は、カテゴリの名称を表示する表示欄である。発話テキスト表示欄１１ｃ２は、動画コンテンツにおける発話テキストを表示する表示欄である。カテゴリ詳細ボタン１１ｃ３は、後述するカテゴリの詳細画面（図１０）に遷移するボタンである。 The individual category display field 11c includes a moving image display field 11a1, a title display field 11a3, a category name display field 11c1, a spoken text display field 11c2, and a category details button 11c3. The moving image display field 11a1 and the title display field 11a3 are the same as those described above, so description thereof will be omitted. The category name display field 11c1 is a display field for displaying the name of the category. The spoken text display field 11c2 is a display field for displaying the spoken text in the video content. The category details button 11c3 is a button for transitioning to a category details screen (FIG. 10), which will be described later.

端末２は、動画コンテンツに関する情報（動画コンテンツＩＤ、動画コンテンツのカテゴリ、タイトル、サムネイル画像及び公開日等）、及び該動画コンテンツにおける発話テキストに関する情報（発話ＩＤ及び発話テキスト等）をサーバ１の動画コンテンツＤＢ１７１及び発話テキストＤＢ１７２から取得する。 The terminal 2 transmits information about video content (video content ID, video content category, title, thumbnail image, release date, etc.) and information about spoken text in the video content (speech ID, spoken text, etc.) to the video of the server 1. Acquired from the content DB 171 and the spoken text DB 172 .

端末２は、取得した動画コンテンツの公開日に基づいて、本日公開された動画コンテンツを抽出する。端末２は、抽出した本日公開された動画コンテンツを本日公開表示欄１１ａに表示する。具体的には、端末２は、抽出した各動画コンテンツのサムネイル画像を該当動画表示欄１１ａ１に表示し、各動画コンテンツのタイトルを該当するタイトル表示欄１１ａ３に表示し、各動画コンテンツの公開日を該当する公開日表示欄１１ａ４に表示する。 The terminal 2 extracts the video content released today based on the release date of the acquired video content. The terminal 2 displays the extracted video content released today in the today's release display field 11a. Specifically, the terminal 2 displays the extracted thumbnail image of each moving image content in the corresponding moving image display column 11a1, displays the title of each moving image content in the corresponding title display column 11a3, and displays the release date of each moving image content. It is displayed in the corresponding publication date display field 11a4.

端末２は、再生ボタン１１ａ２のタッチ（クリック）操作を受け付けた場合、動画コンテンツＩＤに基づいて、該動画コンテンツのデータをサーバ１の動画コンテンツＤＢ１７１から取得する。端末２は、取得した動画コンテンツのデータを再生する。端末２は、詳細ボタン１１ａ５のタッチ操作を受け付けた場合、動画コンテンツＩＤを動画コンテンツの詳細画面（図８）に受け渡し、動画コンテンツの詳細画面に遷移する。 When the terminal 2 receives the touch (click) operation of the playback button 11a2, the terminal 2 acquires the data of the moving image content from the moving image content DB 171 of the server 1 based on the moving image content ID. The terminal 2 reproduces the acquired video content data. When the terminal 2 receives the touch operation of the detail button 11a5, the terminal 2 passes the moving image content ID to the detailed screen of the moving image content (FIG. 8), and transitions to the detailed screen of the moving image content.

端末２は、サーバ１から取得された動画コンテンツにおける発話テキストから、代表的な発話テキストを特定する。端末２は、カテゴリ毎に、動画コンテンツに対応付けて代表的な発話テキストをカテゴリ個別表示欄１１ｃに表示する。具体的には、端末２は、動画コンテンツのサムネイル画像を動画表示欄１１ａ１に表示し、動画コンテンツのタイトルをタイトル表示欄１１ａ３に表示し、動画コンテンツにおける代表的な発話テキストを発話テキスト表示欄１１ｃ２に表示する。 The terminal 2 identifies representative spoken texts from the spoken texts in the video content acquired from the server 1 . For each category, the terminal 2 displays a representative spoken text in the individual category display field 11c in association with the video content. Specifically, the terminal 2 displays the thumbnail image of the moving image content in the moving image display field 11a1, displays the title of the moving image content in the title display field 11a3, and displays the typical spoken text in the moving image content in the spoken text display field 11c2. to display.

端末２は、カテゴリ詳細ボタン１１ｃ３のタッチ操作を受け付けた場合、カテゴリＩＤ、及び該カテゴリに属する各動画コンテンツＩＤをカテゴリの詳細画面（図１０）に受け渡し、カテゴリの詳細画面に遷移する。 When the terminal 2 accepts the touch operation of the category details button 11c3, it passes the category ID and each video content ID belonging to the category to the category details screen (FIG. 10), and transitions to the category details screen.

図８は、動画コンテンツの詳細画面の一例を示す説明図である。なお、図８Ａ、図８Ｂ及び図８Ｃは、ユーザの操作に応じて変化させた詳細画面の一例を示す説明図である。図８Ａは、発話テキストの展開の際の詳細画面である。図８Ｂは、発話テキストの折りたたみの際の詳細画面である。図８Ｃは、動画コンテンツの全画面表示の際の詳細画面である。 FIG. 8 is an explanatory diagram showing an example of a detailed screen of moving image content. Note that FIGS. 8A, 8B, and 8C are explanatory diagrams showing an example of the detailed screen changed according to the user's operation. FIG. 8A is a detailed screen when expanding the spoken text. FIG. 8B is a detailed screen when collapsing the spoken text. FIG. 8C is a detailed screen when the moving image content is displayed on the full screen.

該画面は、動画表示欄１２ａ、再生ボタン１２ｂ、早戻しボタン１２ｃ、早送りボタン１２ｄ、発話テキスト表示欄１２ｅ、折りたたみボタン１２ｆ、フルスクリーン表示ボタン１２ｇ、発話再生ボタン１２ｈ、登録ボタン１２ｉ、シェアボタン１２ｊ及び発話開始時点表示欄１２ｋを含む。 The screen includes a moving image display field 12a, a play button 12b, a fast rewind button 12c, a fast forward button 12d, a spoken text display field 12e, a folding button 12f, a full screen display button 12g, a speech play button 12h, a registration button 12i, and a share button 12j. and an utterance start point display field 12k.

動画表示欄１２ａは、動画コンテンツを表示する表示欄である。再生ボタン１２ｂは、動画コンテンツの再生／一時停止のボタンである。早戻しボタン１２ｃは、動画コンテンツに対して早戻しを実行するボタンである。早送りボタン１２ｄは、動画コンテンツに対して早送りを実行するボタンである。発話テキスト表示欄１２ｅは、動画コンテンツにおける発話テキストを表示する表示欄である。 The moving image display column 12a is a display column for displaying moving image content. The play button 12b is a button for playing/pausing moving image content. The rewind button 12c is a button for rewinding moving image content. The fast-forward button 12d is a button for fast-forwarding moving image content. The spoken text display field 12e is a display field for displaying the spoken text in the video content.

折りたたみボタン１２ｆは、発話テキストの展開または折りたたみを行うボタンである。フルスクリーン表示ボタン１２ｇは、動画コンテンツを全画面で表示するボタンである。発話再生ボタン１２ｈは、発話テキストの表示の開始時点（タイミング）を動画コンテンツの再生開始位置として、該動画コンテンツを再生するボタンである。登録ボタン１２ｉは、発話テキストを登録するボタンである。シェアボタン１２ｊは、動画コンテンツに対応付けて発話テキストをシェアするボタンである。発話開始時点表示欄１２ｋは、発話テキストの表示の開始時点を表示する表示欄である。 The folding button 12f is a button for expanding or folding the spoken text. The full screen display button 12g is a button for displaying video content in full screen. The speech playback button 12h is a button for playing back video content with the display start time (timing) of the speech text as the playback start position of the video content. The registration button 12i is a button for registering an utterance text. The share button 12j is a button for sharing the spoken text associated with the video content. The speech start time display field 12k is a display field for displaying the start time of display of the speech text.

端末２は、動画コンテンツの一覧画面（図７）から受け渡された動画コンテンツＩＤを受け取る。端末２は、受け取った動画コンテンツＩＤに基づいて、該動画コンテンツに関する情報（動画コンテンツのデータ及びサムネイル画像等）、及び該動画コンテンツにおける発話テキストをサーバ１の動画コンテンツＤＢ１７１及び発話テキストＤＢ１７２から取得する。 The terminal 2 receives the video content ID passed from the video content list screen (FIG. 7). Based on the received video content ID, the terminal 2 acquires information about the video content (video content data, thumbnail images, etc.) and spoken text in the video content from the video content DB 171 and the spoken text DB 172 of the server 1. .

端末２は、取得した複数の動画コンテンツのサムネイル画像を動画表示欄１２ａに表示する。端末２は、取得した発話テキストから代表的な発話テキストを特定し、特定した代表的な発話テキストを発話テキスト表示欄１２ｅに表示する。なお、代表的な発話テキストの特定処理に関しては、上述した処理と同様であるため、説明を省略する。なお、代表的な発話テキストに限定せず、例えば端末２は、発話テキストの表示の開始時点の新しい順に一番目の発話テキストを表示しても良い。 The terminal 2 displays thumbnail images of the plurality of acquired moving image contents in the moving image display field 12a. The terminal 2 identifies a representative speech text from the obtained speech texts, and displays the identified representative speech text in the speech text display field 12e. Note that the process of specifying a representative utterance text is the same as the process described above, so the description is omitted. For example, the terminal 2 may display the first utterance text in chronological order from the time when the display of the utterance text is started, without being limited to representative utterance texts.

端末２は、再生ボタン１２ｂのタッチ操作を受け付けた場合、動画コンテンツのデータを再生する。端末２は動画コンテンツの再生中に、該動画コンテンツにおける各発話テキストの表示の開始時点に基づいて、各発話テキストを発話テキスト表示欄１２ｅに切り替えて表示する。なお、動画コンテンツの再生中に、端末２は再生ボタン１２ｂのタッチ操作を再度受け付けた場合、該動画コンテンツのデータの再生を一時停止する。端末２は、早戻しボタン１２ｃのタッチ操作を受け付けた場合、所定の秒数（例えば、１０秒）で動画コンテンツに対して早戻しを実行する。端末２は、早送りボタン１２ｄのタッチ操作を受け付けた場合、所定の秒数（例えば、１０秒）で動画コンテンツに対して早送りを実行する。 The terminal 2 reproduces the data of the video content when receiving the touch operation of the reproduction button 12b. The terminal 2 switches and displays each utterance text to the utterance text display field 12e based on the start time of display of each utterance text in the moving image content during playback of the moving image content. It should be noted that, when the terminal 2 receives the touch operation of the play button 12b again during the reproduction of the video content, the reproduction of the data of the video content is temporarily stopped. When the terminal 2 receives the touch operation of the fast-rewind button 12c, the terminal 2 fast-reverses the moving image content in a predetermined number of seconds (for example, 10 seconds). When the terminal 2 receives the touch operation of the fast-forward button 12d, the terminal 2 fast-forwards the video content for a predetermined number of seconds (for example, 10 seconds).

端末２は、折りたたみボタン１２ｆのタッチ操作を受け付けた場合、発話テキストの展開を行い、すべての発話テキストを発話テキスト表示欄１２ｅに表示する（図８Ｂ）。なお、縦方向に並べられた複数の発話テキストが発話テキスト表示欄１２ｅに収まりきらない場合、複数の発話テキストを縦方向にスクロール可能に表示することができる。また、端末２は動画コンテンツの再生中に、該動画コンテンツの再生タイミングに合わせて表示している発話テキストをハイライト（例えば、太線、斜線またはカラー）で表示する。端末２は、「展開中」となった発話テキストに対し、折りたたみボタン１２ｆのタッチ操作を再度受け付けた場合、発話テキストの折りたたみを行う（図８Ａ）。 When receiving the touch operation of the fold button 12f, the terminal 2 expands the speech texts and displays all the speech texts in the speech text display field 12e (FIG. 8B). If the plurality of utterance texts arranged in the vertical direction cannot fit in the utterance text display field 12e, the plurality of utterance texts can be displayed so as to be scrollable in the vertical direction. In addition, while the moving image content is being reproduced, the terminal 2 displays the displayed spoken text in a highlight (for example, bold line, diagonal line, or color) in accordance with the reproduction timing of the moving image content. When the terminal 2 accepts again the touch operation of the fold button 12f for the speech text that is "developing", the terminal 2 folds the speech text (FIG. 8A).

「折りたたみ」状態で発話テキストが表示された場合、端末２は、各発話テキストに対して登録ボタン１２ｉ及びシェアボタン１２ｊを生成して画面に表示する。「展開中」状態で発話テキストが表示された場合、端末２は、各発話テキストに対して発話再生ボタン１２ｈ、登録ボタン１２ｉ、シェアボタン１２ｊ及び発話開始時点表示欄１２ｋを生成して画面に表示する。端末２は、各発話テキストの表示の開始時点を該当する発話開始時点表示欄１２ｋに表示する。 When the speech texts are displayed in the "folded" state, the terminal 2 generates a registration button 12i and a share button 12j for each speech text and displays them on the screen. When the utterance text is displayed in the "deploying" state, the terminal 2 generates an utterance reproduction button 12h, a registration button 12i, a share button 12j, and an utterance start time display field 12k for each utterance text, and displays them on the screen. do. The terminal 2 displays the start point of display of each utterance text in the corresponding utterance start point display field 12k.

端末２は、発話再生ボタン１２ｈのタッチ操作を受け付けた場合、発話テキストの表示タイミング（開始時点）に対応する動画コンテンツの再生タイミングにて再生する。例えば発話テキストの表示の開始時点が「０２：３４」である場合、端末２は、「０２：３４」となった動画コンテンツの再生タイミングにて該動画コンテンツを再生する。 When the terminal 2 receives the touch operation of the speech reproduction button 12h, the terminal 2 reproduces the moving image content at the reproduction timing corresponding to the display timing (start point) of the speech text. For example, if the speech text display start time is "02:34", the terminal 2 reproduces the video content at the reproduction timing of "02:34".

端末２は、動画コンテンツの再生中に、登録ボタン１２ｉのタッチ操作を受け付けた場合、ユーザＩＤ及び動画コンテンツＩＤに対応付けて発話テキストの発話ＩＤをサーバ１に送信する。サーバ１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び発話ＩＤを受信する。サーバ１は、受信したユーザＩＤ及び動画コンテンツＩＤに対応付けて、発話ＩＤ及び登録日時を一つのレコードとしてノートＤＢ１７５に登録する。端末２は、登録された発話テキストに対し、該当する登録ボタン１２ｉを登録済み状態に変更する。例えば端末２は、登録ボタン１２ｉの背景色を赤に変更する。 When the terminal 2 receives the touch operation of the registration button 12i during playback of the moving image content, the terminal 2 transmits the utterance ID of the utterance text to the server 1 in association with the user ID and the moving image content ID. The server 1 receives the user ID, video content ID, and speech ID transmitted from the terminal 2 . The server 1 registers the utterance ID and the registration date and time as one record in the note DB 175 in association with the received user ID and video content ID. The terminal 2 changes the registration button 12i corresponding to the registered utterance text to the registered state. For example, the terminal 2 changes the background color of the registration button 12i to red.

なお、端末２は、登録済みの発話テキストに対し、登録ボタン１２ｉのタッチ操作を再度受け付けた場合、発話テキストの登録解除処理を行う。具体的には、端末２は、ユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤをサーバ１に送信する。サーバ１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤに基づいて、該当する発話テキストのレコードをノートＤＢ１７５から削除する。端末２は、該当する登録ボタン１２ｉを登録解除済み状態に変更する。例えば端末２は、登録ボタン１２ｉの背景色を灰色に変更する。 It should be noted that, when the terminal 2 receives again the touch operation of the registration button 12i for the registered utterance text, the terminal 2 performs registration cancellation processing of the utterance text. Specifically, the terminal 2 transmits the user ID, the video content ID, and the speech ID of the speech text to the server 1 . The server 1 deletes the corresponding utterance text record from the note DB 175 based on the user ID, the video content ID, and the utterance ID of the utterance text transmitted from the terminal 2 . The terminal 2 changes the corresponding registration button 12i to the unregistered state. For example, the terminal 2 changes the background color of the registration button 12i to gray.

端末２は、シェアボタン１２ｊのタッチ操作を受け付けた場合、該当する発話テキストを動画コンテンツに対応付けて共有する。共有先は、例えば、ＳＮＳ（Social Networking Service）のサイトまたは電子掲示板等であっても良い。ＳＮＳのサイトは、例えばＴｗｉｔｔｅｒ（登録商標）、Ｆａｃｅｂｏｏｋ（登録商標）、またはＬＩＮＥ（登録商標）等のサイトである。電子掲示板は、ネットワークを使用した環境において、記事を書き込んだり、閲覧したり、コメントを付けることが可能なサイトである。 When receiving the touch operation of the share button 12j, the terminal 2 associates the corresponding utterance text with the moving image content and shares it. The sharing destination may be, for example, an SNS (Social Networking Service) site, an electronic bulletin board, or the like. The SNS site is, for example, a site such as Twitter (registered trademark), Facebook (registered trademark), or LINE (registered trademark). An electronic bulletin board is a site where articles can be written, viewed, and comments can be added in an environment using a network.

例えば、共有対象となる動画コンテンツのサムネイル画像及び発話テキストそのものを指定された共有先に共有しても良く、または該動画コンテンツのサムネイル画像及び発話テキストを記述したＵＲＬ（Uniform Resource Locator）を共有先に共有しても良い。 For example, the thumbnail image of the video content to be shared and the spoken text itself may be shared with a specified sharing destination, or the URL (Uniform Resource Locator) describing the thumbnail image and spoken text of the video content may be sent to the sharing destination. may be shared to

端末２は、フルスクリーン表示ボタン１２ｇのタッチ操作を受け付けた場合、該当する動画コンテンツを全画面で表示し（図８Ｃ）、該動画コンテンツにおける発話テキストを非表示に切り替える。端末２は、全画面で表示されている動画コンテンツに対し、フルスクリーン表示ボタン１２ｇのタッチ操作を再度受け付けた場合、該当する動画コンテンツの全画面表示を解除し、該動画コンテンツにおける発話テキストを表示に切り替える。 When accepting the touch operation of the full screen display button 12g, the terminal 2 displays the corresponding video content in full screen (FIG. 8C), and switches the spoken text in the video content to non-display. When the terminal 2 receives again the touch operation of the full-screen display button 12g for the video content displayed in full screen, the terminal 2 cancels the full-screen display of the video content and displays the spoken text in the video content. switch to

図９は、カテゴリの一覧画面の一例を示す説明図である。該画面は、カテゴリリスト１３ａを含む。カテゴリリスト１３ａは、カテゴリの一覧を表示するリストである。カテゴリリスト１３ａは、カテゴリアイコン１３ａ１、カテゴリ名称表示欄１３ａ２及びカテゴリ詳細ボタン１３ａ３を含む。カテゴリアイコン１３ａ１は、カテゴリを示すアイコンである。カテゴリ名称表示欄１３ａ２は、カテゴリの名称を表示する表示欄である。カテゴリ詳細ボタン１３ａ３は、後述するカテゴリの詳細画面（図１０）に遷移するボタンである。 FIG. 9 is an explanatory diagram showing an example of a category list screen. The screen includes a category list 13a. The category list 13a is a list displaying a list of categories. The category list 13a includes category icons 13a1, category name display fields 13a2, and category detail buttons 13a3. The category icon 13a1 is an icon indicating a category. The category name display field 13a2 is a display field for displaying the name of the category. The category details button 13a3 is a button for transitioning to a category details screen (FIG. 10), which will be described later.

端末２は、カテゴリに関する情報をサーバ１のカテゴリＤＢ１７６から取得する。カテゴリに関する情報は、カテゴリＩＤ、カテゴリの名称及びカテゴリを示すアイコンを含む。端末２は、取得したカテゴリに関する情報をカテゴリリスト１３ａに表示する。具体的には、端末２は、各カテゴリを示すアイコンを該当するカテゴリアイコン１３ａ１に表示し、各カテゴリの名称をカテゴリ名称表示欄１３ａ２に表示する。 The terminal 2 acquires information about categories from the category DB 176 of the server 1 . The information about the category includes category ID, category name, and icon indicating the category. The terminal 2 displays the acquired information about the category on the category list 13a. Specifically, the terminal 2 displays an icon indicating each category in the corresponding category icon 13a1, and displays the name of each category in the category name display field 13a2.

端末２は、カテゴリ詳細ボタン１３ａ３のタッチ操作を受け付けた場合、該当するカテゴリのカテゴリＩＤをカテゴリの詳細画面（図１０）に受け渡し、カテゴリの詳細画面に遷移する。 When the terminal 2 receives the touch operation of the category detail button 13a3, the terminal 2 passes the category ID of the corresponding category to the category detail screen (FIG. 10), and transitions to the category detail screen.

図１０は、カテゴリの詳細画面の一例を示す説明図である。該画面は、カテゴリ名称表示欄１４ａ及び動画表示欄１４ｂを含む。カテゴリ名称表示欄１４ａは、カテゴリの名称を表示する表示欄である。動画表示欄１４ｂは、動画コンテンツに関する情報を表示する表示欄である。動画表示欄１４ｂは、タイトル表示欄１４ｂ１、サムネイル画像表示欄１４ｂ２、発話テキスト表示欄１４ｂ３及び詳細ボタン１４ｂ４を含む。 FIG. 10 is an explanatory diagram showing an example of a category detail screen. The screen includes a category name display field 14a and a moving image display field 14b. The category name display field 14a is a display field for displaying the name of the category. The moving image display column 14b is a display column for displaying information regarding moving image content. The moving image display field 14b includes a title display field 14b1, a thumbnail image display field 14b2, a spoken text display field 14b3, and a detail button 14b4.

タイトル表示欄１４ｂ１は、動画コンテンツのタイトルを表示する表示欄である。サムネイル画像表示欄１４ｂ２は、動画コンテンツのサムネイル画像を表示する表示欄である。発話テキスト表示欄１４ｂ３は、動画コンテンツにおける発話テキストを表示する表示欄である。詳細ボタン１４ｂ４は、動画コンテンツの詳細画面（図８）に遷移するボタンである。 The title display column 14b1 is a display column for displaying the title of moving image content. The thumbnail image display column 14b2 is a display column for displaying thumbnail images of moving image content. The spoken text display field 14b3 is a display field for displaying the spoken text in the video content. The detail button 14b4 is a button for transitioning to a detailed screen (FIG. 8) of moving image content.

端末２は、遷移元の画面（例えば、カテゴリの一覧画面）から受け渡されたカテゴリＩＤに基づいて、カテゴリの名称及び背景色をサーバ１のカテゴリＤＢ１７６から取得する。端末２はカテゴリＩＤに基づいて、該カテゴリに属するすべての動画コンテンツに関する情報をサーバ１の動画コンテンツＤＢ１７１から取得する。動画コンテンツに関する情報は、動画コンテンツＩＤ、動画コンテンツのデータ、タイトル及びサムネイル画像等を含む。端末２は、動画コンテンツＩＤに基づいて、該動画コンテンツにおける複数の発話テキストをサーバ１の発話テキストＤＢ１７２から取得する。 The terminal 2 acquires the category name and background color from the category DB 176 of the server 1 based on the category ID passed from the transition source screen (for example, the category list screen). Based on the category ID, terminal 2 acquires information on all video content belonging to the category from video content DB 171 of server 1 . The information about moving image content includes a moving image content ID, moving image content data, a title, a thumbnail image, and the like. The terminal 2 acquires a plurality of spoken texts in the moving image content from the spoken text DB 172 of the server 1 based on the moving image content ID.

端末２は、取得したカテゴリの名称をカテゴリ名称表示欄１４ａに表示し、カテゴリの背景色をカテゴリ名称表示欄１４ａの背景色として設定する。端末２は、取得した複数の動画コンテンツを、縦方向にそれぞれの動画表示欄１４ｂに並べて表示する。端末２は、各動画表示欄１４ｂに表示されている動画コンテンツに対し、該動画コンテンツのタイトルをタイトル表示欄１４ｂ１に表示し、該動画コンテンツのサムネイル画像をサムネイル画像表示欄１４ｂ２に表示する。 The terminal 2 displays the name of the acquired category in the category name display field 14a, and sets the background color of the category as the background color of the category name display field 14a. The terminal 2 arranges and displays the plurality of acquired moving image contents vertically in each of the moving image display fields 14b. The terminal 2 displays the title of the moving image content displayed in each moving image display column 14b in the title display column 14b1, and displays the thumbnail image of the moving image content in the thumbnail image display column 14b2.

端末２は、各動画表示欄１４ｂに表示されている動画コンテンツに対応付けて、該動画コンテンツにおける複数の発話テキストを、例えば発話ＩＤの昇順で横方向に発話テキスト表示欄１４ｂ３に並べて表示する。なお、横方向に並べられた複数の発話テキストが動画表示欄１４ｂに収まりきらない場合、複数の発話テキストを横スクロール可能に表示することができる。 The terminal 2 displays a plurality of utterance texts in the moving image content displayed in the utterance text display column 14b3 in association with the moving image content displayed in each moving image display column 14b, arranged horizontally in ascending order of the utterance IDs, for example. If the plurality of uttered texts arranged in the horizontal direction cannot fit in the moving image display field 14b, the plurality of uttered texts can be displayed in a horizontal scrollable manner.

端末２は、発話テキスト表示欄１４ｂ３のスワイプ操作を受け付けた場合、複数の発話テキストを切り替えて表示する。図示のように、発話テキスト１及び発話テキスト２等を含む複数の発話テキストが動画コンテンツＡに対応付けられ、発話テキスト１は画面に表示されている。端末２は、発話テキスト表示欄１４ｂ３の左方向へのスワイプ操作を受け付けた場合、発話テキスト１を次の発話テキスト２に切り替えて表示する。端末２は発話テキスト２の表示中に、発話テキスト表示欄１４ｂ３の左方向へのスワイプ操作を再度受け付けた場合、発話テキスト２を次の発話テキスト３（図示なし）に切り替えて表示する。また、端末２は発話テキスト２の表示中に、発話テキスト表示欄１４ｂ３の右方向へのスワイプ操作を受け付けた場合、発話テキスト２を前の発話テキスト１に切り替えて表示する。このように、ユーザのスワイプ操作により、発話テキストを逐次的に切り替えて表示することができる。なお、端末２は、複数の発話テキストを所定の時間間隔（例えば、１秒）で自動的に切り替えて表示しても良い。端末２は、詳細ボタン１４ｂ４のタッチ操作を受け付けた場合、動画コンテンツＩＤを動画コンテンツの詳細画面（図８）に受け渡し、動画コンテンツの詳細画面に遷移する。 When receiving a swipe operation on the spoken text display field 14b3, the terminal 2 switches and displays a plurality of spoken texts. As illustrated, a plurality of spoken texts including spoken text 1 and spoken text 2 are associated with the video content A, and spoken text 1 is displayed on the screen. When the terminal 2 accepts a leftward swipe operation on the spoken text display field 14b3, the terminal 2 switches the spoken text 1 to the next spoken text 2 and displays it. When the terminal 2 accepts again the swipe operation to the left of the spoken text display field 14b3 while displaying the spoken text 2, the terminal 2 switches and displays the spoken text 2 to the next spoken text 3 (not shown). Further, when the terminal 2 accepts a swipe operation to the right of the spoken text display field 14b3 while displaying the spoken text 2, the terminal 2 switches the spoken text 2 to the previous spoken text 1 and displays it. In this manner, the user's swipe operation can sequentially switch and display the spoken text. Note that the terminal 2 may automatically switch and display a plurality of spoken texts at predetermined time intervals (for example, 1 second). When the terminal 2 receives the touch operation of the detail button 14b4, the terminal 2 passes the moving image content ID to the detail screen of the moving image content (FIG. 8), and transitions to the detail screen of the moving image content.

図１１は、動画コンテンツにおける発話テキストを表示する際の処理手順を示すフローチャートである。サーバ１の制御部１１は、動画コンテンツに関する情報及び発話テキストに関する情報を取得する（ステップＳ１０１）。具体的には、制御部１１は、動画コンテンツＩＤ、動画コンテンツのカテゴリ、タイトル、サムネイル画像及び公開日等を含む動画コンテンツに関する情報を大容量記憶部１７の動画コンテンツＤＢ１７１から取得する。制御部１１は、取得した各動画コンテンツＩＤに基づいて、各動画コンテンツにおける発話テキストに関する情報を大容量記憶部１７の発話テキストＤＢ１７２から取得する。発話テキストに関する情報は、発話ＩＤ、発話テキスト、発話テキストの表示の開始時点及び表示時間等を含む。 FIG. 11 is a flow chart showing a processing procedure for displaying spoken text in video content. The control unit 11 of the server 1 acquires information on video content and information on spoken text (step S101). Specifically, the control unit 11 acquires information about the moving image content including the moving image content ID, moving image content category, title, thumbnail image, release date, and the like from the moving image content DB 171 of the large-capacity storage unit 17 . The control unit 11 acquires information about the spoken text in each moving image content from the spoken text DB 172 of the large-capacity storage unit 17 based on the acquired moving image content ID. The information about the utterance text includes the utterance ID, the utterance text, the display start time and display time of the utterance text, and the like.

制御部１１は、取得した動画コンテンツに関する情報及び発話テキストに関する情報を通信部１３により端末２に送信する（ステップＳ１０２）。端末２の制御部２１は、サーバ１から送信された動画コンテンツに関する情報及び発話テキストに関する情報を通信部２３により受信する（ステップＳ２０１）。制御部２１は、各動画コンテンツにおける複数の発話テキストから、代表的な発話テキストを特定する（ステップＳ２０２）。 The control unit 11 transmits the acquired information about the moving image content and information about the spoken text to the terminal 2 through the communication unit 13 (step S102). The control unit 21 of the terminal 2 receives, through the communication unit 23, the information about the moving image content and the information about the spoken text transmitted from the server 1 (step S201). The control unit 21 identifies a representative spoken text from a plurality of spoken texts in each video content (step S202).

制御部２１は、動画コンテンツのカテゴリ毎に、動画コンテンツのタイトル及びサムネイル画像に対応付けて代表的な発話テキストを表示部２５により表示する（ステップＳ２０３）。制御部２１は、動画コンテンツの再生操作を入力部２４により受け付けた場合（ステップＳ２０４）、動画コンテンツＩＤを通信部２３によりサーバ１に送信する（ステップＳ２０５）。 The control unit 21 causes the display unit 25 to display a representative spoken text in association with the title and thumbnail image of the moving image content for each moving image content category (step S203). When the input unit 24 receives a video content reproduction operation (step S204), the control unit 21 transmits the video content ID to the server 1 through the communication unit 23 (step S205).

サーバ１の制御部１１は、端末２から送信された動画コンテンツＩＤを通信部１３により受信する（ステップＳ１０３）。制御部１１は、受信した動画コンテンツＩＤに基づいて、該動画コンテンツのデータを大容量記憶部１７の動画コンテンツＤＢ１７１から取得する（ステップＳ１０４）。制御部１１は、取得した動画コンテンツのデータを通信部１３により端末２に送信する（ステップＳ１０５）。 The control unit 11 of the server 1 receives the video content ID transmitted from the terminal 2 through the communication unit 13 (step S103). The control unit 11 acquires data of the moving image content from the moving image content DB 171 of the large-capacity storage unit 17 based on the received moving image content ID (step S104). The control unit 11 transmits the acquired video content data to the terminal 2 through the communication unit 13 (step S105).

端末２の制御部２１は、サーバ１から送信された動画コンテンツのデータを通信部２３により受信する（ステップＳ２０６）。制御部２１は、スピーカ２６を介して、受信した動画コンテンツのデータを再生する（ステップＳ２０７）。制御部２１は、動画コンテンツのデータの再生に合わせて、段階的に該動画コンテンツにおける発話テキストを表示部２５により表示し（ステップＳ２０８）、処理を終了する。 The control unit 21 of the terminal 2 receives the moving image content data transmitted from the server 1 through the communication unit 23 (step S206). The control unit 21 reproduces the received video content data via the speaker 26 (step S207). The control unit 21 causes the display unit 25 to display the spoken text in the moving image content step by step in accordance with the reproduction of the data of the moving image content (step S208), and ends the process.

図１２は、発話テキストの登録または登録解除を行う際の処理手順を示すフローチャートである。端末２の制御部２１は、未登録の発話テキストに対する登録操作を入力部２４により受け付けた場合（ステップＳ２１１）、ユーザＩＤ及び動画コンテンツＩＤに対応付けて発話テキストの発話ＩＤを通信部２３によりサーバ１に送信する（ステップＳ２１２）。 FIG. 12 is a flow chart showing a processing procedure for registering or canceling the registration of an utterance text. When the input unit 24 receives a registration operation for an unregistered utterance text (step S211), the control unit 21 of the terminal 2 sends the utterance ID of the utterance text to the server through the communication unit 23 in association with the user ID and the video content ID. 1 (step S212).

サーバ１の制御部１１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び発話ＩＤを通信部１３により受信する（ステップＳ１１１）。制御部１１は、受信したユーザＩＤ及び動画コンテンツＩＤに対応付けて、発話ＩＤ及び登録日時を一つのレコードとして大容量記憶部１７のノートＤＢ１７５に登録する（ステップＳ１１２）。 The control unit 11 of the server 1 receives the user ID, the video content ID, and the speech ID transmitted from the terminal 2 through the communication unit 13 (step S111). The control unit 11 registers the utterance ID and the registration date and time as one record in the note DB 175 of the large-capacity storage unit 17 in association with the received user ID and video content ID (step S112).

端末２の制御部２１は、登録済みの発話テキストに対する登録解除操作を入力部２４により受け付けた場合（ステップＳ２１３）、ユーザＩＤ及び動画コンテンツＩＤに対応付けて発話テキストの発話ＩＤを通信部２３によりサーバ１に送信する（ステップＳ２１４）。 When the control unit 21 of the terminal 2 receives a registration cancellation operation for the registered utterance text by the input unit 24 (step S213), the utterance ID of the utterance text is associated with the user ID and the video content ID by the communication unit 23. It is transmitted to the server 1 (step S214).

サーバ１の制御部１１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤを通信部１３により受信する（ステップＳ１１３）。制御部１１は、受信したユーザＩＤ、動画コンテンツＩＤ及び発話ＩＤに基づいて、該当する発話テキストのレコードをノートＤＢ１７５から削除する（ステップＳ１１４）。 The control unit 11 of the server 1 receives the user ID, the video content ID, and the utterance ID of the utterance text transmitted from the terminal 2 through the communication unit 13 (step S113). Based on the received user ID, video content ID, and speech ID, the control unit 11 deletes the corresponding speech text record from the note DB 175 (step S114).

本実施形態によると、動画コンテンツの再生に合わせて該動画コンテンツにおける発話テキストを表示することが可能となる。 According to this embodiment, it is possible to display the spoken text in the moving image content in synchronization with the reproduction of the moving image content.

本実施形態によると、動画コンテンツに基づいて設定されたカテゴリ毎に、動画コンテンツ及び該動画コンテンツにおける発話テキストを表示することにより、ユーザの興味またはニーズに合わせる動画コンテンツを探しやすくなるため、ユーザが動画コンテンツを視聴するモチベーションを喚起することが可能となる。 According to this embodiment, by displaying the video content and the spoken text of the video content for each category set based on the video content, it becomes easier for the user to search for the video content that matches the interest or needs of the user. It is possible to arouse motivation to watch video content.

本実施形態によると、動画コンテンツにおける発話テキストを表示することにより、ユーザが該動画コンテンツを視聴しなくても、動画コンテンツの内容を大まかに把握することが可能となる。 According to this embodiment, by displaying the spoken text in the video content, it is possible for the user to roughly understand the content of the video content without viewing the video content.

本実施形態によると、動画コンテンツにおける発話テキストに対する登録または登録解除を行うことが可能となる。 According to this embodiment, it is possible to register or cancel the registration of the spoken text in the video content.

（実施形態２）
実施形態２は、登録済みの発話テキストを一覧で表示する形態に関する。なお、実施形態１と重複する内容については説明を省略する。 (Embodiment 2)
Embodiment 2 relates to a form in which registered speech texts are displayed in a list. In addition, description is abbreviate|omitted about the content which overlaps with Embodiment 1. FIG.

動画コンテンツにおける発話テキストが登録された場合、登録済みの発話テキストを一覧で表示することができる。具体的には、端末２はユーザＩＤに基づいて、視聴された動画コンテンツの動画コンテンツＩＤ、該動画コンテンツにおける登録済みの発話テキストの発話ＩＤ及び登録日時をサーバ１のノートＤＢ１７５から取得する。端末２は、取得した登録済みの発話テキストの発話ＩＤに基づいて、発話テキストの表示の開始時点をサーバ１の発話テキストＤＢ１７２から取得する。端末２は、ユーザが視聴した動画コンテンツ別、または、発話テキストの登録順に、取得した登録済みの発話テキスト及び該発話テキストの表示の開始時点を一覧で表示する。 When the spoken texts in the video content are registered, the registered spoken texts can be displayed in a list. Specifically, based on the user ID, the terminal 2 acquires from the notebook DB 175 of the server 1 the moving image content ID of the viewed moving image content, the utterance ID of the registered utterance text in the moving image content, and the registration date and time. The terminal 2 acquires the start point of display of the utterance text from the utterance text DB 172 of the server 1 based on the acquired utterance ID of the registered utterance text. The terminal 2 displays a list of the acquired registered speech texts and the display start time points of the speech texts by moving image content viewed by the user or in the order of registration of the speech texts.

動画コンテンツ別に登録済みの発話テキストが表示された場合、端末２は、視聴された各動画コンテンツの動画コンテンツＩＤに基づいて、各動画コンテンツに関する情報（サムネイル画像及びタイトル等）をサーバ１の動画コンテンツＤＢ１７１から取得する。端末２は、動画コンテンツ別に動画コンテンツのサムネイル画像及びタイトルに対応付けて、ノートＤＢ１７５から取得された動画コンテンツにおける登録済みの発話テキストを画面に表示する。 When registered utterance texts are displayed for each moving image content, the terminal 2 transmits information (thumbnail image, title, etc.) about each moving image content to the moving image content of the server 1 based on the moving image content ID of each viewed moving image content. Acquired from the DB 171. The terminal 2 displays on the screen the registered utterance text in the moving image content acquired from the note DB 175 in association with the thumbnail image and title of the moving image content for each moving image content.

登録順に登録済みの発話テキストが表示された場合、端末２は、ノートＤＢ１７５から取得された発話テキストの登録日時に基づいて、昇順または降順で発話テキストをソートする。端末２は、ソートした発話テキストを画面に表示する。 When the registered speech texts are displayed in the order of registration, the terminal 2 sorts the speech texts in ascending or descending order based on the registration dates and times of the speech texts acquired from the note DB 175 . Terminal 2 displays the sorted speech texts on the screen.

図１３は、動画コンテンツ別の発話テキストの一覧画面の一例を示す説明図である。該画面は、動画コンテンツ別タブ１５ａ、登録順タブ１５ｂ及び動画情報表示欄１５ｃを含む。動画コンテンツ別タブ１５ａは、動画コンテンツ別に登録済みの発話テキストの一覧を示すためのタブ（第１タブ）である。登録順タブ１５ｂは、発話テキストの登録順に登録済みの発話テキストの一覧を示すためのタブ（第２タブ）である。 FIG. 13 is an explanatory diagram showing an example of a list screen of spoken texts by moving image content. The screen includes a moving image content tab 15a, a registration order tab 15b, and a moving image information display field 15c. The video content tab 15a is a tab (first tab) for showing a list of registered utterance texts for each video content. The registration order tab 15b is a tab (second tab) for showing a list of registered utterance texts in the order of registration of the utterance texts.

動画情報表示欄１５ｃは、サムネイル画像表示欄１５ｃ１、タイトル表示欄１５ｃ２、発話テキスト表示欄１５ｃ３、登録ボタン１５ｃ４及び全部ボタン１５ｃ５を含む。サムネイル画像表示欄１５ｃ１は、動画コンテンツのサムネイル画像を表示する表示欄である。タイトル表示欄１５ｃ２は、動画コンテンツのタイトルを表示する表示欄である。発話テキスト表示欄１５ｃ３は、発話テキスト及び該発話テキストの表示の開始時点を表示する表示欄である。登録ボタン１５ｃ４は、発話テキストの登録または登録解除を行うためのボタンである。全部ボタン１５ｃ５は、動画コンテンツにおけるすべての発話テキストの表示画面（図１４）に遷移するためのボタンである。 The moving image information display field 15c includes a thumbnail image display field 15c1, a title display field 15c2, a spoken text display field 15c3, a registration button 15c4, and an all button 15c5. The thumbnail image display field 15c1 is a display field for displaying thumbnail images of moving image content. The title display column 15c2 is a display column for displaying the title of the video content. The utterance text display field 15c3 is a display field for displaying the utterance text and the start point of display of the utterance text. The registration button 15c4 is a button for registering or canceling the registration of the spoken text. The all button 15c5 is a button for transitioning to a display screen (FIG. 14) of all spoken texts in the video content.

端末２は、ユーザＩＤに基づいて、視聴された動画コンテンツの動画コンテンツＩＤ、該動画コンテンツにおける登録済みの発話テキスト及び登録日時をサーバ１のノートＤＢ１７５から取得する。端末２は、取得した登録済みの発話テキストの発話ＩＤに基づいて、発話テキストの表示の開始時点をサーバ１の発話テキストＤＢ１７２から取得する。端末２は、視聴された各動画コンテンツの動画コンテンツＩＤに基づいて、各動画コンテンツに関する情報（サムネイル画像及びタイトル等）をサーバ１の動画コンテンツＤＢ１７１から取得する。 Based on the user ID, the terminal 2 acquires the moving image content ID of the viewed moving image content, the registered utterance text in the moving image content, and the date and time of registration from the notebook DB 175 of the server 1 . The terminal 2 acquires the start point of display of the utterance text from the utterance text DB 172 of the server 1 based on the acquired utterance ID of the registered utterance text. The terminal 2 acquires information (thumbnail image, title, etc.) about each moving image content from the moving image content DB 171 of the server 1 based on the moving image content ID of each viewed moving image content.

端末２は、動画コンテンツ別に、動画コンテンツと、該動画コンテンツにおける発話テキストとを対応付けて、縦方向に各動画情報表示欄１５ｃに並べて表示する。なお、縦方向に並べられた複数の動画情報表示欄１５ｃが画面に収まりきらない場合、複数の動画情報表示欄１５ｃを縦方向にスクロール可能に表示することができる。 The terminal 2 associates the moving image content with the spoken text in the moving image content for each moving image content, and displays them side by side in the vertical direction in each moving image information display column 15c. If the plurality of moving image information display columns 15c arranged in the vertical direction cannot be displayed on the screen, the plurality of moving image information display columns 15c can be displayed so as to be scrollable in the vertical direction.

具体的には、端末２は、各動画コンテンツのサムネイル画像を該当するサムネイル画像表示欄１５ｃ１に表示し、各動画コンテンツのタイトルを該当するタイトル表示欄１５ｃ２に表示する。端末２は、各動画コンテンツにおける発話テキスト、及び発話テキストの表示の開始時点を該当する発話テキスト表示欄１５ｃ３に表示する。 Specifically, the terminal 2 displays the thumbnail image of each moving image content in the corresponding thumbnail image display column 15c1, and displays the title of each moving image content in the corresponding title display column 15c2. The terminal 2 displays the speech text in each moving image content and the display start time of the speech text in the corresponding speech text display field 15c3.

図示のように、１つの動画情報表示欄１５ｃ内に、３つの発話テキスト表示欄１５ｃ３が設けられる。例えば端末２は、発話テキストの発話ＩＤの昇順に、先頭から３番目までの発話テキストを各発話テキスト表示欄１５ｃ３に表示する。なお、端末２は、複数の発話テキストから、ランダムに３つの発話テキストを抽出して各発話テキスト表示欄１５ｃ３に表示しても良い。なお、動画情報表示欄１５ｃ内の発話テキスト表示欄１５ｃ３の数は、ユーザのニーズに合わせて設けられても良い。 As illustrated, three speech text display columns 15c3 are provided in one moving image information display column 15c. For example, the terminal 2 displays the first to third speech texts in the speech text display fields 15c3 in ascending order of the speech IDs of the speech texts. Note that the terminal 2 may randomly extract three utterance texts from a plurality of utterance texts and display them in each utterance text display field 15c3. The number of utterance text display fields 15c3 in the moving image information display field 15c may be provided according to the user's needs.

端末２は、登録済みの発話テキストに対し、登録ボタン１５ｃ４のタッチ操作を受け付けた場合、発話テキストの登録解除処理を行う。具体的には、端末２は、ユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤをサーバ１に送信する。サーバ１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤに基づいて、該当するレコードをノートＤＢ１７５から削除する。端末２は、該当する登録ボタン１５ｃ４を登録解除済み状態に変更する。例えば端末２は、登録ボタン１５ｃ４の背景色を灰色に変更する。 When the terminal 2 receives the touch operation of the registration button 15c4 for the registered utterance text, the terminal 2 performs registration cancellation processing of the utterance text. Specifically, the terminal 2 transmits the user ID, the video content ID, and the speech ID of the speech text to the server 1 . The server 1 deletes the corresponding record from the note DB 175 based on the user ID, the video content ID, and the speech ID of the speech text transmitted from the terminal 2 . The terminal 2 changes the corresponding registration button 15c4 to the unregistered state. For example, the terminal 2 changes the background color of the registration button 15c4 to gray.

また、端末２は、登録解除済みの発話テキストに対し、登録ボタン１５ｃ４のタッチ操作を再度受け付けた場合、発話テキストの登録処理を行う。具体的には、端末２は、ユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤをサーバ１に送信する。サーバ１は、端末２から送信されたユーザＩＤ、動画コンテンツＩＤ及び該発話テキストの発話ＩＤを受信する。端末２は、受信したユーザＩＤ及び動画コンテンツＩＤに対応付けて、発話テキストの発話ＩＤ及び登録日時を一つのレコードとしてノートＤＢ１７５に登録する。端末２は、該当する登録ボタン１５ｃ４を登録済み状態に変更する。例えば端末２は、登録ボタン１５ｃ４の背景色を赤に変更する。 Further, when the terminal 2 receives again the touch operation of the registration button 15c4 for the unregistered speech text, the terminal 2 performs registration processing of the speech text. Specifically, the terminal 2 transmits the user ID, the video content ID, and the speech ID of the speech text to the server 1 . The server 1 receives the user ID, the video content ID, and the speech ID of the speech text transmitted from the terminal 2 . The terminal 2 registers the utterance ID and the registration date and time of the utterance text in the note DB 175 as one record in association with the received user ID and video content ID. The terminal 2 changes the corresponding registration button 15c4 to the registered state. For example, the terminal 2 changes the background color of the registration button 15c4 to red.

端末２は、全部ボタン１５ｃ５のタッチ操作を受け付けた場合、後述する動画コンテンツにおけるすべての発話テキストの表示画面（図１４）に遷移する。 When the terminal 2 accepts the touch operation of the all button 15c5, the terminal 2 transitions to the display screen (FIG. 14) of all the spoken texts in the video content described later.

図１４は、動画コンテンツにおけるすべての発話テキストの表示画面の一例を示す説明図である。なお、図１３と重複する内容については同一の符号を付して説明を省略する。図示のように、端末２は、動画コンテンツのサムネイル画像をサムネイル画像表示欄１５ｃ１に表示し、動画コンテンツのタイトルをタイトル表示欄１５ｃ２に表示する。 FIG. 14 is an explanatory diagram showing an example of a display screen of all spoken texts in video content. Note that the same reference numerals are assigned to the contents that overlap with those in FIG. 13, and the description thereof is omitted. As illustrated, the terminal 2 displays the thumbnail image of the video content in the thumbnail image display field 15c1, and displays the title of the video content in the title display field 15c2.

端末２は、動画コンテンツにおける各発話テキスト、及び各発話テキストの表示の開始時点を各発話テキスト表示欄１５ｃ３に表示する。発話テキストの表示順序は、例えば、発話テキストの発話ＩＤの昇順であっても良く、または発話テキストの登録日時の新しい順であっても良い。なお、登録ボタン１５ｃ４の登録または登録解除処理に関しては、図１３での処理と同様であるため、説明を省略する。 The terminal 2 displays each utterance text in the moving image content and the start point of display of each utterance text in each utterance text display field 15c3. The display order of the speech texts may be, for example, in ascending order of the speech IDs of the speech texts, or may be in order of newest registration date and time of the speech texts. Note that the registration or deregistration processing of the registration button 15c4 is the same as the processing in FIG. 13, so description thereof will be omitted.

図１５は、登録順に発話テキストの一覧画面の一例を示す説明図である。該画面は、発話テキスト表示欄１６ａ及び登録ボタン１６ｂを含む。発話テキスト表示欄１６ａは、発話テキスト及び該発話テキストの表示の開始時点を表示する表示欄である。登録ボタン１６ｂは、発話テキストの登録または登録解除を行うためのボタンである。 FIG. 15 is an explanatory diagram showing an example of a list screen of utterance texts in order of registration. The screen includes a speech text display field 16a and a registration button 16b. The utterance text display field 16a is a display field for displaying the utterance text and the start point of display of the utterance text. The registration button 16b is a button for registering or canceling the registration of the spoken text.

端末２はユーザＩＤに基づいて、発話テキストの登録日時の古い順（登録順）に、登録済みの発話テキストをサーバ１のノートＤＢ１７５から取得する。端末２は、取得した発話テキストの発話ＩＤに基づいて、発話テキストの表示の開始時点をサーバ１の発話テキストＤＢ１７２から取得する。端末２は、発話テキストの登録日時の古い順に、取得した各発話テキスト及び該発話テキストの表示の開始時点を、該当する発話テキスト表示欄１６ａに表示する。端末２は、各登録ボタン１６ｂを登録済み状態に変更する。例えば端末２は、登録ボタン１６ｂの背景色を赤に変更する。 Based on the user ID, the terminal 2 acquires the registered utterance texts from the notebook DB 175 of the server 1 in order of registration date and time of the utterance texts (in order of registration). The terminal 2 acquires the start point of display of the utterance text from the utterance text DB 172 of the server 1 based on the utterance ID of the acquired utterance text. The terminal 2 displays each acquired utterance text and the display start point of the utterance text in the corresponding utterance text display field 16a in chronological order of registration date and time of the utterance text. The terminal 2 changes each registration button 16b to the registered state. For example, the terminal 2 changes the background color of the registration button 16b to red.

また、発話テキストの登録日時は発話テキスト表示欄１６ａに表示されても良い。また、端末２は、カテゴリ毎に設定された色に合わせて、発話テキストのフォントカラーまたは発話テキストの表示領域（発話テキスト表示欄１６ａ）の背景色を設定しても良い。更にまた、端末２は、発話テキストのタッチ操作を受け付けた場合、例えば、動画コンテンツの詳細画面（図８）に遷移し、該発話テキストの表示タイミングに対応する動画コンテンツの再生タイミングにて再生しても良い。 Also, the registration date and time of the spoken text may be displayed in the spoken text display field 16a. In addition, the terminal 2 may set the font color of the spoken text or the background color of the spoken text display area (the spoken text display field 16a) in accordance with the color set for each category. Furthermore, when the terminal 2 accepts the touch operation of the spoken text, for example, the terminal 2 transitions to the detailed screen of the video content (FIG. 8), and reproduces the video content at the playback timing corresponding to the display timing of the spoken text. can be

なお、縦方向に並べられた複数の発話テキスト表示欄１６ａが画面に収まりきらない場合、複数の発話テキスト表示欄１６ａを縦方向にスクロール可能に表示することができる。なお、登録ボタン１６ｂの登録または登録解除処理に関しては、図１３での処理と同様であるため、説明を省略する。 If the plurality of vertically arranged utterance text display fields 16a do not fit on the screen, the plurality of utterance text display fields 16a can be displayed in a vertically scrollable manner. Note that the registration or deregistration processing of the registration button 16b is the same as the processing in FIG. 13, so description thereof will be omitted.

図１６は、発話テキストを一覧で表示する際の処理手順を示すフローチャートである。端末２の制御部２１は、発話テキストの一覧の表示種類を入力部２４により受け付ける（ステップＳ２２１）。表示種類は、動画コンテンツ別に発話テキストを一覧で表示する「動画コンテンツ別」、及び登録順に発話テキストを一覧で表示する「登録順」を含む。制御部２１は、ユーザＩＤ、及び受け付けた表示種類を通信部２３によりサーバ１に送信する（ステップＳ２２２）。 FIG. 16 is a flow chart showing a processing procedure for displaying a list of spoken texts. The control unit 21 of the terminal 2 receives the display type of the list of spoken texts through the input unit 24 (step S221). The display types include "by video content" for displaying a list of spoken texts by video content, and "registration order" for displaying a list of spoken texts in the order of registration. The control unit 21 transmits the user ID and the accepted display type to the server 1 through the communication unit 23 (step S222).

サーバ１の制御部１１は、端末２から送信されたユーザＩＤ及び表示種類を通信部１３により受信する（ステップＳ１２１）。制御部１１は、受信した表示種類が「動画コンテンツ別」であるか否かを判定する（ステップＳ１２２）。制御部１１は、表示種類が「動画コンテンツ別」であると判定した場合（ステップＳ１２２でＹＥＳ）、受信したユーザＩＤに基づいて、視聴された各動画コンテンツに関する情報（動画コンテンツのサムネイル画像及びタイトル等）、及び各動画コンテンツにおける登録済みの発話テキストを大容量記憶部１７の動画コンテンツＤＢ１７１及びノートＤＢ１７５から取得する（ステップＳ１２３）。 The control unit 11 of the server 1 receives the user ID and display type transmitted from the terminal 2 through the communication unit 13 (step S121). The control unit 11 determines whether or not the received display type is "by video content" (step S122). When the control unit 11 determines that the display type is “by moving image content” (YES in step S122), based on the received user ID, information about each viewed moving image content (thumbnail image and title of the moving image content) is displayed. etc.), and registered utterance texts in each moving image content are obtained from the moving image content DB 171 and the note DB 175 of the large-capacity storage unit 17 (step S123).

制御部１１は、取得した各動画に関する情報、及び各動画コンテンツにおける登録済みの発話テキストを通信部１３により端末２に送信する（ステップＳ１２４）。端末２の制御部２１は、サーバ１から送信された動画コンテンツに関する情報及び登録済みの発話テキストを通信部２３により受信する（ステップＳ２２５）。制御部２１は、動画コンテンツ別に、受信した動画コンテンツのサムネイル画像、タイトル、及び該動画コンテンツにおける登録済みの発話テキストを表示し（ステップＳ２２６）、処理を終了する。 The control unit 11 transmits the acquired information about each moving image and the registered utterance text in each moving image content to the terminal 2 through the communication unit 13 (step S124). The control unit 21 of the terminal 2 receives, through the communication unit 23, the information about the video content and the registered speech text transmitted from the server 1 (step S225). The control unit 21 displays the thumbnail image, the title, and the registered utterance text of the received moving image content for each moving image content (step S226), and ends the process.

サーバ１の制御部１１は、表示種類が「動画コンテンツ別」でないと判定した場合（ステップＳ１２２でＮＯ）、ユーザＩＤに基づいて、登録済みの発話テキストを大容量記憶部１７のノートＤＢ１７５から取得する（ステップＳ１２５）。制御部１１は、取得した登録済みの発話テキストを通信部１３により端末２に送信する（ステップＳ１２６）。端末２の制御部２１は、サーバ１から送信された登録済みの発話テキストを通信部２３により受信する（ステップＳ２２３）。制御部２１は、発話テキストの登録日時の古い順（登録順）に、受信した発話テキストを表示部２５により表示し（ステップＳ２２４）、処理を終了する。 When the control unit 11 of the server 1 determines that the display type is not “by video content” (NO in step S122), based on the user ID, acquires the registered speech text from the note DB 175 of the large-capacity storage unit 17. (step S125). The control unit 11 transmits the acquired registered speech text to the terminal 2 through the communication unit 13 (step S126). The control unit 21 of the terminal 2 receives the registered speech text transmitted from the server 1 through the communication unit 23 (step S223). The control unit 21 causes the display unit 25 to display the received utterance texts in order of the registration date and time of the utterance texts (in order of registration) (step S224), and ends the process.

また、動画コンテンツの一覧画面と発話テキストの一覧画面との切り替えを行うことができる。
図１７は、切り替え画面の一例を示す説明図である。該画面は、動画一覧ボタン１７ａ及び発話テキスト一覧ボタン１７ｂを含む。動画一覧ボタン１７ａ及び発話テキスト一覧ボタン１７ｂは、表示画面の一部の領域（例えば、画面の一番下）に固定的に表示される。動画一覧ボタン１７ａは、動画コンテンツの一覧画面（第１画面）に遷移するためのボタンである。発話テキスト一覧ボタン１７ｂは、発話テキストの一覧画面（第２画面）に遷移するためのボタンである。 In addition, it is possible to switch between the video content list screen and the spoken text list screen.
FIG. 17 is an explanatory diagram showing an example of the switching screen. The screen includes a video list button 17a and a spoken text list button 17b. The moving image list button 17a and the spoken text list button 17b are fixedly displayed in a partial area of the display screen (for example, the bottom of the screen). The video list button 17a is a button for transitioning to a video content list screen (first screen). The speech text list button 17b is a button for transitioning to a speech text list screen (second screen).

端末２は、動画一覧ボタン１７ａのタッチ操作を受け付けた場合、動画コンテンツの一覧画面に遷移し、動画コンテンツの一覧画面を表示する。端末２は、発話テキスト一覧ボタン１７ｂのタッチ操作を受け付けた場合、発話テキストの一覧画面に遷移し、発話テキストの一覧画面を表示する。 When the terminal 2 receives the touch operation of the moving image list button 17a, the terminal 2 transitions to the moving image content list screen and displays the moving image content list screen. When the terminal 2 receives the touch operation of the spoken text list button 17b, the terminal 2 transitions to the spoken text list screen and displays the spoken text list screen.

本実施形態によると、動画コンテンツ別または登録順に、登録済みの発話テキストを一覧で表示することにより、発話テキストを随時に閲覧することが可能となる。 According to this embodiment, by displaying a list of registered speech texts by moving image content or in the order of registration, it is possible to view the speech texts at any time.

今回開示された実施形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time are illustrative in all respects and should be considered not restrictive. The scope of the present invention is indicated by the scope of the claims rather than the above-described meaning, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

１情報処理装置（サーバ）
１１制御部
１２記憶部
１３通信部
１４入力部
１５表示部
１６読取部
１７大容量記憶部
１７１動画コンテンツＤＢ
１７２発話テキストＤＢ
１７３ユーザＤＢ
１７４履歴ＤＢ
１７５ノートＤＢ
１７６カテゴリＤＢ
１ａ可搬型記憶媒体
１ｂ半導体メモリ
１Ｐ制御プログラム
２情報処理端末（端末）
２１制御部
２２記憶部
２３通信部
２４入力部
２５表示部
２６スピーカ
２Ｐ制御プログラム 1 Information processing device (server)
REFERENCE SIGNS LIST 11 control section 12 storage section 13 communication section 14 input section 15 display section 16 reading section 17 large capacity storage section 171 video content DB
172 Utterance Text DB
173 User database
174 History DB
175 Note DB
176 Category DB
1a portable storage medium 1b semiconductor memory 1P control program 2 information processing terminal (terminal)
21 control unit 22 storage unit 23 communication unit 24 input unit 25 display unit 26 speaker 2P control program

Claims

Acquiring a video content and an utterance text describing the utterance content of a speaker in the video content,
displaying the spoken text step by step according to the playback of the acquired video content;
transmitting the spoken text to an information processing device when a registration operation for the displayed spoken text is received ;
A first tab showing a list of the registered speech texts by the registration operation for each video content viewed by the user, and a second tab showing a list of the registered speech texts by the registration operation in order of registration of the speech texts. to display
displaying a list of the registered utterance texts for each video content when an operation for selecting the first tab is received;
Displaying a list of the spoken texts in the order of registration when the selection operation of the second tab is accepted
A program that causes a computer to carry out a process.

For each uttered text in the video content, a first object for reproducing at the reproduction timing of the video content corresponding to the display timing of the uttered text, a second object for registering the uttered text, and the uttered text. 2. The program according to claim 1, causing execution of a process of operably displaying a third object for sharing.

3. The program according to claim 1 or 2, which executes a process of switching between display and non-display of the spoken text according to a user's operation while the moving image content is being reproduced.

obtaining a plurality of registered utterance texts in the video content;
4. The program according to any one of claims 1 to 3, which causes execution of a process of displaying a list of the acquired plurality of spoken texts.

When the selection operation of the first tab is accepted, for each video content, a thumbnail image and title of each video content, registered utterance text in the video content, and a fourth object indicating the registered state of the utterance text to display
5. The method according to any one of claims 1 to 4 , wherein a process of canceling the registered state of the utterance text is executed when an operation input of the fourth object for canceling the registered state of the utterance text is received. program.

fixedly displaying a fifth object for displaying the video content list and a sixth object for displaying the spoken text list in a partial area of the display screen;
When an operation input to the fifth object is received, transitioning to a first screen displaying a list of the video content,
6. The method according to any one of claims 1 to 5 , wherein when an operation input to the sixth object is accepted, a process of transitioning to a second screen displaying a list of the registered utterance texts including the first tab and the second tab is executed. The program according to any one .

7. The program according to any one of claims 1 to 6 , causing execution of a process of arranging and displaying the moving image content and the spoken text in the moving image content in association with each other for a plurality of the moving image contents.

Multiple video content is registered for each category set based on video content,
Displaying the display areas classified by category and displayed in different colors in a first direction,
8. The display area according to any one of claims 1 to 7 , wherein the video content belonging to the category and the spoken text in the video content are associated with each other and displayed side by side in the second direction. program.

Get multiple video contents belonging to the category,
Acquire multiple spoken texts in each acquired video content,
9. The program according to any one of claims 1 to 8 , which executes a process of switching and displaying a plurality of spoken texts in each video content by a swipe operation.

Acquiring a video content and an utterance text describing the utterance content of a speaker in the video content,
displaying the spoken text step by step according to the playback of the acquired video content;
registering the spoken text based on a registration operation for the displayed spoken text;
A first tab showing a list of the registered speech texts by the registration operation for each video content viewed by the user, and a second tab showing a list of the registered speech texts by the registration operation in order of registration of the speech texts. to display
displaying a list of the registered utterance texts for each video content when an operation for selecting the first tab is received;
Displaying a list of the spoken texts in the order of registration when the selection operation of the second tab is accepted
Information processing methods.

Acquiring a video content and an utterance text describing the utterance content of a speaker in the video content,
displaying the spoken text step by step according to the playback of the acquired video content;
transmitting the spoken text to an information processing device when a registration operation for the displayed spoken text is received ;
Multiple video content is registered for each category set based on video content,
Displaying the display areas classified by category and displayed in different colors in a first direction,
In each of the display areas, video content belonging to a category and registered utterance text in the video content by the registration operation are associated and displayed side by side in the second direction.
A program that causes a computer to carry out a process.

Acquiring a video content and an utterance text describing the utterance content of a speaker in the video content,
displaying the spoken text step by step according to the playback of the acquired video content;
transmitting the spoken text to an information processing device when a registration operation for the displayed spoken text is received;
Multiple video content is registered for each category set based on video content,
Get multiple video contents belonging to the category,
Obtaining a plurality of registered utterance texts by the registration operation in each obtained video content,
Switch and display multiple spoken texts in each video content by swiping
A program that causes a computer to carry out a process.