JP2018121181A

JP2018121181A - Edition device and edition program

Info

Publication number: JP2018121181A
Application number: JP2017010715A
Authority: JP
Inventors: 杉原　宏; Hiroshi Sugihara; 宏杉原
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2017-01-24
Filing date: 2017-01-24
Publication date: 2018-08-02
Anticipated expiration: 2037-01-24
Also published as: JP6610572B2

Abstract

PROBLEM TO BE SOLVED: To allow even members not participating in the conference to efficiently understand the content that is hard to understand for members not participating in the conference.SOLUTION: An information processing apparatus 1 includes a keyword input receiving section 103 for receiving input of keyword from a user, a text converting section 104 for converting voice indicated by voice data, based on the voice data in video voice data stored in a video voice data storage section 51, specifying section 105 for specifying the text part including the keyword received by the keyword input receiving section 103, among the text indicated by the text data, based on the text data obtained by the text converting section 104, and an editing section 106 for creating another video voice data consisting of the same time-domain part as the voice part becoming the source of the text part specified by the specifying section 105 anew.SELECTED DRAWING: Figure 1

Description

本発明は、編集装置及び編集プログラムに関し、特に、会議等を撮影した映像音声を編集するための技術に関する。 The present invention relates to an editing apparatus and an editing program, and more particularly, to a technique for editing video / audio obtained by shooting a meeting or the like.

通常、会議に参加していないメンバーは、議事録を参照して議事の内容（例えば、結論、課題、アクションアイテム）を確認することになるが、議事録からだけでは、議事の内容を把握できたとしても、会議の雰囲気を感じ取ることは難しい。また、どのような背景や流れで、議事録に記載された結論などに至ったのかを把握することも難しい。 Normally, members who do not participate in a meeting will check the contents of the minutes (for example, conclusions, issues, action items) by referring to the minutes, but the minutes can be understood only from the minutes. Even so, it is difficult to feel the atmosphere of the meeting. It is also difficult to grasp the background and flow that led to the conclusions described in the minutes.

そこで、現在では、会議を撮影することによって、会議の内容を映像と音声で確認できるようにしている場合がある。映像と音声で会議の内容を確認すれば、議事録からだけでは把握できない、会議の雰囲気を感じ取ることができる。しかしながら、６０分の会議であれば、６０分間視聴しなければならず、効率は良くない。 Therefore, at present, there are cases where the content of the conference can be confirmed by video and audio by shooting the conference. If you confirm the contents of the meeting with video and audio, you can feel the atmosphere of the meeting that cannot be grasped only from the minutes. However, if the meeting is 60 minutes, it must be viewed for 60 minutes, which is not efficient.

下記の特許文献１に、撮影した会議の映像音声とメモ（議事録担当者が作成したメモ）とをリンクさせた議事録を作成し、メモに対応する映像音声を簡単に閲覧できるようにすることが記載されている。 In the following Patent Document 1, a minutes is created by linking a video and audio of a meeting taken with a memo (a memo created by a person in charge of the minutes) so that the video and audio corresponding to the memo can be easily browsed. It is described.

特開２００８−１７２５８２号公報JP 2008-172582 A

上記の特許文献１に記載された発明は、ユーザーがメモを指定すれば、指定したメモに対応する映像音声が個別に再生されるので、撮影された全ての映像音声を視聴しなくても良く、時間の短縮を図ることができる。 In the invention described in Patent Document 1 above, if the user designates a memo, the video and audio corresponding to the designated memo are individually played back, so it is not necessary to view all the video and audio shot. The time can be shortened.

しかしながら、上記の特許文献１に記載された発明では、ユーザーはメモと映像音声とを同時に確認する必要があり、煩わしい。また、会議の肝となる内容（例えば、結論、課題、アクションアイテム）について話されている個所を、ユーザーが適切に指定できるとは限らないので、会議の雰囲気を効率良く感じ取ることができるとは言えない。 However, in the invention described in Patent Document 1, it is troublesome for the user to check the memo and the video and audio at the same time. Also, because the user is not always able to specify the location that is spoken about the content that is the heart of the conference (eg, conclusions, issues, action items), it is possible to feel the conference atmosphere efficiently. I can not say.

本発明は、上記の事情に鑑みなされたものであり、現場に居た者（例えば、会議に参加したメンバー）でないと把握しにくい内容を、現場に居なかった者（例えば、会議に参加していないメンバー）であっても、効率良く把握できるようにすることを目的とする。 The present invention has been made in view of the above circumstances, and it is difficult for a person who has not been on the site (for example, a member who has participated in the conference) The purpose is to be able to grasp efficiently even members who are not).

本発明の一局面に係る編集装置は、映像音声データを記憶する映像音声データ記憶部と、キーワードの入力をユーザーから受け付けるキーワード入力受付部と、前記映像音声データ記憶部に記憶されている映像音声データの中の音声データに基づいて、当該音声データが示す音声をテキストに変換するテキスト変換部と、前記テキスト変換部による変換により得られたテキストデータに基づいて、当該テキストデータが示すテキストの中から、前記キーワード入力受付部が受け付けたキーワードを含むテキスト部分を特定する特定部と、前記映像音声データ記憶部に記憶されている前記映像音声データを用いて、前記特定部により特定されたテキスト部分の元となる音声部分と同じ時間領域部分からなる別の映像音声データを新たに作成する編集部と、を備える。 An editing apparatus according to one aspect of the present invention includes a video / audio data storage unit that stores video / audio data, a keyword input reception unit that receives a keyword input from a user, and a video / audio stored in the video / audio data storage unit. A text conversion unit that converts the voice indicated by the voice data into text based on the voice data in the data, and the text data indicated by the text data based on the text data obtained by the conversion by the text conversion unit. The text part specified by the specifying unit using the specifying unit for specifying the text part including the keyword received by the keyword input receiving unit and the video / audio data stored in the video / audio data storage unit Create another video / audio data consisting of the same time domain part as the original audio part It includes a collection unit.

また、本発明の一局面に係る編集プログラムは、コンピューターを、キーワードの入力をユーザーから受け付けるキーワード入力受付部と、映像音声データ記憶部に記憶されている映像音声データの中の音声データに基づいて、当該音声データが示す音声をテキストに変換するテキスト変換部と、前記テキスト変換部による変換により得られたテキストデータに基づいて、当該テキストデータが示すテキストの中から、前記キーワード入力受付部が受け付けたキーワードを含むテキスト部分を特定する特定部と、前記映像音声データ記憶部に記憶されている前記映像音声データを用いて、前記特定部により特定されたテキスト部分の元となる音声部分と同じ時間領域部分からなる別の映像音声データを新たに作成する編集部と、して機能させる。 An editing program according to one aspect of the present invention is based on audio data in a video / audio data stored in a video / audio data storage unit and a video / audio data storage unit that receives a keyword input from a user. A keyword conversion unit that converts the voice indicated by the voice data into text, and the keyword input reception unit receives the text indicated by the text data based on the text data obtained by the conversion by the text conversion unit. The same time as the audio part that is the basis of the text part specified by the specifying unit using the specifying unit for specifying the text part including the keyword and the video / audio data stored in the video / audio data storage unit It functions as an editing section that creates new video and audio data consisting of areas. .

本発明によれば、ユーザーが入力したキーワード（重要な意味をもつ言葉）に基づいて編成された、映像音声のダイジェストが生成される。すなわち、重要な部分が収められたダイジェストが生成される。従って、当該映像音声が撮影された現場に居なかった者（例えば、会議に参加していないメンバー）であっても、生成されたダイジェストを視聴することで、現場に居た者（例えば、会議に参加したメンバー）でないと把握しにくい内容（例えば、会議の雰囲気）を、効率良く把握することができる。 According to the present invention, a video / audio digest organized based on keywords (words having important meanings) input by a user is generated. That is, a digest containing an important part is generated. Therefore, even if the person who was not on the spot where the video and audio were shot (for example, a member who did not participate in the meeting), by viewing the generated digest, the person who was on the spot (for example, the meeting) It is possible to efficiently grasp the content (for example, the atmosphere of the meeting) that is difficult to grasp unless the member is a member who participated in.

本発明の第１実施形態に係る編集装置の主要内部構成を概略的に示した機能ブロック図である。1 is a functional block diagram schematically showing a main internal configuration of an editing apparatus according to a first embodiment of the present invention. 第１実施形態に係る編集装置における制御ユニットで行われる処理動作の一例を示したフローチャートである。It is the flowchart which showed an example of the processing operation performed with the control unit in the editing apparatus which concerns on 1st Embodiment. 表示部に表示される操作画面の一例を示した図である。It is the figure which showed an example of the operation screen displayed on a display part. 会議での発言内容の一例を示した図である。It is the figure which showed an example of the content of the statement in a meeting. 映像データが示す映像と、音声データが示す音声と、テキストデータが示すテキストとの時間的関係を説明するための説明図である。It is explanatory drawing for demonstrating the temporal relationship between the image | video which video data shows, the audio | voice which audio | voice data shows, and the text which text data shows. 第２実施形態に係る編集装置の主要内部構成を概略的に示した機能ブロック図である。It is the functional block diagram which showed roughly the main internal structure of the editing apparatus which concerns on 2nd Embodiment. 第２実施形態に係る編集装置における制御ユニットで行われる処理動作の一例を示したフローチャートである。It is the flowchart which showed an example of the processing operation performed with the control unit in the editing apparatus which concerns on 2nd Embodiment.

以下、本発明に係る編集装置及び編集プログラムの実施の形態を図面に基づいて説明する。図１は、本発明の第１実施形態に係る編集プログラムがインストールされた編集装置の主要内部構成を概略的に示した機能ブロック図である。編集装置１は、例えば、パーソナルコンピューター（ＰＣ：Personal Computer）などの電子機器であって、表示部１０と、音声出力部２０と、操作部３０と、通信部４０と、記憶部５０と、制御ユニット１００とを備えている。これらの各構成は、互いに通信バスによりデータ又は信号の送受信が可能とされている。 Embodiments of an editing apparatus and an editing program according to the present invention will be described below with reference to the drawings. FIG. 1 is a functional block diagram schematically showing the main internal configuration of an editing apparatus in which an editing program according to the first embodiment of the present invention is installed. The editing device 1 is an electronic device such as a personal computer (PC), for example, and includes a display unit 10, a voice output unit 20, an operation unit 30, a communication unit 40, a storage unit 50, and a control. Unit 100. Each of these components can transmit and receive data or signals with each other via a communication bus.

表示部１０は、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）や有機ＥＬ（ＯＬＥＤ：Organic Light-Emitting Diode）などから構成される。また、表示部１０は、制御ユニット１００からの応答やデータ結果等を表示するとともに、タッチパネル機能が設けられており、ユーザーは画面表示される画像等に触れて編集装置１を操作することができる。 The display unit 10 includes a liquid crystal display (LCD), an organic EL (OLED: Organic Light-Emitting Diode), and the like. The display unit 10 displays responses from the control unit 100, data results, and the like, and is provided with a touch panel function. The user can operate the editing apparatus 1 by touching an image displayed on the screen. .

音声出力部２０は、スピーカーなどであり、音声を出力する。 The audio output unit 20 is a speaker or the like and outputs audio.

操作部３０は、マウスやキーボードなどであり、編集装置１が実行可能な各種動作及び処理についてユーザーから各種指示を受け付ける。 The operation unit 30 is a mouse, a keyboard, or the like, and receives various instructions from the user regarding various operations and processes that can be executed by the editing apparatus 1.

通信部４０は、不図示のＬＡＮ（Local Area Network）チップなどの通信モジュールを備える通信インターフェイスである。 The communication unit 40 is a communication interface including a communication module such as a LAN (Local Area Network) chip (not shown).

編集装置１は、例えば、ビデオカメラ２００と接続され、後述する制御部１０１が、通信部４０を介して、ビデオカメラ２００との間でデータの送受信を行い、ビデオカメラ２００に保存されている映像音声データ（映像データと音声データとからなり、これら映像データと音声データが同期されたデータ）を取得することができる。 For example, the editing apparatus 1 is connected to the video camera 200, and a control unit 101 described later transmits and receives data to and from the video camera 200 via the communication unit 40, and is stored in the video camera 200. Audio data (consisting of video data and audio data, and data in which these video data and audio data are synchronized) can be acquired.

記憶部５０は、ＨＤＤ（Hard Disk Drive）などの大容量の記憶装置である。記憶部５０には、制御プログラムや前述の編集プログラム等が記憶されると共に、後述する映像音声データ記憶部５１や編集データ記憶部５２が構築される。 The storage unit 50 is a large-capacity storage device such as an HDD (Hard Disk Drive). The storage unit 50 stores a control program, the editing program described above, and the like, and a video / audio data storage unit 51 and an editing data storage unit 52 described later.

制御ユニット１００は、プロセッサー、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、及び専用のハードウェア回路を含んで構成される。プロセッサーは、例えばＣＰＵ（Central Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＭＰＵ（Micro Processing Unit）等である。制御ユニット１００は、制御部１０１と、操作受付部１０２と、テキスト変換部１０４と、特定部１０５と、編集部１０６と、を備えている。 The control unit 100 includes a processor, a RAM (Random Access Memory), a ROM (Read Only Memory), and a dedicated hardware circuit. The processor is, for example, a central processing unit (CPU), an application specific integrated circuit (ASIC), or a micro processing unit (MPU). The control unit 100 includes a control unit 101, an operation receiving unit 102, a text conversion unit 104, a specifying unit 105, and an editing unit 106.

制御ユニット１００は、記憶部５０に記憶されている制御プログラム等に従った上記プロセッサーによる動作により、制御部１０１、操作受付部１０２、テキスト変換部１０４、特定部１０５、及び編集部１０６として機能するものである。但し、制御部１０１等は、制御ユニット１００による制御プログラム等に従った動作によらず、ハードウェア回路により構成することも可能である。以下、特に触れない限り、各実施形態について同様である。 The control unit 100 functions as the control unit 101, the operation receiving unit 102, the text conversion unit 104, the specifying unit 105, and the editing unit 106 by the operation of the processor according to the control program stored in the storage unit 50. Is. However, the control unit 101 and the like can be configured by a hardware circuit regardless of the operation according to the control program by the control unit 100 or the like. The same applies to each embodiment unless otherwise specified.

制御部１０１は、編集装置１の全体的な動作制御を司る。制御部１０１は、表示部１０、音声出力部２０、操作部３０、通信部４０、及び記憶部５０と接続され、これら各部の駆動制御等を行う。例えば、制御部１０１は、通信部４０を介して、ビデオカメラ２００に保存されている映像音声データを取得し、記憶部５０に映像音声データ記憶部５１を構築し、取得した映像音声データを映像音声データ記憶部５１に記憶させる。 The control unit 101 governs overall operation control of the editing apparatus 1. The control unit 101 is connected to the display unit 10, the audio output unit 20, the operation unit 30, the communication unit 40, and the storage unit 50, and performs drive control of these units. For example, the control unit 101 acquires video / audio data stored in the video camera 200 via the communication unit 40, constructs a video / audio data storage unit 51 in the storage unit 50, and displays the acquired video / audio data as video. The data is stored in the audio data storage unit 51.

操作受付部１０２は、タッチパネル機能が設けられた表示部１０や操作部３０から出力される検知信号に基づき、ユーザーにより入力されたユーザー操作を特定する。そして、操作受付部１０２は、特定したユーザー操作を受け付け、当該ユーザー操作に対応する制御信号を制御部１０１などに出力する。また、操作受付部１０２は、キーワード入力受付部１０３を備える。 The operation reception unit 102 specifies a user operation input by the user based on a detection signal output from the display unit 10 or the operation unit 30 provided with a touch panel function. Then, the operation reception unit 102 receives the specified user operation and outputs a control signal corresponding to the user operation to the control unit 101 or the like. The operation reception unit 102 includes a keyword input reception unit 103.

テキスト変換部１０４は、映像音声データ記憶部５１に記憶されている映像音声データの中の音声データに基づいて、当該音声データが示す音声をテキストに変換する。 Based on the audio data in the video / audio data stored in the video / audio data storage unit 51, the text conversion unit 104 converts the audio indicated by the audio data into text.

特定部１０５は、テキスト変換部１０４による変換により得られたテキストデータに基づいて、当該テキストデータが示すテキストの中から、キーワード入力受付部１０３が受け付けたキーワードを含むテキスト部分を特定する。 Based on the text data obtained by the conversion by the text conversion unit 104, the specification unit 105 specifies a text portion including the keyword received by the keyword input reception unit 103 from the text indicated by the text data.

編集部１０６は、映像音声データ記憶部５１に記憶されている映像音声データを用いて、特定部１０５により特定されたテキスト部分の元となる音声部分と同じ時間領域部分からなる別の映像音声データを新たに作成する。例えば、編集部１０５は、特定部１０５により特定されたテキスト部分の元となる音声部分と同じ時間領域の映像音声データを、映像音声データ記憶部５１から読み出し、当該読み出した映像音声データを、新たな別の映像音声データとする。 The editing unit 106 uses the video / audio data stored in the video / audio data storage unit 51 to generate another video / audio data including the same time domain portion as the original audio portion of the text portion specified by the specifying unit 105. Create a new. For example, the editing unit 105 reads the audio / video data in the same time domain as the audio part that is the original of the text part specified by the specifying unit 105 from the audio / video data storage unit 51, and newly reads the read audio / video data. Another video / audio data.

次に、上記の構成を備える編集装置１の動作について説明する。図２は、編集プログラムを、編集装置１の制御ユニット１００で実行させた場合の処理動作を示したフローチャートである。なお、この処理動作が実行されるのは、操作受付部１０２が映像音声データ記憶部５１に記憶されている映像音声データの編集要求を受け付けた場合である。 Next, the operation of the editing apparatus 1 having the above configuration will be described. FIG. 2 is a flowchart showing the processing operation when the editing program is executed by the control unit 100 of the editing apparatus 1. This processing operation is executed when the operation receiving unit 102 receives an editing request for video / audio data stored in the video / audio data storage unit 51.

操作受付部１０２が、上記編集要求を受け付けると、キーワード入力受付部１０３が、例えば、図３に示すような操作画面Ｄ１を表示部１０に表示させることによって、キーワードのユーザー入力を受け付ける（Ｓ１）。操作画面Ｄ１には、操作部３０を介して入力された文字列を表示する表示領域Ｅ１と、「決定」と記された決定ボタンＢ１とが形成され、キーワード入力受付部１０３は、決定ボタンＢ１に対する操作を受け付けると、ユーザーにより入力された文字列をキーワードとして記憶する。 When the operation accepting unit 102 accepts the editing request, the keyword input accepting unit 103 accepts keyword user input by displaying an operation screen D1 as shown in FIG. 3 on the display unit 10, for example (S1). . The operation screen D1 includes a display area E1 for displaying a character string input via the operation unit 30 and a determination button B1 labeled “OK”. The keyword input reception unit 103 receives the determination button B1. When an operation is accepted, the character string input by the user is stored as a keyword.

続いて、テキスト変換部１０４が、映像音声データ記憶部５１に記憶されている映像音声データの中の音声データに基づいて、当該音声データが示す音声をテキストに変換する（Ｓ２）。なお、音声のテキストへの変換には、既知の技術を用いるため、詳細な説明は省略する。 Subsequently, based on the audio data in the video / audio data stored in the video / audio data storage unit 51, the text conversion unit 104 converts the audio indicated by the audio data into text (S2). In addition, since a known technique is used for conversion of speech into text, detailed description thereof is omitted.

続いて、特定部１０５が、テキスト変換部１０４による変換により得られたテキストデータに基づいて、当該テキストデータが示すテキストの中から、キーワード入力受付部１０３が受け付けたキーワードを含むテキスト部分を特定する（Ｓ３）。 Subsequently, the specifying unit 105 specifies a text portion including the keyword received by the keyword input receiving unit 103 from the text indicated by the text data based on the text data obtained by the conversion by the text converting unit 104. (S3).

図４は、会議での発言内容の一例を示した図であり、例えば、キーワード入力受付部１０３が受け付けたキーワードが「ＸＸＸ」である場合、特定部１０５は、下線で示した部分Ｐ１〜Ｐ３を、キーワードを含むテキスト部分として特定する。 FIG. 4 is a diagram illustrating an example of the content of the remarks at the conference. For example, when the keyword received by the keyword input receiving unit 103 is “XXX”, the specifying unit 105 displays the underlined portions P1 to P3. Is identified as the text portion containing the keyword.

続いて、編集部１０６が、特定部１０５により特定されたテキスト部分の元となる音声部分と同じ時間領域を含む当該同じ時間領域前後の予め定められた時間領域（例えば、前後それぞれ１分間）からなる映像音声データを、映像音声データ記憶部５１から読み出す（Ｓ４）。 Subsequently, the editing unit 106 starts from a predetermined time region before and after the same time region including the same time region as the original voice part of the text portion specified by the specifying unit 105 (for example, one minute before and after each). Is read from the video / audio data storage unit 51 (S4).

図５は、映像データが示す映像と、音声データが示す音声と、テキストデータが示すテキストとの時間的関係を説明するための説明図である。図中Ｖ１〜Ｖ３はそれぞれ、テキスト部分Ｐ１〜Ｐ３の元となる音声部分の時間領域を示し、図中Ｖ１１，Ｖ１２は、時間領域Ｖ１前後の当該予め定められた時間領域を示し、図中Ｖ２１，Ｖ２２は、時間領域Ｖ２前後の当該予め定められた時間領域を示し、図中Ｖ３１，Ｖ３２は、時間領域Ｖ３前後の当該予め定められた時間領域を示している。 FIG. 5 is an explanatory diagram for explaining the temporal relationship between the video indicated by the video data, the audio indicated by the audio data, and the text indicated by the text data. In the figure, V1 to V3 indicate the time areas of the voice parts that are the origins of the text parts P1 to P3, respectively. In the figure, V11 and V12 indicate the predetermined time areas before and after the time area V1, and V21 in the figure. , V22 indicate the predetermined time region before and after the time region V2, and V31 and V32 in the figure indicate the predetermined time region before and after the time region V3.

編集部１０６は、Ｓ４において、時間領域Ｖ１１，Ｖ１，Ｖ１２、時間領域Ｖ２１，Ｖ２，Ｖ２２、及び時間領域Ｖ３１，Ｖ３，Ｖ３２の映像音声データを、映像音声データ記憶部５１から読み出す。 In S4, the editing unit 106 reads out the video / audio data of the time regions V11, V1, V12, the time regions V21, V2, V22, and the time regions V31, V3, V32 from the video / audio data storage unit 51.

例えば、映像音声データが、基準時点（例えば、撮影開始時点）からの経過時間を示す時間情報を有する場合、テキストデータも当該経過時間を示す時間情報を有している。編集部１０６は、当該時間情報に基づいて、指定されたテキストが含まれる区間を映像音声データから特定する。この時間情報に基づいて、映像音声データ及びテキストデータは、上述したように同期されている。このため、編集部１０６は、特定部１０５により特定されたテキスト部分の元となる音声部分と同じ時間領域を含む当該同じ時間領域前後の予め定められた時間領域からなる映像音声データを、映像音声データ記憶部５１から読み出すことが可能である。 For example, when the video / audio data has time information indicating an elapsed time from a reference time (for example, a shooting start time), the text data also has time information indicating the elapsed time. The editing unit 106 identifies a section including the designated text from the video / audio data based on the time information. Based on this time information, the audiovisual data and the text data are synchronized as described above. For this reason, the editing unit 106 converts video / audio data composed of a predetermined time region before and after the same time region including the same time region as the audio portion that is the basis of the text portion specified by the specifying unit 105 into video / audio. It is possible to read from the data storage unit 51.

続いて、編集部１０６は、読み出した映像音声データを用いて、新たな別の映像音声データを編集データとして生成し（Ｓ５）、記憶部５０に編集データ記憶部５２を構築し、生成した編集データを編集データ記憶部５２に記憶させる（Ｓ６）。 Subsequently, the editing unit 106 generates new video / audio data as editing data using the read video / audio data (S5), constructs the editing data storage unit 52 in the storage unit 50, and generates the generated editing data. The data is stored in the edit data storage unit 52 (S6).

編集データ記憶部５２に記憶された編集データについては、操作受付部１０２が、当該編集データに対する再生要求を受け付けると、制御部１０１が、編集データ記憶部５２から当該編集データを読み出し、再生することによって、映像を表示部２０に表示させ、音声を音声出力部２０から発生させる。 With respect to the edit data stored in the edit data storage unit 52, when the operation receiving unit 102 receives a reproduction request for the edit data, the control unit 101 reads the edit data from the edit data storage unit 52 and reproduces it. Thus, the video is displayed on the display unit 20 and the audio is generated from the audio output unit 20.

上記第１実施形態によれば、ユーザーが入力したキーワード（重要な意味をもつ言葉）に基づいて編成された、映像音声のダイジェストが生成される。すなわち、重要な部分が収められたダイジェストが生成される。従って、当該映像音声が撮影された現場に居なかった者（例えば、会議に参加していないメンバー）であっても、生成されたダイジェストを視聴することで、現場に居た者（例えば、会議に参加したメンバー）でないと把握しにくい内容（例えば、会議の雰囲気）を、効率良く把握することができる。 According to the first embodiment, a video / audio digest organized based on keywords (words having important meanings) input by the user is generated. That is, a digest containing an important part is generated. Therefore, even if the person who was not on the spot where the video and audio were shot (for example, a member who did not participate in the meeting), by viewing the generated digest, the person who was on the spot (for example, the meeting) It is possible to efficiently grasp the content (for example, the atmosphere of the meeting) that is difficult to grasp unless the member is a member who participated in.

図６は、第２実施形態に係る編集プログラムがインストールされた編集装置の主要内部構成を概略的に示した機能ブロック図である。なお、第２実施形態に係る編集装置１Ａは、図１に示した編集装置１とは、テキスト変換部１０４による変換により得られたテキストデータを、予め定められた単位で複数のブロックに分ける分割部１０７を制御ユニット１００Ａが備える点で相違する。 FIG. 6 is a functional block diagram schematically showing the main internal configuration of the editing apparatus in which the editing program according to the second embodiment is installed. The editing apparatus 1A according to the second embodiment is different from the editing apparatus 1 shown in FIG. 1 in that the text data obtained by conversion by the text conversion unit 104 is divided into a plurality of blocks in a predetermined unit. The difference is that the control unit 100A includes the unit 107.

次に、第２実施形態に係る編集装置１Ａの動作について説明する。図７は、編集プログラムを、編集装置１Ａの制御ユニット１００Ａで実行させた場合の処理動作を示したフローチャートである。なお、この処理動作が実行されるのは、操作受付部１０２が映像音声データ記憶部５１に記憶されている映像音声データの編集要求を受け付けた場合である。 Next, the operation of the editing apparatus 1A according to the second embodiment will be described. FIG. 7 is a flowchart showing the processing operation when the editing program is executed by the control unit 100A of the editing apparatus 1A. This processing operation is executed when the operation receiving unit 102 receives an editing request for video / audio data stored in the video / audio data storage unit 51.

操作受付部１０２が、上記要求を受け付けると、キーワード入力受付部１０３が、キーワードのユーザー入力を受け付け（Ｓ１１）、テキスト変換部１０４が、映像音声データ記憶部５１に記憶されている映像音声データの中の音声データに基づいて、当該音声データが示す音声をテキストに変換する（Ｓ１２）。そして、分割部１０７は、テキスト変換部１０４による変換により得られたテキストデータを、予め定められた単位（例えば、発言単位）で複数のブロックに分ける（Ｓ１３）。 When the operation accepting unit 102 accepts the request, the keyword input accepting unit 103 accepts a keyword user input (S11), and the text converting unit 104 stores the audio / video data stored in the audio / video data storage unit 51. Based on the voice data in the middle, the voice indicated by the voice data is converted into text (S12). Then, the dividing unit 107 divides the text data obtained by the conversion by the text converting unit 104 into a plurality of blocks in a predetermined unit (for example, a speech unit) (S13).

発言と発言との間には、無音時間が一定時間以上継続すると考えられるので、例えば、分割部１０７は、無音時間の長さに基づいて、テキストデータを複数のブロックに分ける。分割部１０７は、無音時間が予め定められた時間として例えば5秒続いた場合には、当該無音時間前後に存在し、当該無言時間を有していない部分に相当する各テキストデータ部分を別々のブロックとする。図４に示した例でいえば、分割部１０７は、その会議での発言内容を、例えば、「ＡＡＡ案のメリットは何か？」を一つのブロックとし、「ＡＡＡ案のメリットはＹＹＹです。」を一つのブロックとして、それぞれ分ける。 Since it is considered that the silence period continues for a certain time or more between the statements, for example, the dividing unit 107 divides the text data into a plurality of blocks based on the length of the silence period. When the silent time lasts for example 5 seconds as a predetermined time, the dividing unit 107 separates each text data portion corresponding to a portion that exists before and after the silent time and does not have the silent time. Let it be a block. In the example shown in FIG. 4, the dividing unit 107 sets, for example, “What is the merit of the AAA plan?” As one block, and “YAY is the merit of the AAA plan”. ”As a single block.

続いて、特定部１０５が、テキスト変換部１０４による変換により得られたテキストデータに基づいて、当該テキストデータが示すテキストの中から、キーワード入力受付部１０３が受け付けたキーワードを含むブロックを特定する（Ｓ１４）。 Subsequently, the specifying unit 105 specifies a block including the keyword received by the keyword input receiving unit 103 from the text indicated by the text data based on the text data obtained by the conversion by the text converting unit 104 ( S14).

例えば、キーワード入力受付部１０３が受け付けたキーワードが「ＸＸＸ」である場合、特定部１０５は、図４に示した「ＡＡＡ案の課題はＸＸＸです。」、「ＸＸＸは解決できるのか？」、「結論は、ＸＸＸの課題が解決できれば承認とする。」という部分を、キーワードを含むブロックとして特定する。 For example, when the keyword received by the keyword input receiving unit 103 is “XXX”, the specifying unit 105 illustrated in FIG. 4 “A problem of the AAA plan is XXX”, “Can XXX be solved?”, “ The conclusion is “approved if the XXX problem can be solved” is specified as a block including a keyword.

続いて、編集部１０６が、特定部１０５により特定されたブロックに属するテキスト部分の元となる音声部分と同じ時間領域の映像音声データを、映像音声データ記憶部５１から読み出し（Ｓ１５）、読み出した映像音声データを繋ぎ合わせることによって、編集データを生成し（Ｓ１６）、記憶部５０に編集データ記憶部５２を構築し、生成した編集データを編集データ記憶部５２に記憶させる（Ｓ１７）。 Subsequently, the editing unit 106 reads out the audio / video data in the same time domain as the audio part that is the source of the text part belonging to the block specified by the specifying unit 105 from the audio / video data storage unit 51 (S15). Editing data is generated by connecting the video and audio data (S16), the editing data storage unit 52 is constructed in the storage unit 50, and the generated editing data is stored in the editing data storage unit 52 (S17).

当該第２実施形態によれば、キーワードを含むブロックが特定されるので、例えば、キーワードを含む発言を、一つのかたまりとし、それに対応する映像音声データを読み出すことができる。 According to the second embodiment, since a block including a keyword is specified, for example, an utterance including a keyword is taken as one lump, and video and audio data corresponding to the block can be read out.

また、別の実施形態では、上記第１実施形態と同様に、特定部１０５により特定されたブロックに属するテキスト部分の元となる音声部分と同じ時間領域だけでなく、編集部１０６が、その前後の予め定められた時間領域（例えば、前後それぞれ１分間）の映像音声データを、映像音声データ記憶部５１から読み出すようにしても良い。 In another embodiment, in the same way as in the first embodiment, not only the same time region as the voice part that is the source of the text part belonging to the block specified by the specifying unit 105 but also the editing unit 106 Alternatively, the video / audio data in a predetermined time region (for example, one minute before and after each) may be read from the video / audio data storage unit 51.

また、本発明は上記実施の形態の構成に限られず種々の変形が可能である。また、上記実施形態では、本発明に係る編集装置の一実施形態としてパソコンを用いて説明しているが、これは一例に過ぎず、例えば、スマートフォンやタブレットなど、他の電子機器でも構わない。 The present invention is not limited to the configuration of the above embodiment, and various modifications can be made. Moreover, although the said embodiment demonstrated using the personal computer as one Embodiment of the editing apparatus which concerns on this invention, this is only an example and may be other electronic devices, such as a smart phone and a tablet, for example.

また、上記実施形態では、図１乃至図７を用いて上記実施形態により示した構成及び処理は、本発明の一実施形態に過ぎず、本発明を当該構成及び処理に限定する趣旨ではない。 Moreover, in the said embodiment, the structure and process which were shown by the said embodiment using FIG. 1 thru | or FIG. 7 are only one Embodiment of this invention, and are not the meaning which limits this invention to the said structure and process.

１、１Ａ編集装置
５１映像音声データ記憶部
１０３キーワード入力受付部
１０４テキスト変換部
１０５特定部
１０６編集部
１０７分割部 1, 1A editing device 51 video / audio data storage unit 103 keyword input reception unit 104 text conversion unit 105 identification unit 106 editing unit 107 division unit

Claims

A video / audio data storage unit for storing video / audio data;
A keyword input accepting unit that accepts keyword input from the user;
A text conversion unit that converts audio represented by the audio data into text based on audio data in the audio / video data stored in the audio / video data storage unit;
Based on the text data obtained by the conversion by the text conversion unit, from the text indicated by the text data, a specifying unit for specifying a text part including the keyword received by the keyword input receiving unit,
Using the video / audio data stored in the video / audio data storage unit, newly create another video / audio data composed of the same time domain part as the original audio part of the text part specified by the specifying unit And an editing unit.

Further comprising a dividing unit that divides the text data obtained by the conversion by the text conversion unit into a plurality of blocks in a predetermined unit;
The specifying unit specifies the block including the keyword received by the keyword input receiving unit from the text indicated by the text data based on the text data obtained by the conversion by the text conversion unit,
The editing apparatus according to claim 1, wherein the editing unit newly creates another video / audio data including a time region portion that is the same as an audio portion that is a source of a text portion belonging to the block specified by the specifying unit.

The editing unit reads out the video / audio data including a predetermined time domain before and after the same time domain including the same time domain from the video / audio data storage unit, and creates the new video / audio data. The editing apparatus according to claim 1 or 2.

Computer
A keyword input accepting unit that accepts keyword input from the user;
A text conversion unit that converts audio represented by the audio data into text based on audio data in the audio / video data stored in the audio / video data storage unit;
Based on the text data obtained by the conversion by the text conversion unit, from the text indicated by the text data, a specifying unit for specifying a text part including the keyword received by the keyword input receiving unit,
Using the video / audio data stored in the video / audio data storage unit, newly create another video / audio data composed of the same time domain part as the original audio part of the text part specified by the specifying unit An editing program that functions as an editing section.