JP2000078530A

JP2000078530A - Information recorder, information recording method and recording medium

Info

Publication number: JP2000078530A
Application number: JP10257668A
Authority: JP
Inventors: Tsutomu Matsui; 勉松井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-08-28
Filing date: 1998-08-28
Publication date: 2000-03-14
Anticipated expiration: 2018-08-28
Also published as: JP3166725B2

Abstract

PROBLEM TO BE SOLVED: To input a keyword in voice and to record it in relation to video/audio data. SOLUTION: A voice recognition section 22 recognizes a word with respect to a voice entered by a voice entry section 21 by using a word dictionary 23 and gives the word to a dictionary retrieval section 24. The dictionary retrieval section 24 retrieves information of this word from the word dictionary 23 and gives the result of retrieval to a retrieval result display section 25. The retrieval result display section 25 displays plurality of retrieval results sequentially, a voice input section 26 confirms one retrieval result and gives it to an optical disk recorder recording section 27. The optical disk recorder recording section 27 records the confirmed retrieval result to an optical disk 28 in relation to video/audio data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報記録装置およ
び情報記録方法、並びに記録媒体に関し、特に、音声入
力からキーワードを生成し、映像とキーワードを関連付
けて記録することにより、キーワードによる映像の検索
を可能とする情報記録装置および情報記録方法、並びに
記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information recording apparatus, an information recording method, and a recording medium. The present invention relates to an information recording apparatus, an information recording method, and a recording medium that enable the above.

【０００２】[0002]

【従来の技術】従来のＶＴＲ型ビデオカメラでは、カメ
ラ部分にＲＥＴ（リターン）キーが用意されており、撮
影中に編集の「きっかけ」となる部分についてのマーカ
を入力することができるようになされている。従って、
このマーカを入力することにより、後で編集を行うとき
に、マーカを入力した部分を簡単に呼び出すことがで
き、編集作業を容易にすることができる。2. Description of the Related Art In a conventional VTR type video camera, a RET (return) key is provided in a camera portion so that a marker for a portion which is a "trigger" for editing can be input during shooting. ing. Therefore,
By inputting the marker, when editing is performed later, the portion where the marker is input can be easily called, and the editing operation can be facilitated.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、例え
ば、手術中に、手術の状況をビデオカメラ等で撮影する
ような場合、医師がカメラ操作を行い、マーカを入力す
ることは困難である。仮に、助手に依頼するにしても、
医師の思った通りにマーカを入力することは困難である
課題があった。However, for example, when taking a picture of the operation status with a video camera or the like during the operation, it is difficult for the doctor to operate the camera and input the marker. Even if you ask your assistant,
There is a problem that it is difficult to input a marker as expected by a doctor.

【０００４】そこで、できれば音声入力によって、自動
的に編集の「きっかけ」、又は手術中の大切な画面取り
のタイミングを取ることが望まれる。[0004] Therefore, it is desirable to automatically "edit" or to take important screen capture timing during surgery by voice input if possible.

【０００５】その場合、キーとなるワードが予め登録さ
れたワードでないと、自動キーワード検索（音声入力の
際に予め登録したワードと照合すること）を行うことが
できなくなる。手術中は手術に専念する必要があること
から、音声入力により自動的に、キーワードをメタデー
タ（追加データ）として、映像／音声に加えてレコーダ
に入力する必要がある。[0005] In this case, unless the key word is a pre-registered word, it becomes impossible to perform an automatic keyword search (collating with a pre-registered word at the time of voice input). Since it is necessary to concentrate on the operation during the operation, it is necessary to automatically input the keyword as metadata (additional data) to the recorder in addition to the video / audio by voice input.

【０００６】本発明はこのような状況に鑑みてなされた
ものであり、ビデオカメラの撮影中に、音声入力により
キーワードを入力し、後でキーワード検索が可能なよう
に、映像／音声にキーワードを付加して記録することが
できるようにするものである。The present invention has been made in view of such a situation, and a keyword is input by voice input during shooting with a video camera, and a keyword is added to video / audio so that a keyword search can be performed later. It is intended to be able to be additionally recorded.

【０００７】[0007]

【課題を解決するための手段】請求項１に記載の情報記
録装置は、映像を入力する映像入力手段と、音声を入力
する音声入力手段と、音声入力手段によって入力された
音声を認識し、音声に対応する単語に変換する音声認識
手段と、音声認識手段によって認識され、変換された単
語に対応するコードを生成する生成手段と、映像入力手
段によって入力された映像と、生成手段によって生成さ
れた単語に対応するコードを対応付けて、所定の記録媒
体に記録する記録手段とを備えることを特徴とする。ま
た、記録手段は、音声入力手段によって入力された音声
を、映像及びコードとともに記録媒体に記録するように
することができる。また、単語と単語に関する情報を関
連付けて記憶する記憶手段をさらに設けるようにし、音
声認識手段は、記憶手段に記憶されている単語に基づい
て、音声を単語に変換するようにすることができる。ま
た、単語と単語に関する情報を関連付けて記憶する記憶
手段と、音声認識手段によって認識された単語の情報
を、記憶手段から検索する検索手段と、検索手段によっ
て検索された単語の情報を表示する表示手段とをさらに
設けるようにすることができる。また、単語と単語に関
する情報を関連付けて記憶する記憶手段と、音声認識手
段によって認識された単語を、記憶手段から検索する検
索手段と、検索手段によって検索された単語の情報を表
示する表示手段と、検索手段によって複数の情報が検索
されたとき、表示手段に複数の情報を順次表示させ、複
数の情報から１つの情報を選択する選択手段とをさらに
設けるようにし、記録手段は、選択手段によって選択さ
れた情報に対応する単語のコードを記録媒体に記録する
ようにすることができる。また、コードと論理符号から
検索式を生成する検索式生成手段をさらに設けるように
し、記録手段は、コードおよび検索式と映像を関連付け
て、記録媒体に記録するようにすることができる。請求
項７に記載の情報記録方法は、映像を入力する映像入力
ステップと、音声を入力する音声入力ステップと、音声
入力ステップにおいて入力された音声を認識し、音声に
対応する単語に変換する音声認識ステップと、音声認識
ステップにおいて認識され、変換された単語に対応する
コードを生成する生成ステップと、映像入力ステップに
おいて入力された映像と、生成ステップにおいて生成さ
れた単語に対応するコードを対応付けて、所定の記録媒
体に記録する記録ステップとを備えることを特徴とす
る。請求項８に記載の記録媒体は、請求項７に記載の情
報記録方法を実行可能なプログラムを記録したことを特
徴とする。本発明に係る情報記録装置および情報記録方
法、並びに記録媒体においては、映像を入力し、音声を
入力し、入力された音声を認識し、音声に対応する単語
に変換し、音声認識され、変換された単語に対応するコ
ードを生成し、映像とコードを対応付けて、所定の記録
媒体に記録する。An information recording apparatus according to the first aspect of the present invention recognizes video input means for inputting video, audio input means for inputting audio, and audio input by the audio input means. Voice recognition means for converting to a word corresponding to voice, generation means for generating a code corresponding to the word recognized and converted by the voice recognition means, video input by the video input means, and video generated by the generation means Recording means for correlating a code corresponding to the selected word with a predetermined recording medium. The recording means may record the sound input by the sound input means on a recording medium together with the video and the code. Further, it is possible to further provide a storage unit that stores the word and information about the word in association with each other, and the voice recognition unit can convert the voice into the word based on the word stored in the storage unit. A storage unit that associates and stores the word and information about the word; a search unit that searches the storage unit for the information of the word recognized by the voice recognition unit; and a display that displays the information of the word searched by the search unit. Means may be further provided. A storage unit that stores the words and information related to the words in association with each other; a search unit that searches the storage unit for the words recognized by the voice recognition unit; and a display unit that displays information on the words searched by the search units. When a plurality of information is retrieved by the retrieval means, a plurality of information is sequentially displayed on the display means, and a selection means for selecting one information from the plurality of information is further provided. The code of the word corresponding to the selected information can be recorded on a recording medium. Further, a search formula generating means for generating a search formula from the code and the logical code may be further provided, and the recording means may associate the code and the search formula with a video and record the code and the video on a recording medium. The information recording method according to claim 7, wherein a video input step of inputting a video, a voice input step of inputting a voice, and a voice that recognizes the voice input in the voice input step and converts the voice into a word corresponding to the voice. A recognition step, a generation step of generating a code corresponding to the word recognized and converted in the voice recognition step, and associating a video input in the video input step with a code corresponding to the word generated in the generation step Recording on a predetermined recording medium. A recording medium according to an eighth aspect is characterized by recording a program capable of executing the information recording method according to the seventh aspect. In the information recording device, the information recording method, and the recording medium according to the present invention, a video is input, a voice is input, the input voice is recognized, the voice is converted into a word corresponding to the voice, the voice is recognized, and the conversion is performed. A code corresponding to the selected word is generated, and the image is associated with the code and recorded on a predetermined recording medium.

【０００８】[0008]

【発明の実施の形態】図１は、本発明の情報記録装置を
応用した光ディスクカメラ装置の一実施の形態の構成例
を示す図である。図１は、医用のビデオ取りで用いら
れ、天井の照明の中心に取り付けられた様子を示してい
る。光ディスクカメラ装置１（以下、適宜カメラと略記
する）の操作は、医師の補助スタッフ等によって行われ
るものとする。FIG. 1 is a diagram showing an example of the configuration of an embodiment of an optical disk camera device to which the information recording device of the present invention is applied. FIG. 1 shows a state in which it is used in a medical video capture and is mounted at the center of the ceiling lighting. The operation of the optical disk camera device 1 (hereinafter abbreviated as a camera as appropriate) is performed by a doctor's assistant staff or the like.

【０００９】光ディスクカメラ装置１は、被写体からの
光を集光し、被写体の映像を形成するレンズ部１１と、
周囲の音や医師が発音したキーワード等を集音するマイ
クロフォン１３と、レンズ部１１において形成された映
像と、マイクロフォン１３によって集音された音声を記
録するカメラ部１２より構成されている。The optical disk camera device 1 includes a lens unit 11 that collects light from a subject and forms an image of the subject,
The microphone 13 includes a microphone 13 that collects surrounding sounds and keywords or the like pronounced by a doctor, an image formed by the lens unit 11, and a camera unit 12 that records sound collected by the microphone 13.

【００１０】また、光ディスクカメラ装置１には、Ｚ軸
周りに光ディスクカメラ装置１を回動させるためのドラ
イバ１４と、光ディスクカメラ装置１をＹ軸周りに回動
させるためのドライバ１５を有しており、所定の方向に
光ディスクカメラ装置１を向けることができるようにな
されている。The optical disk camera device 1 has a driver 14 for rotating the optical disk camera device 1 around the Z axis and a driver 15 for rotating the optical disk camera device 1 around the Y axis. Thus, the optical disk camera device 1 can be directed in a predetermined direction.

【００１１】光ディスクカメラ装置１は、手術等の映像
を撮影している時に、音声入力された音声を認識し、予
め単語と単語の情報を登録した単語辞書２３に照らし合
わせて該当する単語を抽出し、それをキーワードとし
て、後でキーワード検索が可能なように、ビデオ記録／
音声記録のフレーム若しくはフィールド内部のシステム
データ（図３）に、追加データ（メタデータ）として挿
入するようになされている。これにより、撮影後に、キ
ーワード検索により、目的とする映像／音声が記録され
たフレームを引き出すことができる。The optical disk camera apparatus 1 recognizes voice input when capturing an image of an operation or the like, and extracts a corresponding word by referring to a word dictionary 23 in which words and word information are registered in advance. Then, using that as a keyword, the video recording /
The data is inserted as additional data (metadata) into the system data (FIG. 3) inside the frame or field of the audio recording. Thus, after shooting, a frame in which the target video / audio is recorded can be extracted by keyword search.

【００１２】また、キーワードと論理符号（ＡＮＤ／Ｏ
Ｒ）から検索式を生成し、それをシステムデータフィー
ルドに記録するようにすることもできる。例えば、心臓
／肺／バイパス／手術等の各キーワードを、ＡＮＤ／Ｏ
Ｒによって連結することにより検索式を生成し、それを
システムデータフィールドに記録しておくことで、この
検索式を指定することにより、特定のビデオ映像を引き
出すことができる。Also, keywords and logical codes (AND / O
R), a search formula may be generated and recorded in the system data field. For example, keywords such as heart / lung / bypass / surgery are AND / O
By generating a search formula by linking with R and recording it in the system data field, a specific video image can be extracted by specifying the search formula.

【００１３】図２は、図１に示した光ディスクカメラ装
置１の内部の電気的な構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of an electrical configuration inside the optical disk camera device 1 shown in FIG.

【００１４】同図に示すように、光ディスクカメラ装置
１は、音声を入力し、この音声入力を所定の音声データ
に変換する音声入力部２１と、音声入力部２１によって
変換された音声データを音声認識する音声認識部２２
と、音声認識部２２によって認識された音声認識結果の
単語について、辞書検索を行なう辞書検索部２４と、辞
書検索部２４によって検索された単語の情報をユーザに
対して表示する検索結果表示部２５とを有している。こ
れらの音声入力部２１、音声認識部２２、辞書検索部２
４、検索結果表示部２５は、辞書検索装置を構成してい
る。As shown in FIG. 1, an optical disc camera device 1 receives a voice, converts the voice input into predetermined voice data, and converts the voice data converted by the voice input unit 21 into a voice. Recognition voice recognition unit 22
A dictionary search unit 24 for performing a dictionary search on words of the voice recognition result recognized by the voice recognition unit 22; and a search result display unit 25 for displaying information of the words searched by the dictionary search unit 24 to the user. And These voice input unit 21, voice recognition unit 22, dictionary search unit 2
4. The search result display unit 25 constitutes a dictionary search device.

【００１５】また、光ディスクカメラ装置１は、音声認
識部２２が、音声入力部２１より供給される音声データ
の音声認識に使用する各単語の音声認識用テンプレート
と、辞書検索部２４が検索する各単語の情報を、同一の
単語エントリに登録してある単語辞書２３を有してい
る。ここで、音声入力部２１及び音声入力部２６は、図
１におけるマイクロフォン１３を構成している。In the optical disc camera apparatus 1, the voice recognition unit 22 uses the voice recognition template for each word used for voice recognition of the voice data supplied from the voice input unit 21 and the dictionary search unit 24 searches for each word. It has a word dictionary 23 in which word information is registered in the same word entry. Here, the voice input unit 21 and the voice input unit 26 constitute the microphone 13 in FIG.

【００１６】音声認識のためには、予め使用する言語を
登録しておき、この登録した言語のどれに一番近いかを
判別した方が高速に処理できる。このため、同一音声に
対して、複数（例えば、２，３）の選択枝が発生した場
合、どれかに確定する必要がある。この確定を音声入力
部２６から音声により入力するようにしている。For voice recognition, it is faster to register a language to be used in advance and determine which language is closest to the registered language. Therefore, when a plurality of (for example, two or three) selections occur for the same voice, it is necessary to determine one of them. This determination is input by voice from the voice input unit 26.

【００１７】音声入力部２６は、検索結果表示部２５に
より音声認識の結果として表示された複数の認識結果の
候補の選択枝に対して、ＹＥＳ又はＮＯ、或いは、ＯＫ
又はＮＧの判定を音声により入力するためのものであ
る。このときは、音声入力部２１から入力された音声は
無視される。従って、音声入力部２６は、ＹＥＳ／Ｎ
Ｏ、ＯＫ／ＮＧを認識する簡単な音声認識回路を備えて
いるものとする。The voice input unit 26 responds to the selection of a plurality of candidates of the recognition result displayed as a result of the voice recognition by the search result display unit 25 with YES or NO, or OK.
Alternatively, the NG determination is input by voice. At this time, the voice input from the voice input unit 21 is ignored. Therefore, the voice input unit 26 outputs YES / N
It is assumed that a simple speech recognition circuit for recognizing O, OK / NG is provided.

【００１８】例えば、「ｉｓｈｉ」と発音し、音声入力
部２１より入力すると、音声認識部２２は、単語辞書２
３から、「いし」という単語を認識する。次に、辞書検
索部２４は、「いし」という単語について、単語辞書２
３から、「医師」、「意志」、「石」、「縊死」等を検
索する。そして、検索結果が順番に、検索結果表示部２
５により表示される。これに対して、音声入力部２６よ
り、ＹＥＳ若しくはＮＯ、又はＯＫ若しくはＮＧを音声
によって入力することにより、複数の検索結果から、所
望の単語を選択し、確定することができる。For example, when the user pronounces "ishi" and inputs the sound from the voice input unit 21, the voice recognition unit 22
From 3, the word "is" is recognized. Next, the dictionary search unit 24 searches the word dictionary 2 for the word “Ishi”.
From 3, search for "doctor", "will", "stone", "hanging", and the like. Then, the search results are sequentially displayed in the search result display unit 2.
5 is displayed. On the other hand, by inputting YES or NO, or OK or NG by voice from the voice input unit 26, a desired word can be selected and determined from a plurality of search results.

【００１９】図３は、ビデオ／オーディオデータが記録
される光ディスク２８のフレームのフォーマットの例を
示している。各フレームは、同期をとるための同期部
と、フレームの順番を示すＩＤが格納されるシンクブロ
ックＩＤと、ビデオデータ、及びオーディオデータより
構成される。また、システムデータが、ビデオデータの
中に６４０バイト（Ｂ）（＝３２０＊２）、オーディオ
データの中に４０バイト（＝２０＊２）、合わせて６８
０バイト（Ｂ）／フィールド、即ち、１３６０バイト／
フレーム入る。外符号訂正、内符号訂正は、それぞれ誤
り訂正符号である。FIG. 3 shows an example of a frame format of the optical disk 28 on which video / audio data is recorded. Each frame includes a synchronizing unit for synchronizing, a sync block ID storing an ID indicating a frame order, video data, and audio data. The system data is 640 bytes (B) (= 320 * 2) in the video data and 40 bytes (= 20 * 2) in the audio data, for a total of 68 bytes.
0 bytes (B) / field, ie, 1360 bytes /
Enter the frame. The outer code correction and the inner code correction are error correction codes, respectively.

【００２０】ビデオデータのシステムデータには、音声
入力されたキーワードが記録され、オーディオデータの
システムデータには、キーワード、検索式、後述する
「ＹＥＳ／ＮＯ判定」に至った経過、ビデオデータのシ
ステムデータに入り切らなかったデータ等が記録され
る。In the system data of the video data, a keyword input by voice is recorded, and in the system data of the audio data, a keyword, a search formula, progress of “YES / NO determination” described later, Data that could not fit in the data is recorded.

【００２１】従って、１フレームあたり、２バイト文字
（全角）の場合、６８０字、１バイト文字（半角）の場
合、１３６０文字を、システムデータ中にメタデータと
して自動記録することが可能となる。このとき、音声認
識された音声の中のキーワードを抽出し、キーワードだ
けを記録するようにすると、結果として使用メモリ数を
圧縮して記録することが可能である。Therefore, it is possible to automatically record 680 characters in the case of 2-byte characters (full-width) and 1360 characters in the case of 1-byte characters (half-width) as metadata in the system data per frame. At this time, if a keyword is extracted from the recognized speech and only the keyword is recorded, it is possible to compress and record the number of used memories as a result.

【００２２】次に、その動作について説明する。医師が
手術中に、助手が光ディスクカメラ装置１を操作し、手
術の模様を撮影する。撮影中に、医師及び助手等が光デ
ィスクカメラ装置１に記録したい音声を音声入力部２１
より入力する。Next, the operation will be described. During the operation of the doctor, the assistant operates the optical disc camera device 1 to photograph the operation. During photographing, a doctor and an assistant or the like input sound to be recorded on the optical disc camera device 1 by the sound input unit 21.
Enter more.

【００２３】音声入力部２１に入力された音声は、デジ
タルの音声データに変換された後、音声認識部２２に供
給される。音声認識部２２は、単語辞書２３を用いて、
音声入力部２１より供給された音声データに対して音声
認識を行い、認識結果としての単語を、辞書検索部２４
に供給する。The voice input to the voice input unit 21 is supplied to a voice recognition unit 22 after being converted into digital voice data. The speech recognition unit 22 uses the word dictionary 23 to
Speech recognition is performed on the speech data supplied from the speech input unit 21, and a word as a recognition result is input to the dictionary search unit 24.
To supply.

【００２４】辞書検索部２４は、音声認識部２２より供
給された認識結果としての単語について、単語辞書２３
を用いて辞書検索を行う。そして、検索結果としての単
語の情報、例えば、日本語の場合、漢字などの情報を、
また、単語が複数検索された場合は、同音異義語などの
情報を、さらに音声認識があいまいな場合、複数の検索
結果を音声認識結果の候補として、検索結果表示部２５
に供給する。The dictionary search unit 24 converts the word as a recognition result supplied from the speech recognition unit 22 into a word dictionary 23.
Is used to perform a dictionary search. Then, word information as a search result, for example, in the case of Japanese, information such as kanji,
In addition, when a plurality of words are searched, information such as homonyms is used, and when the speech recognition is ambiguous, a plurality of search results are set as candidates for the speech recognition results, and the search result display unit 25 is used.
To supply.

【００２５】検索結果表示部２５は、辞書検索部２４か
ら供給された単語の情報を表示する。このとき、音声認
識された単語が複数ある場合、単語の情報を１つずつ順
番に表示し、その正否が指示されるのを待つ。The search result display unit 25 displays the word information supplied from the dictionary search unit 24. At this time, if there are a plurality of words that have been voice-recognized, the information of the words is displayed one by one in order, and the system waits for the right or wrong indication.

【００２６】例えば、「ＹＥＳ／ＮＯ」、又は［ＯＫ／
ＮＧ」という単語が、音声入力部２６から入力されるの
を待つ。音声入力部２６は、簡単な音声認識機能を有し
ており、「ＹＥＳ」又は「ＯＫ」が入力されたとき、検
索結果表示部２５に対して、今、表示されている単語の
情報が正しいことを示す信号を供給する。この信号を受
けた検索結果表示部２５は、いま、表示している単語が
正しい認識結果であるとみなし、その単語をカメラ部１
２の光ディスクレコーダ記録部２７に供給する。For example, “YES / NO” or [OK /
Wait for the word "NG" to be input from the voice input unit 26. The voice input unit 26 has a simple voice recognition function, and when “YES” or “OK” is input, the information of the currently displayed word is correct in the search result display unit 25. Is supplied. Upon receiving this signal, the search result display unit 25 determines that the currently displayed word is a correct recognition result, and displays that word in the camera unit 1.
2 to the optical disk recorder recording unit 27.

【００２７】一方、音声入力部２６は、「ＮＯ」又は
「ＮＧ」が入力されたとき、検索結果表示部２５に対し
て、いま表示されている単語の情報が正しくないことを
示す信号を供給する。この信号を受けた検索結果表示部
２５は、いま表示している単語が正しい認識結果ではな
いとみなし、次の単語を表示する。そして、再度、その
正否を指示する「ＹＥＳ／ＮＯ」という単語が、音声に
より、音声入力部２６から入力されるのを待つ。On the other hand, when "NO" or "NG" is input, the voice input unit 26 supplies a signal to the search result display unit 25 indicating that the information of the currently displayed word is incorrect. I do. Upon receiving this signal, the search result display unit 25 determines that the currently displayed word is not a correct recognition result, and displays the next word. Then, it waits again for the word “YES / NO” indicating the right or wrong to be input from the voice input unit 26 by voice.

【００２８】音声入力部２１より、例えば、いつ／ど
こで／誰が／どのような手術／を行ったかを入力する。
具体的には、音声により、「せんきゅうひゃくきゅうじ
ゅうはちねんはちがつにじゅういちにち、じゅうじぜろ
ぜろふんからじゅうにじぜろぜろふん、けいおうだいが
くいがくぶしなのちょうでこばやしこうしがしっとう
し、ふじのきょうじゅがたちあう、ずがいこつせいけい
しゅじゅつ、ぷらすちっくそうにゅう、かんじゃめい、
ＸＸＸＸ、おとこ、にじゅうごさい」と入力すると、音
声認識部２１は、入力された音声に対応する単語を、単
語辞書２３から抽出し、辞書検索部２４に供給する。辞
書検索部２４は、音声認識部２２によって認識された単
語について、単語辞書２３からその詳細な情報を検索す
る。From the voice input unit 21, for example, when / where / who / what kind of operation / operation was performed is input.
To be more specific, a voice prompt reads, Kobayashi Kotoshi is the best friend, Fuji's Kyoju is going to meet you
XXXX, man, twenty-six, ", the speech recognition unit 21 extracts a word corresponding to the input speech from the word dictionary 23 and supplies the word to the dictionary search unit 24. The dictionary search unit 24 searches the word dictionary 23 for detailed information on the words recognized by the speech recognition unit 22.

【００２９】例えば、「けいおうだいがく」に対して
は、単語辞書２３に予め登録されている「慶応大学」が
検索される。検索結果が複数ある場合には、上述したよ
うに、複数の検索結果を順次表示し、その中から１つを
音声により選択する。同様にして、他の単語に対して
も、単語辞書２３から検索し、対応する情報を抽出す
る。このようにして、音声入力された音声に対応するキ
ーワードが確定される。For example, for "Keiodaigaku", "Keio University" registered in the word dictionary 23 in advance is searched. When there are a plurality of search results, as described above, the plurality of search results are sequentially displayed, and one of them is selected by voice. Similarly, other words are searched from the word dictionary 23 and corresponding information is extracted. In this way, the keyword corresponding to the input voice is determined.

【００３０】上記例の場合、音声認識の結果、次のよう
なキーワードが確定される。即ち、「０８２１９８／１
０００−１２００／慶応大学医学部／信濃町／小林講師
／藤野教授／頭蓋骨成形／プラスチック挿入／患者名：
＊＊＊＊／男／２５歳」。In the case of the above example, the following keywords are determined as a result of the speech recognition. That is, “0821198/1
000-1200 / Keio University School of Medicine / Shinanomachi / Lecturer Kobayashi / Professor Fujino / Skull molding / Plastic insertion / Patient name:
***** / male / 25 years old. "

【００３１】ここで、各キーワードをアルファベット
（Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ、Ｇ、Ｈ、Ｉ、Ｊ、Ｋ）又は
番号（１，２，３，４，５，６，７，８，９，１０，１
１）で置き換え、例えば、（（Ａ＋Ｂ）＊Ｅ＊Ｉ）等の
検索式を作成し、上記キーワードとともに、光ディスク
レコーダ記録部２７に供給するようにしてもよい。上記
検索式において、「＋」は論理和を表し、「＊」は論理
積を表している。Here, each keyword is represented by an alphabet (A, B, C, D, E, F, G, H, I, J, K) or a number (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1
Instead of 1), for example, a search formula such as ((A + B) * E * I) may be created and supplied to the optical disk recorder recording unit 27 together with the keyword. In the above retrieval formula, “+” represents a logical sum, and “*” represents a logical product.

【００３２】光ディスクレコーダ記録部２７は、検索結
果表示部２５より供給されたキーワード、即ち、１又は
複数のキーワードに対応するコードを、図３に示したシ
ステムデータの中に書き込む。また、検索結果表示部２
５より、検索式が供給された場合、上記キーワードの後
に、検索式も書き込む。The optical disk recorder recording unit 27 writes the keyword supplied from the search result display unit 25, that is, a code corresponding to one or a plurality of keywords, in the system data shown in FIG. Also, the search result display section 2
When the search formula is supplied from 5, the search formula is also written after the keyword.

【００３３】このとき、検索結果表示部２５は、上述し
たように、各映像／フレーム（静止画）ごとに、後でキ
ーワード検索が可能なように、音声入力された１又は複
数のキーワード（単語）と、必要に応じて検索式を、光
ディスクレコーダ記録部２７に供給する。例えば、上記
例のように、「０８２１９８／１０００−１２００／慶
応大学医学部／信濃町／小林講師／藤野教授／頭蓋骨成
形／プラスチック挿入／患者名：＊＊＊＊／男／２５
歳」というキーワードと、検索式（（Ａ＋Ｂ）＊Ｅ＊
Ｉ）を光ディスクレコーダ記録部２７に供給する。At this time, as described above, the search result display unit 25 displays one or a plurality of voice-input keywords (words) for each video / frame (still image) so that a keyword search can be performed later. ) And, if necessary, a search formula to the optical disk recorder recording unit 27. For example, as in the above example, “0821198 / 1000-1200 / Keio University School of Medicine / Shinanomachi / Lecturer Kobayashi / Professor Fujino / Skull Molding / Plastic Insertion / Patient Name: **** / Male / 25
Age ”and a search expression ((A + B) * E *
I) is supplied to the optical disk recorder recording unit 27.

【００３４】光ディスクレコーダ記録部２７は、図３に
示したフレームのビデオデータのシステムデータ、及び
オーディオデータのシステムデータに、上記キーワード
と検索式を記録する。The optical disk recorder recording section 27 records the keyword and the search formula in the system data of video data and the system data of audio data of the frame shown in FIG.

【００３５】このようにして、映像を構成する各フレー
ム（静止画）毎に、音声入力したキーワードを記録する
とともに、必要に応じて、検索式を生成しながら記録す
ることができる。In this way, for each frame (still image) constituting a video, the keyword input by voice can be recorded, and if necessary, can be recorded while generating a search formula.

【００３６】撮影後は、例えば、「０８２１９８／１０
００−１２００／小林講師／患者名：＊＊＊＊／」を入
力すると、カメラ部１２の光ディスク２８から、入力さ
れたキーワードと同一のキーワードがシステムデータに
記録されているフレームが検索され、該当する映像又は
静止画が抽出される。該当する映像又は静止画が複数あ
る場合は、画面を分割した小画面等に多面表示される。
ここで、所定の映像又は静止画を指示することにより、
映像の場合、指示された映像が再生され、静止画の場
合、画面一杯に表示される。After photographing, for example, “0821198/10
00-1200 / Lecturer Kobayashi / Patient name: **** / ", the optical disk 28 of the camera unit 12 is searched for a frame in which the same keyword as the inputted keyword is recorded in the system data, and Video or still image to be extracted is extracted. When there are a plurality of applicable videos or still images, multiple images are displayed on a small screen obtained by dividing the screen.
Here, by instructing a predetermined video or still image,
In the case of a video, the designated video is reproduced, and in the case of a still image, the video is displayed on the entire screen.

【００３７】以上説明したように、撮影時に、音声入力
された単語を認識し、その単語について自動的に辞書検
索を行い、検索結果を、いま記録している映像や音声に
関連づけて記録するので、後でキーワードに基づいた検
索を行うことができ、編集作業が容易となり、高速に編
集することができる。しかも、キーワード検索を介して
容易にアーカイブすることができる。As described above, at the time of photographing, a word input by voice is recognized, a dictionary search is automatically performed on the word, and the search result is recorded in association with the currently recorded video and audio. Then, a search based on the keyword can be performed later, so that the editing work becomes easy and the editing can be performed at high speed. In addition, archiving can be easily performed via a keyword search.

【００３８】また、音声によりキーワードを入力するの
で、手術中の医師などのように、手が放せない状況下で
も使用することができる。Further, since the keyword is input by voice, it can be used even in a situation where hands cannot be released, such as a doctor during surgery.

【００３９】なお、上記実施の形態においては、本発明
を医用カメラに応用する場合について説明したが、自動
編集、自動アーカイブ、高速アーカイブ等にも本発明を
適用することができる。In the above embodiment, the case where the present invention is applied to a medical camera has been described. However, the present invention can be applied to automatic editing, automatic archiving, high-speed archiving, and the like.

【００４０】また、上記実施の形態においては、本発明
を光ディスクカメラ装置に応用する場合について説明し
たが、デジタルビデオカメラ等、その他の撮像装置にも
本発明を適用することができる。In the above embodiment, the case where the present invention is applied to an optical disk camera device has been described. However, the present invention can be applied to other image pickup devices such as a digital video camera.

【００４１】また、上記実施の形態におけるフレームフ
ォーマットは例であって、これに限定されるものではな
い。Further, the frame format in the above embodiment is an example, and the present invention is not limited to this.

【００４２】[0042]

【発明の効果】以上の如く、本発明に係る情報記録装置
および情報記録方法、並びに記録媒体によれば、映像と
音声を入力し、入力された音声を認識して、音声を単語
に変換し、音声認識され、変換された単語に対応するコ
ードを生成し、映像とコードを対応付けて所定の記録媒
体に記録するようにしたので、ビデオカメラ等で映像と
音声を記録している最中に、音声入力によりキーワード
を入力し、映像と音声にキーワードを付加して記録する
ことができ、後でキーワードを用いて所望の映像と音声
を検索することが可能となる。As described above, according to the information recording apparatus, the information recording method, and the recording medium according to the present invention, video and audio are input, the input audio is recognized, and the audio is converted into words. Since a code corresponding to the word that has been voice-recognized and converted is generated, and the video is associated with the code and recorded on a predetermined recording medium, the video and the audio are recorded by a video camera or the like. In addition, a keyword can be input by voice input, and a keyword can be added to video and audio, and the video and audio can be recorded, and a desired video and audio can be searched for later using the keyword.

[Brief description of the drawings]

【図１】本発明の情報記録装置を応用した光ディスクカ
メラ装置の一実施の形態の構成例を示すブロック図であ
る。FIG. 1 is a block diagram showing a configuration example of an embodiment of an optical disk camera device to which an information recording device of the present invention is applied.

【図２】図１の実施の形態の内部の電気的な構成例を示
すブロック図である。FIG. 2 is a block diagram showing an example of an internal electrical configuration of the embodiment of FIG. 1;

【図３】フレームのフォーマットの例を示す図である。FIG. 3 is a diagram illustrating an example of a frame format.

[Explanation of symbols]

１光ディスクカメラ装置１１レンズ部１２カメラ部１３マイクロフォン１４，１５ドライバ２１，２６音声入力部２２音声認識部２３単語辞書２４辞書検索部２５検索結果表示部２７光ディスクレコーダ記録部２８光ディスク Reference Signs List 1 optical disk camera device 11 lens unit 12 camera unit 13 microphone 14, 15 driver 21, 26 voice input unit 22 voice recognition unit 23 word dictionary 24 dictionary search unit 25 search result display unit 27 optical disk recorder recording unit 28 optical disk

Claims

[Claims]

1. A video input unit for inputting a video, a voice input unit for inputting a voice, a voice recognition unit for recognizing the voice input by the voice input unit and converting the voice into a word corresponding to the voice. Generating means for generating a code corresponding to the word recognized and converted by the voice recognizing means; the video input by the video input means; and the video corresponding to the word generated by the generating means. An information recording apparatus, comprising: recording means for recording a code on a predetermined recording medium in association with a code.

2. The information recording apparatus according to claim 1, wherein the recording unit records the audio input by the audio input unit together with the video and the code on the recording medium.

3. A storage unit for storing the word and information related to the word in association with each other, wherein the voice recognition unit converts the voice into the word based on the word stored in the storage unit. The information recording device according to claim 1, wherein

4. A storage unit for storing the word and information about the word in association with each other, a search unit for searching the word information recognized by the voice recognition unit from the storage unit, and a search by the search unit The information recording apparatus according to claim 1, further comprising: a display unit configured to display information of the word that has been set.

5. A storage unit that stores the word and information related to the word in association with each other; a search unit that searches the storage unit for the word recognized by the voice recognition unit; Display means for displaying the information of the word; and selecting means for sequentially displaying a plurality of pieces of information on the display means and selecting one piece of information from the plurality of pieces of information when a plurality of pieces of information are searched by the search means. The information recording apparatus according to claim 1, further comprising: a recording unit that records a code of a word corresponding to the information selected by the selection unit on the recording medium.

6. A search expression generation unit for generating a search expression from the code and the logical code, wherein the recording unit associates the code and the search expression with the video and records the image on the recording medium. The information recording device according to claim 1, wherein

7. A video input step of inputting a video, a voice input step of inputting a voice, a voice recognition step of recognizing the voice input in the voice input step and converting the voice into a word corresponding to the voice. A generation step of generating a code corresponding to the word recognized and converted in the voice recognition step; and the video input in the video input step;
A recording step of correlating the code corresponding to the word generated in the generation step with the code and recording the code on a predetermined recording medium.

8. A recording medium on which a program capable of executing the information recording method according to claim 7 is recorded.