JP4411392B2

JP4411392B2 - Recording information presentation method and apparatus

Info

Publication number: JP4411392B2
Application number: JP2004210466A
Authority: JP
Inventors: 淳善本
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2004-07-16
Filing date: 2004-07-16
Publication date: 2010-02-10
Anticipated expiration: 2024-07-16
Also published as: JP2006033504A

Description

本発明は、録画情報のうちの所望の内容部分を、迅速かつ容易に検索できるように提示する方法及び装置に関する。 The present invention relates to a method and apparatus for presenting a desired content portion of recorded information so that it can be searched quickly and easily.

インターネット回線等を用いてTV電話やTV会議をする機会が増加しつつある。それに伴い、対話の内容を振り返って再生したい場面も増加している。
そのような録画情報を再生する際、所望の内容部分を捜すのに手間取ることが少なくない。 Opportunities for videophone and videoconferences using internet lines and the like are increasing. Along with this, the number of scenes that you want to look back at and play back the contents of the dialogue is increasing.
When reproducing such recorded information, it often takes time to search for a desired content portion.

録画情報の「頭出し」や「検索」をするための従来技術としては、情報のユニット毎にインデックスを自動的に付与しておく手段や、利用者が所望位置に予めインデックスを付与しておく手段がある。
例えば、特許文献１によると、インデックスの付与された位置については、迅速確実に検索することができる。
特開２００２−１３３８３７ Conventional techniques for “cueing” or “searching” recorded information include means for automatically assigning an index to each unit of information, and a user assigning an index to a desired position in advance. There is a means.
For example, according to Patent Document 1, it is possible to quickly and surely search for an indexed position.
JP 2002-133837

しかし、インデックスの付与されていない位置については、容易に検索することはできない。
また、インデックスは、一般的に「内容」に関してのインデックスであることが多く、例えば「盛り上がり」など、場の「状態」に対してインデックスを付与することは皆無であった。これは、内容検索を優先するあまり、状態検索に関しては技術的に置き去りになってきたことを意味している。 However, it is not possible to easily search for a position where no index is assigned.
In general, the index is often an index related to “content”, and for example, there is no index given to the “state” of the place such as “swell”. This means that content search has been left behind technically because priority is given to content search.

一般に、人間は、対話などイベントの時間的順序は、容易に理解し記憶することができるが、その各イベントが行われた日時までは覚えていないことが多い。これは、記憶をイベント順に理由付けて関連付けてトレースすることは容易だが、通常は絶対的相対的な時間との関連付けが行われていないためであると考えられる。
このことは、１度見た対話（例えば漫才）などから何か特定のシーンを探す場合には、イベントの発生順に次々に頭出しをして内容確認するのが最適であることを示唆する。
イベントのインデックスが付与されておらず、書き起こしたテキストも無い場合、ランダムに再生して検索するのではなく、盛り上がりなどの状態を基準にして順に頭出しをして内容確認をすることが、時間の節約になると思われる。
このためには「状態」を一瞥できる表示技術が要となる。 In general, humans can easily understand and memorize the temporal order of events such as dialogue, but often do not remember until the date and time when each event occurred. This is considered to be because it is easy to associate and trace the memory in the order of events, but usually the absolute relative time is not associated.
This suggests that when searching for a specific scene from a once viewed dialogue (for example, Manzai), it is optimal to check the contents one after another in the order of event occurrence.
If the event index is not assigned and there is no transcribed text, it is possible to check the contents by cueing in order based on the state of excitement, etc. instead of randomly playing and searching, It seems to save time.
For this purpose, a display technology that can glance at the “state” is necessary.

音声情報から内容を確認するときには、めぼしいところの少し前の部分から、時間軸に対して正方向でかつ理解できる再生速度で再生を行う。また、逆方向の再生（巻き戻し再生）は理解が困難なため一般的ではない。
この２つの事象は、音声での検索に時間をかけさせる要因となっている。
短時間に多くの検索をしなくてはならない場合、正誤判断に音声を用いるのは最後の確認手段とし、なるべく音声確認手順を減らすほうが好ましい。
そのためには「めぼしいところ」を、より絞り込めるような手段が必要とされる。 When confirming the contents from the audio information, playback is performed at a playback speed that can be understood in a positive direction with respect to the time axis, starting from a portion just before the insignificant part. Also, reverse playback (rewind playback) is not common because it is difficult to understand.
These two events are factors that make it take time to search by voice.
When many searches must be performed in a short time, it is preferable to use the voice for the correctness judgment as the last confirmation means and to reduce the voice confirmation procedure as much as possible.
For that purpose, a means for narrowing down the “notable place” is required.

対話内容がテキスト化された情報を一気に眺めるのは、音声を聞き返すより全体を見通しやすく、実際に聞くよりも短時間で内容を把握することができる。
しかし、対話等で生じたイベント内容をテキスト化するのは容易ではなく、また、テキストには一般的に、会話の内容を指し示す要となる頷きや笑い、会話の間などの非言語情報が削ぎ落とされてしまっている。フィラーに関しては、抑揚が削ぎ落とされた単なるテキスト情報だと、肯定か否定かさえ意味が不明となり、かえって内容推定が煩わしくなることがある。 Looking at the information in which the dialog contents are converted into text at a glance makes it easier to see the whole than to listen to the voice, and it is possible to grasp the contents in a shorter time than actually listening.
However, it is not easy to make texts of events that occur during conversations, and non-linguistic information such as whispering, laughing, and conversations that are the key points to the content of conversations is generally deleted. It has been dropped. For the filler, if it is just text information with the inflection cut off, the meaning of whether it is affirmative or negative will be unknown, and content estimation may become troublesome.

対話を録画したビデオの早送りは、音声内容が聞き取れないほどの高速再生であっても、内容はそこそこに理解できる。これは、非言語情報である「身体動作情報」は「音声情報」よりも時間的に疎な（時間軸に対する情報密度が低い）情報であるのが主要な原因と思われる。
しかしながら、情報密度が低いとはいえ、十分に有用な情報である。
また、これを指標にして積極的に利用する従来技術は存在しなかった。 Even if the fast-forwarding of the video recording the dialogue is so fast that the audio content cannot be heard, the content can be understood reasonably. The main reason for this is that “physical movement information” that is non-linguistic information is information that is sparser in time (information density with respect to the time axis is lower) than “voice information”.
However, even though the information density is low, the information is sufficiently useful.
In addition, there has been no prior art that uses this as an index.

図１は、従来の動画編集ソフトの表示例を示す説明図である。
記録の段階では、身体等の動作を時間軸に沿って記録する手段と、音声を時間軸に沿って記録する手段とによって、それぞれの話者をビデオなど音声付動画で録画する。従来の動画編集ソフトによると、サムネイル化された各フレームなどの映像情報と、波形グラフなどの音声情報とが、単に時間軸に沿って並列に配置されるだけなので、映像情報の多さが原因で検索が困難になっている。映像で全フレームを示せば、僅かな部分しか表示できず、また映像情報を、５秒につき１フレーム表示するなど間引けば、間引かれた部分に重要な部位があった場合に見落としてしまうというジレンマがあり、インタフェースとして未熟であった。 FIG. 1 is an explanatory diagram showing a display example of conventional video editing software.
At the recording stage, each speaker is recorded as a moving image with sound such as a video by means for recording the movement of the body or the like along the time axis and means for recording the sound along the time axis. According to the conventional video editing software, the video information such as each thumbnail frame and the audio information such as the waveform graph are simply arranged in parallel along the time axis. Searching is difficult. If all the frames are shown in the video, only a small part can be displayed, and if the video information is thinned out such as displaying one frame every 5 seconds, it will be overlooked if there is an important part in the thinned part There was a dilemma, and it was immature as an interface.

そこで、本発明は、録画情報のうちの所望の内容部分を、見落としがないように迅速かつ容易に検索できるように提示する方法及び装置を提供することを課題とする。 Therefore, an object of the present invention is to provide a method and an apparatus for presenting a desired content portion of recorded information so that it can be quickly and easily searched so as not to be overlooked.

上記課題を解決するために、本発明の録画情報提示方法は次の構成を備える。
すなわち、動作を行う物体を対象として、その物体に関する録画情報の所望の内容部分を検索できるように提示する方法であって、情報を記録する段階では、対象物体の動作に関して、独立した３方位への各変動を示す情報を時間軸に沿って記録すると共に、音声を時間軸に沿って記録することによって、対象物体を音声付動画で録画し、情報を表示して提示する段階では、対象物体の動作を表す独立した３方位への各変動を示す映像情報と、音声を表す音声情報とを、時間軸に沿って並列に配置すると共に、圧縮または伸張することで、略一瞥できる表示画面内に展開して表示することを特徴とする。 In order to solve the above problems, the recorded information presentation method of the present invention has the following configuration.
That is, it is a method of presenting an object to be operated so that a desired content portion of the recorded information related to the object can be searched, and at the stage of recording information, the operation of the object moves to three independent directions. At the stage of recording the target object as a moving image with sound by recording information indicating each variation of the sound along the time axis and recording the sound along the time axis, and displaying and presenting the information. In the display screen where video information indicating each variation in three independent directions representing the movement of the voice and voice information representing the voice are arranged in parallel along the time axis and compressed or expanded to be almost at a glance It is characterized by being expanded and displayed.

また同様に、動作を行う物体を対象として、その物体に関する録画情報の所望の内容部分を検索できるように提示する方法であって、情報を記録する段階では、予め記録された音声付動画情報を用いて、対象物体の動作に関して、独立した３方位への各変動を示す情報を時間軸に沿って抽出または変換すると共に、音声を時間軸に沿って抽出または変換することによって、対象物体を音声付動画で編集し、情報を表示して提示する段階では、対象物体の動作を表す独立した３方位への各変動を示す映像情報と、音声を表す音声情報とを、時間軸に沿って並列に配置すると共に、圧縮または伸張することで、略一瞥できる表示画面内に展開して表示することを特徴とする。 Similarly, it is a method for presenting a desired content portion of the recording information related to the object to be searched for the object to be operated, and in the stage of recording the information, And extracting or converting information indicating each variation in three independent directions with respect to the movement of the target object along the time axis, and extracting or converting the voice along the time axis. At the stage of editing with attached video and displaying and presenting information, video information indicating each change in three independent directions representing the motion of the target object and audio information representing the sound are paralleled along the time axis. In addition to being arranged on the display screen, it is expanded and displayed in a display screen that can be plunged by being compressed or expanded.

ここで、対象物体を、対話を行う人体頭部として、対話者の双方に関して、映像情報及び音声情報を表示するように構成して、対話の記録再生に利用してもよい。 Here, the target object may be configured to display video information and audio information with respect to both of the interlocutors as a human head that performs the dialogue, and may be used for recording and reproducing the dialogue.

独立した３方位を、略水平方向に回転する方位（ヨー）と、その略水平回転方位に対して垂直に回転する前後回転方位（ピッチ）と、それら略水平回転方位とも前後回転方位とも垂直に回転する左右回転方位（ロール）として、動作の簡易な認知に寄与させてもよい。 Three independent azimuths, an azimuth (yaw) that rotates in a substantially horizontal direction, a front-rear rotation azimuth (pitch) that rotates perpendicularly to the substantially horizontal rotation azimuth, and a direction that is substantially perpendicular to both the horizontal and front-rear rotation azimuth You may make it contribute to the simple recognition of operation | movement as the right-and-left rotation direction (roll) which rotates.

時間軸での１ないし１０フレームに相当する情報を、１ドット幅として表現して提示して、時間経過に関して広いを範囲を一瞥して検索できるようにしてもい。 Information corresponding to 1 to 10 frames on the time axis may be expressed as a one-dot width and presented so that a wide range with respect to time can be searched at a glance.

対象物体の動作を表す独立した３方位への各変動を示す映像情報と、対象物体のサムネイル化された画像を表す映像情報とを重ねて表示して、視覚的に動作の内容を把握しやすくしてもよい。 It is easy to grasp the contents of the movement visually by displaying the video information indicating each change in three independent directions representing the motion of the target object and the video information representing the thumbnail image of the target object. May be.

対象物体の動作情報や音声情報を分析し、動作を分類した結果に得られた分類グループを示す指標を時間軸に沿って表示して、動作の内容把握に寄与させてもよい。 The motion information and audio information of the target object may be analyzed, and an index indicating the classification group obtained as a result of classifying the motion may be displayed along the time axis to contribute to grasping the content of the motion.

本発明の録画情報提示装置は次の構成を備える。
すなわち、動作を行う物体を対象として、その物体に関する録画情報の所望の内容部分を検索できるように提示する装置であって、情報を記録する段階で、対象物体の動作に関して、独立した３方位への各変動を示す情報を時間軸に沿って記録すると共に、音声を時間軸に沿って記録することによって、対象物体を音声付動画で録画する記録手段と、情報を表示して提示する段階で、対象物体の動作を表す独立した３方位への各変動を示す映像情報と、音声を表す音声情報とを、時間軸に沿って並列に配置すると共に、圧縮または伸張することで、略一瞥できる表示画面内に展開して表示する提示手段とを備えることを特徴とする。 The recorded information presentation apparatus of the present invention has the following configuration.
That is, an apparatus that presents an object to be operated as a target so that a desired content portion of recorded information related to the object can be searched, and at the stage of recording information, the operation of the target object is moved to three independent directions. Recording information along the time axis and recording audio along the time axis, recording means for recording the target object as a moving image with audio, and displaying and presenting the information. The video information indicating each variation in three independent directions representing the motion of the target object and the audio information representing the sound are arranged in parallel along the time axis, and can be substantially at a glance by compressing or expanding. Presenting means for expanding and displaying in a display screen is provided.

また同様に、動作を行う物体を対象として、その物体に関する録画情報の所望の内容部分を検索できるように提示する装置であって、情報を記録する段階で、予め記録された音声付動画情報を用いて、対象物体の動作に関して、独立した３方位への各変動を示す情報を時間軸に沿って抽出または変換すると共に、音声を時間軸に沿って抽出または変換することによって、対象物体を音声付動画で編集する記録手段と、情報を表示して提示する段階で、対象物体の動作を表す独立した３方位への各変動を示す映像情報と、音声を表す音声情報とを、時間軸に沿って並列に配置すると共に、圧縮または伸張することで、略一瞥できる表示画面内に展開して表示する提示手段とを備えることを特徴とする。 Similarly, it is an apparatus that presents an object to be operated so that a desired content portion of the recorded information related to the object can be searched. And extracting or converting information indicating each variation in three independent directions with respect to the movement of the target object along the time axis, and extracting or converting the voice along the time axis. Recording means for editing with an attached moving image, video information indicating each change in three independent directions representing the motion of the target object, and audio information representing the sound on the time axis at the stage of displaying and presenting the information It is characterized by comprising presentation means for expanding and displaying in a display screen that can be arranged at a glance by being arranged in parallel with each other and being compressed or expanded.

本発明によると、インデックスの付与されていない位置であっても、またインデックスが付与される前の段階であっても、録画情報の所望の内容部分を容易に提示して、見落とすことなく迅速な検索に寄与することができる。 According to the present invention, it is possible to easily present a desired content portion of the recorded information quickly without missing it, even at a position where no index is assigned or even before the index is assigned. Can contribute to search.

以下に、図面を基に本発明の実施形態を説明する。
ここでは、録画の対象物体をTV電話等で対話を行う人体とし、更に、検出する動作を、人体頭部のヨー・ピッチ・ロール動作として説明する。しかし、本発明は、録画対象を任意に選択でき、検出する動作も、任意の独立した３方位への変動であればよい。 Embodiments of the present invention will be described below with reference to the drawings.
Here, it is assumed that the object to be recorded is a human body that interacts with a video phone or the like, and that the detection operation is a yaw / pitch / roll operation of the human head. However, according to the present invention, the recording target can be arbitrarily selected, and the detection operation may be any change in any three independent directions.

図２は、本発明による動画編集ソフトの表示例を示す説明図である。
対象物体を人体頭部として、記録の段階では、そのヨー・ピッチ・ロール動作を時間軸に沿って記録する手段と、音声を時間軸に沿って記録する手段とによって、話者をビデオなど音声付動画で録画する。
なお、ビデオなどの既知装置で予め録画された情報を用いて、上記のヨー・ピッチ・ロール動作及び音声に関する情報を、抽出またはコンバートさせて記録してもよい。 FIG. 2 is an explanatory diagram showing a display example of the moving image editing software according to the present invention.
With the target object as the human head, at the recording stage, the yaw / pitch / roll operation is recorded along the time axis, and the speaker is recorded as audio such as video by means of recording the sound along the time axis. Record with attached video.
Note that the information related to the yaw / pitch / roll operation and the sound may be extracted or converted by using information recorded in advance by a known device such as video.

ヨー動作の変動量であるヨー角は、略水平回転方位に回転する水平回転角であり、ピッチ動作の変動量であるピッチ角は、前後回転方位に回転する前後回転角であり、ロール動作の変動量であるロール角は、左右回転方位に回転する左右回転角を示す。 The yaw angle that is the fluctuation amount of the yaw motion is the horizontal rotation angle that rotates in the substantially horizontal rotation direction, and the pitch angle that is the fluctuation amount of the pitch motion is the front and rear rotation angle that is rotated in the front-rear rotation direction. The roll angle, which is a fluctuation amount, indicates a left-right rotation angle that rotates in the left-right rotation direction.

記録された録画情報は、図示のように、ヨー・ピッチ・ロール動作を表す映像情報と、波形グラフなどの音声情報として、時間軸に沿って並列に配置されて表示される。
この映像情報であると、情報を間引かない場合でも、時間軸での１フレームに相当する情報を１ドット幅等として表現できる。例えば、横幅で３００ドットを表示する装置ならば、３００フレームを一度に提示できるので、検索の利便性が高い。
従来では、３００フレームを表示しようとすると、仮に１つのサムネイルの横幅を６４ドットとすると、３００×６４＝１９２００ドットの表示幅が必要となってしまい、情報の間引き等の処理を要した。それに対し、本発明によると、情報を間引かなくても、時間軸上で広範囲を表示できる。
なお、１ドット幅は、時間軸での１ないし１０フレームに相当させてもよい。例えば、人間の頷き動作は１秒間に３回ぐらいが限度であり、１回の頷き動作は、５フレームのダウンと５フレームのアップとから成る１０フレームに相当するので、５フレームを１ドットで表しても、「頷き」の波形を判別できる。 As shown in the figure, the recorded recording information is displayed in parallel along the time axis as video information representing a yaw / pitch / roll operation and audio information such as a waveform graph.
With this video information, even if the information is not thinned out, information corresponding to one frame on the time axis can be expressed as one dot width or the like. For example, a device that displays 300 dots in a horizontal width can present 300 frames at a time, which is highly convenient for searching.
Conventionally, to display 300 frames, if the horizontal width of one thumbnail is 64 dots, a display width of 300 × 64 = 19200 dots is required, and processing such as thinning out information is required. On the other hand, according to the present invention, it is possible to display a wide range on the time axis without thinning out information.
One dot width may correspond to 1 to 10 frames on the time axis. For example, a human's whispering motion is limited to about 3 times per second, and a single whispering motion is equivalent to 10 frames consisting of 5 frames down and 5 frames up. Even if expressed, it is possible to discriminate the “whit” waveform.

所望の情報部分を検索するに当たっては、本発明による上記方式で、その位置をある程度目星つけてから、従来の動画編集ソフトを用いてもよい。 In searching for a desired information portion, the moving image editing software of the related art may be used after a certain degree of position is found by the above method according to the present invention.

図３は、別実施例の動画編集ソフトの表示例を示す説明図である。
本実施例の映像情報では、ヨー・ピッチ・ロール動作を表す映像情報と、サムネイル化された画像を表す映像情報とが、重ねて表示される。
この場合のサムネイルは、表示中の時間軸に合わせて間引きして表示することが好ましい。 FIG. 3 is an explanatory diagram illustrating a display example of the moving image editing software according to another embodiment.
In the video information of the present embodiment, video information representing the yaw / pitch / roll operation and video information representing the thumbnailed image are displayed in an overlapping manner.
The thumbnails in this case are preferably displayed by being thinned in accordance with the time axis being displayed.

図４は、別実施例の動画編集ソフトの表示例を示す説明図である。
本実施例では、クラスター番号が、ヨー・ピッチ・ロール動作を表す映像情報や、波形グラフなどの音声情報と共に、時間軸に沿って並列に配置されて表示される。
クラスター番号は、動作を分類した結果に得られた分類グループを示す指標である。動作の分類は、群平均法（ＵＰＧＭＡ）など、公知のクラスター分析方法を適宜利用できる。
分類グループを示す指標は、クラスター番号の代わりに、図形や記号や色分け等示してもよい。
なお、例えば「声のある頷き」と「声の無い頷き」とでは意味が異なることがあるように、動作情報と音声情報とによって、クラスター分析してもよい。 FIG. 4 is an explanatory diagram illustrating a display example of the moving image editing software according to another embodiment.
In the present embodiment, the cluster numbers are displayed in parallel along the time axis together with video information representing the yaw / pitch / roll operation and audio information such as a waveform graph.
The cluster number is an index indicating a classification group obtained as a result of classifying actions. For the classification of operations, a known cluster analysis method such as a group average method (UPGMA) can be used as appropriate.
The index indicating the classification group may indicate a figure, a symbol, color coding, or the like instead of the cluster number.
It should be noted that, for example, cluster analysis may be performed based on motion information and voice information so that the meaning may differ between “whispering with voice” and “whispering without voice”.

本発明の録画情報提示方法及び装置によると、インデックスの付与されていない位置であっても、またインデックスが付与される前の段階であっても、例えば、「対話者が双方盛り上がっている状態」や「無言で頷いて聞き役になっている状態」など、内容面でトピックスのあった位置など、録画情報の所望の内容部分を、利用者にわかりやすく提示できるので、用途が広く産業上非常に有用である。 According to the recording information presentation method and apparatus of the present invention, for example, “a state in which both of the interlocutors are excited”, even at a position where no index is assigned, or even before the index is assigned. The desired content part of the recorded information, such as the location where the topic was topical, such as “speaking silently and listening”, can be presented to the user in an easy-to-understand manner. Useful.

従来の動画編集ソフトの表示例を示す説明図Explanatory drawing showing a display example of conventional video editing software 本発明による動画編集ソフトの表示例を示す説明図Explanatory drawing which shows the example of a display of the moving image editing software by this invention 同、別実施例Same example 同、別実施例Same example

Claims

A method for presenting a desired content portion of recorded information related to an object to be searched for an object to be operated,
In the step of recording the information, with respect to the operation of the target object, records the information indicating each change on three independent directions along the time axis, and recorded along the time axis voice, further, and the operation information By recording an index indicating the classification result regarding the state of the field obtained from the analysis of the combination of sound information along the time axis , the target object is recorded with the index indicating the state of the field and a moving image with sound,
At the stage of displaying and presenting information, along the time axis, video information indicating each change in three independent directions representing the motion of the target object, audio information indicating audio, and an index indicating the state of the field are displayed along the time axis. The recording information presenting method is characterized in that it is arranged in parallel and displayed in a display screen that can be viewed at a glance by being compressed or expanded.

A method for presenting a desired content portion of recorded information related to an object to be searched for an object to be operated,
At the stage of recording information, using moving image information with sound recorded in advance, information indicating each variation in three independent directions is extracted or converted along the time axis with respect to the motion of the target object, and the sound is The target object is extracted or converted along the time axis, and an index indicating the classification result regarding the state of the field obtained from the analysis of the combination of the motion information and the voice information is extracted or converted along the time axis. Is edited with an indicator showing the state of the field and a video with sound,
At the stage of displaying and presenting information, along the time axis, video information indicating each change in three independent directions representing the motion of the target object, audio information indicating audio, and an index indicating the state of the field are displayed along the time axis. The recording information presenting method is characterized in that it is arranged in parallel and displayed in a display screen that can be viewed at a glance by being compressed or expanded.

At the stage of displaying and presenting information, video information representing the thumbnail image of the target object is also displayed in an overlapping manner.
The recording information presentation method according to claim 1 or 2.

The target object is the human head that performs dialogue
For both interlocutors, recording information presentation method according to any one of claims 1 to 3 for displaying the video information and audio information.

Three independent directions
An azimuth (yaw) that rotates in a substantially horizontal direction, a front-rear rotation azimuth (pitch) that rotates perpendicularly to the substantially horizontal rotation azimuth, and a left-right rotation azimuth (rotation that rotates vertically in both the substantially horizontal rotation direction and the front-rear rotation direction) The indirect motion transmission method according to claim 1.

The indirect motion transmission method according to any one of claims 1 to 5, wherein information corresponding to 1 to 10 frames on a time axis is presented as one dot width.

An apparatus for presenting a desired content portion of video recording information related to an object that performs an operation,
At the stage of recording information, with respect to the operation of the target object, records the information indicating each change on three independent directions along the time axis, and recorded along the time axis voice, further, and the operation information Recording means for recording the target object with the index indicating the state of the field and the moving image with sound by recording the index indicating the classification result regarding the state of the field obtained from the analysis of the combination of the audio information along the time axis, and
At the stage of displaying and presenting information, along the time axis, video information indicating each change in three independent directions representing the motion of the target object, voice information representing the voice , and an index indicating the state of the field along the time axis A recording information presentation device comprising: a display unit configured to display the image on a display screen that is arranged in parallel with each other and compressed or decompressed so as to be displayed on the screen.

An apparatus for presenting a desired content portion of video recording information related to an object that performs an operation,
At the stage of recording information, using moving image information with audio recorded in advance, information indicating each change in three independent directions with respect to the operation of the target object is extracted or converted along the time axis, and the audio is The target object is extracted or converted along the time axis, and an index indicating the classification result regarding the state of the field obtained from the analysis of the combination of the motion information and the voice information is extracted or converted along the time axis. Recording means for editing with an index indicating the state of the field and a video with sound,
At the stage of displaying and presenting information, along the time axis, video information indicating each change in three independent directions representing the motion of the target object, voice information representing the voice , and an index indicating the state of the field along the time axis A recording information presentation device comprising: a display unit configured to display the image on a display screen that is arranged in parallel with each other and compressed or decompressed so as to be displayed on the screen.