JP6091552B2

JP6091552B2 - Movie processing apparatus and movie processing system

Info

Publication number: JP6091552B2
Application number: JP2015126896A
Authority: JP
Inventors: 俊雄石松; 恒利田中; 茂充湯浅; 明雄小金
Original assignee: J-STREAM INC.
Current assignee: J-STREAM INC.
Priority date: 2015-06-24
Filing date: 2015-06-24
Publication date: 2017-03-08
Anticipated expiration: 2035-06-24
Also published as: JP2017011581A

Description

本発明は、動画データの映像に表示される文字等の情報を処理する動画処理装置、及びこの動画処理装置を備えた動画処理システムに関する。 The present invention relates to a moving image processing apparatus that processes information such as characters displayed on video of moving image data, and a moving image processing system including the moving image processing apparatus.

従来から、インターネット等のネットワークでは、動画サーバや動画データベース等の所定のコンピュータが動画データを格納すると共に、視聴者端末に対して動画データを公開する動画配信が行われている。このような動画配信を促進するために、様々な動画処理装置や動画処理システムが提案されている。例えば、動画処理装置や動画処理システムでは、動画データに加えて、動画データの番組情報や出演者情報、字幕情報、ＣＭ情報等に関する動画情報を配信し、視聴者端末に動画情報を利用可能にしたものがある。 2. Description of the Related Art Conventionally, in a network such as the Internet, a predetermined computer such as a moving image server or a moving image database stores moving image data and distributes moving image data to the viewer terminal. In order to promote such moving image distribution, various moving image processing apparatuses and moving image processing systems have been proposed. For example, in a video processing device or a video processing system, in addition to video data, video information related to program information, performer information, caption information, CM information, etc. of video data is distributed so that video information can be used for the viewer terminal. There is what I did.

例えば、特許文献１に記載のメタデータ配信装置では、抽出変換テーブルと局固有データを利用してキー局のコンテンツのメタデータから自局でネット放送するネット番組のコンテンツのメタデータを抽出変換し、抽出変換されたメタデータを配信するので、キー局のコンテンツのメタデータを自局のコンテンツのメタデータとして、受信機に配信し、これにより、キー局以外のネット局で、ネット放送するネット番組において、キー局のコンテンツのメタデータを利用し、サーバ型放送を行う。 For example, the metadata distribution apparatus described in Patent Document 1 extracts and converts the metadata of the content of a net program broadcast on the local station from the metadata of the content of the key station using the extraction conversion table and the station-specific data. Since the extracted and converted metadata is distributed, the metadata of the content of the key station is distributed to the receiver as the metadata of the content of the local station. In the program, server type broadcasting is performed by using metadata of contents of the key station.

特開２００６−３２５１３４号公報JP 2006-325134 A

しかしながら、上記したメタデータ配信装置のような動画処理装置や動画処理システムでは、放送局が予め動画情報のメタデータを用意しなければ動画情報を提供することができない。そのため、このようなメタデータが用意されていない動画データについては動画情報を提供することができない。 However, in a moving image processing device or a moving image processing system such as the above-described metadata distribution device, moving image information cannot be provided unless the broadcast station prepares metadata of moving image information in advance. Therefore, moving image information cannot be provided for moving image data for which such metadata is not prepared.

また、動画データには、セミナーや解説等のように文字が記載された掲示物を表示する動画データや、字幕付きの動画データがあるが、これらの動画データで表示される文字は動画情報として提供されない場合がある。なお、動画データから文字が表示されている静止画フレームを切り出して、この静止画フレームに対して文字認識を行うことで文字情報を取り出すことが考えられるが、動画データから切り出した静止画フレームは画質が粗いため、従来の文字認識処理では文字情報を取り出すことが困難であった。 In addition, video data includes video data that displays postings with texts, such as seminars and commentary, and video data with captions. The text displayed in these video data is video information. May not be provided. Note that it is conceivable to extract character information by extracting a still image frame in which characters are displayed from the moving image data and performing character recognition on the still image frame. Since the image quality is rough, it is difficult to extract character information by conventional character recognition processing.

更に、視聴者は、興味のある動画データを探すためにインターネットの検索エンジンによるキーワード検索をするが、上記のように動画情報として提供されていない文字については検索結果として得られないため、所望の動画データを検索できないことがある。また、視聴者がキーワード検索で動画データを探すことができた場合でも、動画の中でそのキーワードに関連するシーンを探すことが困難である場合が多い。 Furthermore, the viewer performs keyword search using an Internet search engine in order to search for video data of interest. However, since characters that are not provided as video information as described above cannot be obtained as search results, The video data may not be searched. Even when the viewer can search for moving image data by keyword search, it is often difficult to search for a scene related to the keyword in the moving image.

そこで、本発明は上記事情を考慮し、動画データに表示される文字情報をより確実に検出すると共に、検出した文字情報の利便性を高めて、動画配信サービスの利用及び普及の向上を図ることを目的とする。 Therefore, in consideration of the above circumstances, the present invention more reliably detects character information displayed in moving image data and enhances the convenience of the detected character information to improve the use and spread of the moving image distribution service. With the goal.

上記課題を解決するために、本発明の第１の動画処理装置は、動画データから所定のフレーム間隔毎の複数の静止画フレームを切り出すフレーム切り出し部と、前記複数の静止画フレームに対して、前後に連続する前記静止画フレームの近似判定を順次行い、前記近似判定において近似と判定された場合には、先行の前記静止画フレームを処理対象フレームとすると共に、後続の前記静止画フレームを処理対象フレームから除外する近似判定部と、前記処理対象フレームにシャープ化処理を施してエッジを強調したエッジ強調フレームを生成するシャープ化部と、前記エッジ強調フレームに二値化処理を施して二値画像フレームを生成する二値化部と、前記二値画像フレームに対して文字認識処理を行って文字情報を取得する文字認識部と、前記文字情報と共に、少なくとも、当該文字情報が取得された前記動画データに関する動画情報と当該文字情報が取得された前記静止画フレームの静止画情報とを記録したメタデータを前記文字情報毎に生成するメタデータ生成部と、を備えることを特徴とする。 In order to solve the above problems, a first moving image processing apparatus of the present invention includes a frame cutout unit that cuts out a plurality of still image frames at predetermined frame intervals from moving image data, and the plurality of still image frames. Approximate determination of the still image frames consecutive in the front and rear is sequentially performed, and when it is determined as approximate in the approximation determination, the preceding still image frame is set as a processing target frame and the subsequent still image frame is processed. An approximation determination unit that is excluded from the target frame, a sharpening unit that generates an edge-enhanced frame in which an edge is emphasized by performing a sharpening process on the processing target frame, and a binary by applying a binarization process to the edge-enhanced frame A binarization unit that generates an image frame; a character recognition unit that performs character recognition processing on the binary image frame to acquire character information; Along with the character information, at least metadata for recording the moving image information on the moving image data from which the character information is acquired and the still image information of the still image frame from which the character information is acquired is generated for each character information. And a metadata generation unit.

本発明の第１の動画処理装置によれば、動画データに付随して動画情報のメタデータが予め用意されていない場合でも、動画データの内容に関連した文字情報のメタデータを提供することができる。また、動画データに表示される様々な文字情報のメタデータが作成されるため、視聴者は、興味のあるキーワードが何れの動画データの何れのシーン（静止画データ）で表示されるかを迅速に検索することが可能となる。更に、静止画フレームが前回の静止画フレームと近似する場合には、文字認識処理の対象外とすることにより、処理負担を大幅に軽減することが可能である。このように、本発明によれば、動画データに表示される文字情報をより確実に検出すると共に、検出した文字情報の利便性を高めて、動画配信サービスの利用及び普及の向上を図ることが可能となる。 According to the first moving image processing apparatus of the present invention, it is possible to provide text information metadata related to the content of moving image data even when moving image information metadata is not prepared in advance accompanying the moving image data. it can. In addition, since metadata of various character information displayed in the moving image data is created, the viewer can quickly determine which scene (still image data) in which moving image data the keyword of interest is displayed. It becomes possible to search. Furthermore, when the still image frame approximates the previous still image frame, the processing burden can be greatly reduced by excluding the character recognition processing target. As described above, according to the present invention, it is possible to more reliably detect the character information displayed in the moving image data, improve the convenience of the detected character information, and improve the use and spread of the moving image distribution service. It becomes possible.

上記課題を解決するために、本発明の第２の動画処理装置は、上述した本発明の第１の動画処理装置において、前記二値化部は、前記エッジ強調フレームの色温度範囲を算出すると共に、前記色温度範囲における所定の閾値間隔毎の複数の色温度のそれぞれに基づいて複数の閾値を取得して、前記エッジ強調フレームに対して前記複数の閾値をそれぞれ用いた二値化処理を施して複数の前記二値画像フレームを生成し、前記文字認識部は、前記複数の二値画像フレームのそれぞれに対して文字認識処理を行って前記複数の二値画像フレーム毎に前記文字情報を含む文字認識結果を得ると共に、各文字認識結果を比較して、最適な文字認識結果が得られた前記二値画像フレームのみから前記文字情報を取得し、前記メタデータ生成部は、前記複数の二値画像フレームの内、最適な文字認識結果が得られた前記二値画像フレームのみから取得された前記文字情報に基づいて前記メタデータを生成することを特徴とする。 In order to solve the above problem, the second moving image processing apparatus of the present invention is the above-described first moving image processing apparatus of the present invention, wherein the binarization unit calculates a color temperature range of the edge enhancement frame. A binarization process using the plurality of threshold values for the edge enhancement frame is obtained based on each of a plurality of color temperatures for each predetermined threshold interval in the color temperature range. To generate a plurality of binary image frames, and the character recognition unit performs a character recognition process on each of the plurality of binary image frames to generate the character information for each of the plurality of binary image frames. A character recognition result including the character recognition result, and comparing each character recognition result to obtain the character information only from the binary image frame from which the optimum character recognition result was obtained. Of the binary image frame, and generates the meta data based on the character information optimal character recognition result has been obtained only from the binary image frame obtained.

本発明の第２の動画処理装置によれば、最適な閾値で二値化処理した結果から文字情報を抽出することができる。例えば、色温度範囲が同じ静止画フレームであっても、撮影時の照明等の状況により、二値化処理のために設定すべき閾値がそれぞれ異なる場合があるが、このような場合であっても、最適な文字情報を抽出することが可能である。 According to the second moving image processing apparatus of the present invention, character information can be extracted from the result of binarization processing with an optimum threshold. For example, even in the case of still image frames with the same color temperature range, the thresholds to be set for the binarization process may differ depending on the lighting conditions at the time of shooting. Also, it is possible to extract optimum character information.

上記課題を解決するために、本発明の第３の動画処理装置は、上述した本発明の第１又は第２の動画処理装置において、前記近似判定部は、前後に連続する前記静止画フレームについてＲＧＢ値及び輝度のヒストグラムの変化値を比較し、前記比較値が所定の近似閾値以上であれば、当該前後に連続する静止画フレームを近似と判定することを特徴とする。 In order to solve the above-described problem, the third moving image processing apparatus of the present invention is the above-described first or second moving image processing apparatus of the present invention, wherein the approximate determination unit is configured to perform the still image frames consecutive in the front and rear. The change values of the RGB values and the luminance histogram are compared, and if the comparison value is equal to or greater than a predetermined approximation threshold, still and subsequent still image frames are determined to be approximate.

本発明の第３の動画処理装置によれば、近似判定部は、近似処理の正確さを維持すると共に、近似処理に係る負担を大幅に軽減することが可能である。 According to the third moving image processing apparatus of the present invention, the approximation determination unit can maintain the accuracy of the approximation process and can greatly reduce the burden on the approximation process.

上記課題を解決するために、本発明の第４の動画処理装置は、上述した本発明の第１ないし第３の何れかの動画処理装置において、前記近似判定部は、前後に連続する前記静止画フレームにおいて、四分木空間分割を行い各領域における特異点の変位量の二階微分値にて加速度に換算し、比較することで前後に連続する静止画フレームが近似するか否かを判定することを特徴とする。 In order to solve the above-described problem, a fourth moving image processing device according to the present invention is the above-described moving image processing device according to any one of the first to third aspects of the present invention, wherein the approximate determination unit is the stationary stationary image. In a picture frame, quadtree space division is performed, acceleration is converted using a second-order differential value of the amount of displacement of a singular point in each region, and comparison is made to determine whether or not a continuous still picture frame is approximated. It is characterized by that.

本発明の第４の動画処理装置によれば、近似判定部は、前後に連続する静止画フレームについてより正確に近似を判定することができる。 According to the fourth moving image processing apparatus of the present invention, the approximation determination unit can determine the approximation more accurately for the still image frames continuous in the front and rear.

上記課題を解決するために、本発明の第５の動画処理装置は、上述した本発明の第４の動画処理装置において、前記近似判定部は、前記静止画フレームの四分木空間分割を行う際に各特異点の分布及び方向ベクトルを算出し、前記文字認識部は、前記近似判定部で算出された各特異点の分布及び方向ベクトルを、各特異点の分布及び方向ベクトルからなる所定のトレーニングデータと比較することで前記文字認識処理を行うことを特徴とする。 In order to solve the above-described problem, the fifth moving image processing apparatus of the present invention is the above-described fourth moving image processing apparatus of the present invention, wherein the approximate determination unit performs quadtree space division of the still image frame. And calculating the distribution and direction vector of each singular point, and the character recognizing unit calculates the distribution and direction vector of each singular point calculated by the approximation determining unit to a predetermined value composed of the distribution and direction vector of each singular point. The character recognition processing is performed by comparing with training data.

本発明の第５の動画処理装置によれば、近似判定部における近似判定の正確性を維持すると共に、文字認識部における処理負担を軽減することができる。 According to the fifth moving image processing apparatus of the present invention, it is possible to maintain the accuracy of the approximation determination in the approximation determination unit and reduce the processing burden in the character recognition unit.

上記課題を解決するために、本発明の第６の動画処理装置は、上述した本発明の第１ないし第３の何れかの動画処理装置において、前記文字認識部は、前記二値画像フレームの被写体像の各特異点の分布及び方向ベクトルを算出すると共に、特異点の分布及び方向ベクトルからなる所定のトレーニングデータと比較することで前記文字認識処理を行うことを特徴とする。所定のトレーニングデータとは、各種フォント毎の特異点、および方向ベクトルに加え、それぞれフォントの劣化状態での特異点、および方向ベクトルを含んでいる。 In order to solve the above problems, a sixth moving image processing apparatus of the present invention is the above-described moving image processing apparatus according to any one of the first to third aspects of the present invention, wherein the character recognition unit The character recognition processing is performed by calculating the distribution and direction vector of each singular point of the subject image and comparing it with predetermined training data composed of the distribution and direction vector of the singular point. The predetermined training data includes a singular point and a direction vector in a deteriorated state of the font, in addition to a singular point and a direction vector for each font.

本発明の第６の動画処理装置によれば、文字認識部は、分布及び方向ベクトルからなる特異点を算出するため、文字認識処理で使用するデータ量を削減して処理負担を軽減することができ、また、多数用意する必要があるトレーニングデータについてもデータ量を軽減して、トレーニングデータを記憶するサーバコストを低減することが可能である。 According to the sixth moving image processing apparatus of the present invention, since the character recognition unit calculates a singular point composed of a distribution and a direction vector, it is possible to reduce the processing load by reducing the amount of data used in the character recognition process. It is also possible to reduce the data amount of training data that needs to be prepared in large numbers, and to reduce the server cost for storing the training data.

上記課題を解決するために、本発明の第７の動画処理装置は、上述した本発明の第６の動画処理装置において、前記文字認識部は、前記二値画像フレームに対して四分木空間分割を行うことで前記被写体像の各特異点を算出すると共に、当該特異点の分布及び方向ベクトルを算出することを特徴とする。 In order to solve the above problem, according to a seventh moving image processing device of the present invention, in the sixth moving image processing device of the present invention described above, the character recognizing unit has a quadtree space for the binary image frame. By performing the division, each singular point of the subject image is calculated, and the distribution and direction vector of the singular point are calculated.

本発明の第７の動画処理装置によれば、文字認識部は、分布及び方向ベクトルからなる特異点を四分木空間分割によって算出するため、文字認識処理での処理負担をより軽減することができる。 According to the seventh moving image processing apparatus of the present invention, the character recognition unit calculates a singular point composed of a distribution and a direction vector by quadtree space division, so that the processing burden in the character recognition process can be further reduced. it can.

上記課題を解決するために、本発明の動画処理システムは、上述した本発明の第１ないし第７の何れかの動画処理装置と、前記動画処理装置の前記メタデータ生成部によって生成された前記文字情報毎の前記メタデータを格納する文字情報データベースと、を備えることを特徴とする。 In order to solve the above-described problem, a moving image processing system according to the present invention includes the moving image processing device according to any one of the first to seventh aspects of the present invention described above and the metadata generation unit of the moving image processing device. And a character information database for storing the metadata for each character information.

本発明の動画処理システムによれば、視聴者は、興味のあるキーワードを用いて検索することにより、文字情報データベースからそのキーワードに対応する文字情報のメタデータを取得すると共に、このメタデータに記録された動画情報を読み出して動画データを再生することが可能となる。 According to the moving image processing system of the present invention, the viewer obtains the metadata of the character information corresponding to the keyword from the character information database by searching using the keyword of interest, and records it in this metadata. It is possible to read the moving image information and reproduce the moving image data.

本発明によれば、動画データに表示される文字情報をより確実に検出すると共に、検出した文字情報の利便性を高めて、動画配信サービスの利用及び普及の向上を図ることが可能となる。 According to the present invention, it is possible to more reliably detect character information displayed in moving image data and improve the convenience of the detected character information to improve the use and spread of the moving image distribution service.

本発明の一実施形態に係る動画処理システムの概略を示すブロック図である。It is a block diagram which shows the outline of the moving image processing system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る動画処理装置における動画データに基づく文字情報検出動作を示すフローチャートである。It is a flowchart which shows the character information detection operation | movement based on the moving image data in the moving image processing apparatus which concerns on one Embodiment of this invention.

先ず、図１を参照しながら、本発明の実施形態に係る動画処理システム１の全体の構成について説明する。図１に示されるように、動画処理システム１では、動画データに基づいて文字情報を検出する動画処理装置２が、インターネットやＬＡＮ等の所定のネットワーク３を介して、動画データを格納する動画データベース（ＤＢ）４及び文字情報のメタデータ（タグ）を格納する文字情報データベース（ＤＢ）５と通信可能に接続されている。動画ＤＢ４及び文字情報ＤＢ５は、ネットワーク３を介して検索エンジン６と通信可能に接続されていて、それぞれ格納した動画データ及び文字情報のメタデータの検索エンジン６による検索が可能になっている。また、動画処理システム１では、動画データを再生可能な視聴者端末７も、ネットワーク３を介して、動画ＤＢ４、文字情報ＤＢ５及び検索エンジン６と通信可能に接続されている。 First, an overall configuration of a moving image processing system 1 according to an embodiment of the present invention will be described with reference to FIG. As shown in FIG. 1, in the moving image processing system 1, a moving image processing apparatus 2 that detects character information based on moving image data stores a moving image database that stores moving image data via a predetermined network 3 such as the Internet or a LAN. (DB) 4 and a character information database (DB) 5 for storing character information metadata (tag) are communicably connected. The moving image DB 4 and the character information DB 5 are communicably connected to the search engine 6 via the network 3, and the search engine 6 can search for the stored moving image data and character information metadata, respectively. In the moving image processing system 1, a viewer terminal 7 that can reproduce moving image data is also connected to the moving image DB 4, the character information DB 5, and the search engine 6 via the network 3.

先ず、動画処理装置２について説明する。動画処理装置２は、例えば、ネットワーク３上で動画ＤＢ４や文字情報ＤＢ５から独立して設けられていてもよく、又は、動画ＤＢ４や文字情報ＤＢ５を管理するコンピュータとして設けられていてもよい。本実施形態では、ネットワーク３上に１つの動画処理装置２が備えられる例を説明するが、複数の動画処理装置２がネットワーク３上に備えられてよい。また、動画処理装置２は、動画のカテゴリー（業種）別に備えられていてもよい。 First, the moving image processing apparatus 2 will be described. For example, the moving image processing device 2 may be provided independently of the moving image DB 4 and the character information DB 5 on the network 3 or may be provided as a computer that manages the moving image DB 4 and the character information DB 5. In the present embodiment, an example in which one moving image processing device 2 is provided on the network 3 will be described. However, a plurality of moving image processing devices 2 may be provided on the network 3. The moving image processing apparatus 2 may be provided for each moving image category (business type).

動画処理装置２は、動画データに対して文字認識処理を実行するもので、文字認識処理を行うことができる動画データは、セミナーや解説等のように文字が記載される掲示物が頻出する動画や、字幕を伴う映画等の動画に限定されず、文字が表示される動画であればよく、例えば、株価データや会社名が頻出する金融に係る動画や、商品名や会社名が表示される広告に係る動画等、多岐に亘る。 The moving image processing apparatus 2 performs character recognition processing on moving image data. The moving image data that can be subjected to character recognition processing is a moving image in which postings with characters are frequently displayed, such as seminars and explanations. It is not limited to videos such as movies with subtitles, and may be any video that displays characters. For example, videos related to finance with frequent stock price data and company names, product names, and company names are displayed. A wide range of videos related to advertisements.

動画処理装置２は、例えば、制御部１０と、記憶部１１と、通信部１２と、フレーム切り出し部１３と、近似判定部１４と、シャープ化部１５と、二値化部１６と、文字認識部１７と、メタデータ生成部１８と、を備える。なお、フレーム切り出し部１３、近似判定部１４、シャープ化部１５、二値化部１６、文字認識部１７及びメタデータ生成部１８は、記憶部１１に記憶され、制御部１０によって制御されることで動作するプログラムで構成されていてもよい。 The moving image processing apparatus 2 includes, for example, a control unit 10, a storage unit 11, a communication unit 12, a frame cutout unit 13, an approximation determination unit 14, a sharpening unit 15, a binarization unit 16, and character recognition. Unit 17 and metadata generation unit 18. The frame cutout unit 13, the approximation determination unit 14, the sharpening unit 15, the binarization unit 16, the character recognition unit 17, and the metadata generation unit 18 are stored in the storage unit 11 and controlled by the control unit 10. It may be configured by a program that operates in.

制御部１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等を有して、動画処理装置２の全体の動作を統括して制御するように構成される。記憶部１１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリや、ハードディスク等の記録媒体を有して、制御部１０で制御される情報やデータ、プログラム等を記憶するように構成される。 The control unit 10 includes a CPU (Central Processing Unit) and the like, and is configured to control the overall operation of the moving image processing apparatus 2. The storage unit 11 has a memory such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and a recording medium such as a hard disk, and stores information, data, programs, and the like controlled by the control unit 10. Configured.

通信部１２は、動画処理装置２がインターネットやＬＡＮ等のネットワーク３に接続するためのインタフェースであり、即ち、動画処理装置２を動画ＤＢ４や文字情報ＤＢ５とネットワーク３を介して接続する。 The communication unit 12 is an interface for connecting the moving image processing apparatus 2 to the network 3 such as the Internet or a LAN, that is, connecting the moving image processing apparatus 2 to the moving image DB 4 or the character information DB 5 via the network 3.

通信部１２は、例えば、ネットワーク３を介して動画ＤＢ４と通信することにより、動画処理装置２で文字情報検出の対象となる動画データを動画ＤＢ４から受信する。例えば、通信部１２は、動画処理装置２の操作者によって、動画取得動作や文字情報検出の開始動作が実行されると共に、動画データ及び当該動画データの取得先の動画ＤＢ４が選択されると、動画ＤＢ４から動画データを取得する。また、通信部１２は、動画処理装置２が備える動画取得クローラ（図示せず）の動作に応じて、動画ＤＢ４に記録された動画データを順次取得してもよい。なお、動画取得クローラ（図示せず）は、ネットワーク３内の全ての動画データを取得してもよいが、操作者によって選択されたカテゴリー（業種）やキーワードに基づいて動画データを検索して取得してもよい。 For example, the communication unit 12 communicates with the moving image DB 4 via the network 3 to receive moving image data for character information detection by the moving image processing apparatus 2 from the moving image DB 4. For example, when the moving image acquisition operation and the character information detection start operation are performed by the operator of the moving image processing device 2 and the moving image data and the moving image DB 4 from which the moving image data is acquired are selected, The moving image data is acquired from the moving image DB 4. Further, the communication unit 12 may sequentially acquire the moving image data recorded in the moving image DB 4 in accordance with the operation of a moving image acquisition crawler (not shown) included in the moving image processing device 2. A video acquisition crawler (not shown) may acquire all the video data in the network 3, but searches and acquires video data based on the category (industry) and keywords selected by the operator. May be.

なお、通信部１２は、動画データの取得先として、動画ＤＢ４に限定せず、放送局からネットワーク３を介して又は放送受信機（図示せず）によって放送波を受信して動画データを取得してもよく、あるいは、動画処理装置２に直接接続されるスマートフォンやパーソナルコンピュータ等の外部端末から動画データを取得してもよい。 Note that the communication unit 12 is not limited to the moving image DB 4 as a moving image data acquisition destination, and acquires moving image data by receiving broadcast waves from a broadcasting station via the network 3 or by a broadcast receiver (not shown). Alternatively, the moving image data may be acquired from an external terminal such as a smartphone or a personal computer directly connected to the moving image processing device 2.

また、通信部１２は、例えば、ネットワーク３を介して文字情報ＤＢ５と通信することにより、動画処理装置２で生成した文字情報のメタデータを文字情報ＤＢ５へと送信する。なお、動画処理装置２は、通信部１２によるメタデータの送信先の文字情報ＤＢ５を選択できるように構成されてよい。 Moreover, the communication part 12 transmits the metadata of the character information produced | generated with the moving image processing apparatus 2 to character information DB5 by communicating with character information DB5 via the network 3, for example. Note that the moving image processing apparatus 2 may be configured to be able to select the text information DB 5 that is the transmission destination of metadata by the communication unit 12.

フレーム切り出し部１３は、通信部１２が動画ＤＢ４から受信した動画データから複数の静止画フレームを取得する。本実施形態では特に、フレーム切り出し部１３は、動画データに対して所定のフレーム間隔毎に、例えば１秒の時間間隔毎に、静止画フレームを切り出すフレーム切り出しを行ってフレーム間隔毎の複数の静止画フレームを取得する。なお、フレーム切り出し部１３は、後述する特異点の抽出量を増やすために、高画質化して画像ピクセル数を増大させた静止画フレームを取得するとよい。 The frame cutout unit 13 acquires a plurality of still image frames from the moving image data received from the moving image DB 4 by the communication unit 12. Particularly in the present embodiment, the frame cutout unit 13 performs frame cutout for cutting out still image frames at predetermined frame intervals, for example, at time intervals of 1 second, with respect to the moving image data, and performs a plurality of still images at each frame interval. Get a picture frame. Note that the frame cutout unit 13 may acquire a still image frame in which the number of image pixels is increased by increasing the image quality in order to increase the amount of singular points to be described later.

近似判定部１４は、フレーム切り出し部１３で得られた複数の静止画フレームに対して近似判定を行って、近似する静止画フレームを文字認識処理の処理対象フレームから除外する。本実施形態では特に、近似判定部１４は、複数の静止画フレームの内、前後に連続する２つの静止画フレームの近似判定を順次行い、この近似判定において近似と判定された場合には、先行の静止画フレームを処理対象フレームとすると共に、後続の静止画フレームを処理対象フレームから除外する。なお、先の近似判定において後続の静止画フレームを処理対象フレームから除外した場合には、今回の近似判定において後続の静止画フレームと比較されるフレームは、先の近似判定で処理対象フレームとした静止画フレームとなる。 The approximate determination unit 14 performs approximate determination on the plurality of still image frames obtained by the frame cutout unit 13 and excludes the approximate still image frames from the processing target frames of the character recognition process. In the present embodiment, in particular, the approximate determination unit 14 sequentially performs an approximate determination of two still image frames that are consecutive in the front and the back of a plurality of still image frames. This still image frame is set as a processing target frame, and the subsequent still image frame is excluded from the processing target frame. When the subsequent still image frame is excluded from the processing target frame in the previous approximation determination, the frame to be compared with the subsequent still image frame in the current approximate determination is set as the processing target frame in the previous approximation determination. It becomes a still image frame.

例えば、近似判定部１４は、近似判定として、前後に連続する静止画フレームについてＲＧＢ値及び輝度のヒストグラムの変化値を比較し、比較値が所定の近似閾値以上であれば、前後に連続する静止画フレームを近似と判定する。更に、近似判定部１４は、連続する静止画フレームにおいて、四分木空間分割を行い各領域における特異点の変位量の二階微分値にて加速度に換算し、比較することで前後に連続する静止画フレームが近似するか否かを判定する。あるいは、近似判定部１４は、上記のヒストグラムを用いた比較と、上記の四分木空間分割を用いた比較との何れかによって近似を判定してもよい。 For example, the approximation determination unit 14 compares the RGB value and the change value of the histogram of the luminance for the still image frames continuous before and after as an approximation determination, and if the comparison value is equal to or greater than a predetermined approximation threshold, The image frame is determined to be approximate. Further, the approximate determination unit 14 performs quadtree space division on successive still image frames, converts the acceleration into a second-order differential value of the displacement amount of the singular point in each region, and compares the accelerations with each other to obtain a stationary still image. It is determined whether or not the image frame is approximate. Or the approximation determination part 14 may determine an approximation by either of the comparison using said histogram, and the comparison using said quadtree space division | segmentation.

なお、セミナーや解説等のように文字が記載された掲示物を表示する動画データや、字幕付きの動画データのように、文字の表示される位置が予め決められている場合には、近似判定部１４は、例えば、予め用意したテンプレートを用いて文字の表示部分だけのデータを取得し、文字の表示部分にターゲットを絞って近似処理を行ってもよい。この場合、動画処理装置２では、文字の表示部分毎のテンプレートを管理して、テンプレートの新規作成、変更、削除などの機能を有する。 It should be noted that if the position where characters are to be displayed is determined in advance, such as video data that displays postings with characters, such as seminars or commentary, or video data with subtitles, approximate determination is made. For example, the unit 14 may acquire data of only the character display portion using a template prepared in advance, and may perform the approximation process by focusing on the character display portion. In this case, the moving image processing apparatus 2 manages a template for each character display portion and has functions such as new creation, change, and deletion of a template.

シャープ化部１５は、近似判定部１４で得られた処理対象フレームにシャープ化処理を施すことにより、被写体像のエッジを強調したエッジ強調フレームを生成する。 The sharpening unit 15 generates an edge-enhanced frame in which the edge of the subject image is enhanced by performing a sharpening process on the processing target frame obtained by the approximation determining unit 14.

シャープ化部１５によるシャープ化処理では、例えば、処理対象フレームのガウス分布（標準偏差）を算出することでアンシャープマスクを生成し、処理対象フレームにアンシャープマスクを加えることでシャープフレームを生成する。更に、このシャープフレームにおいて、画素間で輝度が変位する区間を検出すると共にその区間の輝度の変位量を算出し、また、変位量を加速度に換算し、そして、シャープフレームにおいて加速度が大きいほどエッジをより強調するエッジ抽出処理を行ってエッジ強調フレームを生成する。 In the sharpening process by the sharpening unit 15, for example, an unsharp mask is generated by calculating a Gaussian distribution (standard deviation) of the processing target frame, and a sharp frame is generated by adding the unsharp mask to the processing target frame. . Further, in this sharp frame, a section where the luminance is displaced between pixels is detected, the amount of luminance displacement in that section is calculated, the amount of displacement is converted into acceleration, and the edge increases as the acceleration increases in the sharp frame. Edge enhancement processing is performed to further emphasize the edge to generate an edge enhancement frame.

二値化部１６は、シャープ化部１５で生成されたエッジ強調フレームに二値化処理を施して二値画像フレームを生成する。二値化部１６は、例えば、二値化処理の前にエッジ強調フレームに対してゼロ交差法等を用いてエッジ検出を行い、このエッジ検出によって特異点が検出されたフレームに対して二値化処理を施してもよい。 The binarization unit 16 performs binarization processing on the edge enhancement frame generated by the sharpening unit 15 to generate a binary image frame. For example, the binarization unit 16 performs edge detection on the edge-enhanced frame using a zero-crossing method or the like before binarization processing, and binarizes the frame in which a singular point is detected by the edge detection. The treatment may be performed.

本実施形態では特に、二値化部１６は、エッジ強調フレーム（又は元の静止画フレーム）の色温度範囲（画像の各ドットが存在する周波数帯域）を算出すると共に、色温度範囲における所定の閾値間隔毎の複数の色温度のそれぞれに基づいて複数の閾値を取得する。そして、二値化部１６は、エッジ強調フレームに対して複数の閾値をそれぞれ用いた二値化処理を施して複数の二値画像フレームを生成する。なお、二値化部１６は、エッジ強調フレームの色温度範囲の最大値及び最小値に基づく閾値で二値化処理をしても、黒部分又は白部分が多すぎる二値画像フレームが生成されるため、これらの閾値での二値化処理は行わない。また、二値化処理部１６は、生成した複数の二値画像フレームの濃度（黒部分及び白部分）の分布を参照して、黒部分又は白部分が多すぎる二値画像フレームを除外してもよい。 In particular, in the present embodiment, the binarization unit 16 calculates the color temperature range (frequency band in which each dot of the image exists) of the edge enhancement frame (or the original still image frame), and performs predetermined processing in the color temperature range. A plurality of threshold values are acquired based on each of a plurality of color temperatures for each threshold interval. Then, the binarization unit 16 generates a plurality of binary image frames by performing binarization processing using a plurality of thresholds on the edge enhancement frame. Note that the binarization unit 16 generates a binary image frame having too many black portions or white portions even if binarization processing is performed with threshold values based on the maximum value and the minimum value of the color temperature range of the edge enhancement frame. Therefore, binarization processing with these threshold values is not performed. Further, the binarization processing unit 16 refers to the density distribution (black portion and white portion) of the generated plurality of binary image frames, and excludes the binary image frame having too many black portions or white portions. Also good.

文字認識部１７は、二値化部１６で生成された二値画像フレームに対して文字認識処理を行って文字情報を取得する。この文字認識処理において、文字認識部１７は、二値画像フレームから文字を１つずつ認識し、例えば、二値画像フレームの被写体像を示す特異点を抽出すると共に、抽出された特異点の集まりを１つの文字の文字候補とする。例えば、文字認識部１７は、二値画像フレームに対して四分木空間分割を行うことで被写体像の各特異点を算出する。なお、上記の近似判定部１４が、四分木空間分割を行って特異点を算出する場合、文字認識部１７は、四分木空間分割を行うことなく、近似判定部１４で算出した特異点を用いてもよい。 The character recognition unit 17 performs character recognition processing on the binary image frame generated by the binarization unit 16 and acquires character information. In this character recognition process, the character recognition unit 17 recognizes characters one by one from the binary image frame, extracts, for example, singular points indicating the subject image of the binary image frame, and collects the extracted singular points. Is a character candidate of one character. For example, the character recognition unit 17 calculates each singular point of the subject image by performing quadtree space division on the binary image frame. When the approximation determination unit 14 performs quadtree space division to calculate singular points, the character recognition unit 17 does not perform quadtree space division and the singular points calculated by the approximation determination unit 14. May be used.

なお、二値画像フレーム（元の静止画フレーム）で文字が斜めに表示されていた場合でも、文字認識部１７は、基準線を導入すると共にベクトル空間を設定することによって、文字候補の特異点のベクトル方向を正確に修正することができる。例えば、文字認識部１７は、３Ｄ空間認識で利用される仕組みと同様にして、隣接する文字候補の配列方向のベクトルから基準線を取得し、この基準線が水平又は垂直となるように文字候補の特異点のベクトル方向を修正する。 Even when characters are displayed obliquely in a binary image frame (original still image frame), the character recognition unit 17 introduces a reference line and sets a vector space to thereby determine a singular point of a character candidate. The vector direction can be accurately corrected. For example, the character recognition unit 17 acquires a reference line from a vector in the arrangement direction of adjacent character candidates in the same manner as the mechanism used in 3D space recognition, and the character candidates so that the reference line is horizontal or vertical. Correct the vector direction of the singular point.

また、文字認識部１７は、文字認識のためのトレーニングデータとして、既定の様々なフォントの様々な文字、およびそれらの劣化状態について特異点の分布と方向ベクトルを予め登録しておく。そして、文字認識部１７は、文字候補の特異点をトレーニングデータと比較することでトレーニングデータの何れかの文字に該当するか否かを判定して、二値画像フレーム上の各文字を認識する。 Also, the character recognition unit 17 registers in advance singularity distributions and direction vectors for various characters in various predetermined fonts and their deterioration states as training data for character recognition. Then, the character recognition unit 17 determines whether the character candidate corresponds to any character of the training data by comparing the singular point of the character candidate with the training data, and recognizes each character on the binary image frame. .

更に、文字認識部１７は、上記のようにして認識できた文字に対して辞書データ処理を行い、この辞書データ処理では、隣接する２つ以上の文字列を、予め登録してある単語辞書と比較、照合する。そして、文字認識部１７は、文字列が単語辞書の何れかの単語に該当するか否かを判定し、その判定結果に基づいて文字情報を取得する。なお、文字認識部１７は、文字列が単語辞書の何れかの単語にも該当しない場合でも、例えば誤読パターンに該当する場合には、その誤読パターンに対する正しい文字列に自動的に訂正して文字情報としてよい。上記のように、文字認識部１７で認識された文字情報は、二値画像フレームにおける特異点の集まりからなるデータと共に、追加トレーニングデータとして保存するとよい。 Furthermore, the character recognition unit 17 performs dictionary data processing on the characters recognized as described above, and in this dictionary data processing, two or more adjacent character strings are stored in a pre-registered word dictionary. Compare and collate. And the character recognition part 17 determines whether a character string corresponds to any word of a word dictionary, and acquires character information based on the determination result. Even if the character string does not correspond to any word in the word dictionary, for example, if it corresponds to a misread pattern, the character recognition unit 17 automatically corrects the character string to the correct character string for the misread pattern. It may be information. As described above, the character information recognized by the character recognition unit 17 may be stored as additional training data together with data including a collection of singular points in the binary image frame.

なお、文字認識部１７は、トレーニングデータや単語辞書の単語に優先度を付加しておき、上記の文字認識処理において、優先度の高いトレーニングデータや単語から順に文字候補や文字列との比較に用いるとよい。例えば、文字認識部１７は、認識される頻度の高いトレーニングデータや単語に対して優先度を高く設定する。 The character recognition unit 17 adds priorities to the training data and words in the word dictionary, and in the character recognition process, the character recognition unit 17 compares the candidates with the character candidates and character strings in order from the training data and words having the highest priority. Use it. For example, the character recognition unit 17 sets a high priority for training data and words that are frequently recognized.

また、文字認識部１７は、上記のように文字認識処理を行う文字認識エンジンを動画のカテゴリー（業種）別に備え、更に、動画のカテゴリー別にトレーニングデータや単語辞書を予め登録しておくとよい。文字認識部１７は、動画データに記録された動画情報に含まれるカテゴリーを判別し、又は、操作者の入力したカテゴリーを判別する。そして、文字認識部１７は、判別されたカテゴリーに対応する文字認識エンジンを使用すると共に、このカテゴリーに対応するトレーニングデータや単語辞書を優先的に使用して文字認識処理を行うとよい。 In addition, the character recognition unit 17 may include a character recognition engine that performs character recognition processing as described above for each category (business type) of a moving image, and further, training data and a word dictionary may be registered in advance for each category of a moving image. The character recognition unit 17 determines a category included in the moving image information recorded in the moving image data, or determines a category input by the operator. The character recognition unit 17 may use a character recognition engine corresponding to the determined category and perform character recognition processing preferentially using training data or a word dictionary corresponding to the category.

本実施形態では特に、文字認識部１７は、二値化部１６で生成された複数の二値画像フレームのそれぞれに対して文字認識処理を行う。そして、文字認識部１７は、複数の二値画像フレーム毎に文字情報を含む文字認識結果を得ると共に、各文字認識結果を比較する。このとき、文字認識部１７は、文字認識結果として、例えば、認識できた文字数と、認識できた文字の中で意味を持つ文字として辞書から導き出された文字数とを判定し、これらの文字数が多いものを最適な文字認識結果として判定する。なお、単に認識できた文字よりも、意味を持つ文字の優先度を高く設定してよい。そして、文字認識部１７は、最適な文字認識結果が得られた二値画像フレームのみから文字情報を取得する。 Particularly in the present embodiment, the character recognition unit 17 performs character recognition processing on each of a plurality of binary image frames generated by the binarization unit 16. Then, the character recognition unit 17 obtains a character recognition result including character information for each of a plurality of binary image frames, and compares the character recognition results. At this time, the character recognition unit 17 determines, for example, the number of recognized characters and the number of characters derived from the dictionary as meaningful characters among the recognized characters as the character recognition result, and the number of these characters is large. The object is determined as the optimum character recognition result. Note that the priority of characters that have meaning may be set higher than characters that are simply recognized. And the character recognition part 17 acquires character information only from the binary image frame from which the optimal character recognition result was obtained.

メタデータ生成部１８は、文字認識部１７で得られた文字情報毎にメタデータを生成する。メタデータ生成部１８は、例えば、文字情報と共に、当該文字情報が取得された動画データに関する動画情報と、当該文字情報が取得された静止画フレームの静止画情報とを記録したメタデータを生成する。 The metadata generation unit 18 generates metadata for each character information obtained by the character recognition unit 17. The metadata generation unit 18 generates, for example, metadata in which the moving image information regarding the moving image data from which the character information is acquired and the still image information of the still image frame from which the character information is acquired together with the character information. .

メタデータの動画情報としては、動画データの動画ＩＤ、フレーム数、フレームサイズ及びフォーマット形式等が記録されてよく、その他に、動画データのタイトル、作者情報、作成日時、動画のカテゴリー、サムネイル（ＵＲＬ）等が記録されてもよい。メタデータの静止画情報としては、例えば、文字情報が取得された静止画フレームの動画データにおけるリレーションＩＤや時間情報（タイムスタンプ）、及びこの静止画フレームのフレーム番号（ユニークＩＤ）等が記録されてよい。また、メタデータ生成部１８は、文字認識処理の処理日時や処理状況データをメタデータに記録するとよい。本実施形態では特に、メタデータ生成部１８は、二値化部１６で生成された複数の二値画像フレームの内、最適な文字認識結果が得られた二値画像フレームのみから取得された文字情報に基づいてメタデータを生成する。 As the video information of the metadata, the video ID of the video data, the number of frames, the frame size, the format format, etc. may be recorded. In addition, the title of the video data, author information, creation date, video category, thumbnail (URL ) Etc. may be recorded. As the still image information of the metadata, for example, the relation ID and time information (time stamp) in the moving image data of the still image frame from which the character information is acquired, the frame number (unique ID) of this still image frame, and the like are recorded. It's okay. Further, the metadata generation unit 18 may record the processing date / time and processing status data of the character recognition processing in the metadata. In the present embodiment, in particular, the metadata generation unit 18 is a character acquired from only a binary image frame from which an optimum character recognition result is obtained among a plurality of binary image frames generated by the binarization unit 16. Generate metadata based on the information.

動画ＤＢ４は、動画データを格納すると共に、視聴者端末７からのアクセスに応じて動画をダウンロード方式やストリーミング方式で配信するデータベースである。また、動画ＤＢ４は、動画処理装置２からの取得動作に応じて、動画データそのものを動画処理装置２へと提供することができる。動画ＤＢ４に格納された動画データは、映像データや音声データに加えて、予め設定された動画タイトルや内容等の動画情報が記録されていてよく、動画情報を検索キーワードとすることで検索エンジン６によって検索可能となる。また、動画ＤＢ４は、所定の視聴者にアクセス権限を付与して、当該視聴者の視聴者端末７からの要求に応じて動画データを配信するように構成されてもよい。 The moving image DB 4 is a database that stores moving image data and distributes moving images according to an access from the viewer terminal 7 by a download method or a streaming method. In addition, the moving image DB 4 can provide the moving image data itself to the moving image processing device 2 in accordance with an acquisition operation from the moving image processing device 2. The moving image data stored in the moving image DB 4 may store moving image information such as preset moving image titles and contents in addition to video data and audio data. Can be searched. Further, the moving image DB 4 may be configured to grant access authority to a predetermined viewer and distribute the moving image data in response to a request from the viewer terminal 7 of the viewer.

本実施形態では、ネットワーク３上に１つの動画ＤＢ４が備えられる例を説明するが、複数の動画ＤＢ４がネットワーク３上に備えられてよい。また、動画ＤＢ４は、動画のカテゴリー（業種）別に備えられていてもよい。動画ＤＢ４は、１つの動画処理装置２で利用されるものに限定されず、複数の動画処理装置２で利用可能に設けられてよい。 In the present embodiment, an example in which one moving image DB 4 is provided on the network 3 will be described. However, a plurality of moving image DBs 4 may be provided on the network 3. The moving image DB 4 may be provided for each moving image category (business type). The moving image DB 4 is not limited to the one used by one moving image processing device 2, and may be provided so as to be usable by a plurality of moving image processing devices 2.

文字情報ＤＢ５は、動画処理装置２で生成された文字情報のメタデータを格納すると共に、視聴者端末７からのアクセスに応じて文字情報のメタデータを提供するデータベースである。文字情報ＤＢ５に格納されたメタデータは、その文字情報を検索キーワードとすることで検索エンジン６によって検索可能となる。また、文字情報ＤＢ５は、所定の視聴者にアクセス権限を付与して、当該視聴者の視聴者端末７からの要求に応じて文字情報のメタデータを提供するように構成されてもよい。 The character information DB 5 is a database that stores character information metadata generated by the moving image processing apparatus 2 and provides character information metadata in response to access from the viewer terminal 7. The metadata stored in the character information DB 5 can be searched by the search engine 6 by using the character information as a search keyword. Further, the character information DB 5 may be configured to give access authority to a predetermined viewer and to provide metadata of the character information in response to a request from the viewer terminal 7 of the viewer.

更に、文字情報ＤＢ５は、格納頻度や検索頻度が高い文字情報のメタデータが優先的に検索されるようにメタデータを格納するとよい。また、文字情報ＤＢ５は、視聴者端末７が検索エンジン６を介して所定の文字情報を検索するときに、当該文字情報について、メタデータを１つずつ検索エンジン６へと提供してもよいが、複数のメタデータからなるリストを検索エンジン６へと提供してもよい。 Further, the character information DB 5 may store metadata so that metadata of character information having a high storage frequency or high search frequency is preferentially searched. Further, when the viewer terminal 7 searches for predetermined character information via the search engine 6, the character information DB 5 may provide metadata for the character information one by one to the search engine 6. A list including a plurality of metadata may be provided to the search engine 6.

本実施形態では、ネットワーク３上に１つの文字情報ＤＢ５が備えられる例を説明するが、複数の文字情報ＤＢ５がネットワーク３上に備えられてよい。また、文字情報ＤＢ５は、動画のカテゴリー（業種）別に備えられていてもよい。文字情報ＤＢ５は、１つの動画処理装置２で利用されるものに限定されず、複数の動画処理装置２で利用可能に設けられてよい。 In the present embodiment, an example in which one character information DB 5 is provided on the network 3 will be described. However, a plurality of character information DBs 5 may be provided on the network 3. The character information DB 5 may be provided for each category (business type) of the moving image. The character information DB 5 is not limited to that used by one moving image processing apparatus 2, and may be provided so as to be usable by a plurality of moving image processing apparatuses 2.

視聴者端末７は、ネットワーク３に接続可能であって動画データを再生可能な端末であればよく、例えば、スマートフォン、携帯電話機及びタブレット等の携帯端末や、パーソナルコンピュータ及びテレビ等の据え置き型端末でよい。 The viewer terminal 7 may be any terminal that can be connected to the network 3 and can reproduce moving image data. For example, the viewer terminal 7 may be a mobile terminal such as a smartphone, a mobile phone, and a tablet, or a stationary terminal such as a personal computer and a television. Good.

次に、このような構成を備えた動画処理システム１において、動画処理装置２による動画データに基づく文字情報検出動作について、図２を参照して説明する。 Next, the character information detection operation based on the moving image data by the moving image processing device 2 in the moving image processing system 1 having such a configuration will be described with reference to FIG.

先ず、動画処理システム１では、動画データの動画提供者が動画データを動画ＤＢ４にアップロードしておく。 First, in the moving image processing system 1, a moving image provider of moving image data uploads moving image data to the moving image DB 4.

一方、動画処理装置２では、例えば、操作者によって文字情報検出の開始動作が実行されると共に、動画データ及び当該動画データの取得先の動画ＤＢ４が選択されると、通信部１２が動画ＤＢ４から動画データを取得する（ステップＳ１）。 On the other hand, in the video processing device 2, for example, when the operator starts the character information detection start operation and selects the video data and the video DB 4 from which the video data is acquired, the communication unit 12 starts from the video DB 4. Movie data is acquired (step S1).

続いて、フレーム切り出し部１３が、通信部１２で取得された動画データから所定のフレーム間隔毎の複数の静止画フレームを取得する（ステップＳ２）。 Subsequently, the frame cutout unit 13 acquires a plurality of still image frames at predetermined frame intervals from the moving image data acquired by the communication unit 12 (step S2).

そして、動画処理装置２は、複数の静止画フレームに対して、順次、文字情報検出を実行する（ステップＳ３）。 Then, the moving image processing apparatus 2 sequentially performs character information detection for a plurality of still image frames (step S3).

各静止画フレームの文字情報検出では、先ず、前回の文字情報検出がされた処理対象フレームの有無を判定する（ステップＳ４）。ここで、前回の処理対象フレームがある場合には（ステップＳ４：Ｙｅｓ）、近似処理（ステップＳ５）に移行する。一方、前回の処理対象フレームがない場合には（ステップＳ４：Ｎｏ）、今回の静止画フレームを処理対象フレームとしてシャープ化処理（ステップＳ６）に移行する。この場合、今回の静止画フレームは、次の静止画フレームの文字情報検出（ステップＳ３）の際に前回の処理対象フレームとなる。 In the character information detection of each still image frame, first, it is determined whether or not there is a processing target frame for which the previous character information was detected (step S4). If there is a previous processing target frame (step S4: Yes), the process proceeds to an approximation process (step S5). On the other hand, when there is no previous processing target frame (step S4: No), the process proceeds to the sharpening process (step S6) with the current still image frame as the processing target frame. In this case, the current still image frame becomes the previous processing target frame when the character information of the next still image frame is detected (step S3).

近似処理（ステップＳ５）では、近似判定部１４が、今回の静止画フレームが前回の処理対象フレームに近似するか否かを判定する。ここで、今回の静止画フレームが前回の処理対象フレームに近似する場合には（ステップＳ５：Ｙｅｓ）、今回の静止画フレームを処理対象フレームから除外して、シャープ化処理（ステップＳ６）に移行することなく、次の静止画フレームの文字情報検出（ステップＳ３）に移行する。 In the approximation process (step S5), the approximation determination unit 14 determines whether or not the current still image frame approximates the previous process target frame. If the current still image frame approximates the previous processing target frame (step S5: Yes), the current still image frame is excluded from the processing target frame and the process proceeds to sharpening processing (step S6). Without proceeding, the process proceeds to the character information detection (step S3) of the next still image frame.

一方、今回の静止画フレームが前回の処理対象フレームに近似しない場合には（ステップＳ５：Ｎｏ）、今回の静止画フレームを処理対象フレームとしてシャープ化処理（ステップＳ６）に移行する。この場合、今回の静止画フレームは、次の静止画フレームの文字情報検出（ステップＳ３）の際に前回の処理対象フレームとなる。 On the other hand, if the current still image frame does not approximate the previous processing target frame (step S5: No), the process proceeds to the sharpening process (step S6) using the current still image frame as the processing target frame. In this case, the current still image frame becomes the previous processing target frame when the character information of the next still image frame is detected (step S3).

シャープ化処理（ステップＳ６）では、シャープ化部１５が、処理対象フレームにシャープ化処理を施すことによりエッジ強調フレームを生成する。 In the sharpening process (step S6), the sharpening unit 15 generates an edge-enhanced frame by performing the sharpening process on the processing target frame.

また、二値化処理（ステップＳ７）に移行し、二値化部１６が、エッジ強調フレームの色温度範囲に基づいて複数の閾値を取得すると共に、複数の閾値をそれぞれ用いてエッジ強調フレームを二値化処理して複数の二値画像フレームを生成する。 Further, the process proceeds to binarization processing (step S7), and the binarization unit 16 acquires a plurality of threshold values based on the color temperature range of the edge enhancement frame, and uses the plurality of threshold values to generate an edge enhancement frame. Binarization processing is performed to generate a plurality of binary image frames.

更に、文字認識処理（ステップＳ８）に移行し、文字認識部１７が、複数の二値画像フレームのそれぞれに文字認識処理を行う。そして、文字認識部１７は、複数の二値画像フレームの各文字認識結果を比較し、最適な文字認識結果が得られた二値画像フレームから文字情報を取得する（ステップＳ９）。 Further, the process proceeds to a character recognition process (step S8), and the character recognition unit 17 performs the character recognition process on each of the plurality of binary image frames. Then, the character recognition unit 17 compares the character recognition results of the plurality of binary image frames, and acquires character information from the binary image frame from which the optimum character recognition result is obtained (step S9).

続いて、メタデータ作成（ステップＳ１０）に移行し、メタデータ生成部１８が、文字情報のメタデータを作成する。 Subsequently, the process proceeds to metadata creation (step S10), and the metadata generation unit 18 creates text information metadata.

このようにして動画処理装置２で作成されたメタデータは、通信部１２によってネットワーク３を介して文字情報ＤＢ５にアップロードされる（ステップＳ１１）。文字情報ＤＢ５は、アップロードされたメタデータを、ユーザーが利用しやすいようにソートしておく。 The metadata created by the moving image processing apparatus 2 in this way is uploaded to the character information DB 5 by the communication unit 12 via the network 3 (step S11). The character information DB 5 sorts the uploaded metadata so that the user can easily use it.

本実施形態では、上述のように、動画処理装置２は、動画データから所定のフレーム間隔毎の複数の静止画フレームを切り出すフレーム切り出し部１３と、複数の静止画フレームに対して、前後に連続する静止画フレームの近似判定を順次行い、近似判定において近似と判定された場合には、先行の静止画フレームを処理対象フレームとすると共に、後続の静止画フレームを処理対象フレームから除外する近似判定部１４と、処理対象フレームにシャープ化処理を施してエッジを強調したエッジ強調フレームを生成するシャープ化部１５と、エッジ強調フレームに二値化処理を施して二値画像フレームを生成する二値化部１６と、二値画像フレームに対して文字認識処理を行って文字情報を取得する文字認識部１７と、文字情報と共に、少なくとも、当該文字情報が取得された動画データに関する動画情報と当該文字情報が取得された静止画フレームの静止画情報とを記録したメタデータを文字情報毎に生成するメタデータ生成部１８と、を備えて構成されている。 In the present embodiment, as described above, the moving image processing apparatus 2 continues to the front and rear with respect to the plurality of still image frames and the frame cutout unit 13 that extracts a plurality of still image frames at predetermined frame intervals from the moving image data. Approximate determination of the still image frame to be performed, and if it is determined to be approximate in the approximate determination, the preceding still image frame is set as the processing target frame and the subsequent still image frame is excluded from the processing target frame. Unit 14, a sharpening unit 15 that performs sharpening processing on the processing target frame to generate an edge enhancement frame that emphasizes edges, and a binary that performs binarization processing on the edge enhancement frame to generate a binary image frame The characterizing unit 16, a character recognizing unit 17 that performs character recognition processing on the binary image frame to acquire character information, In addition, a metadata generation unit 18 that generates, for each character information, metadata that records moving image information related to the moving image data from which the character information has been acquired and still image information of the still image frame from which the character information has been acquired. It is prepared for.

このような構成により、本実施形態によれば、動画データに付随して動画情報のメタデータが予め用意されていない場合でも、動画データの内容に関連した文字情報のメタデータを提供することができる。また、動画データに表示される様々な文字情報のメタデータが作成されるため、視聴者は、興味のあるキーワードが何れの動画データの何れのシーン（静止画データ）で表示されるかを迅速に検索することが可能となる。更に、静止画フレームが前回の静止画フレームと近似する場合には、文字認識処理の対象外とすることにより、処理負担を大幅に軽減することが可能である。このように、本発明によれば、動画データに表示される文字情報をより確実に検出すると共に、検出した文字情報の利便性を高めて、動画配信サービスの利用及び普及の向上を図ることが可能となる。 With such a configuration, according to the present embodiment, even when video information metadata is not prepared in advance accompanying the video data, it is possible to provide text information metadata related to the content of the video data. it can. In addition, since metadata of various character information displayed in the moving image data is created, the viewer can quickly determine which scene (still image data) in which moving image data the keyword of interest is displayed. It becomes possible to search. Furthermore, when the still image frame approximates the previous still image frame, the processing burden can be greatly reduced by excluding the character recognition processing target. As described above, according to the present invention, it is possible to more reliably detect the character information displayed in the moving image data, improve the convenience of the detected character information, and improve the use and spread of the moving image distribution service. It becomes possible.

また、本実施形態によれば、動画処理装置２において、二値化部１６は、エッジ強調フレームの色温度範囲を算出すると共に、色温度範囲における所定の閾値間隔毎の複数の色温度のそれぞれに基づいて複数の閾値を取得して、エッジ強調フレームに対して複数の閾値をそれぞれ用いた二値化処理を施して複数の二値画像フレームを生成し、文字認識部１７は、複数の二値画像フレームのそれぞれに対して文字認識処理を行って複数の二値画像フレーム毎に文字情報を含む文字認識結果を得ると共に、各文字認識結果を比較して、最適な文字認識結果が得られた二値画像フレームのみから文字情報を取得し、メタデータ生成部１８は、複数の二値画像フレームの内、最適な文字認識結果が得られた二値画像フレームのみから取得された文字情報に基づいて前記メタデータを生成するように構成される。 Further, according to the present embodiment, in the moving image processing apparatus 2, the binarization unit 16 calculates the color temperature range of the edge enhancement frame, and each of a plurality of color temperatures for each predetermined threshold interval in the color temperature range. A plurality of threshold values are acquired based on the image, and binarization processing using the plurality of threshold values is performed on the edge enhancement frame to generate a plurality of binary image frames. Character recognition processing is performed on each of the value image frames to obtain character recognition results including character information for each of the plurality of binary image frames, and each character recognition result is compared to obtain an optimum character recognition result. The metadata generation unit 18 obtains character information from only the binary image frame from which the optimum character recognition result is obtained from among the plurality of binary image frames. Configured to generate the metadata based on.

このような構成により、動画処理装置２は、最適な閾値で二値化処理した結果から文字情報を抽出することができる。例えば、色温度範囲が同じ静止画フレームであっても、撮影時の照明等の状況により、二値化処理のために設定すべき閾値がそれぞれ異なる場合があるが、このような場合であっても、最適な文字情報を抽出することが可能である。 With such a configuration, the moving image processing apparatus 2 can extract character information from the result of binarization processing with an optimum threshold. For example, even in the case of still image frames with the same color temperature range, the thresholds to be set for the binarization process may differ depending on the lighting conditions at the time of shooting. Also, it is possible to extract optimum character information.

更に、本実施形態によれば、動画処理装置２において、近似判定部１４は、前後に連続する静止画フレームについてＲＧＢ値及び輝度のヒストグラムの変化値を比較し、比較値が所定の近似閾値以上であれば、当該前後に連続する静止画フレームを近似と判定するように構成される。 Further, according to the present embodiment, in the moving image processing apparatus 2, the approximation determination unit 14 compares the change values of the RGB value and the luminance histogram for the consecutive still image frames, and the comparison value is equal to or greater than a predetermined approximation threshold. If so, it is configured to determine that still image frames consecutive in the front and rear are approximate.

このような構成により、近似判定部１４は、近似処理の正確さを維持すると共に、近似処理に係る負担を大幅に軽減することが可能である。 With such a configuration, the approximation determination unit 14 can maintain the accuracy of the approximation process and can significantly reduce the burden on the approximation process.

また、本実施形態によれば、動画処理装置２において、近似判定部１４は、前後に連続する静止画フレームにおいて、四分木空間分割を行い各領域における特異点の変位量の二階微分値にて加速度に換算し、比較することで前後に連続する静止画フレームが近似するか否かを判定するように構成される。 Further, according to the present embodiment, in the moving image processing apparatus 2, the approximate determination unit 14 performs quadtree space division on the continuous still image frames to obtain a second-order differential value of the displacement amount of the singular point in each region. Then, it is configured to determine whether or not the still and successive frames are approximated by converting to acceleration and comparing.

このような構成により、近似判定部１４は、前後に連続する静止画フレームについてより正確に近似を判定することができる。 With such a configuration, the approximation determination unit 14 can determine the approximation more accurately for the still image frames consecutive in the front and rear.

更に、本実施形態によれば、動画処理装置２において、近似判定部１４は、静止画フレームの四分木空間分割を行う際に各特異点の分布及び方向ベクトルを算出し、文字認識部１７は、近似判定部１４で算出された各特異点の分布及び方向ベクトルを、各特異点の分布及び方向ベクトルからなる所定のトレーニングデータと比較することで文字認識処理を行うように構成される。 Further, according to the present embodiment, in the moving image processing apparatus 2, the approximate determination unit 14 calculates the distribution and direction vector of each singular point when performing the quadtree space division of the still image frame, and the character recognition unit 17. Is configured to perform character recognition processing by comparing the distribution and direction vector of each singular point calculated by the approximation determination unit 14 with predetermined training data composed of the distribution and direction vector of each singular point.

このような構成により、近似判定部１４における近似判定の正確性を維持すると共に、文字認識部１７における処理負担を軽減することができる。 With such a configuration, it is possible to maintain the accuracy of the approximation determination in the approximation determination unit 14 and reduce the processing burden on the character recognition unit 17.

また、本実施形態によれば、動画処理装置２において、文字認識部１７は、二値画像フレームの被写体像の特異点の分布及び方向ベクトルを算出すると共に、特異点の分布及び方向ベクトルからなる所定のトレーニングデータと比較することで文字認識処理を行うように構成されている。 Further, according to the present embodiment, in the moving image processing apparatus 2, the character recognition unit 17 calculates the distribution and direction vector of singular points of the subject image of the binary image frame, and includes the distribution and direction vectors of singular points. Character recognition processing is performed by comparing with predetermined training data.

このような構成により、文字認識部１７は、分布及び方向ベクトルからなる特異点を算出するため、文字認識処理で使用するデータ量を削減して処理負担を軽減することができ、また、多数用意する必要があるトレーニングデータについてもデータ量を軽減して、トレーニングデータを記憶するサーバコストを低減することが可能である。 With such a configuration, the character recognition unit 17 calculates a singular point composed of a distribution and a direction vector, so that the amount of data used in the character recognition process can be reduced and the processing load can be reduced. It is also possible to reduce the data amount of the training data that needs to be performed and reduce the server cost for storing the training data.

また、本実施形態によれば、動画処理装置２において、文字認識部１７は、二値画像フレームに対して四分木空間分割を行うことで被写体像の各特異点を算出すると共に、当該特異点の分布及び方向ベクトルを算出するように構成されている。 Further, according to the present embodiment, in the moving image processing device 2, the character recognition unit 17 calculates each singular point of the subject image by performing quadtree space division on the binary image frame, and the singular point. The point distribution and the direction vector are calculated.

このような構成により、文字認識部１７は、分布及び方向ベクトルからなる特異点を四分木空間分割によって算出するため、文字認識処理での処理負担をより軽減することができる。 With such a configuration, the character recognizing unit 17 calculates a singular point composed of a distribution and a direction vector by quadtree space division, so that the processing burden in the character recognition processing can be further reduced.

また、本実施形態によれば、動画処理システム１は、上記したような動画処理装置２と、動画処理装置２のメタデータ生成部１８によって生成された文字情報毎のメタデータを格納する文字情報ＤＢ（データベース）５と、を備えて構成されている。 Further, according to the present embodiment, the moving image processing system 1 includes character information that stores metadata for each character information generated by the moving image processing device 2 and the metadata generation unit 18 of the moving image processing device 2 as described above. DB (database) 5.

このような構成により、動画処理システム１において、視聴者は、興味のあるキーワードを用いて検索することにより、文字情報ＤＢ５からそのキーワードに対応する文字情報のメタデータを取得すると共に、このメタデータに記録された動画情報を読み出して動画データを再生することが可能となる。 With such a configuration, in the moving image processing system 1, the viewer searches using the keyword of interest to acquire the text information metadata corresponding to the keyword from the text information DB 5, and this metadata. It is possible to read the moving image information recorded in the video and reproduce the moving image data.

本実施形態では、文字認識部１７は、文字認識のためのトレーニングデータとして、各フォントの各文字について特異点の分布と方向ベクトルを予め登録しておく構成を説明したが、この構成に限定されない。例えば、他の実施形態では、トレーニングデータを登録するトレーニングデータ用データベースを別途設けて、動画処理装置２が文字認識処理時にこのトレーニングデータ用データベースにアクセスしてトレーニングデータを取得するように構成されてもよい。 In the present embodiment, the configuration has been described in which the character recognition unit 17 pre-registers the distribution of singular points and the direction vector for each character of each font as training data for character recognition. However, the present invention is not limited to this configuration. . For example, in another embodiment, a training data database for registering training data is separately provided, and the moving image processing apparatus 2 is configured to access the training data database and acquire training data during character recognition processing. Also good.

また、文字認識部１７やトレーニングデータ用データベースは、各フォントの各文字についてのトレーニングデータに加えて、デコレーションされた文字についてのトレーニングデータを登録するように構成されてもよい。デコレーションされた文字は、例えば、プレゼンテーション等で利用される文字であって、既定のフォントに比べて文字の輪郭が違う色で表示された文字や、斜体や太文字で形成された文字、白抜きされた文字、シャドーのある文字等がある。 In addition to the training data for each character of each font, the character recognition unit 17 and the training data database may be configured to register training data for the decorated character. Decorated characters are, for example, characters used in presentations, etc. that are displayed in a different color than the default font, characters that are italicized or bold, Characters, shadowed characters, etc.

また、他の実施形態として、動画処理装置２は、静止画フレームにおける特定の人物（特に、著名人）の顔画像を認識すると共に、その人物情報をメタデータとして生成するように構成することもできる。この場合、動画処理装置２は、特定の人物の顔画像の特徴点の分布及び方向ベクトル（この場合では、３Ｄ方向ベクトル）を、予めトレーニングデータとして登録し、トレーニングデータに付随してその特定の人物の人物情報も登録しておく。そして、動画処理装置２は、静止画フレーム（二値画像フレーム）における特徴点の分布及びベクトル方向から顔認識処理を行い、抽出された顔画像が、トレーニングデータに該当するか否かを判定する。抽出された顔画像がトレーニングデータに該当する場合には、そのトレーニングデータに付随する人物情報を、動画データ及び静止画フレームに関連付けたメタデータとして生成し、人物情報データベース（図示せず）に登録する。 As another embodiment, the moving image processing apparatus 2 may be configured to recognize a face image of a specific person (particularly a celebrity) in a still image frame and generate the person information as metadata. it can. In this case, the moving image processing device 2 registers the distribution of the feature points of the face image of a specific person and the direction vector (in this case, the 3D direction vector) as training data in advance, and adds the specific data along with the training data. The person information of the person is also registered. Then, the moving image processing device 2 performs face recognition processing from the distribution of the feature points in the still image frame (binary image frame) and the vector direction, and determines whether or not the extracted face image corresponds to the training data. . If the extracted face image corresponds to training data, person information associated with the training data is generated as metadata associated with moving image data and still image frames, and registered in a person information database (not shown). To do.

同様にして、動画処理装置２は、静止画フレームにおけるランドマーク（特に、著名な建造物）の画像を認識すると共に、そのランドマーク情報をメタデータとして生成するように構成することもできる。 Similarly, the moving image processing apparatus 2 can be configured to recognize an image of a landmark (particularly a famous building) in a still image frame and generate the landmark information as metadata.

更に、他の実施形態として、動画処理装置２は、静止画フレームにおける特定の風景（特に、色彩から想定できる海等の風景）の画像を認識すると共に、その風景情報をメタデータとして生成するように構成することもできる。この場合、動画処理装置２は、特定の風景の色の分布を示すヒストグラムを、予めトレーニングデータとして登録し、トレーニングデータに付随してその特定の風景の風景情報も登録しておく。そして、動画処理装置２は、静止画フレーム（処理対象フレーム）における色の分布のヒストグラムを算出し、そのヒストグラムが、トレーニングデータに該当するか否かを判定する。算出されたヒストグラムがトレーニングデータに該当する場合には、そのトレーニングデータに付随する風景情報を、動画データ及び静止画フレームに関連付けたメタデータとして生成し、風景情報データベース（図示せず）に登録する。 Furthermore, as another embodiment, the moving image processing apparatus 2 recognizes an image of a specific landscape (especially, a landscape such as a sea that can be assumed from colors) in a still image frame, and generates the landscape information as metadata. It can also be configured. In this case, the moving image processing apparatus 2 registers a histogram indicating the color distribution of a specific landscape as training data in advance, and also registers the landscape information of the specific landscape along with the training data. Then, the moving image processing device 2 calculates a histogram of the color distribution in the still image frame (processing target frame), and determines whether or not the histogram corresponds to the training data. When the calculated histogram corresponds to the training data, landscape information associated with the training data is generated as metadata associated with the moving image data and the still image frame, and is registered in the landscape information database (not shown). .

本実施形態では、動画処理装置２が文字認識処理の機能を有する構成を説明したが、この構成に限定されない。例えば、他の実施形態では、動画処理システム１において、動画処理装置２又は他のサーバが、動画処理装置２と同様の文字認識処理の機能を有するプログラムやアプリケーションを提供するように構成されてもよい。 In the present embodiment, the configuration in which the moving image processing apparatus 2 has the function of character recognition processing has been described, but the present invention is not limited to this configuration. For example, in another embodiment, in the moving image processing system 1, the moving image processing device 2 or another server may be configured to provide a program or application having the same character recognition processing function as the moving image processing device 2. Good.

１動画処理システム
２動画処理装置
３ネットワーク
４動画データベース（ＤＢ）
５文字情報データベース（ＤＢ）
６検索エンジン
７視聴者端末
１０制御部
１１記憶部
１２通信部
１３フレーム切り出し部
１４近似判定部
１５シャープ化部
１６二値化部
１７文字認識部
１８メタデータ生成部
1 video processing system 2 video processing device 3 network 4 video database (DB)
5 Character information database (DB)
6 Search Engine 7 Viewer Terminal 10 Control Unit 11 Storage Unit 12 Communication Unit 13 Frame Cutout Unit 14 Approximate Determination Unit 15 Sharpening Unit 16 Binarization Unit 17 Character Recognition Unit 18 Metadata Generation Unit

Claims

A frame cutout unit that cuts out a plurality of still image frames at predetermined frame intervals from video data;
For each of the plurality of still image frames, approximation determination of the still image frames consecutive in the front and rear is sequentially performed, and when it is determined to be approximation in the approximation determination, the preceding still image frame is set as a processing target frame. And an approximate determination unit that excludes the subsequent still image frame from the processing target frame;
A sharpening unit that generates an edge-enhanced frame that enhances an edge by applying a sharpening process to the processing target frame;
A binarization unit that binarizes the edge enhancement frame to generate a binary image frame;
A character recognition unit that obtains character information by performing character recognition processing on the binary image frame;
Along with the character information, at least metadata for recording moving image information related to the moving image data from which the character information has been acquired and still image information of the still image frame from which the character information has been acquired is generated for each character information. A metadata generation unit;
Equipped with a,
The binarization unit calculates a color temperature range of the edge enhancement frame, acquires a plurality of threshold values based on each of a plurality of color temperatures for each predetermined threshold interval in the color temperature range, and converts the edge A binarization process using each of the plurality of thresholds is performed on the emphasized frame to generate a plurality of the binary image frames,
The character recognition unit performs character recognition processing on each of the plurality of binary image frames to obtain a character recognition result including the character information for each of the plurality of binary image frames. In comparison, the character information is obtained only from the binary image frame where the optimum character recognition result was obtained,
The metadata generation unit generates the metadata based on the character information acquired from only the binary image frame from which an optimum character recognition result is obtained among the plurality of binary image frames. A moving image processing apparatus.

The approximate determination unit compares the change value of the histogram of RGB value and luminance for the still image frames that are continuous before and after, and if the comparison value is equal to or greater than a predetermined approximate threshold, The moving image processing apparatus according to claim 1 , wherein the moving image processing apparatus is determined to be approximate.

The approximate determination unit performs quadtree space division on the still image frames that are continuous before and after, converts the acceleration into a second-order differential value of the amount of displacement of the singular point in each region, and compares the accelerations with each other to compare them continuously. 3. The moving image processing apparatus according to claim 1, wherein it is determined whether or not the still image frame is approximate.

The approximate determination unit calculates a distribution and direction vector of each singular point when performing quadtree space division of the still image frame,
The character recognition unit performs the character recognition process by comparing the distribution and direction vector of each singular point calculated by the approximation determination unit with predetermined training data including the distribution and direction vector of each singular point. The moving image processing apparatus according to claim 3 .

The character recognition unit calculates the distribution and direction vector of each singular point of the subject image of the binary image frame, and compares the character recognition process with predetermined training data including the distribution and direction vector of the singular point. moving image processing apparatus according to any one of claims 1 or claim 2, wherein the performing.

The character recognition unit calculates each singular point of the subject image by performing quadtree space division on the binary image frame, and calculates a distribution and a direction vector of the singular point. The moving image processing apparatus according to claim 5 .

The moving image processing apparatus according to any one of claims 1 to 6 ,
A character information database for storing the metadata for each character information generated by the metadata generation unit of the video processing device;
A video processing system comprising: