JP3023461B2

JP3023461B2 - Database device for non-coded information

Info

Publication number: JP3023461B2
Application number: JP5147401A
Authority: JP
Inventors: 好秀中尾
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1993-06-18
Filing date: 1993-06-18
Publication date: 2000-03-21
Anticipated expiration: 2015-03-21
Also published as: JPH0721202A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、図面や絵や動画像など
の文字コード化されない非コード化情報を対象として、
その情報のデータベース化を図ったり、高精度な検索を
可能にしたりする技術に関する。本発明の利用分野とし
ては、任意の言葉で検索できる図面，絵，動画像など
のデータベースシステム、マルチメディアシステム、
言葉で検索できる電子ファイルシステム、テレビ画
像の自動選択録画および検索システム、ビデオ録画画
像の分類および検索システム、インタラクティブムー
ビーなどがある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to non-coded information such as drawings, pictures, moving images, etc., which is not coded.
The present invention relates to a technique for creating a database of the information and enabling a highly accurate search. The field of application of the present invention is a database system for drawings, pictures, moving images, etc., which can be searched with arbitrary words, a multimedia system,
There is an electronic file system that can be searched by words, an automatic selection recording and retrieval system of television images, a classification and retrieval system of video recorded images, an interactive movie, and the like.

【０００２】[0002]

【従来の技術】従来、図面や画像などの非コード化情報
をデータベース化する場合、それぞれに対して単純に異
なった番号を付け、別途に対照表を用意して、データベ
ースを蓄積したり、そのデータベースを検索したりして
いた。あるいは、図面や画像などの非コード化情報を人
間が見て、その非コード化情報が表現している意味（内
容）に最も適切なキーワードや検索情報を付加して分類
し、データベースを蓄積したり、そのデータベースを検
索したりしていた。例えば画像ファイルシステムなどで
は入力したイメージ画像に対して、人間が表題や検索情
報を後から入力していた。いずれにしろ、人為的な判断
が必須となっていた。2. Description of the Related Art Conventionally, when non-coded information such as drawings and images is stored in a database, each is simply assigned a different number, a separate comparison table is prepared, and the database is stored. Or searching the database. Alternatively, humans look at non-coded information such as drawings and images, add the most appropriate keywords and search information to the meaning (contents) represented by the non-coded information, classify them, and accumulate a database. Or searching its database. For example, in an image file system or the like, a human inputs a title and search information later on an input image image. In any case, artificial judgment was essential.

【０００３】[0003]

【発明が解決しようとする課題】図面や絵や動画像など
の非コード化情報は、その情報の特徴を単純に抽出する
ことがきわめてむずかしい。したがって、従来からそれ
らの非コード化情報を分類したり整理したりする場合、
機械的に番号を付け、別途に対照表を用意するか、ある
いは、人間がその非コード化情報を見て適当な表題や検
索情報を付加するなどしていた。これらは、その大部分
の作業が人為的なものであった。機械的に番号を付けた
場合は対照表を用意して検索しなければならず、検索作
業がきわめて大変であった。また、表題や検索情報を付
加する場合も、それらを付加する作業自体が大変なもの
であった。このようにデータベースの作成に多大な手間
がかかる上に、検索情報を人間が付けているために主観
が入り込んでしまい、客観的で的確なデータベース化が
むずかしいという問題を有していた。For non-coded information such as drawings, pictures and moving images, it is extremely difficult to simply extract features of the information. Therefore, when traditionally classifying or organizing such uncoded information,
Either they were numbered mechanically and a separate look-up table was prepared, or humans looked at the uncoded information and added appropriate titles and search information. Most of these operations were artificial. In the case of mechanical numbering, a comparison table had to be prepared and searched, and the search operation was extremely difficult. In addition, when adding a title or search information, the work itself of adding them is also difficult. As described above, it takes a lot of time and effort to create a database, and furthermore, since the search information is attached to humans, the subjectivity enters, and there is a problem that it is difficult to create an objective and accurate database.

【０００４】本発明は、このような事情に鑑みて創案さ
れたものであって、図面や絵や動画像など文字コード化
されていないために本来的にはそのものから特徴を抽出
することがきわめてむずかしい非コード化情報の中にお
いて、あるいはそのような非コード化情報の周辺におい
て、その非コード化情報を説明するために付加されてい
る文字の言語情報に着目し、そのような言語情報を用い
て図面や絵や動画像などの非コード化情報を自動的にデ
ータベース化したり高精度な検索が可能になるようにす
ることを目的とする。換言すれば、従来においてそのデ
ータベース化に莫大な人手作業を必要とした非コード化
情報のデータベース化を自動化し、省力化を図ることを
目的とする。また、従来にあっては、分類やキーワード
付けのなされていない非コード化情報のデータベースの
検索はむずかしいものであったが、その検索を容易化す
ることも目的とする。The present invention has been made in view of such circumstances, and since it is not a character code such as a drawing, a picture, or a moving image, it is originally very difficult to extract features from the character. in among difficult uncoded information, or in the vicinity of such a non-coded information, focused on the language information of the added by which characters for explaining the non-coded information, such language information It is an object of the present invention to automatically create a database of non-coded information such as drawings, pictures, and moving images, and to enable high-precision search. In other words, an object of the present invention is to automate the creation of a database of non-coded information, which conventionally required enormous manual work to create the database, and to save labor. In addition, conventionally, it has been difficult to search a database of non-coded information that has not been classified or assigned a keyword, but it is another object of the present invention to facilitate the search.

【０００５】[0005]

【課題を解決するための手段】本発明に係る非コード化
情報のデータベース化装置は、その最大の特徴が、非コ
ード化情報に含まれる文字情報から抽出したキーワード
に重み付けをし、一定以上に重みのあるキーワードを非
コード化情報部分に付加してデータベース化する点にあ
る。すなわち、非コード化情報を含む文章を読み込み電
子化する読み込み手段と、文字データ部分と非コード化
情報部分を分離する画像処理手段と、非コード化情報部
分の中及び周辺にある文字をコード化する文字認識処理
手段と、そのコード化された文字情報から複数のキーワ
ード候補を抽出する言語処理手段と、各キーワード候補
に対してそれに対応した文字情報が元の非コード化情報
部分に位置的に近いほどまた出現頻度が高いものほど大
きな点数を付与する点数付け処理手段と、点数が基準点
に達しているか否かを判別する点数判別処理手段と、前
記点数判別処理手段で点数が基準点に達していると判別
されたキーワード候補をキーワードとして前記非コード
化情報部分に付加する付加処理手段と、を具備すること
を特徴としている。According to the present invention, there is provided an apparatus for creating a database of non-coded information , the most characteristic of which is that a keyword extracted from character information included in the non-coded information is weighted to a certain level or more. The point is that a weighted keyword is added to the non-coded information portion to create a database. In other words, a sentence containing uncoded information is read and
Reading means to make child, character data part and non-coding
Image processing means for separating an information part, and an uncoded information part
Character recognition processing to encode characters in and around minutes
Means and a plurality of keywords from the encoded character information.
Language processing means for extracting keyword candidates, and keyword candidates
Character information corresponding to the original non-coded information
The closer to the location and the higher the frequency of appearance, the greater
Scoring processing means for assigning points
Score judgment processing means for judging whether or not
It is judged that the score has reached the reference point by the mark number judgment processing means.
The non-code
And additional processing means for adding to the conversion information portion .

【０００６】[0006]

【作用】本発明によれば、コード化した文字情報から抽
出したキーワードに対して点数を付与し、基準点以上の
キーワードを分離した非コード化情報部分に付加してデ
ータベース化してあるから、非コード化情報部分とそれ
を分類・検索するためのキーワードとの関連付けが密接
なものとなる。According to the present invention, a score is assigned to a keyword extracted from coded character information, and a keyword above a reference point is added to a separated non-coded information part to form a database. The association between the coded information part and the keyword for classifying and searching the coded information part becomes close.

【０００７】[0007]

【実施例】以下、本発明に係る実施例を図面に基づいて
詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments according to the present invention will be described below in detail with reference to the drawings.

【０００８】第１実施例この第１実施例は、文書中にある図についてのデータベ
ースの作成に関するものである。図１は第１実施例に係
る非コード化情報のデータベース化方式の構成を示すブ
ロック線図である。 First Embodiment This first embodiment relates to creation of a database for figures in a document. FIG. 1 is a block diagram showing a configuration of a database system for non-coded information according to the first embodiment.

【０００９】非コード化情報である図を含む文書１を画
像データとして読み取り手段であるスキャナ２で読み込
み、文章情報と図の情報とを電子化する。次に、これら
に対して画像処理手段で所要の画像処理３を施すことに
より、文章データ部分４と図データ部分５とに分離す
る。図データ部分５が非コード化情報に相当している。
さらに、文章データ部分４と図データ部分５に対して文
字認識処理手段で文字認識処理６を行い、文章データ部
分４中の文字および図データ部分５中の文字をそれぞれ
コード化する。このコード化によって作成されたのが文
字情報７であるが、その文字情報７に対して言語処理手
段で言語処理８を施すことによって複数のキーワード９
の候補を抽出する。これらのキーワード９の候補は、元
の文書１に対応している。A document 1 containing a figure which is non-coded information is read as image data by a scanner 2 which is a reading means, and the text information and the figure information are digitized. Next, the image data is subjected to required image processing 3 by the image processing means, thereby separating the text data portion 4 and the drawing data portion 5. The figure data portion 5 corresponds to the non-coded information.
Furthermore, statements for text data portion 4 and Figure portion 5
The character recognition processing unit 6 performs the character recognition processing 6 to encode the characters in the text data part 4 and the characters in the figure data part 5, respectively. Although was created by this encoding is character information 7, language processing hand against the character information 7
By performing the language processing 8 in the stage, a plurality of keywords 9
Is extracted. These candidates for the keyword 9 correspond to the original document 1.

【００１０】次いで、抽出した複数のキーワード９の候
補に対して点数付け処理手段で点数付け処理１０を行
う。この点数付け処理１０については次のように行う。
すなわち、図の辺の付近において他の文章とは離れてお
り、しかも、図の辺と平行に記載されている文章から抽
出したキーワードの候補に対しては最も大きな点数を与
える。図中にある文字から抽出したキーワードの候補に
は次に大きな点数を与える。図の周辺に存在する文章か
ら抽出したキーワードの候補に対して次に大きな点数を
与えるが、この場合、図から位置的に近い順に大きな点
数を与える。さらに、出現頻度の高いキーワードの候補
ほど大きな点数を与える。以上のようにして、さまざま
のキーワード９の候補に対してそれぞれ独自の点数が付
与されることになる。[0010] Then, the extracted climate of more keywords 9
The scoring processing means 10 performs scoring processing 10 on the complement . The scoring process 10 is performed as follows.
In other words, the score is given to a keyword candidate extracted from a sentence that is separated from other sentences near the side of the figure and that is described in parallel with the side of the figure. The next largest score is given to the keyword candidates extracted from the characters in the figure. The next largest score is given to a keyword candidate extracted from a sentence existing around the figure. In this case, the largest score is given in the order of position closer to the figure. Furthermore, a keyword with a higher appearance frequency is given a higher score. As described above, a unique score is given to each of various keyword 9 candidates .

【００１１】次に、点数判別処理手段で点数判別処理１
１により各キーワード９の候補に付与されている点数が
基準点に達しているか否かを判断し、基準点以上の点数
をもつキーワード９の候補をキーワード９としてそれに
該当する図データ部分５に対して付加処理手段で付加す
る付加処理１２を施すことによって図のデータベース１
３を作成する。Next, a point discriminating process 1 is performed by the point discriminating processing means.
It is determined whether or not the score assigned to each candidate for the keyword 9 has reached the reference point according to 1, and the candidate for the keyword 9 having the score equal to or more than the reference point is determined as the keyword 9 for the corresponding figure data portion 5. Figure database by performing additional processing 12 for adding an additional processor Te 1
Create 3.

【００１２】以上のようにして作成された図のデータベ
ース１３を分類する場合には上記のキーワード９に従っ
て分類する。また、そのデータベース１３を検索する場
合には、検索者が検索したい図に関連した言葉を入力
し、入力した言葉に合致するキーワード９またはそれに
近いキーワード９を付加されている図データ部分５を読
み出す。この検索の場合において、付与されている点数
が高いキーワード９から順に検索するのである。When classifying the database 13 of the diagram created as described above, the classification is performed according to the keyword 9 described above. When searching the database 13, the searcher inputs words related to the figure to be searched, and reads out the figure data portion 5 to which a keyword 9 matching the input word or a keyword 9 close thereto is added. . In the case of this search, the search is performed in order from the keyword 9 having the highest assigned score.

【００１３】第２実施例この第２実施例は、動画像についてのデータベースの作
成に関するものである。図２は第２実施例に係る非コー
ド化情報の検索方式の構成を示すブロック線図である。 Second Embodiment This second embodiment relates to the creation of a database for moving images. FIG. 2 is a block diagram showing a configuration of a non-coded information search method according to the second embodiment.

【００１４】非コード化情報の顕著な例である音声を伴
う動画像２１に対して分割処理２２を施すことにより、
動画像２１を任意の一定時間またはシーンごとに分割
し、分割動画像２３を得る。次に、分割動画像２３が伴
っている音声に対して音声認識処理２４を施し、その分
割動画像２３の中で話されている会話や音声の内容を認
識する。この音声認識処理２４によって音声情報２５が
得られるが、この音声情報２５に対して言語処理２６を
施すことによって複数のキーワード２７を抽出する。こ
れらのキーワード２７は、分割動画像２３に対応してい
る。By performing a division process 22 on a moving image 21 with sound, which is a prominent example of non-coded information,
The moving image 21 is divided for an arbitrary fixed time or for each scene, and a divided moving image 23 is obtained. Next, voice recognition processing 24 is performed on the voice accompanied by the divided moving image 23, and the conversation and the contents of the voice spoken in the divided moving image 23 are recognized. Voice information 25 is obtained by the voice recognition processing 24, and a plurality of keywords 27 are extracted by performing language processing 26 on the voice information 25. These keywords 27 correspond to the divided moving images 23.

【００１５】次いで、抽出した複数のキーワード２７に
対して点数付け処理２８を施す。この点数付け処理２８
は、出現頻度の高いキーワードほど大きな点数を与える
ものとする。これにより、各キーワード２７に対してそ
れぞれ独自の点数が付与されることになる。Next, a scoring process 28 is performed on the plurality of extracted keywords 27. This scoring process 28
, A keyword with a higher appearance frequency is given a higher score. As a result, a unique score is assigned to each keyword 27.

【００１６】次に、点数判別処理２９により各キーワー
ド２７に付与されている点数が基準点に達しているか否
かを判断し、基準点以上の点数をもつキーワード２７を
それに該当する分割動画像２３またはその分割動画像２
３に付随して音声が記録されている部分に対して付加す
る付加処理３０を施すことによって動画像のデータベー
ス３１を作成する。Next, it is determined by a score determination process 29 whether or not the score assigned to each keyword 27 has reached a reference point, and the keyword 27 having a score equal to or greater than the reference point is assigned to the corresponding divided moving image 23. Or its divided video 2
A moving image database 31 is created by performing an additional process 30 for adding to a portion in which sound is recorded in addition to the portion 3.

【００１７】以上のようにして作成された動画像のデー
タベース３１を分類する場合には上記のキーワード２７
に従って分類する。また、そのデータベース３１を検索
する場合には、検索者が検索したい動画像に関連した言
葉を入力し、入力した言葉に合致するキーワード２７ま
たはそれに近いキーワード２７を付加されている分割動
画像２３を読み出す。この検索の場合において、付与さ
れている点数が高いキーワード２７から順に検索するの
である。In order to classify the moving picture database 31 created as described above, the above-described keyword 27 is used.
Classify according to. When searching the database 31, the searcher inputs words related to the moving image to be searched, and the divided moving image 23 to which the keyword 27 matching the input word or the keyword 27 close thereto is added. read out. In the case of this search, the search is performed sequentially from the keyword 27 having the highest assigned score.

【００１８】以上説明した第１実施例および第２実施例
によれば、次の利点がある。According to the first and second embodiments described above, there are the following advantages.

【００１９】データベースの作成を完全自動化する
ことにより大幅な省力化を図ることができる。By completely automating the creation of the database, it is possible to significantly reduce labor.

【００２０】キーワードの付加を自動的に行うた
め、人間の主観が入り込む余地がなく、キーワードのバ
ラツキのない均質で精度の高いデータベース化が可能と
なっている。Since the keyword is automatically added, there is no room for human subjectivity to enter, and a uniform and highly accurate database without variation in keywords can be created.

【００２１】キーワードの付加を自動的に負うた
め、人手作業の場合に比べて大幅な高速化を図ることが
できる。Since the keyword is automatically added, the speed can be greatly increased as compared with the case of manual operation.

【００２２】人間では分からないか判断しにくいよ
うな情報に対しても、客観的なキーワードを付加するこ
とができる。An objective keyword can be added to information that is difficult for a human to understand or judge.

【００２３】前後に存在する多くの言語情報を用い
ることにより、幅広いキーワードが付加でき、漏れのな
い分類や検索が可能になる。By using a lot of linguistic information existing before and after, a wide range of keywords can be added, and classification and search without omission can be performed.

【００２４】第３実施例この第３実施例はマルチメディアデータベースの検索に
関するものである。具体的には、ＣＤ−ＲＯＭのように
すでに作られており、読み出しはできるが書き込みがで
きないデータベースに示した有効な検索方式である。Ｃ
Ｄ−ＲＯＭ等に記録されたマルチメディアデータベース
の中から特定の図面や画像や動画像の一部を検索するシ
ステムである。図３は第３実施例に係る非コード化情報
の検索方式の構成を示すブロック線図である。 Third Embodiment This third embodiment relates to a search of a multimedia database. More specifically, this is an effective search method shown in a database that has already been created like a CD-ROM and can be read but cannot be written. C
This is a system for searching a multimedia database recorded in a D-ROM or the like for a part of a specific drawing, image, or moving image. FIG. 3 is a block diagram showing a configuration of a non-coded information search method according to the third embodiment.

【００２５】検索者４１は自分が欲する情報に関連した
言葉や文章を自然言語４２としてシステムに入力する。
システムは、入力された自然言語４２に対して言語処理
４３を施すことにより、その自然言語４２の中から適切
な複数のキーワード４４を自動的に抽出する。そして、
そのキーワード４４を用いてまずＣＤ−ＲＯＭ等のマル
チメディアデータベース４５の中のコード化された文字
情報４５ａを検索し、キーワード４４と同一または同様
の意味もしくは近い意味の文字データを抽出する。The searcher 41 inputs words and sentences related to the information he or she wants as a natural language 42 into the system.
The system automatically performs a plurality of appropriate keywords 44 from the natural language 42 by performing the language processing 43 on the input natural language 42. And
Using the keyword 44, the coded character information 45a is first searched in a multimedia database 45 such as a CD-ROM, and character data having the same, similar, or similar meaning as the keyword 44 is extracted.

【００２６】次に、システムは、抽出した文字データに
対して位置的に近い部位にある画像情報４５ｂを近い順
に検索する。あるいは、抽出した文字データに対して時
間的に近い部位にある動画像を一定シーン切り出して抽
出する。切り出すシーンの長さは検索者４１の指示によ
り任意に設定でき、また、抽出後も任意に変更できる。
この図面や画像や動画像の抽出においては、抽出されて
きたものとキーワード４４とのマッチング処理４６を行
い、一定の度合い以上にマッチングしておれば、画像サ
ーチ処理４７を行って抽出すべき対象を絞り込んでい
き、最終的に画像抽出処理４８により抽出対象を検索者
４１に提供する。Next, the system searches for the image information 45b located in a position close to the position of the extracted character data in the order of the closest. Alternatively, a moving image at a portion temporally closer to the extracted character data is cut out and extracted in a fixed scene. The length of a scene to be cut out can be arbitrarily set according to an instruction of the searcher 41, and can be arbitrarily changed after extraction.
In the extraction of drawings, images, and moving images, a matching process 46 between the extracted one and the keyword 44 is performed, and if the matching is performed to a certain degree or more, an image search process 47 is performed to extract an object to be extracted. Are finally narrowed down, and the extraction target is finally provided to the searcher 41 by the image extraction processing 48.

【００２７】検索者４１は、以上の処理によって得られ
た情報が満足できないものであったり、あるいはさらに
詳細な情報を得たい場合には、システムに対して次のス
テップの指示をする。システムは、画像認識処理４９に
よってデータベース４５中の画像情報４５ｂを分析し文
字認識によりイメージ状の文字を文字コードに変換す
る。また、音声認識処理５０によってデータベース４５
中の音声情報４５ｃを分析し音声認識により音声を文字
コードに変換する。そして、この新たに生成された文字
コードの少なくともいずれか一方に基づいてデータベー
ス４５を前記と同様に検索し、関連する文字データに対
して位置的もしくは時間的に近い部位にある画像情報４
５ｂや動画像の一部のシーンを抽出し、前述と同様の手
順を経て検索者４１に提供する。If the information obtained by the above processing is not satisfactory or if more detailed information is desired, the searcher 41 instructs the system to perform the next step. The system analyzes the image information 45b in the database 45 by the image recognition processing 49, and converts image-like characters into character codes by character recognition. Further, the database 45 is executed by the voice recognition process 50.
The voice information 45c is analyzed and the voice is converted into a character code by voice recognition. Then, based on at least one of the newly generated character codes, the database 45 is searched in the same manner as described above, and the image information 4 located at a position that is close in position or time to related character data is read.
5b and a part of the scene of the moving image are extracted and provided to the searcher 41 through the same procedure as described above.

【００２８】この第３実施例の検索方式によれば、ＣＤ
−ＲＯＭだけでなく、分類やキーワード付けのなされて
いないどのようなデータベースからでも検索者が必要と
する任意の非コード化情報を高速かつ的確に検索するこ
とができる。According to the search method of the third embodiment, the CD
Fast and accurate retrieval of any non-coded information required by the searcher from any database not classified or keyworded, not just ROM.

【００２９】第４実施例この第４実施例はインタラクティブムービーの作成およ
び検索に係るものである。図４はインタラクティブムー
ビーのデータベースの概念図、図５はキーワード画像対
照表を示す図である。データベース５１は、画像データ
記録部５２とキーワード画像対照表５３とを有してい
る。画像データ記録部５２には、画像データ番号が割り
当てられた複数の画像データ５２ａ，５２ｂ……５２ｉ
…が記録されている。キーワード画像対照表５３におい
ては、キーワードと画像データ番号との対応だけでな
く、次に接続して行うべき抽出対象となるキーワードと
の対応を付けたキーワードリンクが確保されている。 Fourth Embodiment This fourth embodiment relates to creation and retrieval of an interactive movie. FIG. 4 is a conceptual diagram of an interactive movie database, and FIG. 5 is a diagram showing a keyword image comparison table. The database 51 has an image data recording unit 52 and a keyword image comparison table 53. The image data recording unit 52 has a plurality of image data 52a, 52b,.
… Is recorded. In the keyword image comparison table 53, not only the correspondence between the keyword and the image data number, but also a keyword link that associates with the keyword to be extracted and to be extracted next.

【００３０】この方式は第２実施例とよく似ている。ま
ず最初に、動画像情報とともに同時記録されている音声
情報の分析を行って、ナレーションや会話などの音声部
分を抽出し、音声認識処理によって文字コードに変換す
る。次いで、変換した文字コード列に言語処理を施して
キーワードを抽出する。次に、動画像情報に画像処理を
行い、画面内に文字情報の記録されている動画像を切り
出す。そして、１つの文字情報に関して、文字情報それ
ぞれについて最も鮮明に文字情報が記録されている画像
を１枚ずつ自動的に抽出する。今度は、その画像に対し
て文字認識処理を行い、画面内の文字を認識し文字コー
ドに変換する。次いで、上記と同様に変換した文字コー
ド列に言語処理を施してキーワードを抽出する。以上に
より、音声情報に基づいたキーワードと画像情報に基づ
いたキーワードとが抽出されたことになる。This method is very similar to the second embodiment. First, the voice information recorded simultaneously with the moving image information is analyzed to extract voice parts such as narrations and conversations, and are converted into character codes by voice recognition processing. Next, language processing is performed on the converted character code string to extract keywords. Next, image processing is performed on the moving image information, and a moving image in which character information is recorded on the screen is cut out. Then, for one piece of character information, an image in which the character information is recorded most clearly for each character information is automatically extracted one by one. Next, character recognition processing is performed on the image to recognize characters in the screen and convert them to character codes. Next, language processing is performed on the character code string converted in the same manner as described above to extract a keyword. As described above, a keyword based on audio information and a keyword based on image information are extracted.

【００３１】動画像情報の中の音声および文字からキー
ワードを抽出した部位に対して検索用の識別信号を記録
し、データベース５１において動画像情報の前部（また
は後部）に、キーワードと識別信号（画像データ番号）
との組み合わせよりなるキーワード画像対照表５３を付
加する。An identification signal for searching is recorded for a portion where a keyword is extracted from voices and characters in the moving image information, and the keyword and the identification signal (before or after the moving image information in the database 51). Image data number)
And a keyword image comparison table 53 composed of a combination of

【００３２】インタラクティブムービーを作成する場合
には、このようにして作成した１つまたは複数の動画像
情報を組み合わせて作成する。ユーザーは、そのときに
見たいストーリーや場面を自然言語で入力する。する
と、システムが言語処理を行ってキーワードを抽出し、
そのキーワードと同じまたは同様の意味もしくは近い意
味をもつキーワードをキーワード画像対照表５３より選
択する。その結果、ユーザーが求める内容に最も近い動
画像情報が抽出され、その動画像情報の先頭または必要
部位からの再生が開始される。When an interactive movie is created, one or a plurality of pieces of moving image information created in this way are combined and created. The user inputs the story or scene that he / she wants to see at that time in a natural language. Then, the system performs linguistic processing to extract keywords,
A keyword having the same, similar, or similar meaning as the keyword is selected from the keyword image comparison table 53. As a result, the moving image information closest to the content desired by the user is extracted, and the reproduction of the moving image information from the head or the necessary part is started.

【００３３】このようにユーザーがインタラクティブに
自分の見たい内容の動画像情報を抽出することができ
る。さらに、ユーザーが複数のキーワードを順次に入力
することにより、複数の動画像を接続して任意のストー
リーのムービーを作ることができる。As described above, the user can interactively extract the moving image information of the content he wants to see. Further, by inputting a plurality of keywords sequentially by the user, a movie of an arbitrary story can be created by connecting a plurality of moving images.

【００３４】また、動画像情報から抽出した複数のキー
ワード相互間に、キーワードリンクという多様な関連付
けを行うことにより複雑なインタラクティブムービーの
作成も可能となる。Further, by making various associations called keyword links among a plurality of keywords extracted from the moving image information, it is possible to create a complicated interactive movie.

【００３５】[0035]

【発明の効果】本発明によれば、非コード化情報のデー
タベース化が自動的に行われるから、ほぼ全般的に人為
的作業に頼っていた従来方式に比べて大幅な省力化を図
ることができ、また、高精度な検索も可能になった。According to the present invention , since the non-coded information is automatically converted into a database, it is possible to largely reduce labor as compared with the conventional method which generally relies on artificial work. Yes, and high-precision search became possible.

[Brief description of the drawings]

【図１】本発明の第１実施例に係る非コード化情報のデ
ータベース化方式の構成を示すブロック線図である。FIG. 1 is a block diagram showing a configuration of a database system for non-coded information according to a first embodiment of the present invention.

【図２】本発明の第２実施例に係る非コード化情報のデ
ータベース化方式の構成を示すブロック線図である。FIG. 2 is a block diagram showing a configuration of a database system for non-coded information according to a second embodiment of the present invention.

【図３】本発明の第３実施例に係る非コード化情報の検
索方式の構成を示すブロック線図である。FIG. 3 is a block diagram showing a configuration of an uncoded information search method according to a third embodiment of the present invention.

【図４】本発明の第４実施例に係るインタラクティブム
ービーのデータベースの概念図である。FIG. 4 is a conceptual diagram of an interactive movie database according to a fourth embodiment of the present invention.

【図５】第４実施例におけるキーワード画像対照表を示
す図である。FIG. 5 is a diagram showing a keyword image comparison table in a fourth embodiment.

[Explanation of symbols]

１…図を含む文書、２…スキャナ、３…画像処理、４…
文章データ部分、５…図データ部分、６…文字認識処
理、７…文字情報、８…言語処理、９…キーワード、１
０…点数付け処理、１１…点数判別処理、１２…付加処
理、１３…図のデータベース、２１…動画像、２２…分
割処理、２３…分割動画像、２４…音声認識処理、２５
…音声情報、２６…言語処理、２７…キーワード、２８
…点数付け処理、２９…点数判別処理、３０…付加処
理、３１…動画像のデータベース、４１…検索者、４２
…自然言語、４３…言語処理、４４…キーワード、４５
…マルチメディアデータベース、４５ａ…文字情報、４
５ｂ…画像情報、４５ｃ…音声情報、４６…マッチング
処理、４７…画像サーチ処理、４８…画像抽出処理、４
９…画像認識処理、５０…音声認識処理、５１…インタ
ラクティブムービーデータベース、５２…画像データ記
録部、５３…キーワード画像対照表1 ... Documents containing figures, 2 ... Scanners, 3 ... Image processing, 4 ...
Sentence data part, 5 ... Figure data part, 6 ... Character recognition processing, 7 ... Character information, 8 ... Language processing, 9 ... Keyword, 1
0: scoring processing, 11: score discrimination processing, 12: addition processing, 13: database of figures, 21: moving image, 22: division processing, 23: divided moving image, 24: voice recognition processing, 25
... voice information, 26 ... language processing, 27 ... keywords, 28
... Scoring processing, 29 ... Scoring determination processing, 30 ... Addition processing, 31 ... Video database, 41 ... Searcher, 42
... natural language, 43 ... language processing, 44 ... keywords, 45
... Multimedia database, 45a ... Character information, 4
5b image information, 45c audio information, 46 matching processing, 47 image search processing, 48 image extraction processing, 4
9: Image recognition processing, 50: Voice recognition processing, 51: Interactive movie database, 52: Image data recording unit, 53: Keyword image comparison table

Claims

(57) [Claims]

1. A method for reading a sentence including non-coded information,
Reading means for converting the image into a child data ,
Encoding and management means, the characters in the periphery and in the non-coded information part
Character recognition processing means, and a plurality of keyword candidates from the coded character information.
Language processing means for extracting keyword information and character information corresponding to each keyword candidate
Nearer to the non-coded information part
Scoring processing means that gives a higher score as the score is higher
And a score determination process for determining whether the score has reached a reference point.
And management means, when the number in the point number judgment processing means has reached the reference point determine
Using the separated keyword candidate as a keyword,
Database apparatus of a non-coded information, characterized by comprising the additional processing means for adding the de-information portion.