JPH0478965A

JPH0478965A - Automatic multi-media document data generation system

Info

Publication number: JPH0478965A
Application number: JP19256690A
Authority: JP
Inventors: Akihiko Sato; 昭彦佐藤; Rie Kashiwa; 柏　理恵
Original assignee: TOHOKU NIPPON DENKI SOFTWARE KK; NEC Corp; NEC Software Tohoku Ltd
Current assignee: TOHOKU NIPPON DENKI SOFTWARE KK; NEC Corp; NEC Solution Innovators Ltd
Priority date: 1990-07-20
Filing date: 1990-07-20
Publication date: 1992-03-12

Abstract

PURPOSE:To automatically generate the multi-media document data, based on an input document by providing an input part, a document information analyzing part, a multi-media document data generating part and an output means. CONSTITUTION:When information of a document is inputted from a first input means 11, in a document information analyzing part 4, an area dividing means 13 recognizes an arrangement of a sentence and a chart on the document and divides the document. Subsequently, when the kind of media of data expressed at every one area, divided by a second input means 12 is inputted, a format analyzing means 14 analyzes a format at the time of printing the document with respect to designated media, and a data analyzing means 15 recognizes the contents of the data in accordance with an encoded rule of every media. Next, in a multi-media document data generating part 5, a document structure constituting means 16 constructs recognized document information in accordance with a structure system of a target document, and a document format data generating means 17 expresses it by a format of the target multi-media document. In such a way, the multimedia document data can be generated automatically.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明はマルチメディア文書データ自動作成方式に関し
、特に印刷された紙面から目的とするある構造化された
マルチメディア文書データ自動作成方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for automatically creating multimedia document data, and more particularly to a method for automatically creating a certain structured multimedia document data from printed paper.

[Conventional technology]

複数のメディアを含み印刷された文書からデータを読み
込む際、メディア毎の区画の切り出しは利用者側が行っ
ていた。また、文書データを作成する時は、データ内容
に関しては自動読み取りが行われているが、書式や位置
などの文書情報は何らかの手段で指定しており、入力し
た印刷文書を文書子−夕として構築することはてきなか
った。When reading data from a printed document containing multiple media, the user is responsible for cutting out sections for each media. Also, when creating document data, the data content is automatically read, but document information such as format and position is specified by some means, and the input print document is constructed as a document child. I couldn't do it.

〔発明か解決しようとする課題− 上述した従来の文書作成方式では、マルチメディアの文
書データを入力する時はメディア毎に領域を切り出さね
ばならす、しかも切り出した位置や書式なとの文書情報
は利用者か指定しな（チれはならないのて、処理か繁雑
になるという欠点かある。また、ある構造化されたマル
チメディア文書のデータを新規に作成するためには、目
的とする文書構造に関する知識か必要となり、データ作
成が困雛であるという欠点かある。[Problem to be solved by the invention - In the conventional document creation method described above, when inputting multimedia document data, it is necessary to cut out an area for each media, and document information such as the cut position and format cannot be used. (Although there should be no errors, the disadvantage is that the processing becomes complicated.Also, in order to create new data for a certain structured multimedia document, it is necessary to specify the The drawback is that it requires some knowledge and is difficult to create data.

本発明の目的は、上述の点に鑑み、マルチメディアで表
現された文書についてその情報を自動的に読込み、また
その情報の内容を、ある構造化されたマルチメディア文
書の表現形式で表現することにより、利用者の負担を軽
減してマルチメディア文書データを提供することにある
。In view of the above-mentioned points, an object of the present invention is to automatically read the information of a document expressed in multimedia, and to express the content of the information in a certain structured multimedia document expression format. The objective is to provide multimedia document data while reducing the burden on users.

[Means to solve the problem]

本発明のマルチメディア文書データ自動作成方式は、文
書の情報を読み取る第一入力手段と、前記第一入力手段
によって入力される文書上の文や図表の配置を認識して
書面を区分けする領域分割手段と、前記領域分割手段で区分けした一領域毎に表現されたデ
ータのメディアの種類を入力する第二入力手段と、前記第二入力手段により指定されたメディアに対し、文
書印刷時の書式を解析する書式解析手段と、前記第二入力手段により決定したメディアに対し、メデ
ィア毎のコード化規則に従ってデータの内容を認識する
データ解析手段と、前記データ解析手段により認識した文書情報を、目的と
する文書の構造体系に従って構築する文書構造構成手段
と、前記文書構造構成手段で構築した文書を、目的とするマ
ルチメディア文書の表現形式で表現する文書形式データ
生成手段とを備えて構成される。The automatic multimedia document data creation method of the present invention includes a first input means for reading document information, and an area division for dividing the document by recognizing the arrangement of sentences and charts on the document inputted by the first input means. means, a second input means for inputting the type of media of data expressed for each area divided by the area dividing means, and a format for printing a document for the medium specified by the second input means; a format analysis means for analyzing the data; a data analysis means for recognizing the content of the data according to encoding rules for each medium for the medium determined by the second input means; A document structure configuring means for constructing a document according to a structure system of a document to be created; and a document format data generating means for expressing the document constructed by the document structure constructing means in an expression format of a target multimedia document.

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は、本発明の一実施例の構成を示すブロック図で
ある。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

本実施例のマルチメディア文書データ自動−作成方式は
、第一入力装置１と、第二入力手段２と、出力装置７と
に接続されており、入力部３と文書情報解析部４とマル
チメディア文書データ生成部５と出力手段６とを含んで
構成されている。In the multimedia document data automatic creation method of this embodiment, a first input device 1, a second input means 2, and an output device 7 are connected, and the input section 3, the document information analysis section 4, and the multimedia It is configured to include a document data generation section 5 and an output means 6.

入力部３は、第一入力手段１１と、第二入力手段１２と
を含んで構成されている。文書情報解析部４は、領域分
割手段１３と、書式解析手段１４と、データ解析手段１
５とを含んで構成されている。また、マルチメディア文
書データ生成部５は、文書構造構成手段１６と、文書形
式データ生成手段１７とを含んで構成されている。The input section 3 includes a first input means 11 and a second input means 12. The document information analysis section 4 includes an area division means 13, a format analysis means 14, and a data analysis means 1.
5. Further, the multimedia document data generation section 5 includes a document structure configuration means 16 and a document format data generation means 17.

第一入力装置１は、マルチメディア文書を生成する基と
なる文書を入力する。このとき入力する文書は一般的に
文字とイメージのメディアで表された情報を含んている
。第一入力装置１は、例えば０ＣＲ（光学文字読み取り
装置）を含んて構成される。The first input device 1 inputs a document from which a multimedia document is generated. The document input at this time generally includes information expressed in text and image media. The first input device 1 includes, for example, an OCR (optical character reader).

第二入力装置２は分割した区画それぞれの領域について
、文書情報を表わすメディアの種類（文字またはイメー
ジ）を入力する。第二入力装置２は例えはＣＲＴ装置で
構成される。The second input device 2 inputs the type of media (characters or images) representing document information for each divided area. The second input device 2 is composed of, for example, a CRT device.

出力装置７は、生成したマルチメディア文書データを出
力する。この出力装置７は例えは磁気ディスク装置で構
成される。The output device 7 outputs the generated multimedia document data. This output device 7 is composed of, for example, a magnetic disk device.

第一入力手段１１は、第一入力装置１より文書のデータ
や情報を入力する。The first input means 11 inputs document data and information from the first input device 1 .

文書情報解析部４は、第一入力手段１１よつ受は取った
文書情報について区分けを行い、区分けした各領域毎に
解析処理を行う。The document information analysis section 4 classifies the document information received by the first input means 11, and performs an analysis process for each of the divided areas.

文書情報解析部４において、領域分割手段１３は第一入
力手段１１より受は取った文書の情報を区分けし、区分
けした各領域の内容が第一メディアとなるように書面を
分割する。また分割した書面の位置関係を認識する処理
を行う。In the document information analysis section 4, the area dividing means 13 divides the document information received from the first input means 11, and divides the document so that the content of each divided area becomes the first medium. It also performs processing to recognize the positional relationship of divided documents.

第二入力手段１２は、領域分割手段１３て分割しな各領
域のメチイアの種別を第二入力装置２によつ入力する。The second input means 12 inputs, through the second input device 2, the type of mesh of each region to be divided by the region division means 13.

書式解析手段〕４は、第二入力手段］２で入力されたメ
チイアの種類に対応するように書式に関する情報を解析
する処理を行う。The format analysis means]4 performs a process of analyzing the information regarding the format so as to correspond to the type of mechia inputted by the second input means]2.

データ解析手段１５は領域分割手段１３で区分けされた
領域毎に、第二入力手段１２で指定されたメチイアデー
タとして、情報を読み取る処理を行う６マルチメディア文書データ生成部５は、文書情報解析部
４で解析した情報を基に、マルチメディア文書の構造を
組み立て、目的とする形式のデータを生成する。マルチ
メディア文書データ生成部５において、文書構造構成手
段１６は文書情報解析部４で解析した結果を基にして目
的とする文書の体系に従った文書のｍ造や内容の情報を
決定する。文書形式データ生成手段１７は文書構造構成
手段１６で決定した文書構造を、目的とする文書の規則
に従った形式でデータ生成する処理を行つ。The data analysis means 15 performs a process of reading information as mechia data specified by the second input means 12 for each area divided by the area division means 13 6 Based on the information analyzed, the structure of the multimedia document is assembled and data in the desired format is generated. In the multimedia document data generation section 5, the document structure construction means 16 determines the structure and content information of the document according to the target document system based on the results of the analysis performed by the document information analysis section 4. The document format data generation means 17 performs a process of generating data of the document structure determined by the document structure construction means 16 in a format that conforms to the rules of the target document.

出力手段６は、マルチメディア文書データ生成部５て゛
生成したデータを出力装置７に出力する４次に、このよ
うに構成された本実施例のマルチメディア文書データ自
動作成方式の動作について説明する。The output means 6 outputs the data generated by the multimedia document data generation section 5 to the output device 7.Next, the operation of the multimedia document data automatic creation system of this embodiment configured as described above will be explained.

マルチメディア文書データ自動作成方式か起動されると
、まず第一入力手段１１か第一入力装置１から文書の情
報を入力する。そして文書情報解析部４へ入力した情報
を伝える。文書情報解析部４において領域分割手段１３
は、文書紙面の空白部分や規則性のあるデータのまとま
り等を調へて、文書を、メディアが混在しない、いくつ
かの領域に分割し、各領域の位置関係を認識する。When the multimedia document data automatic creation method is activated, document information is first inputted from the first input means 11 or the first input device 1. Then, the input information is transmitted to the document information analysis section 4. In the document information analysis section 4, area division means 13
divides the document into several areas where media do not mix by examining blank areas and regular data clusters on the document surface, and recognizes the positional relationship of each area.

第２図は本発明の一実施例の入力文書の例を示す説明図
である。第２図のような文書が入力されると、文書枠内
の左上は、文字列より成る文書てあり、枠内の右上は画
像イメージであり、枠内の下は、文字の中心とする表型
式となり、領域分割手段１３は第３図のように領域を分
割し、各領域の位置関係を記憶する。FIG. 2 is an explanatory diagram showing an example of an input document according to an embodiment of the present invention. When a document like the one shown in Figure 2 is input, the upper left of the document frame is a document consisting of character strings, the upper right of the frame is an image, and the lower part of the frame is a table with the center of the text. The area dividing means 13 divides the area as shown in FIG. 3 and stores the positional relationship of each area.

次に、第二入力手段１２では領域分割手段１３て決定し
た各領域のデータをどのメディアとして処理するのか最
も適当であるかという情報を第二入力装置２より入力し
、各領域のメディアを決定する。第２図のデータでは領
域３１が文字、領域３２かイメージ、領域３３が文字、
と決定する。Next, the second input means 12 inputs from the second input device 2 information as to which media is most appropriate for processing the data of each region determined by the region dividing means 13, and determines the media of each region. do. In the data in Figure 2, area 31 is text, area 32 is an image, area 33 is text,
I decide.

続いて書式解析手段１４は、領域分割手段］３て決定し
た各領域のデータを、第二入力手段１２で指定されたメ
ディアで表現するときの書式に関する情報を入力文書か
ら得る。書式解析手段１４で得る情報とは、例えば文字
に関しては文字の書体や間隔、網かけ等の情報であり、
イメージに関しては画素の間隔や密度等の情報である。Next, the format analysis means 14 obtains information regarding the format for expressing the data of each region determined by the region dividing means 3 in the media designated by the second input means 12 from the input document. The information obtained by the format analysis means 14 is, for example, information on the font, spacing, shading, etc. of characters.
Regarding images, this is information such as pixel spacing and density.

そして、データ解析手段１５は領域分割手段１３で決定
した各領域のデータを、第二入力手段１２で指定された
メディアで表現する時のデータ内容（内容コード）を読
み取る。第２図において領域２１のデータ内容は「事務
文書とは〜」という一連の文章の内容である。Then, the data analysis means 15 reads the data content (content code) when the data of each region determined by the region division means 13 is expressed on the media designated by the second input means 12. In FIG. 2, the data content in area 21 is the content of a series of sentences such as "What is an office document?".

文書情報解析部４で解析した情報は、次にマルチメディ
ア文書データ生成部５に渡される。そして、マルチメデ
ィア文書データ生成部５において、ます、文書構造構成
手段〕６は文書情報解析部４で得られた情報を、目的と
する文書の体系に従った情報として整理する。例えば第
２１′２Ｉの文書を第４図のような構造で認識する。す
なわち、文書割付は根４１で１ページの中の割付けを、
単純ページ集合４２で各ページを、ページ４３でページ
順番を、枠４４で表やく画像）イメージの区画４５Ａ〜
４５Ｃをそれぞれ認識する。The information analyzed by the document information analysis section 4 is then passed to the multimedia document data generation section 5. Then, in the multimedia document data generation section 5, the document structure construction means]6 organizes the information obtained by the document information analysis section 4 as information according to the structure of the target document. For example, document No. 21'2I is recognized with a structure as shown in FIG. In other words, the document layout is the layout within one page at the root 41,
Image section 45A~ in which each page is displayed in a simple page set 42, the page order is displayed in a page 43, and the frame 44 is displayed.
45C respectively.

続いて文書形式データ生成手＆１７は、文書構造構成手
段１６で得たマルチメディア文書として意味を持つ情報
を、目的とする文書の体系の規則に従ったデータの表現
形式で表現する処理を行つ。Next, the document format data generation unit &17 performs processing to express the information having meaning as a multimedia document obtained by the document structure configuration unit 16 in a data expression format that conforms to the rules of the target document system. .

最後に出力手段６は、マルチメディア文書データ生成部
５で生成したマルチメディア文書のデータを出力装置７
に出力する。Finally, the output means 6 outputs the multimedia document data generated by the multimedia document data generation section 5 to an output device 7.
Output to.

〔Effect of the invention〕

以上説明したように本発明は、入力部と文書情報解析部
とマルチメディア文書データ生成部と出力手段とを設け
たことにより、入力文書を基にしてマルチメディア文書
データを自動的に作成することかできるようにしたもの
である。つまり、利用者が基になる文書の情報をメチイ
ア毎に切り出したり、メチイア毎の情報群の位置関係を
把握したりする必要がないため、処理は容易にてきると
いう効果かある。As explained above, the present invention is capable of automatically creating multimedia document data based on an input document by providing an input section, a document information analysis section, a multimedia document data generation section, and an output means. It was made so that it could be done. In other words, since the user does not have to cut out the information of the underlying document for each method or grasp the positional relationship of the information group for each method, the process can be easily performed.

また、文書データの生成はほとんと人手を介さずに行う
ため、利用者にとって文書の体系に関する知識は必須で
なくなり、誰にでも目的とするマルチメディア文書デー
タを作成することが可能になるという効果がある。Additionally, since the document data is generated with almost no human intervention, it is no longer necessary for the user to have knowledge of document systems, and anyone can create the desired multimedia document data. There is.

流れ図。flow diagram.

１・・第一入力装置、２・第二入力装置、３　・入力部
、４・・文書情報解析部、５・・マルチメディア文書デ
ータ生成部、６・・・出力手段、７・・出力装置、１１
・・・第一入力手段、１２・・第二入力手段、１３・・
・領域分割手段、１４・・書式解析手段、１５・・デー
タ解析手段、１６・・・文書構造構成手段、１７・・・
文書形式データ生成手段。1. First input device, 2. Second input device, 3. Input section, 4. Document information analysis section, 5. Multimedia document data generation section, 6. Output means, 7. Output device. , 11
...First input means, 12...Second input means, 13...
- Area dividing means, 14... Format analysis means, 15... Data analysis means, 16... Document structure composition means, 17...
Document format data generation means.

Claims

[Scope of Claims] A first input means for reading information of a document; an area dividing means for dividing the document by recognizing the arrangement of sentences and charts on the document input by the first input means; a second input means for inputting the type of media of data expressed for each area divided by the area dividing means; and a format for analyzing the format when printing a document for the medium specified by the second input means. an analysis means; a data analysis means that recognizes the content of data in accordance with encoding rules for each medium for the medium determined by the second input means; A multi-media document comprising: a document structure configuring means for constructing a document according to a structure system; and a document format data generating means for expressing the document constructed by the document structure constructing means in an expression format of a target multimedia document. Media document data automatic creation method.