JPS59135576A

JPS59135576A - Registering and retrieving device of document information

Info

Publication number: JPS59135576A
Application number: JP58008941A
Authority: JP
Inventors: Masahiko Hase; 雅彦長谷; Hajime Suzuki; 元鈴木; Rikuo Takano; 陸男高野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1983-01-21
Filing date: 1983-01-21
Publication date: 1984-08-03

Abstract

PURPOSE:To shorten the time needed for production of a summary edition with a simple method by extracting automatically the information of an area among the printed document pictures which is marked by a fluorescent pen, etc. CONSTITUTION:A document 11 is scanned at a picture information input part 14 and then fed into a region discriminating part 15. The part 15 segmentes a fixed region centering on an area in the document 11 which is marked by a fluorescent pen. This segment region is stored in a memory 18 together with the entire document picture. For the region stored in the memory 18, a character train region is segmented by a segmenting part 16 and then stored again in the memory 18. The character train sent from the part 16 is summarized at a summary producing part 17 and then stored in the memory 18 in the form of a page of summary edition. Thus a summary edition of the printed document can be obtained automatically.

Description

【発明の詳細な説明】この発明は印刷文書等・ドキュメント情報を格納し、ま
だこれを検索するドキュメント情報登録検索装置に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a document information registration and retrieval device that stores printed documents and other document information and retrieves them.

〈従来技術〉従来の印刷文書等のドキュメント情報を格納する装置は
、その文書全体の画像情報をそのまま磁気ディスク等に
格納し、その文書の管理情報、例えば表題を別の装置か
ら入力する方法が一般的であった。また印刷文書等に人
間が、例えば第１図に示すように文書などのドキュメン
ト１１にアングライン１２やけい光ペンのマークなどマ
ーク１３を付けた部分があれば、これらマークも印刷文
書部分と同時に画像情報として磁気ティスフ等に格納さ
れ、その画像を後で読み出した際に、文書画像がみだれ
てしまうという欠点があった。<Prior Art> Conventional devices for storing document information such as printed documents store image information of the entire document as it is on a magnetic disk, etc., and input management information of the document, such as the title, from another device. It was common. In addition, if there is a part of a printed document etc. where a human has attached marks 13 such as an angline 12 or a fluorescent pen mark to a document 11 such as a document as shown in FIG. There is a drawback that when the image information is stored in a magnetic tape or the like and the image is later read out, the document image becomes blurred.

また印刷文書画像を管理する場合に、既存の装置ではシ
ーケンシャルに文書画像を管理しているため、検索する
ためにそれぞれの文書画像が必要か否かを順次、各文書
全体を出力してみてからでないと認識できないという欠
点があった。In addition, when managing printed document images, existing devices manage document images sequentially, so it is necessary to output each document in its entirety and then check whether each document image is necessary for searching. The drawback was that it could not be recognized otherwise.

〈発明の概要〉この発明はこれらの欠点を除去するために、マークおよ
び色情報などの違いによシドキュメントのマーク領域を
判別して取り出し、そのマーク領域の情報から要約（ア
ブストラクト）情報を作υ、この要約情報と対応ドキュ
メント情報とを組にして蓄積しておく。従って要約情報
のみを読み出し表示し、必要な情報か否かを迅速に判断
でき検索を速く行うことができる。しかもその要約情報
の作成は自動的に行われ、人手を要さない。<Summary of the Invention> In order to eliminate these drawbacks, the present invention identifies and extracts the mark area of a document based on the difference in mark and color information, and creates abstract information from the information of the mark area. υ, this summary information and corresponding document information are stored in pairs. Therefore, only the summary information can be read and displayed, and it can be quickly determined whether the information is necessary or not, and the search can be performed quickly. Furthermore, the summary information is created automatically and does not require any human intervention.

〈実施例〉第２図はこの発明、の一実施例を示す。文書寿どのドキ
ュメント１１は画像情報入力部１４によシ走査され、画
像情報として入力される。その画像情報は領域を判別す
る判別処理部１５に入力される。判別処理部１５、切り
出し処理部１６、要約作成部１７、文書画像および格納
処理を行うプログラムが一次蓄積されるメモリ１８、中
央演算処理装置（ＣＰＵ）１９、文書画像が実際に格納
される磁気ディスクおよび光ディスク等の蓄積部２１、
文書画像表示装置２２、画像出力装置２３は共通バス２
４に接続されている。<Embodiment> FIG. 2 shows an embodiment of this invention. The document 11, such as a document lifespan, is scanned by the image information input section 14 and input as image information. The image information is input to a discrimination processing section 15 that discriminates the area. Discrimination processing unit 15, extraction processing unit 16, summary creation unit 17, memory 18 in which document images and programs that perform storage processing are temporarily stored, central processing unit (CPU) 19, magnetic disk in which document images are actually stored. and a storage unit 21 such as an optical disk;
The document image display device 22 and the image output device 23 are connected to the common bus 2.
Connected to 4.

画像入力部１４はテレビジョンカメラやファクシミリ等
の入力装置であわ、これにより入力された画像信号は判
別処理部１５で印刷文書１１中のけい光ペンでマークが
しるされた部分を中心としたある一定領域が切り分けら
れる。例えば第３図に示すように「１目的」および「Ｌ
ＥＤプリンタ」にマークが付けられているとすると、そ
のマークを中心とするその附近を含む領域２５．２６が
判別処理部１５で切り分けられる。実際の判別処理にあ
たってはファクシミリの場合はカラー情報のセンサの適
用およびカラーカメラであればカラーのフィルタ等の採
用が考えられる。The image input unit 14 is an input device such as a television camera or a facsimile, and the input image signal is sent to the discrimination processing unit 15, where the image signal is centered on the part of the printed document 11 marked with a fluorescent pen. A certain area is carved out. For example, as shown in Figure 3, "1 purpose" and "L
If a mark is attached to "ED Printer", the discrimination processing unit 15 divides an area 25 and 26 including the vicinity of the mark. In the actual discrimination process, a sensor for color information may be used in the case of a facsimile, and a color filter may be used in the case of a color camera.

具体的にはカラーカメラ等で文書１１の全体を走査しな
がら文書中のけい光ペンでマークを付加された蜀１分の
位置座標（ｘ、ｙ）をサンプリングし、まずその領域を
抽出する。その後にそのけい光ペンでかかれた領域のま
わりの一定領域を抽出し、領域２５．２６となし文書画
像全体をもとにメモリ１８内に格納される。この領域２
５．２６の抽出はソフトウェアで実現可能である。つま
９文書１１をカラーカメラで走査することによりマーク
が付されている部分、？出力が得られると、その時の走
査信号から、そのマークが付けられた領域Ｘ、Ｙ座標が
検出され、そのＸ成分がｘ１〜Ｘ２、Ｙ成分がｙ、〜ｙ
２である場合は、Ｘ方向およびＹ方向についてそれぞれ
左右、上下に△Ｘ、△ｙずつ広げ、ｘｌ−△ｘ−Ｘ２＋
△Ｘとｙｌ−△ｙ−ｙ２＋△ｙにより限定される領域を
判別処理部１５（でより分けられた領域、例えば２６と
し、この領域２６に含まれる画像情報が全体の画像情報
から要約情報として取り出される。Specifically, while scanning the entire document 11 with a color camera or the like, the positional coordinates (x, y) of 1 minute in the document marked with a fluorescent pen are sampled, and that area is first extracted. Thereafter, a certain area around the area drawn with the fluorescent pen is extracted and stored in the memory 18 based on the area 25, 26 and the entire document image. This area 2
5.26 extraction can be realized by software. 9. The part marked by scanning the document 11 with a color camera, ? When the output is obtained, the X and Y coordinates of the marked area are detected from the scanning signal at that time, and the X component is x1 to X2, and the Y component is y, to y.
If it is 2, expand △X and △y horizontally and vertically in the X and Y directions, respectively, and xl - △x - X2 +
The area limited by △X and yl - △y - y2 + △y is defined as an area divided by the discrimination processing unit 15 (for example, 26), and the image information contained in this area 26 is extracted from the entire image information as summary information. taken out.

メモリ１８内に格納された文書画像の判別された領域２
５．２６は切り出し処理部１６において文字列領域の切
り出し、および中途半端な文字の削除が行われる。即ち
具体的にはメモリ１８内の領域２５．２６を縦横方向に
周辺分布を敗り、その周辺分布から文字列領域の切り出
しおよび中途半端な文字の削除が行われる。第４図にそ
の具体例を示す。即ち例えば領域２６においてその横力
向における各走査線、つまＤｙ＋−△Ｙ４２＋△ｙの各
ｙの値についてのｘｌ−△ｘ−ｘ２＋△Ｘの各点の中の
像レベルの最大値をそれぞれ求めると、第４図Ａに示す
ように、文書の行１砺２７に対応する部分はゼロとなる
。その各ゼロの領域の各中央音じの点￥１．￥２内を取
り出し、その外狽ｌをυト除し、第４図Ｂに示す切シ分
は像をイ尋る。同（筆にして領域２６のｘ、１−△ｘ−
ｘ２＋△Ｘの各Ｘのイ直についてｙ、　７△Ｙ”−Ｙ２
＋△ｙの各、薇の中の最大イ直を−すれぞれ求めると、
第４図Ｃに示“す分布力玉？尋られる。Determined area 2 of document image stored in memory 18
5.26, the cutout processing unit 16 cuts out the character string area and deletes half-finished characters. That is, specifically, the peripheral distribution of areas 25 and 26 in the memory 18 is examined in the vertical and horizontal directions, and a character string area is cut out from the peripheral distribution and half-finished characters are deleted. A specific example is shown in FIG. That is, for example, in the region 26, find the maximum value of the image level in each point of xl-△x-x2+△X for each scanning line in the lateral force direction, that is, for each y value of Dy+-△Y42+△y. Then, as shown in FIG. 4A, the portion corresponding to line 1 27 of the document becomes zero. The point of each central note in each zero region is ¥1. Take out the amount within 2 yen, divide its outer value by υt, and examine the image shown in Figure 4B. Same (as a brush, x in area 26, 1-△x-
For each straight line of X of x2+△X, y, 7△Y”-Y2
For each of +△y, find the maximum a-direction in -, respectively.
You will be asked if it is the distributed force ball shown in Figure 4C.

これより空間２８に対応する＠６分を検出し、ｘ１〜ｘ
２内を取り出す。この結果第４図りに示すような中途半
端な部分が除去される。From this, @6 minutes corresponding to space 28 is detected, and x1 to x
Take out the inside of 2. As a result, a half-finished portion as shown in the fourth diagram is removed.

以上の処理によって作成された文書面１イ象中の番すい
光ペンのマーク領域２９はメモリ１８に再び入力され、
その場合には領域２６　（２５）は１ｌＩＪ除される。The mark area 29 of the bright light pen in the document surface 1 image created by the above processing is inputted into the memory 18 again,
In that case, the area 26 (25) is divided by 1lIJ.

切り出し処理部１６で作成された領域２９は、次に要約
作成部１７におし）て集約され、１ペ一ジ分の要約版が
完成される。１ペー・ジの要約１友はメモリ１８に蓄積
される。その場合には文字認識処理を行い、要約版をコ
ード化しておくことも考えられる。メモリ１８内に蓄積
された要約版の文書画像の情報と対応する原画［象（全
体のバイナリ−情報）は蓄積部２１に共通バス２４を通
して格納される。その際に要約情報が先に読み出される
ように格納する。以上の処理の概略の流れを第５図に示
す。The area 29 created by the cutout processing section 16 is then aggregated by the summary creation section 17) to complete a summary version for one page. A summary of one page is stored in memory 18. In that case, it may be possible to perform character recognition processing and encode the summarized version. The information of the summarized document image stored in the memory 18 and the corresponding original image (the entire binary information) are stored in the storage section 21 through the common bus 24. At that time, the summary information is stored so that it is read out first. FIG. 5 shows a schematic flow of the above processing.

格納された文書画像を検索する場合には、まず登録され
た文書画像の要約版だけを表示装置２２に表示し必要か
否かを判別する。必要な場合にはその一要約版に続く、
原画像（必要に応じてその要約版も含めて）出力装置２
３にハードコピーとして出力する。When searching for stored document images, first, only the summarized version of the registered document image is displayed on the display device 22, and it is determined whether or not it is necessary. If necessary, a condensed version follows,
Original image (including its summarized version if necessary) output device 2
3. Output as a hard copy.

〈効　果〉以上説明したようにこの発明のドキュメント情報登録検
索装置は、印刷文書画像中のけい光ペン等でマークをほ
どこしだ部分の情報を自動的に抽出しその情報からその
印刷文書の要約版（アブストラクト）を自動的に作成す
ることが可能である。<Effects> As explained above, the document information registration and retrieval device of the present invention automatically extracts information from a portion of a printed document image marked with a fluorescent pen or the like, and uses that information to extract a summary of the printed document. It is possible to automatically create a version (abstract).

そこで要約版を作成する時間を短縮および簡易化できる
利点がある。まだその文書情報を検索する場合でもその
要約版の情報を用いることにより、検索時間の短縮およ
び操作の簡易化か実現できる。This has the advantage of shortening and simplifying the time required to create an abridged version. Even if the document information is still being searched, by using the summarized version of the information, the search time can be shortened and the operation can be simplified.

加えて判別処理部を変更することにより様々なマークに
よる文書画像の一部の抽出か可能となる。In addition, by changing the discrimination processing section, it becomes possible to extract a portion of a document image based on various marks.

[Brief explanation of the drawing]

第１図は既存文書画像にはい光ペンでマークを付加する
例を示す図、第２図はこの発明の一実施例を示すブロッ
ク図、第３図は領域判別処理部で抽出される領域を示す
図、第４図は切り出し処理部での処理を説明するだめの
図、第５図は全体の処理の順を示す流れ図である。１１：文書、１４：画像情報入力部、１５：判別処理部
、１６：切り出し処理部、１７：要約（アブストラクト
）作成部、１８：文書画像格納・プログラム格納用メモ
リ、１９：中央演算処理装置（ＣＰＵ　）、２１：文書
画像蓄積部、２２：文書画像表示装置、２３：文書画像
出力装置、２４：共通バス、２６：領域判別処理で切シ
出される領域、２９：切り出し処理で切り分けられた領
域。特許出願人　　日本電信電話公社代　　理　　人　　　草　　野　　　　　卓第１図才２　図１１ノ然偶Ｚオ　３　図才　４　図第５（￥１FIG. 1 is a diagram showing an example of adding a mark to an existing document image using a fluorescent pen, FIG. 2 is a block diagram showing an embodiment of the present invention, and FIG. FIG. 4 is a diagram for explaining the processing in the cutout processing section, and FIG. 5 is a flowchart showing the order of the overall processing. 11: Document, 14: Image information input section, 15: Discrimination processing section, 16: Cutting processing section, 17: Abstract creation section, 18: Document image storage/program storage memory, 19: Central processing unit ( CPU), 21: Document image storage unit, 22: Document image display device, 23: Document image output device, 24: Common bus, 26: Area cut out in area discrimination processing, 29: Area cut out in cutting processing . Patent Applicant: Representative of Nippon Telegraph and Telephone Public Corporation Takashi Kusano No. 1 Figure 2 Figure 11 no Random Z O 3 Figure 4 Figure 5 (¥1

Claims

[Claims]

(1) An input unit that reads document information as an electrical signal, a discrimination processing unit that discriminates a marked area in the document from the read information, and a character string area is cut out from the discriminated area. an extraction processing unit, a summary creation unit that creates summary information of the document from the extracted information, a storage processing unit that stores the read document information and the created summary information, and each such document information. and a storage section for storing a plurality of sets of summary information, a document image information display section for reading out and displaying the information in the storage section, and a document image output section for reading out the information in the storage section and outputting it as a hard copy. A document information registration and retrieval device comprising: