JPS59135576A - Registering and retrieving device of document information - Google Patents

Registering and retrieving device of document information

Info

Publication number
JPS59135576A
JPS59135576A JP58008941A JP894183A JPS59135576A JP S59135576 A JPS59135576 A JP S59135576A JP 58008941 A JP58008941 A JP 58008941A JP 894183 A JP894183 A JP 894183A JP S59135576 A JPS59135576 A JP S59135576A
Authority
JP
Japan
Prior art keywords
document
information
area
stored
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58008941A
Other languages
Japanese (ja)
Inventor
Masahiko Hase
雅彦 長谷
Hajime Suzuki
元 鈴木
Rikuo Takano
陸男 高野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP58008941A priority Critical patent/JPS59135576A/en
Publication of JPS59135576A publication Critical patent/JPS59135576A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Input (AREA)

Abstract

PURPOSE:To shorten the time needed for production of a summary edition with a simple method by extracting automatically the information of an area among the printed document pictures which is marked by a fluorescent pen, etc. CONSTITUTION:A document 11 is scanned at a picture information input part 14 and then fed into a region discriminating part 15. The part 15 segmentes a fixed region centering on an area in the document 11 which is marked by a fluorescent pen. This segment region is stored in a memory 18 together with the entire document picture. For the region stored in the memory 18, a character train region is segmented by a segmenting part 16 and then stored again in the memory 18. The character train sent from the part 16 is summarized at a summary producing part 17 and then stored in the memory 18 in the form of a page of summary edition. Thus a summary edition of the printed document can be obtained automatically.

Description

【発明の詳細な説明】 この発明は印刷文書等・ドキュメント情報を格納し、ま
だこれを検索するドキュメント情報登録検索装置に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a document information registration and retrieval device that stores printed documents and other document information and retrieves them.

〈従来技術〉 従来の印刷文書等のドキュメント情報を格納する装置は
、その文書全体の画像情報をそのまま磁気ディスク等に
格納し、その文書の管理情報、例えば表題を別の装置か
ら入力する方法が一般的であった。また印刷文書等に人
間が、例えば第1図に示すように文書などのドキュメン
ト11にアングライン12やけい光ペンのマークなどマ
ーク13を付けた部分があれば、これらマークも印刷文
書部分と同時に画像情報として磁気ティスフ等に格納さ
れ、その画像を後で読み出した際に、文書画像がみだれ
てしまうという欠点があった。
<Prior Art> Conventional devices for storing document information such as printed documents store image information of the entire document as it is on a magnetic disk, etc., and input management information of the document, such as the title, from another device. It was common. In addition, if there is a part of a printed document etc. where a human has attached marks 13 such as an angline 12 or a fluorescent pen mark to a document 11 such as a document as shown in FIG. There is a drawback that when the image information is stored in a magnetic tape or the like and the image is later read out, the document image becomes blurred.

また印刷文書画像を管理する場合に、既存の装置ではシ
ーケンシャルに文書画像を管理しているため、検索する
ためにそれぞれの文書画像が必要か否かを順次、各文書
全体を出力してみてからでないと認識できないという欠
点があった。
In addition, when managing printed document images, existing devices manage document images sequentially, so it is necessary to output each document in its entirety and then check whether each document image is necessary for searching. The drawback was that it could not be recognized otherwise.

〈発明の概要〉 この発明はこれらの欠点を除去するために、マークおよ
び色情報などの違いによシドキュメントのマーク領域を
判別して取り出し、そのマーク領域の情報から要約(ア
ブストラクト)情報を作υ、この要約情報と対応ドキュ
メント情報とを組にして蓄積しておく。従って要約情報
のみを読み出し表示し、必要な情報か否かを迅速に判断
でき検索を速く行うことができる。しかもその要約情報
の作成は自動的に行われ、人手を要さない。
<Summary of the Invention> In order to eliminate these drawbacks, the present invention identifies and extracts the mark area of a document based on the difference in mark and color information, and creates abstract information from the information of the mark area. υ, this summary information and corresponding document information are stored in pairs. Therefore, only the summary information can be read and displayed, and it can be quickly determined whether the information is necessary or not, and the search can be performed quickly. Furthermore, the summary information is created automatically and does not require any human intervention.

〈実施例〉 第2図はこの発明、の一実施例を示す。文書寿どのドキ
ュメント11は画像情報入力部14によシ走査され、画
像情報として入力される。その画像情報は領域を判別す
る判別処理部15に入力される。判別処理部15、切り
出し処理部16、要約作成部17、文書画像および格納
処理を行うプログラムが一次蓄積されるメモリ18、中
央演算処理装置(CPU)19、文書画像が実際に格納
される磁気ディスクおよび光ディスク等の蓄積部21、
文書画像表示装置22、画像出力装置23は共通バス2
4に接続されている。
<Embodiment> FIG. 2 shows an embodiment of this invention. The document 11, such as a document lifespan, is scanned by the image information input section 14 and input as image information. The image information is input to a discrimination processing section 15 that discriminates the area. Discrimination processing unit 15, extraction processing unit 16, summary creation unit 17, memory 18 in which document images and programs that perform storage processing are temporarily stored, central processing unit (CPU) 19, magnetic disk in which document images are actually stored. and a storage unit 21 such as an optical disk;
The document image display device 22 and the image output device 23 are connected to the common bus 2.
Connected to 4.

画像入力部14はテレビジョンカメラやファクシミリ等
の入力装置であわ、これにより入力された画像信号は判
別処理部15で印刷文書11中のけい光ペンでマークが
しるされた部分を中心としたある一定領域が切り分けら
れる。例えば第3図に示すように「1目的」および「L
EDプリンタ」にマークが付けられているとすると、そ
のマークを中心とするその附近を含む領域25.26が
判別処理部15で切り分けられる。実際の判別処理にあ
たってはファクシミリの場合はカラー情報のセンサの適
用およびカラーカメラであればカラーのフィルタ等の採
用が考えられる。
The image input unit 14 is an input device such as a television camera or a facsimile, and the input image signal is sent to the discrimination processing unit 15, where the image signal is centered on the part of the printed document 11 marked with a fluorescent pen. A certain area is carved out. For example, as shown in Figure 3, "1 purpose" and "L
If a mark is attached to "ED Printer", the discrimination processing unit 15 divides an area 25 and 26 including the vicinity of the mark. In the actual discrimination process, a sensor for color information may be used in the case of a facsimile, and a color filter may be used in the case of a color camera.

具体的にはカラーカメラ等で文書11の全体を走査しな
がら文書中のけい光ペンでマークを付加された蜀1分の
位置座標(x、y)をサンプリングし、まずその領域を
抽出する。その後にそのけい光ペンでかかれた領域のま
わりの一定領域を抽出し、領域25.26となし文書画
像全体をもとにメモリ18内に格納される。この領域2
5.26の抽出はソフトウェアで実現可能である。つま
9文書11をカラーカメラで走査することによりマーク
が付されている部分、?出力が得られると、その時の走
査信号から、そのマークが付けられた領域X、Y座標が
検出され、そのX成分がx1〜X2、Y成分がy、〜y
2である場合は、X方向およびY方向についてそれぞれ
左右、上下に△X、△yずつ広げ、xl−△x−X2+
△Xとyl−△y−y2+△yにより限定される領域を
判別処理部15(でより分けられた領域、例えば26と
し、この領域26に含まれる画像情報が全体の画像情報
から要約情報として取り出される。
Specifically, while scanning the entire document 11 with a color camera or the like, the positional coordinates (x, y) of 1 minute in the document marked with a fluorescent pen are sampled, and that area is first extracted. Thereafter, a certain area around the area drawn with the fluorescent pen is extracted and stored in the memory 18 based on the area 25, 26 and the entire document image. This area 2
5.26 extraction can be realized by software. 9. The part marked by scanning the document 11 with a color camera, ? When the output is obtained, the X and Y coordinates of the marked area are detected from the scanning signal at that time, and the X component is x1 to X2, and the Y component is y, to y.
If it is 2, expand △X and △y horizontally and vertically in the X and Y directions, respectively, and xl - △x - X2 +
The area limited by △X and yl - △y - y2 + △y is defined as an area divided by the discrimination processing unit 15 (for example, 26), and the image information contained in this area 26 is extracted from the entire image information as summary information. taken out.

メモリ18内に格納された文書画像の判別された領域2
5.26は切り出し処理部16において文字列領域の切
り出し、および中途半端な文字の削除が行われる。即ち
具体的にはメモリ18内の領域25.26を縦横方向に
周辺分布を敗り、その周辺分布から文字列領域の切り出
しおよび中途半端な文字の削除が行われる。第4図にそ
の具体例を示す。即ち例えば領域26においてその横力
向における各走査線、つまDy+−△Y42+△yの各
yの値についてのxl−△x−x2+△Xの各点の中の
像レベルの最大値をそれぞれ求めると、第4図Aに示す
ように、文書の行1砺27に対応する部分はゼロとなる
。その各ゼロの領域の各中央音じの点¥1.¥2内を取
り出し、その外狽lをυト除し、第4図Bに示す切シ分
は像をイ尋る。同(筆にして領域26のx、1−△x−
x2+△Xの各Xのイ直についてy、 7△Y”−Y2
+△yの各、薇の中の最大イ直を−すれぞれ求めると、
第4図Cに示“す分布力玉?尋られる。
Determined area 2 of document image stored in memory 18
5.26, the cutout processing unit 16 cuts out the character string area and deletes half-finished characters. That is, specifically, the peripheral distribution of areas 25 and 26 in the memory 18 is examined in the vertical and horizontal directions, and a character string area is cut out from the peripheral distribution and half-finished characters are deleted. A specific example is shown in FIG. That is, for example, in the region 26, find the maximum value of the image level in each point of xl-△x-x2+△X for each scanning line in the lateral force direction, that is, for each y value of Dy+-△Y42+△y. Then, as shown in FIG. 4A, the portion corresponding to line 1 27 of the document becomes zero. The point of each central note in each zero region is ¥1. Take out the amount within 2 yen, divide its outer value by υt, and examine the image shown in Figure 4B. Same (as a brush, x in area 26, 1-△x-
For each straight line of X of x2+△X, y, 7△Y”-Y2
For each of +△y, find the maximum a-direction in -, respectively.
You will be asked if it is the distributed force ball shown in Figure 4C.

これより空間28に対応する@6分を検出し、x1〜x
2内を取り出す。この結果第4図りに示すような中途半
端な部分が除去される。
From this, @6 minutes corresponding to space 28 is detected, and x1 to x
Take out the inside of 2. As a result, a half-finished portion as shown in the fourth diagram is removed.

以上の処理によって作成された文書面1イ象中の番すい
光ペンのマーク領域29はメモリ18に再び入力され、
その場合には領域26 (25)は1lIJ除される。
The mark area 29 of the bright light pen in the document surface 1 image created by the above processing is inputted into the memory 18 again,
In that case, the area 26 (25) is divided by 1lIJ.

切り出し処理部16で作成された領域29は、次に要約
作成部17におし)て集約され、1ペ一ジ分の要約版が
完成される。1ペー・ジの要約1友はメモリ18に蓄積
される。その場合には文字認識処理を行い、要約版をコ
ード化しておくことも考えられる。メモリ18内に蓄積
された要約版の文書画像の情報と対応する原画[象(全
体のバイナリ−情報)は蓄積部21に共通バス24を通
して格納される。その際に要約情報が先に読み出される
ように格納する。以上の処理の概略の流れを第5図に示
す。
The area 29 created by the cutout processing section 16 is then aggregated by the summary creation section 17) to complete a summary version for one page. A summary of one page is stored in memory 18. In that case, it may be possible to perform character recognition processing and encode the summarized version. The information of the summarized document image stored in the memory 18 and the corresponding original image (the entire binary information) are stored in the storage section 21 through the common bus 24. At that time, the summary information is stored so that it is read out first. FIG. 5 shows a schematic flow of the above processing.

格納された文書画像を検索する場合には、まず登録され
た文書画像の要約版だけを表示装置22に表示し必要か
否かを判別する。必要な場合にはその一要約版に続く、
原画像(必要に応じてその要約版も含めて)出力装置2
3にハードコピーとして出力する。
When searching for stored document images, first, only the summarized version of the registered document image is displayed on the display device 22, and it is determined whether or not it is necessary. If necessary, a condensed version follows,
Original image (including its summarized version if necessary) output device 2
3. Output as a hard copy.

〈効 果〉 以上説明したようにこの発明のドキュメント情報登録検
索装置は、印刷文書画像中のけい光ペン等でマークをほ
どこしだ部分の情報を自動的に抽出しその情報からその
印刷文書の要約版(アブストラクト)を自動的に作成す
ることが可能である。
<Effects> As explained above, the document information registration and retrieval device of the present invention automatically extracts information from a portion of a printed document image marked with a fluorescent pen or the like, and uses that information to extract a summary of the printed document. It is possible to automatically create a version (abstract).

そこで要約版を作成する時間を短縮および簡易化できる
利点がある。まだその文書情報を検索する場合でもその
要約版の情報を用いることにより、検索時間の短縮およ
び操作の簡易化か実現できる。
This has the advantage of shortening and simplifying the time required to create an abridged version. Even if the document information is still being searched, by using the summarized version of the information, the search time can be shortened and the operation can be simplified.

加えて判別処理部を変更することにより様々なマークに
よる文書画像の一部の抽出か可能となる。
In addition, by changing the discrimination processing section, it becomes possible to extract a portion of a document image based on various marks.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は既存文書画像にはい光ペンでマークを付加する
例を示す図、第2図はこの発明の一実施例を示すブロッ
ク図、第3図は領域判別処理部で抽出される領域を示す
図、第4図は切り出し処理部での処理を説明するだめの
図、第5図は全体の処理の順を示す流れ図である。 11:文書、14:画像情報入力部、15:判別処理部
、16:切り出し処理部、17:要約(アブストラクト
)作成部、18:文書画像格納・プログラム格納用メモ
リ、19:中央演算処理装置(CPU )、21:文書
画像蓄積部、22:文書画像表示装置、23:文書画像
出力装置、24:共通バス、26:領域判別処理で切シ
出される領域、29:切り出し処理で切り分けられた領
域。 特許出願人  日本電信電話公社 代  理  人   草  野     卓第1図 才2 図 11ノ然偶Z オ 3 図 才 4 図 第5(¥1
FIG. 1 is a diagram showing an example of adding a mark to an existing document image using a fluorescent pen, FIG. 2 is a block diagram showing an embodiment of the present invention, and FIG. FIG. 4 is a diagram for explaining the processing in the cutout processing section, and FIG. 5 is a flowchart showing the order of the overall processing. 11: Document, 14: Image information input section, 15: Discrimination processing section, 16: Cutting processing section, 17: Abstract creation section, 18: Document image storage/program storage memory, 19: Central processing unit ( CPU), 21: Document image storage unit, 22: Document image display device, 23: Document image output device, 24: Common bus, 26: Area cut out in area discrimination processing, 29: Area cut out in cutting processing . Patent Applicant: Representative of Nippon Telegraph and Telephone Public Corporation Takashi Kusano No. 1 Figure 2 Figure 11 no Random Z O 3 Figure 4 Figure 5 (¥1

Claims (1)

【特許請求の範囲】[Claims] (1)  ドキュメントの情報を電気信号として読み取
る入力部と、その読み取り情報から上記ドキュメント中
のマークが付けられた領域を判別する判別処理部と、そ
の判別された領域から文字列領域を切シ出す切り出し処
理部と、その切シ出された情報からそのドキュメントの
要約情報を作成する要約作成部と、上記読み取ったドキ
ュメント情報および作成した要約情報を格納する格納処
理部と、このような各ドキュメント情報および要約情報
の組の複数を格納するだめの蓄積部と、その蓄積部の情
報を読み出して表示する文書画像情報表示部と、上記蓄
積部の情報を読み出してハードコピーとして出力する文
書画像出力部とを具備するドキュメント情報登録検索装
置。
(1) An input unit that reads document information as an electrical signal, a discrimination processing unit that discriminates a marked area in the document from the read information, and a character string area is cut out from the discriminated area. an extraction processing unit, a summary creation unit that creates summary information of the document from the extracted information, a storage processing unit that stores the read document information and the created summary information, and each such document information. and a storage section for storing a plurality of sets of summary information, a document image information display section for reading out and displaying the information in the storage section, and a document image output section for reading out the information in the storage section and outputting it as a hard copy. A document information registration and retrieval device comprising:
JP58008941A 1983-01-21 1983-01-21 Registering and retrieving device of document information Pending JPS59135576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58008941A JPS59135576A (en) 1983-01-21 1983-01-21 Registering and retrieving device of document information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58008941A JPS59135576A (en) 1983-01-21 1983-01-21 Registering and retrieving device of document information

Publications (1)

Publication Number Publication Date
JPS59135576A true JPS59135576A (en) 1984-08-03

Family

ID=11706691

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58008941A Pending JPS59135576A (en) 1983-01-21 1983-01-21 Registering and retrieving device of document information

Country Status (1)

Country Link
JP (1) JPS59135576A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61267177A (en) * 1985-05-22 1986-11-26 Hitachi Ltd Retrieving system for document picture information
JPH01134671A (en) * 1987-11-20 1989-05-26 Canon Inc Document processor
EP0544432A2 (en) * 1991-11-19 1993-06-02 Xerox Corporation Method and apparatus for document processing
JPH06325096A (en) * 1993-05-13 1994-11-25 Ricoh Co Ltd Image forming and storing device
JP2003228572A (en) * 2002-12-12 2003-08-15 Ricoh Co Ltd Image processor, and method of preparing index information
JP2018163418A (en) * 2017-03-24 2018-10-18 富士ゼロックス株式会社 Search information generation device, image processing device, and search information generation program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55121572A (en) * 1979-03-13 1980-09-18 Toshiba Corp Document filing unit

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55121572A (en) * 1979-03-13 1980-09-18 Toshiba Corp Document filing unit

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61267177A (en) * 1985-05-22 1986-11-26 Hitachi Ltd Retrieving system for document picture information
JPH0750483B2 (en) * 1985-05-22 1995-05-31 株式会社日立製作所 How to store additional information about document images
JPH01134671A (en) * 1987-11-20 1989-05-26 Canon Inc Document processor
EP0544432A2 (en) * 1991-11-19 1993-06-02 Xerox Corporation Method and apparatus for document processing
JPH05242142A (en) * 1991-11-19 1993-09-21 Xerox Corp Method for summarizing document without decoding document picture
EP0544432A3 (en) * 1991-11-19 1993-12-22 Xerox Corp Method and apparatus for document processing
US5491760A (en) * 1991-11-19 1996-02-13 Xerox Corporation Method and apparatus for summarizing a document without document image decoding
JPH06325096A (en) * 1993-05-13 1994-11-25 Ricoh Co Ltd Image forming and storing device
JP2003228572A (en) * 2002-12-12 2003-08-15 Ricoh Co Ltd Image processor, and method of preparing index information
JP2018163418A (en) * 2017-03-24 2018-10-18 富士ゼロックス株式会社 Search information generation device, image processing device, and search information generation program

Similar Documents

Publication Publication Date Title
JP3504054B2 (en) Document processing apparatus and document processing method
EP0555027B1 (en) Information processing apparatus and method utilising useful additional information packet
US20020102022A1 (en) Detecting and utilizing add-on information from a scanned document image
CN100397864C (en) Image processing system and image processing method
JP4533273B2 (en) Image processing apparatus, image processing method, and program
JP2816241B2 (en) Image information retrieval device
JP2005149096A (en) Image processing system and image processing method
JPS5947641A (en) Producer of visiting card data base
EP0899666B1 (en) Image processing apparatus displaying a catalog of different types of data in different manner
JPS59135576A (en) Registering and retrieving device of document information
CN101872344A (en) Control method for image scanning
JP2005208977A (en) Document filing device and method
JP4480109B2 (en) Image management apparatus and image management method
JPS6255772A (en) Picture processor
JP2001101213A (en) Information processor, document managing device, information processing sysetm, information managing method and storage medium
JPS62243067A (en) Image file device
JPS59103177A (en) Business card reader
JPS61153756A (en) Document processing system
JPS6331825B2 (en)
JPH06162109A (en) Electronic filing system
JP2000222577A (en) Method and device for ruled line processing, and recording medium
JPH01278170A (en) Image filing device
WO2019161808A1 (en) Body-worn law enforcement camera employing 4g/5g network and two-dimensional barcode scanning and identification, and method
JPS6151272A (en) Picture processing system
JP2005157447A (en) Image processing system and method