JPH0296467A

JPH0296467A - Document accumulating device

Info

Publication number: JPH0296467A
Application number: JP63248383A
Authority: JP
Inventors: Hisao Hayashi; 久雄林
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-09-30
Filing date: 1988-09-30
Publication date: 1990-04-09

Abstract

PURPOSE:To synthesize and accumulate an obtained mark and picture information, to reduce an accumulated information quantity, and effectively use a storage part by providing a device, which extracts a mark-expressible information area from document information and obtains the corresponding mark, and a device which obtains an inexpressible area. CONSTITUTION:When accumulating the document being accumulated in an original, the document is converted into picture information 101 expressed in a bit image by an original reader part 1. Next, a character is recognized by a character recognizing part, the document information which has successed in recognizing the character is converted into a character code 104, and the document information which has not successed in recognizing the character is outputted as the picture information, compression-processed by an encoder part 3, the document information separated into encoding picture information 103 is information-synthesized 4, and accumulated 5 as the synthetic information. When the cumulative information is fetched, cumulative information 106 is separated by a character code 109 and encoding picture information 107 by an information separating part 6, encoded picture information 107 is decoded 7, and outputted as the picture information.

Description

【発明の詳細な説明】産業上の利用分野本発明は、文書蓄積装置に関し、特に、原稿に記述され
た文書を蓄積する文書蓄積装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a document storage device, and more particularly to a document storage device that stores documents written on manuscripts.

従来の技術従来、この種の文書蓄積装置は、文書情報を全て画像情
報としてＮ積する横道となっていた。BACKGROUND OF THE INVENTION Conventionally, document storage devices of this type have been a sideways process in which all document information is multiplied by N as image information.

発明が解決しようとする課題上述した文言蓄積装置は、文書情報を全て画幅情報とし
て処理しているので、蓄積する情報量か多く、大容量の
記憶部を必要とするという欠点がある。Problems to be Solved by the Invention The above-mentioned text storage device processes all document information as image width information, so it has the drawback of storing a large amount of information and requiring a large-capacity storage unit.

本発明は従来の技術に内在する上記欠点を克服する為に
なされたものであり、従って本発明の目的は、蓄積する
情報量を少なくすることによって、記憶装置を効率的に
使用することを可能とした新規な文書蓄積装置を提供す
ることにある。The present invention has been made in order to overcome the above-mentioned drawbacks inherent in the conventional technology, and therefore, an object of the present invention is to make it possible to efficiently use a storage device by reducing the amount of information stored. The object of the present invention is to provide a new document storage device with the following features.

課題を解決するための手段上記目的を達成する為に、本発明に係る文言蓄積装置は
、文書情報から記号表現可能な情報領域を抽出してこれ
に該当する記号を得る第１の手段と、前記文書情報のう
ちで記号表現不可能な清報領域を画像情報として得る第
２の手段と、これら第１、第２の手段により得られた記
号と画像情報を合成して蓄積する手段と８備えて構成さ
れる。Means for Solving the Problems In order to achieve the above object, the text storage device according to the present invention includes a first means for extracting an information region that can be expressed symbolically from document information and obtaining a symbol corresponding to the region; a second means for obtaining as image information a clear information area that cannot be represented symbolically in the document information; and a means for synthesizing and storing the symbols and image information obtained by these first and second means; Prepared and configured.

実施例次に本発明をその好ましい一実施例について図面を参照
して具体的に説明する。Embodiment Next, a preferred embodiment of the present invention will be specifically explained with reference to the drawings.

第１図は、本発明の一実施例を示すブロック構成図であ
る。FIG. 1 is a block diagram showing one embodiment of the present invention.

第１図を参照するに、原稿に記述された文芽を蓄積する
場合に、ます、文書を原稿読取部］、によってビットイ
メージで表現された画（ｆｆｌ情報１０１に変換する。Referring to FIG. 1, when accumulating sentence buds written on a manuscript, the document is first converted into an image expressed as a bit image (ffl information 101) by the manuscript reading section.

次に文字認識処理によって文字の認識が行われ、文字認
識に成功した文書情報は、文字コード１０４に変換され
、文字認識できなかった文書＋ｆ！報は画像情報のまま
１１〕２として出力されて符号化部３によって圧縮処理
され符号ｆヒ画像情報１１〕３を得ろ。このようにして
　文字コード１０４と肴号イヒ画渫情報１１］３に分離
された文書情報は、情報合成部４によって合成されき成
情報１０５として蓄積部５に蓄積される。Next, characters are recognized by character recognition processing, and document information whose characters have been successfully recognized is converted to character code 104, and documents whose characters cannot be recognized +f! The image information is outputted as image information 11]2 and compressed by the encoder 3 to obtain code fhi image information 11]3. The document information thus separated into the character code 104 and the appetizer number information 11 ] 3 is synthesized by the information synthesis section 4 and stored in the storage section 5 as finished information 105 .

蓄積された文言を取り出す場りには、情報分雛部Ｃ′）
によって蓄積情報１１）６を文字コード１１）９と符号
化画像情報１０７に分越し、符号ｆヒ画像情報１０７は
復号（ヒ部７によってビットイメージに復号化されて画
（、Ｔ１情報１０８として出力される。このようにして
得られた文字コード１０９と画像情報１０８は表示制御
部８によってＣＲＴ信号１１０、プリント信号１１１と
してＣＲＴ９、プリンタ１０に出力される。Information section C') is used to retrieve the stored text.
The accumulated information 11) 6 is divided into character code 11) 9 and encoded image information 107, and the code f image information 107 is decoded into a bit image by the decoding section 7 and output as image (, T1 information 108. The character code 109 and image information 108 thus obtained are output by the display control section 8 to the CRT 9 and printer 10 as a CRT signal 110 and print signal 111.

第２図は、第１図における文字認識部２の処理概要の一
例を示したフローチャートである。FIG. 2 is a flowchart showing an example of the processing outline of the character recognition unit 2 in FIG. 1.

第２図を参照するに、処理を開始すると、まず文書の画
像情報から文字３切り出す処理８２ト行う。Referring to FIG. 2, when the process starts, first, a process 82 is performed to cut out three characters from the image information of the document.

文字の切り出しができない場合にはＳ７へ進み、現在処
理している情報領域を画像情報のまま取り込む、一方、
文字の切り出しができた場合には、Ｓ４に進んで文字の
特徴点を計算し、特徴点ともとにして辞書検索処理Ｓ５
を行って、辞書の中に該当する文字があるかどうかを判
断する。該当する文字がない場合には、Ｓ７で現在処理
している情報領域を画像情報のまま取り込み、該当文字
があった場合には該当する文字の文字コードを得る。If the characters cannot be cut out, the process advances to S7, and the information area currently being processed is imported as image information.
If the character has been successfully extracted, the process proceeds to S4, where the feature points of the character are calculated, and the feature points and the dictionary search process S5 are performed.
to determine whether the corresponding character exists in the dictionary. If there is no corresponding character, the information area currently being processed is taken in as image information in S7, and if there is a corresponding character, the character code of the corresponding character is obtained.

第３図は第１図における情報合成部４の処理結果の一例
を示す図である。FIG. 3 is a diagram showing an example of the processing result of the information synthesis section 4 in FIG. 1.

第３図において、文字コードＤ３と符号化画像情報Ｄ６
５ｒ文書取り出し時に識別できる様に、文字コードヘッ
ダＤ１、文字コード属性Ｄ２及び画像情報へｌりＤ４．
亘（象情報属性り５３１寸加１−でいる。In FIG. 3, character code D3 and encoded image information D6
5r The character code header D1, the character code attribute D2, and the image information D4.
Wataru (Elephant information attribute 531 size addition 1-).

発明の詳細な説明したように、本発明によれば、文言情報から記号
表現可能な情報領域を抽出してこれに該当する記号を得
る第１の手段と、記号表現不可能な情報領域を画像情報
として得る第２の手段と、これら第１、第２の手段によ
り得られた記号と画像情報を合成して蓄積する手段とを
有することにより、蓄積する情報量が少なくなり、記憶
部を効率的に使用できる効果が得られる。DETAILED DESCRIPTION OF THE INVENTION According to the present invention, there is provided a first means for extracting an information area that can be expressed symbolically from textual information and obtaining a symbol corresponding to the information area; By having a second means for obtaining information and a means for synthesizing and storing the symbols and image information obtained by these first and second means, the amount of information to be stored is reduced and the storage unit can be used more efficiently. It can be used effectively.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック構成図、第２
図は文字認識処理の一例を示すフローチャート、第３１
２１は文字コードと画像情報の自戒例と示す図である。１・・原稿読収部、２・・・文字認識部、３・・・符号
化部、４・・情報合成部、５・・・蓄積部、６・・・情
報分雛部、７・・復号化部、８・・・表示制御部、９・
・・ＣＲＴ、１０・・プリンタ、１０１・・・文書情報
、１０２・・・画１象情報、１０３　・・符号化画像＋
ｆｌ報、１０４・・・文字コード、１０５・・・合成＋
ｒｇ報、１０６　　・・蓄積情報、１０７・・符号（ヒ
画像情報、１０８・・・画像情報、１０９・・・文字コ
ード、１１０・・ＣＲＴ信号、１１１・・・プリント信
号Ｓ１・・開始、Ｓ２・・・文字切り比し処理、Ｓ３・
・・文字切り出し判別、Ｓ４・・・文字特徴点計算処理
、Ｓ５・・・辞書検索処理、Ｓ６・・・該当文字判別、
Ｓ７・・・画（象情報取り込み処理、Ｓ８・・・文字コ
ード決定処理、Ｓ９・・・終了Ｄ１・・・文字コードヘ
ッダ、Ｄ２・・・文字コード属性、Ｄ３・・・文字コー
ド、Ｄ４・・・画像情報ヘッダ、Ｄ５・・・画像情報属
性、Ｄ６・・・符号化画像情報特許出願人　　　日本電気株式会社代　理　人　　　弁理士　熊谷雄太部FIG. 1 is a block diagram showing one embodiment of the present invention, and FIG.
The figure is a flowchart showing an example of character recognition processing, No. 31.
21 is a diagram showing an example of character code and image information. 1... Manuscript reading unit, 2... Character recognition unit, 3... Encoding unit, 4... Information synthesis unit, 5... Storage unit, 6... Information division unit, 7... Decoding unit, 8...Display control unit, 9.
...CRT, 10...Printer, 101...Document information, 102...Image 1 image information, 103...Encoded image +
fl report, 104...Character code, 105...Synthesis+
rg information, 106...accumulation information, 107...code (hi image information, 108...image information, 109...character code, 110...CRT signal, 111...print signal S1...start, S2・・・Character cut ratio processing, S3・
...Character cutout discrimination, S4... Character feature point calculation processing, S5... Dictionary search processing, S6... Applicable character discrimination,
S7... Image (image information import processing, S8... Character code determination processing, S9... End D1... Character code header, D2... Character code attribute, D3... Character code, D4... ...Image information header, D5...Image information attribute, D6...Encoded image information Patent applicant NEC Co., Ltd. Agent Patent attorney Yutabe Kumagai

Claims

[Claims]

In a document storage device that stores document information, a first means extracts an information area that can be expressed symbolically from the document information and obtains a symbol corresponding to the information area; A document storage device characterized by having a second means for obtaining information, and a third means for combining and storing symbols and image information obtained by the first and second means.