JPH0683928A

JPH0683928A - Constitution system for compound information

Info

Publication number: JPH0683928A
Application number: JP23752492A
Authority: JP
Inventors: Hideaki Ozawa; 英昭小澤; Toru Nakagawa; 透中川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-09-07
Filing date: 1992-09-07
Publication date: 1994-03-25
Anticipated expiration: 2016-07-30
Also published as: JP3193472B2

Abstract

PURPOSE:To generate the composite information at low cost by the constitution system for information by combining information represented with character strings with information represented with images. CONSTITUTION:A trapezoidal shape identifying device 4 determines the area of a layout image on a mount image at the time of the generation of a newspaper from image information inputted from an image data input device 1. A text data structuring device 5, on the other hand, extracts position information from the information, represented with the character strings, which is inputted from a text data input device 2 and converts it into a rule of coordinate representation on the mount image to structure the information. A stored data generating device 6 generates the composite information by combining those pieces of information represented with the character strings and images. Consequently, the composite information can be generated at low cost without any human operation for relating information and segmentation of the image information.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキストデータや画像
データといった、複数の表現法を持つ複合的な情報もし
くはマルチメディア情報の処理システムにおいて、個々
別々の表現法によって表現された複数個の情報から、全
体を包含する複合的な情報を構築する方式に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a complex information or multimedia information processing system having a plurality of expressions such as text data and image data, and a plurality of information expressed by different expressions. Therefore, the present invention relates to a method of constructing complex information including the whole.

【０００２】[0002]

【従来の技術】従来、マルチメディアデータベースとよ
ばれる、文字列や音声，画像といった様々の種類の情報
からなる複合的な情報を扱うシステムでは、人間が直接
的に、検索のためのキーワードや必要な画像の選択など
を入力することが一般的であった。例えば新聞や雑誌の
記事を、情報検索のシステムを用いてキーワード検索
し、選択された記事をファクシミリによって配送するシ
ステムの場合、人間がスキャナから入力された画像から
各記事の位置を指示して切り出し、キーワードや日付な
どの情報を付加して、複合的な情報を構築していた。2. Description of the Related Art Conventionally, in a system called a multimedia database, which handles complex information including various kinds of information such as character strings, voices, and images, a person directly needs a keyword or a keyword for searching. It was common to input a selection of different images. For example, in the case of a system in which articles in newspapers and magazines are searched by keyword using an information search system and the selected articles are delivered by facsimile, a person specifies the position of each article from the image input from the scanner and cuts it out. , Information such as keywords and dates was added to build complex information.

【０００３】一方で単一のメディアによる電子化された
情報は、イメージデータであれば新聞社から印刷工場へ
新聞紙面の画像を電送されたりして、またテキストデー
タであれば新聞紙面製作の過程でＣＴＳ（Ｃｏｍｐｕｔ
ｅｒＴｙｐｅｄＳｅｔｔｉｎｇ）からの出力とし
て、本文や見出し，日付などの情報をすべて文字列とし
て扱う情報検索システムとして提供されている。On the other hand, if the information digitized by a single medium is image data, an image of a newspaper page is transmitted from a newspaper company to a printing plant, or if it is text data, a process of producing a newspaper page. At CTS (Comput
er Typed Setting), it is provided as an information retrieval system that handles all information such as text, headings, and dates as character strings.

【０００４】[0004]

【発明が解決しようとする課題】しかし、従来の複合的
な情報の生成手段では、画像情報や文字列で表現された
情報など情報の伝達媒体が異なる情報は、全く別の情報
として扱っていたために、人間が、画像情報と文字列に
よるキーワードなどとの対応づけや、文字列による情報
と関連する画像情報の切り出しという作業を行なってい
た。この結果、複合的な情報を構築する際のコストが、
高いという問題があった。However, in the conventional composite information generating means, information having different information transmission media such as image information or information expressed by a character string is treated as completely different information. In addition, human beings have performed the work of associating image information with keywords by character strings and cutting out image information related to information by character strings. As a result, the cost of building complex information is
There was a problem of being expensive.

【０００５】本発明は、上記問題点を解決するためにな
されたものであり、その目的は、文字列で表現された情
報と画像で表現された情報を組み合わせて複合的な情報
を低コストで作成可能とする複合的な情報の構築方式を
提供することにある。The present invention has been made to solve the above problems, and an object thereof is to combine information represented by a character string and information represented by an image to combine complex information at low cost. It is to provide a complex information construction method that can be created.

【０００６】[0006]

【課題を解決するための手段】上記の目的を達成するた
め、請求項１の発明においては、少なくとも一つ以上の
ある文字列を用いて表現される情報と、該文字列を用い
て表現された情報と同一の内容を画像によって表現した
画像情報である少なくとも一つ以上の要素画像から構成
された画像情報である少なくとも一つ以上のレイアウト
画像を持つ台紙画像の画像情報と、個々の前記文字列で
表現された情報が予め定められた規則に従って前記レイ
アウト画像上のどの位置に存在するかを示す位置情報と
の、３種の情報がそれぞれ入力された時に、前記台紙画
像上において個々の前記レイアウト画像の領域を決定す
る手段と、特に前記入力された位置情報を前記レイアウ
ト画像上での規則に従った位置情報に変換する手段と、
該レイアウト画像上の位置情報を前記台紙画像上でのレ
イアウト画像の存在する領域の情報によって該台紙画像
上での規則に従った位置情報に変換する手段を有し、前
記画像情報と前記文字列で表現された情報を共に関係づ
けて扱えるように構造化する構成としている。In order to achieve the above object, in the invention of claim 1, the information expressed by using at least one character string and the information expressed by using the character string are used. The image information of the mount image having at least one or more layout images, which is the image information composed of at least one or more element images, which is the image information expressing the same content as the information described above, and the individual characters. When three types of information, that is, position information indicating at which position on the layout image the information expressed in a column exists according to a predetermined rule are input, the individual information on the mount image is displayed. Means for determining a region of the layout image, and particularly means for converting the input position information into position information according to a rule on the layout image,
A unit for converting position information on the layout image into position information in accordance with a rule on the mount image according to information on a region where the layout image exists on the mount image; The information expressed by is structured so that it can be handled in relation to each other.

【０００７】また、請求項２の発明においては、入力さ
れた要素画像および一つ以上の要素画像からなるレイア
ウト画像が段や行による位置の規則性を持つ画像情報で
あり、かつ文字列で表現された情報と共に入力される位
置情報が該レイアウト画像上での段や行の規則によって
表現されている場合に、該レイアウト画像上での位置と
該文字列で表現された情報と共に入力された位置情報を
対応づけるために、該レイアウト画像の領域を決定する
手段が、特に該レイアウト画像上における黒点の分布か
ら一段の高さを同定する手段と、前記レイアウト画像上
の各段における黒点の分布から行間隔を同定する手段を
有し、前記文字列として表現された情報が持つ段と行で
表される位置情報と前記レイアウト画像上での位置を対
応づけることで台紙画像上での位置の規則に従って該文
字列で表現された情報と要素画像とを対応づける構成と
している。According to the second aspect of the invention, the input element image and the layout image composed of one or more element images are image information having regularity of positions by columns and rows, and are expressed by a character string. If the position information input together with the input information is expressed by the rule of columns or rows on the layout image, the position input on the layout image and the position input together with the information expressed by the character string In order to correlate the information, the means for determining the area of the layout image, in particular, means for identifying the height of one step from the distribution of black dots on the layout image and the distribution of black dots in each step on the layout image A unit is provided which has means for identifying the line spacing, and correlates the position information represented by the row and line that the information represented as the character string has with the position on the layout image. According to the rules of the position on the image is set to associating configure the information and elements image represented by the character string.

【０００８】また、請求項３の発明においては、特に情
報の本体を表現する領域と付加的な情報の領域が存在す
る場合に、レイアウト画像の領域を決定する手段が、特
に前記情報の本体を表現する領域と前記付加的な情報の
領域とのレイアウト画像情報上での境界線を判定する手
段と、該判定された境界線の位置を用いて台紙画像情報
から前記情報の本体を表現する領域の画像情報を切り出
す手段とを有し、さらに該情報の本体のみの画像情報か
ら段，行の構造を抽出して前記レイアウト画像情報全体
における段や行の位置を判定する構成としている。According to the third aspect of the present invention, the means for determining the area of the layout image particularly when the area for expressing the main body of information and the area for additional information exist, particularly the main body of the information, Means for determining a boundary line on the layout image information between the area to be expressed and the additional information area, and an area for expressing the main body of the information from the mount image information using the position of the determined boundary line. Of the image information, and the structure of the row and the row is extracted from the image information of only the main body of the information, and the position of the row and the row in the entire layout image information is determined.

【０００９】さらに、請求項４の発明においては、入力
したレイアウト画像が傾いている際に該レイアウト画像
の外枠を検出する手段と、前記レイアウト画像をあらか
じめ定められた基準にしたがって主要な情報と付加的な
情報とに分離する境界線の存在する位置を識別する手段
と、外枠情報と前記主要な情報と前記付加的な情報との
境界線の位置情報から該主要な情報を表現する画像の領
域の中心点を計算する手段とを設け、前記主要な情報を
持つ画像の中心点を回転の中心として台紙画像を回転さ
せた後の画像情報が入力される構成としている。Further, in the invention of claim 4, means for detecting the outer frame of the layout image when the input layout image is tilted, and the main information of the layout image in accordance with a predetermined standard. A means for identifying a position where a boundary line separating the additional information exists, and an image expressing the main information from the position information of the boundary line between the outer frame information, the main information, and the additional information. And means for calculating the center point of the area, and the image information after the mount image is rotated with the center point of the image having the main information as the center of rotation is input.

【００１０】[0010]

【作用】本発明による請求項１の発明では、入力された
画像そのものである台紙画像と、台紙画像上で情報を表
現する領域であるレイアウト画像と、レイアウト画像中
の個々の情報を表す要素画像からなる画像情報に対し
て、台紙画像上におけるレイアウト画像の領域を決定
し、一方、文字列で表現された個々の情報に付けられて
いる要素画像の位置をレイアウト画像上での座標表現と
して入力し、レイアウト画像上での座標表現を、台紙画
像上での座標表現の規則に変換することで、文字列で表
現された情報と画像で表現された情報を組み合わせて、
複合的な情報を作成できるようにしている。これによ
り、人間が直接的に行っていた画像情報と文字列による
キーワードなどとの対応づけや、文字列による情報と関
連した画像情報の切り出すといった作業をなくし、総合
的な情報構築のコストの低減を可能にしている。According to the first aspect of the present invention, the mount image which is the input image itself, the layout image which is an area expressing information on the mount image, and the element images which represent individual information in the layout image For the image information consisting of, the area of the layout image on the mount image is determined, while the position of the element image attached to each information represented by the character string is input as the coordinate expression on the layout image. Then, by converting the coordinate expression on the layout image into the rule of the coordinate expression on the mount image, the information expressed by the character string and the information expressed by the image are combined,
I am able to create complex information. This eliminates the work of directly associating image information with keywords using character strings and cutting out image information related to information using character strings, reducing the overall cost of information construction. Is possible.

【００１１】例えば、新聞紙面や雑誌面などといったハ
ードコピーの情報や新聞社が印刷工場に配布するディジ
タルファクシミリのイメージ情報から、紙面の画像情報
と日付，ページ番号の情報を共に獲得し、例えば新聞記
事の本文のように画像で表現された情報の中に含まれる
情報と同一の情報を持つ文字列で表現された情報を入力
し、文字列で表現された情報の日付やページ番号，画像
情報上での縦，横方向における割合などによる位置の情
報を入力し、画像上に含まれる各情報の位置を、入力さ
れた文字列との対応をとる位置情報の表現に変換し、文
字列による情報から生成された情報と画像から生成され
た情報を組み合わせることにより、画像で表現された情
報と文字列で表現された情報を対にして利用できる情報
に変換できるようにする。For example, both the image information on the page and the information on the date and page number are obtained from the hard copy information such as newspaper page and magazine page, and the digital facsimile image information distributed by the newspaper company to the printing plant. Enter the information represented by a character string that has the same information contained in the information represented by an image, such as the text of an article, and enter the date, page number, and image information of the information represented by the character string. Input the position information based on the ratio in the vertical and horizontal directions above, convert the position of each information included in the image to the representation of the position information that corresponds to the input character string, and use the character string. By combining the information generated from the information and the information generated from the image, the information expressed by the image and the information expressed by the character string can be converted into usable information in pairs. To.

【００１２】画像情報としての新聞紙面や雑誌面，本な
どは、従来段や行によって規則的に情報が配置されてい
る。請求項２の発明は、段と行によって構造化された画
像に対し、段や行で表現されるある情報の位置を示す情
報が入力された場合、画像の情報を行間や段間の空白に
よって段の高さ，行の幅の候補を抽出し、得られた候補
の中から平均的な段の高さや行の幅を計算し、計算され
た段の高さや行の幅から画像情報を生成した際の台紙の
形状を容易に同定できるようにする。On a newspaper page, a magazine page, a book, etc. as image information, the information is regularly arranged by columns or rows. According to the second aspect of the present invention, when information indicating the position of certain information expressed by columns or rows is input to an image structured by columns and rows, the information of the image is displayed by the spaces between the rows or the columns. Candidates for column height and row width are extracted, average column height and row width are calculated from the obtained candidates, and image information is generated from the calculated column height and row width. Make it easy to identify the shape of the mount when doing.

【００１３】例えば新聞紙面においては、記事の部分の
ような情報の本体と記事下の広告の部分のような付加的
な部分の２種の画像情報が結合されて、一つの画像情報
として提供されている。このうち記事部の情報は、台紙
上の段と行で表現される位置に配置されているが、記事
下広告は、全く異なった配置方式をとっている。この結
果、記事下広告によって、台紙の段幅，行幅の同定に誤
りを生じる可能性がある。請求項３の発明は、新聞画像
等の領域の横幅いっぱいに引かれている記事部と記事下
広告の境界線を判定し、判定された記事部などの情報の
本体の領域を画像情報から切り出すことで、記事部など
の情報の本体の画像のみから台紙の段，行を推定して、
台紙の段幅，行幅を正確に同定する。For example, on a newspaper page, two types of image information, that is, a body of information such as an article portion and an additional portion such as an advertisement portion under the article are combined and provided as one image information. ing. Of these, the information of the article section is arranged at the positions represented by columns and rows on the mount, but the advertisement under the article has a completely different arrangement method. As a result, the under-article advertisement may cause an error in identifying the step width and line width of the mount. According to the third aspect of the present invention, the boundary line between the article part and the under-article advertisement drawn in the full width of the area of the newspaper image or the like is determined, and the determined body area of the information such as the article part is cut out from the image information. By estimating the line and line of the mount from only the image of the main body of the information such as the article part,
Accurately identify the step width and line width of the mount.

【００１４】画像による情報を入力する装置として、イ
メージスキャナやＣＣＤカメラなどによる画像入力装置
を利用する場合に、画像が傾いて入力される可能性があ
る。請求項４の発明は、画像によって表現される情報を
囲む外枠を識別し、画像の外枠の縦罫線，横罫線の傾き
から画像全体の傾きを計算し、例えば新聞記事の記事の
領域と記事下広告の領域の境界線を識別して、境界線の
座標から記事の領域の中心を計算し、記事の中心を中心
点として画像の傾き分を回転することにより、記事部に
ついてひずみの少ない補正画像を得る。When an image input device such as an image scanner or a CCD camera is used as a device for inputting information based on an image, the image may be inclined and input. According to the invention of claim 4, the outer frame surrounding the information represented by the image is identified, and the inclination of the entire image is calculated from the inclinations of the vertical ruled lines and the horizontal ruled lines of the outer frame of the image. By identifying the boundary of the area under the article advertisement, calculating the center of the area of the article from the coordinates of the boundary, and rotating the tilt of the image with the center of the article as the center point, there is less distortion in the article part. Obtain a corrected image.

【００１５】以上、本発明によれば、台紙画像上でのレ
イアウト画像の位置が正確に決定でき、レイアウト画像
上の座標位置を入力された位置情報から変換することが
できる。この結果、本発明は、入力された位置を決める
規則と、画像情報上での位置の規則が異なっていても、
両者を関連づけて扱うことができる。As described above, according to the present invention, the position of the layout image on the mount image can be accurately determined, and the coordinate position on the layout image can be converted from the input position information. As a result, according to the present invention, even if the rule for determining the input position is different from the rule for the position on the image information,
Both can be handled in association with each other.

【００１６】[0016]

【実施例】以下、本発明の実施例を、図面を用いて詳細
に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１７】〔実施例１〕図１は本発明の第１実施例を
示す複合的な情報の構築方式のシステム構成図である。
本発明は主に文字で表現された情報をイメージとして扱
う情報と、文字列によって表現される情報とを組合わせ
て利用する複合的な情報の検索システム全般に対して効
果があるが、ここでは情報として新聞を用いて説明す
る。新聞は、見出しや本文といった主に文字列を扱う情
報であり、日常は紙に印刷された形で、イメージの情報
として取り扱われている。[Embodiment 1] FIG. 1 is a system configuration diagram of a composite information construction system showing a first embodiment of the present invention.
INDUSTRIAL APPLICABILITY The present invention is mainly effective for a general information retrieval system that uses a combination of information that treats information expressed as characters as an image and information that is expressed as a character string. Explain using a newspaper as information. Newspapers are information that mainly deals with character strings, such as headings and texts, and are usually handled as image information printed on paper.

【００１８】本実施例は、イメージで表現された情報を
入力する少なくとも一つ以上のイメージデータ入力装置
１と、テキストとして表現された情報を入力する少なく
とも一つ以上のテキストデータ入力装置２と、イメージ
データ入力装置１によって入力されたイメージに対して
画像の傾きの補正といったイメージデータの整形処理を
行なうイメージデータ前処理装置３と、その整形された
イメージから新聞を作成する際の台紙の行，段の位置を
同定する台紙形状同定装置４と、テキストデータ入力装
置２によって入力された文字列の情報から日付，ペー
ジ，段行の位置などの情報を抽出して構造化するテキス
トデータ構造化装置５と、その構造化された記事の情報
と紙面画像を関連づけてデータベースに格納する処理を
行なう格納データ生成装置６と、生成された複合的な情
報を格納する複合情報格納装置７の７つの装置からな
る。In this embodiment, at least one or more image data input devices 1 for inputting information expressed as images, and at least one or more text data input devices 2 for inputting information expressed as texts are provided. An image data pre-processing device 3 that performs image data shaping processing such as correction of image inclination on an image input by the image data input device 1, and a line of a mount when a newspaper is created from the shaped image. A mount shape identifying device 4 for identifying the position of a column, and a text data structuring device for extracting and structuring information such as date, page, and position of a line from information of a character string input by the text data input device 2. 5 and stored data raw for performing processing for associating the structured article information and the paper image with each other and storing them in the database. A device 6, the generated storing complex information of seven devices of the composite information storage device 7.

【００１９】例えば本実施例における新聞画像の場合、
図７（ｂ）に示すように、請求項に記載した台紙画像Ｇ
₃はイメージデータ入力装置１によって入力された画像
そのものであり、レイアウト画像Ｇ₂は新聞名や日付な
どを除いた記事全体の画像であり、要素画像Ｇ₁は個々
の記事の画像である。For example, in the case of the newspaper image in this embodiment,
As shown in FIG. 7B, the mount image G described in the claims.
Reference numeral ₃ is the image itself input by the image data input device 1, layout image G ₂ is the image of the entire article excluding the newspaper name and date, and element image G ₁ is the image of the individual article.

【００２０】図２は上記構成におけるイメージデータ前
処理装置３のシステム構成図である。入力したイメージ
データはやや傾いている可能性があるため、図２に示す
ように本実施例におけるイメージデータ前処理装置３
は、新聞画像の外枠の情報から傾きの大きさを決定し傾
きを補正する傾き補正機構１１と、ノンブルと呼ばれる
外枠外の新聞名や日付といった領域を取り除きレイアウ
ト画像を抽出するノンブル除去機構１２から成る１つの
サブシステムとする。FIG. 2 is a system configuration diagram of the image data preprocessing device 3 in the above configuration. Since the input image data may be slightly inclined, the image data preprocessing device 3 according to the present embodiment as shown in FIG.
Is a tilt correction mechanism 11 that determines the size of the tilt from the information of the outer frame of the newspaper image and corrects the tilt, and a decipher removal mechanism 12 that removes regions such as newspaper names and dates outside the outer frame called the page number and extracts the layout image. One subsystem consisting of

【００２１】本実施例における上記の傾き補正機構１１
は、入力されたイメージデータを格納するイメージバッ
ファ１３と、イメージバッファ１３の画像の一ライン毎
に黒点を計数する黒点計数モジュール１４と、全てのラ
イン上の黒点数を格納する計数バッファ１５と、計数バ
ッファ１５のデータから画像イメージの傾きを計算する
傾き計算モジュール１６と、求められた傾きからイメー
ジバッファ１３のイメージデータを回転するイメージ補
正モジュール１７とから成る。The above tilt correction mechanism 11 in the present embodiment.
Is an image buffer 13 that stores the input image data, a black spot counting module 14 that counts black spots for each line of the image in the image buffer 13, and a counting buffer 15 that stores the number of black spots on all the lines. An inclination calculation module 16 for calculating the inclination of the image image from the data of the counting buffer 15 and an image correction module 17 for rotating the image data of the image buffer 13 from the obtained inclination.

【００２２】本実施例における傾き補正機構１１の処理
としては、図３のフローチャートに示す手順に従って行
なわれる。まず、イメージデータ入力装置１によって入
力されたイメージデータはイメージバッファ１３に格納
され、そのイメージバッファ１３からステップ３１とし
てポインタのセットにより新聞画像下部から上部に向か
い横方向に一ライン毎にデータを取り出し、ステップ３
２として黒点計数モジュール１４において、一ライン中
の少なくとも２点以上を中心にして予め定められた範囲
に対して該当ライン上での黒点と判断される点の個数を
計測し、その黒点の数を計測バッファ１５に順次格納す
る。全てのライン上の黒点の数が計数バッファ１５に溜
ったら、傾き計算モジュール１６において、ステップ３
３として計数バッファ１５中の一つの候補点の画像の最
下段の黒点の計数値から画像の上部に向かって計数値を
取り出すために、ポインタをセットする。ステップ３４
として予め与えられているライン候補黒点数と一ライン
づつ取り出した黒点の計測を行なった値を比較し、始め
てライン候補黒点数を越える計数値が越えたラインの行
番号を候補ラインとして、傾き計算モジュール１６内の
スタートライン候補バッファに格納する。The processing of the tilt correction mechanism 11 in this embodiment is performed according to the procedure shown in the flowchart of FIG. First, the image data input by the image data input device 1 is stored in the image buffer 13, and as a step 31 from the image buffer 13, data is taken out line by line in the horizontal direction from the bottom to the top of the newspaper image by setting the pointer. , Step 3
2, the black spot counting module 14 measures the number of points determined to be black spots on the line within a predetermined range centered on at least two or more points in one line, and calculates the number of black spots. The measurement buffer 15 sequentially stores them. When the number of black dots on all the lines is accumulated in the counting buffer 15, the slope calculation module 16 executes step 3
As 3, the pointer is set in order to extract the count value from the count value of the black dot at the bottom of the image of one candidate point in the count buffer 15 toward the upper part of the image. Step 34
As a candidate line, the line number of the line that exceeds the line candidate sunspot count for the first time is used as a candidate line and the slope is calculated. It is stored in the start line candidate buffer in the module 16.

【００２３】次にステップ３５として、さらに続けて計
数バッファ１５内の黒点の計数とライン候補黒点数との
比較を行ない、ライン候補黒点数よりも小さくなったラ
インを傾き計算モジュール１６内のエンドライン候補バ
ッファに格納する。次にステップ３６として、エンドラ
イン候補バッファの値とスタートライン候補バッファの
値との差を計算し、予め与えられている基準枠線幅と比
較する。差の値が基準枠線幅の範囲内にあれば、ステッ
プ３７としてエンドライン候補バッファの値とスタート
ライン候補バッファの値の平均（中間の値）を、傾き計
算モジュール１６中の各候補点毎の候補ラインバッファ
に格納する。差の値が基準線幅の範囲から外れている場
合は、ステップ３４に戻りスタートライン候補を調べ
る。次にステップ３８として、全ての候補点の処理が終
っていなければ、ステップ３３へ戻り、他の候補点にポ
インタを合わせる。Next, in step 35, the number of black spots in the counting buffer 15 is further compared with the number of line candidate black spots, and the line smaller than the number of line candidate black spots is determined as an end line in the slope calculation module 16. Store in the candidate buffer. Next, at step 36, the difference between the value of the end line candidate buffer and the value of the start line candidate buffer is calculated and compared with a reference frame line width given in advance. If the difference value is within the range of the reference frame line width, in step 37, the average (intermediate value) of the values of the end line candidate buffer and the start line candidate buffer is calculated for each candidate point in the slope calculation module 16. Stored in the candidate line buffer of. If the difference value is outside the range of the reference line width, the process returns to step 34 and the start line candidate is examined. Next, in step 38, if the processing of all the candidate points is not completed, the process returns to step 33, and the pointer is set to another candidate point.

【００２４】全候補点について候補ラインが定まった
ら、ステップ３９として各候補点間の距離と候補ライン
間の距離から、傾き角度を計算する。傾き角度は例えば
２つの候補点を用いた場合、新聞画像イメージが一ライ
ンあたり３０００ドットである時、例えば候補点を１０
００ドットと２０００ドットとして、その時の候補ライ
ンがそれぞれ３９１０ラインと３９１７ラインであると
すると、ラインの差が７、候補点間の間隔が１０００ド
ットであるから、ｔａｎ^-1（７／１０００）＝０．４゜
が傾きの大きさとなる。When the candidate lines have been determined for all the candidate points, in step 39, the tilt angle is calculated from the distance between each candidate point and the distance between the candidate lines. For example, when two candidate points are used as the tilt angle, when the newspaper image has 3000 dots per line, for example, 10 candidate points are selected.
Assuming that there are 00 dots and 2000 dots, and the candidate lines at that time are 3910 lines and 3917 lines, respectively, the difference between the lines is 7, and the interval between the candidate points is 1000 dots, so tan ⁻¹ (7/1000) = The degree of inclination is 0.4 °.

【００２５】ステップ４０では、計算された傾きの大き
さを用いて、イメージ補正モジュール１７においてイメ
ージバッファ１３のデータを回転する。例えば候補ライ
ンにおける傾きが時計方向に対して０．４゜であった場
合、左下角の点を中心に回転するとすれば、第１象限に
おける回転のため０．４゜回転すれば良いが、一般的に
画像データは左上角を原点として画像の下方向へ正の値
をとるために、左上角を原点とすると逆に台紙画像に対
して−０．４゜の回転を行なう必要がある。回転のアル
ゴリズムは、例えば画像の横方向の座標をｘとし、縦方
向の座標のｙとし、計算された横方向の座標をＵとし
て、計算された縦方向の座標をＶとするならば、Ｕ＝ｘｃｏｓ（−０．４゜）−ｙｓｉｎ（−０．４
゜），Ｖ＝ｘｓｉｎ（−０．４゜）＋ｙｃｏｓ（−０．４゜）によって新しい座標を計算することができる。In step 40, the data in the image buffer 13 is rotated in the image correction module 17 using the calculated tilt magnitude. For example, when the inclination of the candidate line is 0.4 ° with respect to the clockwise direction, if it is supposed to rotate around the point at the lower left corner, it is sufficient to rotate 0.4 ° because of the rotation in the first quadrant. Since the image data takes a positive value in the downward direction of the image with the upper left corner as the origin, it is necessary to rotate the mount image by -0.4 ° with the upper left corner as the origin. For example, if the horizontal coordinate of the image is x, the vertical coordinate is y, the calculated horizontal coordinate is U, and the calculated vertical coordinate is V, the rotation algorithm is = Xcos (-0.4 °) -ysin (-0.4
.Degree.), V = xsin (-0.4.degree.) + Ycos (-0.4.degree.) To calculate a new coordinate.

【００２６】このようにして、傾きが補正された画像デ
ータは、ノンブル除去機構１２に送られ、記事部と枠線
外領域に分離される。ノンブル除去機構１２は、傾き補
正された画像を格納するイメージバッファ１８と、枠線
を推定するために画像上の縦，横それぞれのライン毎に
黒点と判断される点の個数を計測する枠線計測モジュー
ル１９と、各ライン毎の黒点の個数を保存する枠線バッ
ファ２０と、枠線バッファ２０内の黒点の個数の値か
ら、記事部の４端点を決定する端点決定モジュール２１
と、端点の情報によりイメージバッファ１８の画像から
記事部の画像のみを切り出すイメージ抽出モジュール２
２から成る。The image data whose inclination has been corrected in this way is sent to the page number removing mechanism 12 and separated into an article part and a region outside the frame. The defibrillation mechanism 12 includes an image buffer 18 that stores an image whose inclination has been corrected, and a frame line that measures the number of points determined to be black dots for each vertical and horizontal line on the image in order to estimate the frame line. A measurement module 19, a frame line buffer 20 that stores the number of black points for each line, and an end point determination module 21 that determines the four end points of the article part from the value of the number of black points in the frame line buffer 20.
And an image extraction module 2 that cuts out only the image of the article part from the image of the image buffer 18 based on the end point information.
It consists of two.

【００２７】本実施例におけるノンブル除去機構１２の
処理としては、図４（ａ），（ｂ）のフローチャートに
示す手順によって行なわれる。まず図４（ａ）に示すよ
うに、イメージバッファ１８に蓄えられている傾き補正
後のイメージデータを用いて、枠線計測モジュール１９
では、ステップ４１として、ポインタを新聞画像の上端
と下端にセットして新聞画像下部と上部からそれぞれ横
方向に一ラインずつデータを取り出し、ステップ４２と
して、黒点と判断される点の個数を計測し、枠線バッフ
ァ２０の「上ラインバッファ」，「下ラインバッファ」
のそれぞれ該当する所へ値を格納する。次にステップ４
３としては、ポインタをして画像の左端と右端にセット
して縦方向に一ラインずつデータを取り出し、ステップ
４４として黒点と判断される点の個数を計測し、枠線バ
ッファ２０の「左ラインバッファ」，「右ラインバッフ
ァ」のそれぞれ該当する所へ値を格納する。As the processing of the number removing mechanism 12 in this embodiment, the procedure shown in the flowcharts of FIGS. 4 (a) and 4 (b) is performed. First, as shown in FIG. 4A, the frame line measuring module 19 is used by using the image data after the tilt correction stored in the image buffer 18.
Then, in step 41, the pointers are set to the upper and lower ends of the newspaper image, and the data is taken out one line each in the horizontal direction from the lower and upper parts of the newspaper image. In step 42, the number of points determined to be black dots is measured. , "Upper line buffer" and "lower line buffer" of the frame line buffer 20
The value is stored in each corresponding place of. Next step 4
In step 3, the pointer is set to the left and right edges of the image to extract data line by line in the vertical direction, and the number of points determined to be black dots is measured in step 44, and the "left line" of the frame line buffer 20 is measured. Store the value in the corresponding place of "buffer" and "right line buffer".

【００２８】次にステップ４５として、端点決定モジュ
ール２１において、枠線バッファ２０中の左枠線バッフ
ァにポインタを合わせ、ステップ４６として図４（ｂ）
に示す後記の枠線の判定の処理により左端を計算し、イ
メージ抽出モジュール２２の左点バッファに格納する。
同様の操作をステップ４７〜４８として、「右ラインバ
ッファ」の値を用いて右点を、ステップ４９〜５０とし
て「上ラインバッファ」の値を用いて上点を、ステップ
５１〜５２として「下ラインバッファ」の値を用いて下
点を求める。Next, at step 45, the end point determination module 21 sets the pointer to the left frame line buffer in the frame line buffer 20, and as step 46, FIG.
The left end is calculated by the processing of the frame line determination described later in (4) and stored in the left point buffer of the image extraction module 22.
Similar operations are performed in steps 47 to 48, using the value of the "right line buffer" to set the right point, in steps 49 to 50 using the value of the "upper line buffer" to set the upper point, and in steps 51 to 52, "down." The lower point is calculated using the value of "line buffer".

【００２９】次にイメージ抽出モジュール２２では、ス
テップ５３として、上記上下左右４点の値を用いて記事
部のイメージ（レイアウト画像）のみを抽出し、台紙形
状同定装置４へ送る。Next, in step 53, the image extraction module 22 extracts only the image (layout image) of the article part using the values of the four points above and below and to the right and sends it to the mount shape identifying device 4.

【００３０】枠線計測モジュール１９における枠線判定
の処理としては、図４（ｂ）のフローチャートに示すよ
うに、処理を始める際に縦枠線閾値か横枠線閾値を用い
るかと、上下左右のどの端点を求めるかを指定し、そし
てステップ５４として枠線バッファ２０のラインバッフ
ァから順次黒点の数を取り出し、次にステップ５５とし
て予め定められている枠線閾値と黒点の数の値を比較
し、最初に閾値を越えた時にステップ５６として枠線フ
ラグを１にする。次にステップ５７として、枠線閾値よ
りも小さくかつ枠線フラグが１であるか否かを判定し、
真（Ｙｅｓ）である場合には、ステップ５８として現在
のラインを記事部の端のラインであるとする。為（Ｎ
ｏ）の場合には、更にステップ５４へ戻り、順番に値を
取り出して処理する。As shown in the flowchart of FIG. 4B, the frame line determination module 19 determines whether to use a vertical frame line threshold value or a horizontal frame line threshold value to determine whether to use the vertical frame line threshold value or the horizontal frame line threshold value. Which end point is to be determined is specified, and in step 54, the number of black dots is sequentially taken out from the line buffer of the frame line buffer 20, and then in step 55, the predetermined frame line threshold value and the number of black dots are compared. When the threshold value is exceeded for the first time, the frame line flag is set to 1 in step 56. Next, at step 57, it is determined whether the frame line threshold is smaller than 1 and the frame line flag is 1,
If it is true (Yes), the current line is determined to be the end line of the article in step 58. Therefore (N
In the case of o), the process further returns to step 54, and the values are sequentially taken out and processed.

【００３１】図５に本実施例における台紙形状同定装置
４のシステム構成を示す。台紙形状同定装置４は、記事
部だけになったイメージデータを格納するイメージバッ
ファ６１と、台紙の段の幅を推測するためにイメージデ
ータを横方向に分割する画像分割モジュール６２と、分
割された画像を縦方向に各ライン毎に黒点の数を数えて
段の幅を決定する段幅推定モジュール６３と、イメージ
データを段幅推定モジュール６３で決定した段の幅に切
る段画像分割モジュール６４と、段画像を横方向に走査
して行の幅を決定する行幅推定モジュール６５と、決定
された段幅と行幅とイメージデータを組み合わせて図１
に示す複合情報格納装置７へのデータを生成するイメー
ジデータ構築モジュール６６からなる。FIG. 5 shows the system configuration of the mount shape identifying device 4 in this embodiment. The mount shape identifying device 4 is divided into an image buffer 61 for storing image data of only the article part, an image dividing module 62 for horizontally dividing the image data in order to estimate the width of the step of the mount, and is divided. A step width estimation module 63 that determines the step width by counting the number of black dots for each line in the vertical direction, and a step image division module 64 that cuts the image data into the step width determined by the step width estimation module 63. 1, a line width estimation module 65 that horizontally scans a step image to determine a line width, and a combination of the determined step width, line width, and image data.
The image data construction module 66 for generating data for the composite information storage device 7 shown in FIG.

【００３２】本実施例における台紙形状同定装置４の処
理過程は、図６のフローチャートに示すような手順によ
って行われる。まず、記事部のみになったイメージデー
タを格納するイメージバッファ６１のデータに対し、ス
テップ７１として画像分割モジュール６２において、予
め定められた幅の新聞画像を縦方向に分割して縦長に切
り出し、黒点フラグに−１を代入する。切り出された画
像データは段幅推定モジュール６３に送られ、ステップ
７２として各ラインの黒点の個数を計測し、予め定めら
れている閾値と黒点の個数を比較して、閾値よりも大き
くかつ黒点フラグが−１の場合には、ステップ７３とし
て黒点フラグにライン番号を格納（代入）する。閾値よ
りも小さくかつ黒点フラグにライン番号が存在する場合
には、ステップ７４として現在のライン番号と黒点フラ
グ中の番号の差を段幅の値として、段幅推定モジュール
６３内の段幅バッファの該当する段幅のスロットの値に
１を加える。さらにステップ７５として、黒点フラグに
−１を代入する。全てのブロックについて段幅推定が終
ったら、ステップ７６として最も値の大きいスロットの
段幅（最大頻度段幅）から、予め定められている段幅誤
差範囲の値によって、最大頻度段幅から誤差範囲にある
段幅を加重平均することで、段幅候補値を決定する。さ
らにレイアウト画像の高さを段幅候補値で除す。新聞画
像のような段行で構造化される情報においては段数は必
ず整数値になり、また各段間には空白帯や罫線などが入
るために段幅候補値は実際の段幅よりも小さいので、小
数点以下を切り捨てて整数部のみを取り出して段数とす
る。続いてレイアウト画像の高さを段数で除し、小数点
以下を四捨五入することで整数部を取り出し、これを段
幅とする。例えば４００ｄｐｉの解像度で入力した新聞
のあるイメージデータの場合では、予め与えられている
段幅誤差範囲が１０ドットである時に、最大頻度段幅は
４９０で加重平均値が４９１ドットであったとすると、
段幅候補値は４９１ドットになる。段幅を同定するレイ
アウト画像の高さが、例えば７７６３ドットであったと
すると、段数は７７６３／４９１＝１５段と同定され、
同定された段幅は７７６３／１５＝５１７ドットとな
る。The processing steps of the mount shape identifying device 4 in this embodiment are carried out according to the procedure shown in the flowchart of FIG. First, with respect to the data of the image buffer 61 which stores the image data only in the article part, in step 71, the image dividing module 62 divides a newspaper image of a predetermined width in the vertical direction and cuts it out vertically to obtain black dots. Substitute -1 for the flag. The cut-out image data is sent to the step width estimation module 63, and in step 72, the number of black dots on each line is measured, and the number of black dots is compared with a predetermined threshold value, and the black dot flag is larger than the threshold value. If -1 is -1, the line number is stored (substituted) in the black dot flag in step 73. If the line number is smaller than the threshold and the black dot flag has a line number, then in step 74, the difference between the current line number and the number in the black dot flag is used as the step width value, and the step width buffer in the step width estimation module 63 is set. Add 1 to the value of the slot of the corresponding step width. Further, in step 75, -1 is substituted for the black spot flag. When the step width estimation is completed for all blocks, the step width error range from the maximum frequency step width is determined according to a predetermined step width error range value from the step width (maximum frequency step width) of the slot having the largest value in step 76. A step width candidate value is determined by performing a weighted average of the step widths in. Further, the height of the layout image is divided by the step width candidate value. In information structured by columns such as newspaper images, the number of columns is always an integer value, and blank bands and ruled lines are inserted between each column, so the column width candidate value is smaller than the actual column width. Therefore, the number after the decimal point is rounded down and only the integer part is extracted as the number of stages. Then, the height of the layout image is divided by the number of steps, and the whole number is rounded off to obtain the integer part, which is taken as the step width. For example, in the case of image data of a newspaper input at a resolution of 400 dpi, if the given step width error range is 10 dots, and the maximum frequency step width is 490 and the weighted average value is 491 dots,
The step width candidate value is 491 dots. If the height of the layout image for identifying the step width is, for example, 7763 dots, the number of steps is identified as 7763/491 = 15 steps,
The identified step width is 7763/15 = 517 dots.

【００３３】次に段幅推定モジュール６３によって計算
された段幅に基づき、段画像分割モジュール６４では、
ステップ７７としてイメージバッファ６１中の画像デー
タを、同定された段幅に従って横長の画像に分割する。
さらに行幅推定モジュール６５において、ステップ７８
として、黒点フラグに０を、候補ラインに０を代入して
おく。次にステップ７９として縦方向に一ラインづつ取
り出して、各ラインの黒点の個数を計測し、あらかじめ
定められている閾値と黒点の個数を比較して、閾値より
も大きい場合には、ステップ８０として黒点フラグに１
を格納する。閾値よりも小さくかつ黒点フラグ１の場合
には、ステップ８１として現在のライン番号と候補ライ
ン中の番号の差を行幅として、行幅推定モジュール６５
内の行幅バッファの該当する行幅のスロットの値に１を
加える。ステップ８２として、黒点フラグを０にし、候
補ラインの現在のライン番号を代入する。全てのブロッ
クについて行幅推定が終ったら、ステップ８３として最
も値の大きいスロットの行幅（最大頻度行幅）から、予
め定められている行幅誤差範囲によって、最大頻度行幅
から誤差範囲にある行幅の頻度を、加重平均して行幅を
決定する。例えば４００ｄｐｉの解像度で入力した新聞
のイメージデータの場合では、予め与えられている行幅
誤差範囲が５ドットである時に、最大頻度行幅が６２ド
ットで加重平均値が６２ドットであったとすると、同定
された行幅は６２ドットになる。Next, based on the step width calculated by the step width estimation module 63, the step image division module 64
In step 77, the image data in the image buffer 61 is divided into horizontally long images according to the identified step width.
Further, in the line width estimation module 65, step 78
As a substitute, 0 is assigned to the black spot flag and 0 to the candidate line. Next, in step 79, one line is taken out in the vertical direction, the number of black dots in each line is measured, the predetermined threshold value is compared with the number of black points, and if it is larger than the threshold value, step 80 is executed. 1 for the sunspot flag
To store. If it is smaller than the threshold value and the black spot flag is 1, the difference between the current line number and the number in the candidate line is set as the line width in step 81, and the line width estimation module 65 is used.
Add 1 to the value of the slot of the appropriate row width in the row width buffer in. In step 82, the black spot flag is set to 0 and the current line number of the candidate line is substituted. When the line width estimation is completed for all blocks, the line width is within the error range from the maximum frequency line width according to a predetermined line width error range from the line width (maximum frequency line width) of the slot having the largest value in step 83. The line width is determined by weighted averaging the line width frequencies. For example, in the case of newspaper image data input at a resolution of 400 dpi, if the line width error range given in advance is 5 dots and the maximum frequency line width is 62 dots and the weighted average value is 62 dots, The identified line width is 62 dots.

【００３４】最後にイメージデータ構築モジュール６６
において、ステップ８４として決定した段幅行幅と、イ
メージデータを構造化されたデータとして、格納データ
生成装置６へ送る。Finally, the image data construction module 66
In step 84, the step width and line width determined in step 84 and the image data are sent to the stored data generation device 6 as structured data.

【００３５】本実施例におけるテキストデータ構造化装
置５は、図１に示すようにテキストで記述された情報か
らパターンマッチにより日付やページなどの情報を取り
出すパターンマッチャ８と、日付など構造化された情報
を一時的に格納する構造化スロット９、パターンマッチ
ャ８で利用する知識を格納するパターンマッチデータベ
ース１０からなる。本実施例におけるテキストデータ構
造化装置５には、図７（ａ），（ｂ）に示すようなテキ
ストデータがテキストデータ入力部であるテキストデー
タ入力装置２から送られてくる。The text data structuring device 5 in the present embodiment is structured such as a pattern matcher 8 for extracting information such as date and page from the information described in text by pattern matching as shown in FIG. It comprises a structured slot 9 for temporarily storing information, and a pattern matching database 10 for storing knowledge used by the pattern matcher 8. Text data as shown in FIGS. 7A and 7B is sent to the text data structuring device 5 in this embodiment from the text data input device 2 which is a text data input unit.

【００３６】本実施例におけるテキストデータ構造化装
置５のテキストデータ構造化処理としては、図８のフロ
ーチャートの手順に従って行われる。入力されたデータ
から例えばパターンマッチャ８において、ステップ８５
として入力された文字列の情報を一行毎に取り出し、ス
テップ８６として各行の最初の空白までの「＊日付＊」
などのパターン部と、その後ろに来る属性データのテキ
スト部を切り分ける。次にステップ８７として、パター
ンマッチデータベース１０をパターンを用いて検索し、
属性データを変換するルールを獲得する。ステップ８８
として検索されたルールに従い、例えば「９２０６１
８」を（９２６１８）という数値に変換して、構造
化スロット９に格納する。一記事分のデータが構造化で
きたら、ステップ８９として、格納データ生成装置６へ
送る。The text data structuring process of the text data structuring device 5 in this embodiment is performed according to the procedure of the flowchart of FIG. From the input data, for example, in pattern matcher 8, step 85
The information of the character string input as is extracted line by line, and in step 86, "* date *" up to the first blank of each line
Separate the pattern part such as and the text part of the attribute data that follows it. Next, in step 87, the pattern matching database 10 is searched using the pattern,
Get the rules for converting attribute data. Step 88
According to the rule searched for as, for example, “92061
8 ”is converted into a numerical value of (92618) and stored in the structured slot 9. When the data for one article has been structured, it is sent to the stored data generation device 6 in step 89.

【００３７】上記におけるパターンマッチデータベース
１０に格納されるルールの例として、例えば「＊日付
＊」の場合は「９２０６１８」という文字列を２文字ず
つ切り出し、「９２」という文字列からさらに文字とし
て「９」を切り出し、「０，１，２…，８，９」間での
文字に対応する数値の表から９という数値データを得、
これを１０倍し、「２」という文字から数値の２を得て
９０と加算することで９２とする。図７に示す他のデー
タの場合も容易に類推できるので、省略する。As an example of the rule stored in the pattern matching database 10 described above, for example, in the case of "* date *", the character string "920618" is cut out by two characters, and the character string "92" is further converted into characters ""9" is cut out, and numerical data of 9 is obtained from a table of numerical values corresponding to characters between "0, 1, 2, ..., 8, 9",
This is multiplied by 10 and the numerical value 2 is obtained from the character "2" and added to 90 to obtain 92. Since the other data shown in FIG. 7 can be easily analogized, the description thereof will be omitted.

【００３８】格納データ生成装置６では、テキストデー
タ構造化装置５によって構造化されたテキストデータ中
の段行の位置と、台紙形状同定装置４によって得られ
た、段幅，行幅のデータを用いて、各記事の存在する位
置を画像データ上の座標点に変換し、テキストデータ，
イメージデータともに、複合情報格納装置７に格納す
る。The stored data generating device 6 uses the position of the line in the text data structured by the text data structuring device 5 and the data of the line width and line width obtained by the mount shape identifying device 4. Then, the position where each article exists is converted into a coordinate point on the image data, and the text data,
The image data and the image data are stored in the composite information storage device 7.

【００３９】本実施例におけるイメージデータ入力装置
１としては、イメージスキャナもしくは、新聞社が印刷
工場に配布している、ディジタルファクシミリによるデ
ィジタル画像伝送装置があげられる。The image data input device 1 in this embodiment may be an image scanner or a digital image transmission device by digital facsimile distributed by a newspaper company to a printing factory.

【００４０】本実施例における画像情報は、図７に示す
ようにページ単位に分割されたイメージデータと、少な
くとも新聞名，日付，ページの情報を持っている。The image information in this embodiment has image data divided into page units as shown in FIG. 7, and at least information on newspaper name, date, and page.

【００４１】本実施例におけるテキストデータ入力装置
１は、例えば記事テキストデータベースからのデータベ
ースの検索システムや、磁気テープなどの計算機用デー
タ交換媒体によって入手することができる新聞記事テキ
ストデータの読み取り装置である。他のテキスト情報入
力装置としては、本文の文字列情報や位置情報を内部に
持つ、新聞社などで利用されている計算機システムを用
いた版組システムであるＣＴＳ（ＣｏｍｐｕｔｅｒＴ
ｙｐｅｄＳｅｔｔｉｎｇ）や、ＤＴＰ（Ｄｅｓｋｔｏ
ｐＰｕｂｌｉｓｈｉｎｇＳｙｓｔｅｍ）もあげられ
る。The text data input device 1 in the present embodiment is a newspaper article text data reading device which can be obtained by a database retrieval system from an article text database or a computer data exchange medium such as a magnetic tape. . As another text information input device, there is a CTS (Computer T) which is a typesetting system using a computer system used in a newspaper company or the like, which internally has character string information and position information of a main body.
typed setting) and DTP (Deskto
p Publishing System).

【００４２】本実施例における記事テキストデータは、
図７（ｃ）に示すように個々の記事に分離されており、
少なくとも日付，新聞名，ページ番号，記事の紙面上で
の段行などによる位置，記事の本文といった情報を持
つ。The article text data in this embodiment is
As shown in Figure 7 (c), it is separated into individual articles,
It has information such as at least date, newspaper name, page number, position of article on paper, and text of article.

【００４３】〔実施例２〕図９は本発明の第２実施例に
おける台紙形状同定装置のシステム構成図である。新聞
紙面の記事部は、記事本部と記事下広告の２つの領域に
分けることができ、記事下広告のレイアウトは、台紙の
段行の構造と全く異なっている場合がほとんどなので、
本実施例では、台紙形状同定装置として記事下広告を分
離する手段を付加することにより、台紙画像上でのレイ
アウト画像の形状を同定する精度を高くすることができ
る例を示す。なお、台紙形状同定装置を除いた他の構成
は図１と同様である。一般的に記事本部と記事下広告の
部分は、あらかじめ分離して製作され、最終的に画像情
報として結合されるために、両者の境界を示す横罫線
は、レイアウト画像である紙面の横幅いっぱいに必ず引
かれている（図７に図示の境界線Ｌ）。そこで新聞画像
から横方向の各ラインの黒点の数を計測すると、外枠以
外では記事本部と記事下広告の境界の横罫線のみ、黒点
の数が横幅のドット数と一致する。[Embodiment 2] FIG. 9 is a system configuration diagram of a mount shape identifying apparatus in a second embodiment of the present invention. The article section on the newspaper page can be divided into two areas, the article headquarters and the under-article advertisement, and the layout of the under-article advertisement is almost completely different from the mount structure of the mount, so
In the present embodiment, an example is shown in which the accuracy of identifying the shape of the layout image on the mount image can be increased by adding a means for separating the post-article advertisement as the mount shape identifying device. The configuration other than the mount shape identifying device is the same as in FIG. In general, the article headquarters and the sub-article advertisement are separately produced in advance and finally combined as image information, so the horizontal ruled line that indicates the boundary between them is set to the full width of the layout image. It is always drawn (boundary line L shown in FIG. 7). Therefore, when the number of black dots on each line in the horizontal direction is measured from the newspaper image, the number of black dots is the same as the number of dots in the horizontal width only on the horizontal ruled line at the boundary between the article headquarters and the under-article advertisement except the outer frame.

【００４４】記事下広告を分離できる台形形状同定装置
４の実施例としては、図９に示すように最長横罫線抽出
モジュール９１と、画像分割モジュール９２を第１の実
施例で示した台紙形状同定装置４に加える。As an embodiment of the trapezoidal shape identifying device 4 capable of separating the sub-article advertisement, as shown in FIG. 9, the longest horizontal ruled line extraction module 91 and the image division module 92 are used to identify the mount shape shown in the first embodiment. Add to device 4.

【００４５】本実施例における台紙形状同定装置４での
記事下広告の分離の処理としては、図１０のフローチャ
ートに示す手順によって行われる。まず、図１のイメー
ジデータ前処理装置２から得たノンブルの除去後の画像
データに対して図１０に示すように、ステップ１０１と
して最長横罫線抽出モジュール９１において、白ライン
フラグ，黒ラインフラグに−１を代入する。ステップ１
０２として新聞画像の下端から上端に達するまで一ライ
ンずつデータを取り出し、ステップ１０３として黒点と
判断されるデータの個数を数える。次にステップ１０４
として、白ラインフラグが−１で、黒点の個数が０だっ
た場合には、ステップ１０５として白ラインフラグにラ
イン番号を格納する。ステップ１０６では白ラインフラ
グが−１ではなく、黒点の個数が画像の横幅に等しい場
合には、ステップ１０７として黒ラインフラグにライン
番号を格納する。ステップ１０８として黒ラインフラ
グ，白ラインフラグに共にライン番号が格納され（−１
でないこと）、黒点の個数が０である場合には、そのラ
イン番号を画像分離モジュール９２へ送る。ステップ１
０９として、黒点の個数が１以上，横幅未満の場合に
は、ステップ１０１に戻り白点フラグ，黒点フラグに−
１を代入する。The process of separating the post-article advertisement in the mount shape identifying device 4 in this embodiment is performed by the procedure shown in the flowchart of FIG. First, as shown in FIG. 10, the longest horizontal ruled line extraction module 91 sets a white line flag and a black line flag in step 101 as to the image data from which the page number is removed, which is obtained from the image data preprocessing device 2 in FIG. Substitute -1. Step 1
As 02, data is taken line by line from the lower end to the upper end of the newspaper image, and at step 103, the number of data judged as black dots is counted. Then step 104
When the white line flag is -1 and the number of black dots is 0, the line number is stored in the white line flag in step 105. If the white line flag is not -1 in step 106 and the number of black dots is equal to the width of the image, the line number is stored in the black line flag in step 107. In step 108, the line numbers are stored in both the black line flag and the white line flag (-1
If the number of black dots is 0, the line number is sent to the image separation module 92. Step 1
If the number of black dots is 1 or more and less than the width as 09, the process returns to step 101 and the white dot flag and the black dot flag are set to −.
Substitute 1

【００４６】次に画像分割モジュール９２では、最長横
罫線抽出モジュール９１によって得られた記事部のライ
ン番号を用いて、記事部のみの画像を切り出し、イメー
ジバッファ６１に格納する。その後の処理は、上記第１
実施例とほぼ同様である。Next, the image division module 92 cuts out the image of only the article part using the line number of the article part obtained by the longest horizontal ruled line extraction module 91 and stores it in the image buffer 61. Subsequent processing is the first
It is almost the same as the embodiment.

【００４７】他の最長横罫線抽出モジュール９１の実施
例としては、紙に印刷された新聞画像を図１のイメージ
データ入力装置１としてイメージスキャナを利用して読
み込んだ場合には、入力された画像が傾いたり、ノイズ
が発生するなどの原因により、記事本部と記事下広告の
境における黒点の数が、１以上横幅未満の値になる可能
性がある。そこで広告分離閾値黒と広告分離閾値白の２
つの閾値を設け、広告分離閾値黒よりも黒点の数が大き
い場合には、図１０における最大横線幅に等しく、広告
分離閾値白よりも黒点の数が少ない場合には、空白帯で
あるとみなす。この結果、ノイズや傾きによる誤差を生
じているデータに対しても、記事本部と記事下広告の分
離が可能となる。As another embodiment of the longest horizontal ruled line extraction module 91, when a newspaper image printed on paper is read using an image scanner as the image data input device 1 of FIG. 1, the input image is input. There is a possibility that the number of sunspots at the boundary between the article headquarters and the post-article advertisement may be a value of 1 or more and less than the width due to factors such as tilting and noise. Therefore, there are 2
Two thresholds are provided, and when the number of black dots is larger than the advertisement separation threshold black, it is equal to the maximum horizontal line width in FIG. 10, and when the number of black dots is smaller than the advertisement separation threshold white, it is considered as a blank band. . As a result, it is possible to separate the article headquarters from the article advertisement even for data that has errors due to noise or inclination.

【００４８】〔実施例３〕次に、本発明の第３実施例を
示す。図２のイメージ補正モジュール１７としては、新
聞画像の場合、ディジタルファクシミリと同等の品質を
持つ画像をイメージスキャナにより入力すると、縦方向
が１００００ドット程度あるため、画像の左上角を原点
として回転をかけると、０．１°程度の傾きであったと
しても、原点近傍はほとんど変化しないが、下端の辺り
は横方向に２０ドット以上動くため、画像のゆがみが大
きくなる。[Third Embodiment] Next, a third embodiment of the present invention will be described. In the image correction module 17 of FIG. 2, in the case of a newspaper image, when an image having the same quality as that of a digital facsimile is input by the image scanner, since there are about 10,000 dots in the vertical direction, rotation is performed with the upper left corner of the image as the origin. Then, even if the inclination is about 0.1 °, there is almost no change in the vicinity of the origin.

【００４９】そこで、本実施例のイメージ補正モジュー
ルとしては、回転の中心を新聞画像の中央に位置させ
る。例えばある新聞の入力画像が縦９２００ドット、横
５８２０ドットであったとすると、回転の計算を行なう
際に、縦４６００ドット、横２９１０ドットの点を原点
となるようにして計算を行なう。この結果、計算式は、
例えば画像の横方向の座標をｘとして、横方向の座標を
ｙとし、計算された横方向の座標をＵとし、計算された
縦方向の座標をＶとするならば、Ｕ＝（ｘ−２９１０）×ｃｏｓ（−０．１゜）−（ｙ−
４６００）×ｓｉｎ（−０．１゜）＋２９１０，Ｖ＝（ｘ−２９１０）×ｓｉｎ（−０．１゜）＋（ｙ−
４６００）×ｃｏｓ（−０．１゜）＋４６００によって新しい座標点を計算する。この結果、ゆがみが
分離されて、高々１０ドット程度になる。Therefore, in the image correction module of this embodiment, the center of rotation is located at the center of the newspaper image. For example, if the input image of a newspaper has 9200 dots in the vertical direction and 5820 dots in the horizontal direction, the calculation is performed with the point of 4,600 dots in the vertical direction and 2910 dots in the horizontal direction as the origin. As a result, the formula is
For example, if the horizontal coordinate of the image is x, the horizontal coordinate is y, the calculated horizontal coordinate is U, and the calculated vertical coordinate is V, then U = (x-2910) ) × cos (−0.1 °) − (y−
4600) × sin (−0.1 °) +2910, V = (x−2910) × sin (−0.1 °) + (y−
4600) × cos (−0.1 °) +4600 to calculate a new coordinate point. As a result, the distortion is separated and becomes about 10 dots at most.

【００５０】更に本実施例におけるイメージデータ前処
理装置としては、新聞の画像の場合、記事下広告がつい
ているため、記事本部のみが必要な場合には、記事本部
を検出して、記事本部の中心を用いて回転を行なえば、
記事本部の平均的なゆがみは更に小さくなる。例えば、
ある新聞の一面の記事下広告は新聞の高さ方向に対し
て、２０％の領域を占めている。この場合、図１１のイ
メージデータ前処理装置のシステム構成図に示すよう
に、イメージデータ前処理装置３内に、最長横罫線抽出
モジュール９１を組み込むことで達成できる。図１１に
示すように、イメージデータ前処理装置３の一実施例と
しては、傾き補正機構１１とノンブル除去機構１２を２
セットと、最長横罫線抽出モジュール９１から構成され
る。Further, as the image data preprocessing device in the present embodiment, in the case of a newspaper image, since an under-article advertisement is attached, when only the article headquarters is required, the article headquarters is detected and the article headquarters is detected. If you rotate using the center,
The average distortion at the article headquarters is even smaller. For example,
The advertisement below the front page of a newspaper occupies 20% of the height of the newspaper. This case can be achieved by incorporating the longest horizontal ruled line extraction module 91 in the image data preprocessing device 3 as shown in the system configuration diagram of the image data preprocessing device in FIG. As shown in FIG. 11, as one embodiment of the image data pre-processing device 3, the inclination correction mechanism 11 and the non-blem removing mechanism 12 are provided in two.
A set and a longest horizontal ruled line extraction module 91.

【００５１】本実施例におけるイメージデータ前処理装
置３の処理過程としては、図３に示した傾き補正の処理
過程によって得られた画像情報に対し、最長横罫線抽出
モジュール９１において、記事部の領域を決定し、一方
の傾き補正機構１１中のイメージ補正モジュール１７に
おいて、記事本部の中心座標を中心にして回転する。こ
れにより、記事本部のゆがみは更に小さくなり、高々横
方向で８ドット程度しか動かない。As the processing process of the image data preprocessing device 3 in the present embodiment, the longest horizontal ruled line extraction module 91 uses the region of the article part for the image information obtained by the processing process of the inclination correction shown in FIG. Then, the image correction module 17 in one of the tilt correction mechanisms 11 rotates about the center coordinates of the article headquarters. As a result, the distortion of the article headquarters is further reduced, and only about 8 dots can be moved in the horizontal direction.

【００５２】[0052]

【発明の効果】従来、新聞紙面や雑誌面のように、レイ
アウトされた画像を情報伝達媒体として利用していた情
報と、同一の内容を計算機によって用いられる文字コー
ドや、それを紙に打ち出した文字列による文字や伝達媒
体とする情報は、分離して扱われていたが、本発明によ
れば文字を伝達媒体とする情報に、画像情報上での位置
の情報を付加することにより、２種類の情報を例えば計
算機上で、一つの情報として扱うことが可能になる。[Effects of the Invention] Conventionally, information that uses a laid out image as an information transmission medium, such as a newspaper page or a magazine page, and a character code used by a computer for the same content as the information, and the same is printed on paper. Characters in a character string and information to be used as a transmission medium have been treated separately, but according to the present invention, by adding information on the position on the image information to the information using a character as a transmission medium, It is possible to handle the type of information as one piece of information on a computer, for example.

【００５３】特に本発明では、イメージデータの入力装
置の違いによって、台紙画像上の座標点は様々であった
としても、台紙画像中のレイアウト画像内の各要素画像
の位置を判定できる手段を備えたことにより、レイアウ
ト画像上での座標に従った位置情報を入力することのみ
で、画像上の情報と文字列で表現された情報から、複合
的な情報を作成することが可能となり、複合的な情報を
作成する際の効率が大幅に上がる。In particular, the present invention is provided with means for determining the position of each element image in the layout image in the mount image, even if the coordinate points on the mount image are different due to the difference in the input device of the image data. As a result, it is possible to create complex information from the information on the image and the information expressed by the character string by only inputting the position information according to the coordinates on the layout image. The efficiency of creating accurate information is greatly increased.

【００５４】また、請求項３の発明によれば、特に新聞
記事のような記事本部と広告の部分に分かれている情報
に対して、記事本部のような情報の本体の領域を判断す
る手段を備えたことにより、情報の本体の画像の要素画
像の位置の判定を正確に行うことができ、レイアウト画
像，要素画像のゆがみを小さくすることができる。According to the third aspect of the present invention, means for determining the area of the main body of information such as the article headquarter is particularly provided for information divided into the article headquarter and the advertisement portion such as newspaper articles. With the provision, it is possible to accurately determine the position of the element image of the image of the information body, and it is possible to reduce the distortion of the layout image and the element image.

【００５５】さらに、請求項４の発明によれば、特に画
像が傾いて入力される可能性のあるイメージスキャナな
どの画像入力装置を使用する場合、たとえ画像が傾いて
入力されても、ひずみの少ない複合的な情報が得られ
る。Further, according to the invention of claim 4, when an image input device such as an image scanner in which an image is likely to be input with inclination, is used, distortion is generated even if the image is input with inclination. A small amount of complex information can be obtained.

[Brief description of drawings]

【図１】本発明の第１実施例における複合的な情報の構
築方式を示すシステム構成図FIG. 1 is a system configuration diagram showing a complex information construction method according to a first embodiment of the present invention.

【図２】上記第１実施例におけるイメージデータ前処理
装置のシステム構成図FIG. 2 is a system configuration diagram of the image data preprocessing device in the first embodiment.

【図３】上記第１実施例における傾き補正処理のフロー
チャートFIG. 3 is a flowchart of a tilt correction process in the first embodiment.

【図４】（ａ），（ｂ）は上記第１実施例におけるノン
ブル除去機構の処理のフローチャート4 (a) and 4 (b) are flow charts of the process of the pumble removing mechanism in the first embodiment.

【図５】上記第１実施例における台紙形状同定装置のシ
ステム構成図FIG. 5 is a system configuration diagram of the mount shape identifying device in the first embodiment.

【図６】上記第１実施例における台紙形状同定装置の処
理のフローチャートFIG. 6 is a flowchart of processing of the mount shape identifying device in the first embodiment.

【図７】（ａ），（ｂ），（ｃ）は上記第１実施例にお
けるデータ例を示す図7 (a), (b) and (c) are diagrams showing an example of data in the first embodiment.

【図８】上記第１実施例におけるテキストデータ構造化
処理のフローチャートFIG. 8 is a flowchart of text data structuring processing in the first embodiment.

【図９】本発明の第２実施例における台紙形状同定装置
のシステム構成図FIG. 9 is a system configuration diagram of a mount shape identifying device according to a second embodiment of the present invention.

【図１０】上記第２実施例における台紙形状同定装置内
の記事下広告分離処理のフローチャートFIG. 10 is a flowchart of article down advertisement separation processing in the mount shape identifying device in the second embodiment.

【図１１】本発明の第３実施例におけるイメージデータ
前処理装置のシステム構成図FIG. 11 is a system configuration diagram of an image data preprocessing device in a third embodiment of the present invention.

[Explanation of symbols]

１…イメージデータ入力装置２…テキストデータ入力装置３…イメージデータ前処理装置４…台紙形状同定装置５…テキストデータ構造化装置６…格納データ生成装置７…複合情報格納装置８…パターンマッチャ９…構造化スロット１０…パターンマッチデータベース１１…傾き補正機構１２…ノンブル除去機構 1 ... Image data input device 2 ... Text data input device 3 ... Image data pre-processing device 4 ... Mount shape identification device 5 ... Text data structuring device 6 ... Stored data generation device 7 ... Complex information storage device 8 ... Pattern matcher 9 ... Structured slot 10 ... Pattern match database 11 ... Slope correction mechanism 12 ... French removal mechanism

Claims

[Claims]

1. Information represented by at least one character string and at least one or more image information representing the same content as the information represented by the character string by an image. Image information of a mount image having at least one layout image which is image information composed of element images, and which position on the layout image the information represented by each of the character strings is in accordance with a predetermined rule. Means for determining the area of each of the layout images on the mount image when three types of information, that is, the position information indicating whether or not the input position information is present, are input to the layout image. A means for converting the position information according to the rule on the image, and the position information on the layout image into information on the area where the layout image exists on the mount image. Therefore, it is characterized by having means for converting into position information in accordance with the rule on the mount image, and structured so that the image information and the information expressed by the character string can be handled in association with each other. Complex information construction method.

2. The composite information construction method according to claim 1, wherein the layout image composed of the input element image and one or more element images is image information having regularity of positions by columns or rows. , And the position information input together with the information represented by the character string is represented by the rule of the step or line on the layout image, the position on the layout image and the character string are represented. In order to correlate the position information input together with the information, the means for determining the area of the layout image, in particular the means for identifying a step height from the distribution of black dots on the layout image, It has a means for identifying the line spacing from the distribution of black dots in the columns, and associates the position information represented by the columns and rows of the information represented as the character string with the position on the layout image. A composite information construction method characterized in that the information represented by the character string and the element image are associated with each other according to the rules of the position on the mount image.

3. The composite information construction method according to claim 1, wherein the layout image area is determined particularly when an area representing the main body of information and an additional information area are present. In particular, means for determining a boundary line on the layout image information between the area expressing the main body of the information and the additional information area, and from the mount image information using the position of the determined boundary line. And a means for cutting out image information of an area expressing the main body of the information, and further extracting the structure of steps and rows from the image information of only the main body of the information to determine the positions of the steps and rows in the entire layout image information. A complex information construction method characterized by making a judgment.

4. The composite information construction method according to claim 1, 2 or 3, wherein a means for detecting an outer frame of the input layout image when the layout image is tilted, and the layout image are stored in advance. Means for identifying the position of the boundary line separating the main information and the additional information according to a predetermined standard, and the position of the boundary line between the outer frame information, the main information and the additional information And means for calculating the center point of the area of the image expressing the main information from the information, and inputting the image information after rotating the mount image with the center point of the image having the main information as the center of rotation. A complex information construction method characterized by being performed.