JP2018132953A

JP2018132953A - Image processing method, and computer program

Info

Publication number: JP2018132953A
Application number: JP2017026282A
Authority: JP
Inventors: 栄竹内; Sakae Takeuchi; 克犬嶋; Masaru INUJIMA
Original assignee: SOFNEC CO Ltd
Current assignee: SOFNEC CO Ltd
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2018-08-23
Anticipated expiration: 2037-02-15
Also published as: JP6294524B1; WO2018151043A1

Abstract

PROBLEM TO BE SOLVED: To provide an image processing method and a program that can create a binary image appropriate for extracting meaningful information such as characters or signs from images including various colors, particularly from photographed images.SOLUTION: The method includes: a step S2 for identifying a character area excluding a background area from a color image subject to the processing; a step S3 for classifying each pixel included in the background area into L (L>=2) groups; a step S4 for classifying each pixel included in the character area into N+L groups which are the combination of the L of the background area and the character area specific N (N>=2); and a step S5 for considering the groups of the background area as same groups, dividing the groups of N + 1 into two, and creating a binary image showing the pixels included in one portion in a same one solid color and the pixels included in the other portion in another one solid color.SELECTED DRAWING: Figure 2

Description

本発明は、種々の色を含む画像、特に実写画像から、文字や標識などの意味ある情報を抽出するに適した２値画像を作成することができる画像処理方法に関する。 The present invention relates to an image processing method capable of creating a binary image suitable for extracting meaningful information such as characters and signs from an image including various colors, particularly a photographed image.

テレビの映像のような動画像には、画像に文字がオーバーレイされていることが多く、文字のみを抽出する機能が必要となることがある。最近の画像の多くはカラー画像であるが、従来のカラー画像の文字抽出処理は、モノクロの２値画像処理の手法を援用するものであった。すなわち、カラー画像を何らかの方法で２値化処理してモノクロの２値画像とした後に、この２値画像について文字データを抽出しようとするものである。ところが、カラー画像は種々の色を含んでいることから、文字抽出に関し、次のような問題がある。それは、カラー画像においては文字の色と、その背景色とが異なるにもかかわらず、２値化処理をした結果、文字及び背景色が共に黒（又は白）に変換され、その結果、文字が失われてしまう、という問題である。特許文献１には、種々の色を含むカラー原稿から各色の画像を正確に認識することができる「画像認識方法」が提案されている。 In a moving image such as a video on a television, characters are often overlaid on the image, and a function for extracting only characters may be required. Most of the recent images are color images, but conventional color image character extraction processing uses a monochrome binary image processing technique. That is, after a color image is binarized by some method to form a monochrome binary image, character data is extracted from the binary image. However, since a color image includes various colors, the following problems are associated with character extraction. Even if the color of the character in the color image is different from the background color, as a result of the binarization process, both the character and the background color are converted to black (or white). It is a problem of being lost. Patent Document 1 proposes an “image recognition method” that can accurately recognize an image of each color from a color document including various colors.

特開２００４−２１７６５号公報Japanese Patent Laid-Open No. 2004-21765

特許文献１に記載の発明によれば、カラー画像を２値画像とすることなく、カラーイメージデータを各色毎に分離した複数のイメージデータ毎に認識処理を行なう。従って、例えばカラー原稿において各色毎に異なる文字を表して、原稿がカラーであることを生かすことができる。また、カラー原稿において文字の色とその背景色とが異なりさえすれば、これらが共に黒に変換され文字が失われることを防止でき、レイアウト認識ができなくなることを防止して、円滑に文字認識処理に移行することができる。 According to the invention described in Patent Document 1, the recognition process is performed for each of a plurality of image data obtained by separating the color image data for each color without making the color image a binary image. Therefore, for example, it is possible to make use of the color of the original by representing different characters for each color in the color original. Also, as long as the color of the character and the background color of the color document are different, they can both be converted to black, preventing the loss of the character, preventing the layout from becoming unrecognizable, and smoothly recognizing the character. You can move on to processing.

しかしながら、ここで対象となる画像はカラーではあるもののあくまでドキュメントであり、風景、人物などの実写映像は考慮されていない。実写映像ではピクセル値が連続して変化するため、特許文献１の段落〔００３１〕−〔００３２〕のような、クラスタリングの前に色数を求めることは意味がない。
本発明は、テレビで放映される動画像のような種々の色を含む自然画像から文字を確実に抽出するための２値画像を作成することを目的とする。 However, although the target image here is a color, it is just a document, and does not take into account actual images such as landscapes and people. Since the pixel value changes continuously in a live-action video, it is not meaningful to determine the number of colors before clustering as in paragraphs [0031] to [0032] of Patent Document 1.
An object of the present invention is to create a binary image for reliably extracting characters from a natural image including various colors such as a moving image broadcast on a television.

本発明は、カラー画像を２値化する画像処理方法であって、対象となるカラー画像から背景領域を除いた文字領域を特定するステップと、前記背景領域に含まれる各ピクセルをＬ（Ｌ＞＝２）個のグループに分類するステップと、前記文字領域に含まれる各ピクセルを、背景領域のＬ個と文字領域固有のＮ（Ｎ＞＝２）個をあわせたＮ＋Ｌ個のグループに分類するステップと、背景領域のグループを同一のグループとみなし、Ｎ＋１個のグループを２分し、一方に含まれるピクセルを同一の１色で、他方に含まれるピクセルを他の１色で表示する２値画像を作成するステップと、からなることを特徴とする。 The present invention is an image processing method for binarizing a color image, the step of specifying a character area excluding the background area from the target color image, and each pixel included in the background area as L (L> = 2) Classifying into groups, and each pixel included in the character area is classified into N + L groups that are a combination of L in the background area and N (N> = 2) unique to the character area. Binary that displays the step and the background area group as the same group, divides the N + 1 group into two, and displays the pixels included in one with the same color and the pixels included in the other with the other color And a step of creating an image.

本発明の画像処理方法によれば、背景領域と文字領域のそれぞれについてグループ化した後、複数の２値画像を作成する。個々の２値画像だけでは、完全な文字データの抽出はできないとしても、これら複数の２値画像から得られる情報を総合すると高い精度で文字データが抽出できる。 According to the image processing method of the present invention, a plurality of binary images are created after grouping the background area and the character area. Even if complete character data cannot be extracted with only individual binary images, character data can be extracted with high accuracy by combining information obtained from the plurality of binary images.

本発明では、入力されたカラー画像を、ピクセル単位でＬ^＊ａ^＊ｂ^＊表色系の色空間の座標に変換して、この変換後の画像に対して、前記文字領域を特定するステップ以降の処理を行うとよい。
このように、ＲＧＢ値に比べると人間の視覚の特性をよく反映した色表現であるＬ^＊ａ^＊ｂ^＊値に変換するので、人間にとって違和感なく色の類似性を評価できる。 In the present invention, the input color image is converted into the coordinates of the color space of the L ^* a ^* b ^* color system in pixel units, and the character area is specified for the converted image. It is good to perform the process.
In this way, since the color representation is converted to the L ^* a ^* b ^* value, which is a color expression that better reflects the characteristics of human vision compared to the RGB values, the similarity of colors can be evaluated without a sense of incongruity for humans.

本発明では、前記背景領域のグループ化は、色数Ｌ個についてのＫ−ｍｅａｎｓ法により行い、前記文字領域のグループ化は、色数が当初Ｍ＋Ｌ個についてのＫ−ｍｅａｎｓ法により行い、文字領域に固有のＭ（Ｍ＞Ｎ）個のグループのうち属するピクセル個数が最も少ないグループを削除する処理を、Ｍが最終的なグループ個数Ｎに達するまで繰り返すとよい。文字には、あまり背景には現れないような色が用いられることが多い。そのため、背景領域と文字領域とにＫ−ｍｅａｎｓ法を別々に適用することで、適切に文字部分のグループ化ができる。また、文字領域のグループ化にあたって、背景領域のグループ化で求めたピクセル値に近い色が文字領域にあれば、そのピクセルは背景に属するものとする。これにより、文字領域として特定された領域にあっても、本来は背景に属するピクセルは背景領域にあるものとして適切に分類されるので、２値化処理の精度が高まる。 In the present invention, the background region is grouped by a K-means method for L colors, and the character region is grouped by a K-means method for initially M + L colors. It is preferable to repeat the process of deleting the group having the smallest number of pixels belonging to M (M> N) groups unique to, until M reaches the final group number N. The characters often use colors that do not appear in the background. Therefore, character parts can be appropriately grouped by applying the K-means method separately to the background area and the character area. In addition, when grouping character areas, if a character area has a color close to the pixel value obtained by grouping the background area, the pixel belongs to the background. As a result, even in the area specified as the character area, the pixels originally belonging to the background are appropriately classified as being in the background area, so that the accuracy of the binarization process is increased.

本発明では、前記文字領域固有のグループ化に使用される当初のＭ色は、Ｒ，Ｇ，Ｂ，Ｃ（シアン）、Ｍ（マゼンダ）、Ｙ（イエロー），白、黒の８色であるとよい。文字の色は、黒や青など純色が多いので、文字領域を対象とするＫ−ｍｅａｎｓ処理は、これらの色からスタートすることが望ましい。 In the present invention, the initial M colors used for grouping specific to the character area are eight colors of R, G, B, C (cyan), M (magenta), Y (yellow), white, and black. Good. Since there are many pure colors such as black and blue, the K-means processing for the character area is preferably started from these colors.

入力されたカラー画像から、複数の２値画像を作成するので、それらを合成すれば高い精度で文字データを抽出できる。 Since a plurality of binary images are created from the input color image, character data can be extracted with high accuracy by combining them.

本発明の実施形態に係る画像処理装置の構成を示す機能ブロック図である。1 is a functional block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. 本発明の実施形態に係る処理の概略を説明するフロー図である。It is a flowchart explaining the outline of the process which concerns on embodiment of this invention. 本発明の実施形態に係る原画像と文字領域特定後の画像を例示する説明図である。It is explanatory drawing which illustrates the image after specifying the original image and character area which concern on embodiment of this invention. 本発明の実施形態に係る背景領域グループ化処理を説明するフロー図である。It is a flowchart explaining the background area grouping process which concerns on embodiment of this invention. 本発明の実施形態に係る文字領域グループ化処理を説明するフロー図である。It is a flowchart explaining the character area grouping process which concerns on embodiment of this invention. 本発明の実施形態に係る２値画像を作成するためにグループを２分する場合の数を説明するための図である。It is a figure for demonstrating the number in the case of dividing a group into 2 in order to produce the binary image which concerns on embodiment of this invention. 本発明の実施形態に係る出力結果である２値画像を例示する図である。It is a figure which illustrates the binary image which is an output result concerning the embodiment of the present invention.

図面を参照しながら本発明の一実施形態の画像処理装置について説明する。
以下、次の項目別に説明する。
《１．画像処理装置の機能ブロック構成》
《２．画像処理装置による前処理（文字領域特定）》
《３．画像処理装置による本処理１（背景領域のグループ化）》
《４．画像処理装置による本処理２（文字領域のグループ化）》
《５．画像処理装置による本処理３（処理対象図形についての２値画像作成）》 An image processing apparatus according to an embodiment of the present invention will be described with reference to the drawings.
Hereinafter, the following items will be described.
<< 1. Functional block configuration of image processing apparatus >>
<< 2. Preprocessing by image processing device (character area identification) >>
<< 3. Main Processing 1 by Image Processing Device (Background Area Grouping) >>
<< 4. Main processing 2 by image processing apparatus (grouping of character areas) >>
<< 5. Main processing 3 by image processing apparatus (binary image creation for processing target figure) >>

《１．画像処理装置の機能ブロック構成》
図１を参照して、画像処理装置１の機能に着目した構成について説明する。
画像処理装置１は、パソコンやスマートフォンなどのコンピュータと、そのコンピュータに実装されたコンピュータプログラム（請求項５〜８に係るコンピュータプログラムに相当）によって実現されている。
画像処理装置１は、処理部２と記憶部３と通信インターフェース部４を備える。これらのほかに、オペレータが操作時に用いるマウスやキーボードなどの入力操作部、ディスプレイやプリンタなどの出力部やカメラなども適宜備えるが図示は省略する。 << 1. Functional block configuration of image processing apparatus >>
With reference to FIG. 1, a configuration focusing on the function of the image processing apparatus 1 will be described.
The image processing apparatus 1 is realized by a computer such as a personal computer or a smartphone and a computer program (corresponding to a computer program according to claims 5 to 8) installed in the computer.
The image processing apparatus 1 includes a processing unit 2, a storage unit 3, and a communication interface unit 4. In addition to these, an input operation unit such as a mouse and a keyboard used by the operator during operation, an output unit such as a display and a printer, a camera, and the like are appropriately provided, but illustration is omitted.

記憶部３には、入力された画像（以下、「処理対象画像」）、文字領域特定のための学習サンプル、各種パラメータ類、処理部２による各種の中間処理結果などが格納され、メモリやハードディスクなどの記憶装置によって実現される。
パラメータ類には、文字領域を特定するために使用する畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ。以下、「ＣＮＮ」）のパラメータ類、背景領域および文字領域をグループ化する際のグループ個数と各グループの代表ピクセル値の初期値などが含まれる。
中間処理結果には、特定された文字領域、ピクセル毎の所属グループなどのＫ−ｍｅａｎｓ法の適用に伴う途中経過などが含まれる。
記憶部３には、コンピュータを画像処理装置１として機能させるためのプログラムも含まれ、これらのプログラムがメモリ上に読み込まれ、読み込まれたプログラムコードを図示しないＣＰＵが実行することによって処理部２の各部が動作することになる。
次に、処理部２について説明する。 The storage unit 3 stores an input image (hereinafter, “processing target image”), a learning sample for specifying a character region, various parameters, various intermediate processing results by the processing unit 2, and the like. It is realized by a storage device such as
The parameters include parameters of a convolutional neural network (hereinafter referred to as “CNN”) used to specify a character area, the number of groups when grouping the background area and the character area, and a representative of each group. Contains the initial pixel value.
The intermediate processing result includes the progress of the specified character area, the progress of the application of the K-means method, such as the group to which each pixel belongs.
The storage unit 3 also includes programs for causing a computer to function as the image processing apparatus 1. These programs are read into the memory, and the read program code is executed by a CPU (not shown) so that the processing unit 2 can execute the program. Each part operates.
Next, the processing unit 2 will be described.

処理部２は、画像取得部２１と、文字領域特定処理部２２と、背景領域グループ化処理部２３と、文字領域グループ化処理部２４と、２値画像作成部２５を備える。以下、各部２１〜２５の説明とあわせて、図２を参照しながら画像処理装置１による処理の概要も説明する。 The processing unit 2 includes an image acquisition unit 21, a character region identification processing unit 22, a background region grouping processing unit 23, a character region grouping processing unit 24, and a binary image creation unit 25. In the following, an outline of processing by the image processing apparatus 1 will be described with reference to FIG.

画像取得部２１は、通信インターフェース部４を介して、外部の通信ネットワークや情報処理装置から処理対象画像を取得し、この画像の各ピクセルの色情報を、Ｌ^＊ａ^＊ｂ^＊色空間の座標に変換する（図２のステップＳ１）。文字領域特定処理以降（図２のステップＳ２〜Ｓ５）は、変換後の各ピクセルに基づいて処理が行われる。ここで、変換を行うのは、Ｌ^＊ａ^＊ｂ^＊色空間が、ＲＧＢ色空間よりも人による色の認識に近い座標表示をすることができ、従って、人による色の認識にほぼ正確に従うように色を分離することができるからである。 The image acquisition unit 21 acquires a processing target image from an external communication network or information processing apparatus via the communication interface unit 4, and uses color information of each pixel of this image as coordinates in the L ^* a ^* b ^* color space. (Step S1 in FIG. 2). After the character area specifying process (steps S2 to S5 in FIG. 2), the process is performed based on each pixel after conversion. Here, the conversion is performed because the L ^* a ^* b ^* color space can display coordinates closer to human color recognition than the RGB color space, and therefore follows the human color recognition almost accurately. This is because the colors can be separated.

文字領域特定処理部２２は、本実施形態では、文字領域内存否判定部２２ｂに実装された機械学習の機能によって処理対象画像から背景を除いた文字領域を特定する（図２のステップＳ２）。本発明は、背景領域と文字領域とに分け、それぞれについてＫ−ｍｅａｎｓ法を適用したグループ化を行う点に特徴のひとつがある。そのために、文字領域特定処理部２２は、処理対象画像に含まれる各ピクセルが背景領域と文字領域のいずれに含まれるかを判定する。
ところで、文字領域内存否判定部２２ｂの実体はＣＮＮであり、予め機械学習部２２ａによって、ＣＮＮのパラメータ類が調整されている。機械学習部２２ａと文字領域内存否判定部２２ｂについては、後述する《２．画像処理装置による前処理（文字領域特定）》において説明する。 In this embodiment, the character area specifying processing unit 22 specifies a character area excluding the background from the processing target image by the machine learning function implemented in the character area existence determining unit 22b (step S2 in FIG. 2). The present invention is characterized in that it is divided into a background region and a character region, and grouping is performed by applying the K-means method to each. For this purpose, the character area specification processing unit 22 determines whether each pixel included in the processing target image is included in the background area or the character area.
By the way, the entity of the character area existence determination unit 22b is CNN, and the parameters of the CNN are adjusted in advance by the machine learning unit 22a. The machine learning unit 22a and the character area existence determination unit 22b will be described later in << 2. This will be described in “Preprocessing by Image Processing Device (Character Area Identification)”

背景領域グループ化処理部２３は、文字領域特定処理部２２によって文字領域の背景と判定された領域の各ピクセルを、Ｋ−ｍｅａｎｓ法を適用してＬ個（Ｌ＞＝２）のグループに分類する。（図２のステップＳ３）。 The background area grouping processing unit 23 classifies each pixel of the area determined as the background of the character area by the character area specifying processing unit 22 into L groups (L> = 2) by applying the K-means method. To do. (Step S3 in FIG. 2).

文字領域グループ化処理部２４は、文字領域特定処理部２２によって特定された文字領域毎に、その文字領域に含まれる各ピクセルを、Ｋ−ｍｅａｎｓ法を適用してグループ化する。当初のグループ数は文字領域固有のＭ個と背景領域と同じＬ個との合計Ｍ＋Ｌ個であるが、Ｋ−ｍｅａｎｓ法によるピクセルの分類が安定し、グループ化が収束する度に文字領域固有のグループを目的のＮ個になるまで段階的に削除していくので、最終的に文字領域固有のＮ個と背景領域Ｌ個との合計Ｎ＋Ｌ個にグループ化される（図２のステップＳ４）。ここでＮは、Ｎ＞＝２の整数であり、Ｍは、Ｍ＞Ｎの整数である。 The character area grouping processing unit 24 groups each pixel included in the character area for each character area specified by the character area specifying processing unit 22 by applying the K-means method. The initial number of groups is M + L, which is a total of M characters unique to the character region and L same as the background region. However, each time the grouping converges, the classification of pixels by the K-means method is stable. Since the group is deleted step by step until the target number of N is reached, the group is finally grouped into a total N + L of N characters unique to the character region and L background regions (step S4 in FIG. 2). Here, N is an integer of N> = 2, and M is an integer of M> N.

２値画像作成部２５は、背景領域と文字領域の２段階でグループ化処理がされた結果に基づいて複数の２値画像を作成する。本発明は、文字抽出のために原画像を２値画像にすることが目的なので、背景領域にあるピクセルについては、そのピクセルが背景領域内に存在するという情報のみが必要なのである。そのため、背景領域に分類されたＬ個のグループを区別せずに１個のグループとして取扱い、文字領域に分類されたＮ個のグループとあわせた（Ｎ＋１）個のグループのいずれかに処理対象画像の各ピクセルを分類する。（Ｎ＋１）個のグループを２色（通常は、白と黒）で色分けする場合の数は、２^{（Ｎ＋１）}である。しかし、すべて白あるいは黒となる場合を除外するので、{２^{（Ｎ＋１）}―２}個の場合について２値画像を作成する（図２のステップＳ５）。以上で《１．画像処理装置の機能ブロック構成》についての説明を終える。続いて、画像処理装置１による動作を説明する。 The binary image creation unit 25 creates a plurality of binary images based on the result of grouping processing in two stages of the background region and the character region. The purpose of the present invention is to convert the original image into a binary image for character extraction. Therefore, for the pixel in the background area, only the information that the pixel exists in the background area is necessary. Therefore, the L groups classified into the background area are treated as one group without being distinguished, and the image to be processed is selected as one of (N + 1) groups combined with the N groups classified into the character area. Classify each pixel of. The number when (N + 1) groups are color-coded with two colors (usually white and black) is 2 ^{(N + 1)} . However, since the case of all white or black is excluded, a binary image is created for {2 ^{(N + 1)} −2} cases (step S5 in FIG. 2). With the above << 1. The description of the functional block configuration of the image processing apparatus> ends. Subsequently, the operation of the image processing apparatus 1 will be described.

《２．画像処理装置による前処理（文字領域特定）》
画像取得部２１は、処理対象のカラー画像を受け取ると、各々のピクセルについて、その色情報をＲＧＢ色空間等からＬ^＊ａ^＊ｂ^＊色空間へ座標変換する。即ち、各ピクセルを、ピクセル単位で、明度Ｌ^＊、色相ａ^＊、彩度ｂ^＊で表す。
続いて、Ｌ^＊ａ^＊ｂ^＊色空間の座標に変換された後の処理対象図形について文字領域特定処理を行う。以下、この処理について詳しく説明する。 << 2. Preprocessing by image processing device (character area identification) >>
When receiving the color image to be processed, the image acquisition unit 21 converts the color information of each pixel from the RGB color space or the like to the L ^* a ^* b ^* color space. That is, each pixel is represented by lightness L ^* , hue a ^* , and saturation b ^* in pixel units.
Subsequently, a character area specifying process is performed on the graphic to be processed after being converted into the coordinates of the L ^* a ^* b ^* color space. Hereinafter, this process will be described in detail.

本実施形態では、１枚の画像に含まれる文字データの存在する文字領域を特定し、文字の背景にある画像領域（「背景領域」）と区別する。文字領域を特定するために、機械学習の一種であるＣＮＮを利用する。そのため、文字領域特定処理部２２による処理を説明する前に、画像処理装置１による機械学習の機能について説明する。 In the present embodiment, a character area in which character data included in one image exists is specified, and is distinguished from an image area (“background area”) in the background of the character. In order to specify a character area, CNN which is a kind of machine learning is used. Therefore, before describing the processing by the character region identification processing unit 22, the function of machine learning by the image processing apparatus 1 will be described.

画像処理装置１は機械学習部２２ａを備え、予め大量の学習用画像を収集し学習サンプルを取り出し、機械学習にかけ、結果を検証して機械学習用のパラメータを調整しておく。具体的には、学習用の画像を収集し、これらの画像の文字領域内から正サンプルを、それ以外の領域から負サンプルを抽出する。全体が完全に文字領域内に含まれている正サンプルは、その中心が文字領域に含まれる尤度を１．０とし、文字領域内にまったく含まれていない負サンプルは、その中心が文字領域に含まれる尤度を０．０とする。この尤度が教師データであって、学習サンプルとこの教師データとが対応づけられる。
新たな学習サンプルが入力される都度、尤度を算出し、この尤度が教師データと乖離している場合、パラメータを調整する。例えば、文字領域内から抽出した学習サンプルは、その中心が必ず文字領域内にあるので、その尤度は１．０となるはずである。ところが、機械学習による出力結果と本来の尤度１．０とが乖離しているならば、この差をできるだけ少なくするように、所望の精度が実現されるまでパラメータの調整，つまりＣＮＮの学習をするのである。
このように調整されたパラメータは、文字領域内存否判定部２２ｂにエクスポートされる。
以上、機械学習について簡単に説明をした。それでは、文字領域特定処理の説明に戻る。 The image processing apparatus 1 includes a machine learning unit 22a, collects a large amount of learning images in advance, takes out learning samples, performs machine learning, verifies the results, and adjusts the parameters for machine learning. Specifically, learning images are collected, and positive samples are extracted from the character regions of these images, and negative samples are extracted from other regions. A positive sample that is entirely contained within the character area has a likelihood that its center is contained in the character area, and a negative sample that is not contained within the character area has its center located in the character area. Is assumed to be 0.0. This likelihood is teacher data, and the learning sample is associated with the teacher data.
Each time a new learning sample is input, the likelihood is calculated, and if this likelihood deviates from the teacher data, the parameters are adjusted. For example, the learning sample extracted from within the character region is necessarily within the character region, so the likelihood should be 1.0. However, if the output result from machine learning and the original likelihood of 1.0 are different from each other, parameter adjustment, that is, learning of CNN is performed until a desired accuracy is realized so as to reduce this difference as much as possible. To do.
The parameter adjusted in this way is exported to the character area existence determination unit 22b.
The machine learning has been briefly described above. Returning to the description of the character area specifying process.

文字領域特定処理部２２による文字領域特定処理は、以下のように機械学習後のＣＮＮを利用して行われる。
文字領域特定処理部２２は、処理対象画像を走査し、学習サンプルと同じ大きさの小領域（以下、「単位領域」）を取り出す。例えば、画像の左上から所定の移動量で右端へ向かって走査し、右端へ達すると所定の移動量だけ下方へ移動し、左端へ向かって走査する。これを処理対象画像の全体に渡って繰り返す。
取り出された単位領域はその都度、文字領域内存否判定部２２ｂに入力され、文字領域特定処理部２２は当該単位領域の中心が文字領域内に存在する尤度を出力結果として得る。 The character area specifying processing by the character area specifying processing unit 22 is performed using CNN after machine learning as follows.
The character area identification processing unit 22 scans the processing target image and extracts a small area (hereinafter referred to as “unit area”) having the same size as the learning sample. For example, scanning is performed from the upper left of the image toward the right end with a predetermined movement amount, and when reaching the right end, the image is moved downward by a predetermined movement amount and scanned toward the left end. This is repeated over the entire processing target image.
The extracted unit area is input to the character area existence determination unit 22b each time, and the character area specification processing unit 22 obtains the likelihood that the center of the unit area exists in the character area as an output result.

文字領域特定処理部２２は、処理対象画像の全体を走査し、単位領域の中心の尤度を取得したならば、その尤度が予め設定した閾値以上か否かによって文字領域内か否かを判定する。この判定結果により、処理対象画像の各ピクセルが文字領域に属するのか、背景領域に属するのかが確定する。例えば、図３（ａ）に示す処理対象画像は、図３（ｂ）の２値画像のように文字領域と背景領域とに分離された。図３（ｂ）では、背景領域は黒でぬりつぶされ、文字領域は白抜きされている。この例では、文字領域はchA,chB,chCの３領域があり、後続する文字領域グループ化処理では、これら３つの文字領域を別々に処理する。まとまった位置に集まっている文字同士は同じような色であることが多く、離れた位置にある文字同士は異なる色であることが多いからである。 When the character region specifying processing unit 22 scans the entire processing target image and acquires the likelihood of the center of the unit region, it determines whether the likelihood is within the character region depending on whether the likelihood is equal to or greater than a preset threshold value. judge. Based on the determination result, it is determined whether each pixel of the processing target image belongs to the character area or the background area. For example, the processing target image shown in FIG. 3A is separated into a character area and a background area like the binary image in FIG. In FIG. 3B, the background area is blacked out and the character area is outlined. In this example, there are three character areas, chA, chB, and chC. In the subsequent character area grouping process, these three character areas are processed separately. This is because the characters that are gathered at a grouped position often have the same color, and the characters that are separated from each other often have a different color.

なお、上述した機械学習部２２ａは、本実施の形態のカラー画像の２値化処理と非同期で動作する。であるから、機械学習部２２ａは、画像処理装置１とは別のコンピュータに実現させてもよい。機械学習の結果である文字領域内存否判定部２２ｂについても、別のコンピュータに実現させて、通信回線を介して文字領域特定処理部２２との間でデータの送受信をするようにしてもよい。 The machine learning unit 22a described above operates asynchronously with the color image binarization processing of the present embodiment. Therefore, the machine learning unit 22a may be realized by a computer different from the image processing apparatus 1. The character area presence / absence determining unit 22b, which is the result of the machine learning, may also be realized by another computer so as to transmit / receive data to / from the character area specifying processing unit 22 via a communication line.

以上、《２．画像処理装置による前処理（文字領域特定）》の説明をしたが、背景および文字領域のグループ化処理の説明の前に、本発明における文字領域特定処理の意義を述べる。
すなわち、本発明の活用場面として想定されるのは、カラーの実写映像の一部分に文字がオーバーレイされている状況において、文字データを洩れなく抽出することである。そのために、オーバーレイされた領域と背景領域とをあらかじめマスクとして特定しておくことが望ましい。なぜなら、まず背景部分のみをグループ化して、次に文字を背景から際立った色として抽出することが可能となるからである。もし、文字領域を特定することなく、背景と文字が混在した処理対象画像全体をＫ−ｍｅａｎｓ法でグループ化するだけでは、得られる２値画像に混ざりこむ背景を十分には除去できず、文字認識などの後続処理の使用に耐えられない。 As described above, << 2. Although the preprocessing by the image processing apparatus (character area specification) >> has been described, the significance of the character area specification processing in the present invention will be described before the background and character area grouping processing.
That is, it is assumed that character data is extracted without omission in a situation where characters are overlaid on a part of a color live-action video image. Therefore, it is desirable to specify the overlaid area and the background area as a mask in advance. This is because it is possible to first group only the background portion and then extract characters as colors that stand out from the background. If the entire processing target image in which the background and characters are mixed without grouping the character area is simply grouped by the K-means method, the background mixed in the obtained binary image cannot be sufficiently removed. It cannot withstand the use of subsequent processing such as recognition.

《３．画像処理装置による本処理１（背景領域のグループ化）》
背景領域グループ化処理部２３は、処理対象画像の背景領域に含まれる各ピクセルをＫ−ｍｅａｎｓ法に従って所定個数Ｌ個のグループに分類する。以下、図４の処理フローを参照しながら説明する。 << 3. Main Processing 1 by Image Processing Device (Background Area Grouping) >>
The background area grouping processing unit 23 classifies each pixel included in the background area of the processing target image into a predetermined number L of groups according to the K-means method. Hereinafter, a description will be given with reference to the processing flow of FIG.

まず、グループ個数Ｌを設定し、Ｋ−ｍｅａｎｓ法の処理カウンタＮｐを１に初期設定する（ステップＳ３１）。Ｌ個のグループにそれぞれの代表ピクセル値の初期値ｂｇＣＬＲ（ｌ）（ｌ＝１、２、・・・、Ｌ）を設定する（ステップＳ３２）。なお、代表ピクセル値は、Ｋ−ｍｅａｎｓ法による処理を繰り返すと、いずれ収束するので、初期値は任意に設定してよい。 First, the number of groups L is set, and the processing counter Np of the K-means method is initialized to 1 (step S31). Initial values bgCLR (l) (l = 1, 2,..., L) of the representative pixel values are set in the L groups (step S32). Note that the representative pixel value will eventually converge when the process by the K-means method is repeated, so the initial value may be arbitrarily set.

注目ピクセル（ｉ，ｊ）のピクセル値をＰij（３次元実ベクトル）とし、ノルムＳ（ｌ）＝｜Ｐij−ｂｇＣＬＲ（ｌ）｜（ｌ＝１、２、・・・、Ｌ）を計算する（ステップＳ３３）。Ｓ（ｌ）はＬ^＊ａ^＊ｂ^＊色空間の座標内における注目ピクセルと各グループの暫定的な代表ピクセル値との距離である。Ｓ（ｌ）（ｌ＝１、２、．・・・、Ｌ）のうち最小の値に対応するグループを注目ピクセル（ｉ，ｊ）が属するグループとする。この計算をすべての背景領域内のピクセルについて行い、すべてのピクセルはＬ個のグループのいずれかに分類される。 The pixel value of the pixel of interest (i, j) is Pij (three-dimensional real vector), and norm S (l) = | Pij−bgCLR (l) | (l = 1, 2,..., L) is calculated. (Step S33). S (l) is the distance between the pixel of interest and the provisional representative pixel value of each group in the coordinates of the L ^* a ^* b ^* color space. A group corresponding to the smallest value among S (l) (l = 1, 2,..., L) is a group to which the pixel of interest (i, j) belongs. This calculation is performed for pixels in all background regions, and all pixels are classified into one of L groups.

次に、ステップＳ３３によるグループ化が安定したかどうかを判定する。処理カウンタＮｐが１より大、つまりＫ−ｍｅａｎｓ法の処理が初回でなければ（ステップＳ３４で“Ｎｐ＞１”）、全ピクセルについて、その属するグループが前回のＫ−ｍｅａｎｓ処理の適用結果と同一であるか否かを判定し、同一でないピクセルの個数が所定の閾値（０を含む）を超えていれば（ステップＳ３５でＮｏ）、ステップＳ３６の処理へ移る。Ｋ−ｍｅａｎｓ法の処理が初回の場合も（ステップＳ３４で“Ｎｐ＝１”）、ステップＳ３６の処理へ移る。ステップＳ３６では、同一グループに含まれるピクセルのピクセル値平均を計算し、得られた平均値で各グループの代表ピクセル値を更新するとともに、処理カウンタＮｐに１を加算する。ステップＳ３６の処理を終えると、ステップＳ３３の処理へ戻る。すなわち、ステップＳ３３〜Ｓ３６の処理を、全ピクセルのグループ化が安定するまで繰り返す。 Next, it is determined whether the grouping in step S33 is stable. If the process counter Np is greater than 1, that is, if the process of the K-means method is not the first time (“Np> 1” in step S34), the group to which all the pixels belong is the same as the application result of the previous K-means process. If the number of non-identical pixels exceeds a predetermined threshold (including 0) (No in step S35), the process proceeds to step S36. Even when the process of the K-means method is the first time (“Np = 1” in step S34), the process proceeds to step S36. In step S36, the pixel value average of the pixels included in the same group is calculated, the representative pixel value of each group is updated with the obtained average value, and 1 is added to the processing counter Np. When the process of step S36 is completed, the process returns to the process of step S33. That is, the processes in steps S33 to S36 are repeated until the grouping of all pixels is stabilized.

Ｋ−ｍｅａｎｓ法の処理が２回以上実行され（ステップＳ３４で“Ｎｐ＞１”）、且つＫ−ｍｅａｎｓ処理が収束したならば（ステップＳ３５でＹｅｓ）、各グループの最終的な代表ピクセル値ｂｇＣＬＲ（ｌ）が確定し（ステップＳ３７）、これらの値が後続の文字領域グループ化処理において参照される。 If the process of the K-means method is executed twice or more (“Np> 1” in step S34) and the K-means process has converged (Yes in step S35), the final representative pixel value bgCLR of each group. (L) is confirmed (step S37), and these values are referred to in the subsequent character area grouping process.

《４．画像処理装置による本処理２（文字領域のグループ化）》
文字領域グループ化処理部２４は、処理対象画像の文字領域に含まれる各ピクセルをＫ−ｍｅａｎｓ法に従って所定個数のグループに分類する。以下、図５を参照しながら説明する。 << 4. Main processing 2 by image processing apparatus (grouping of character areas) >>
The character area grouping processing unit 24 classifies each pixel included in the character area of the processing target image into a predetermined number of groups according to the K-means method. Hereinafter, a description will be given with reference to FIG.

文字領域は、最終的にはＮ個（２＝＜Ｎ＜８）の文字領域に固有のグループに分けられる。ただし、文字領域内であっても文字の周囲に背景が映り込むことがあるので、この背景に属するピクセルを文字そのもののピクセルと分離したい。そのために上述した背景領域のグループ化によって得られたＬ個のグループもＫ−ｍｅａｎｓ法による処理で使用する。
文字領域に固有のグループは最終的にはＮ個であるが、当初は８個に設定する。つまり、文字固有８個と背景Ｌ個の合計（８＋Ｌ）個のグループでＫ−ｍｅａｎｓ法による処理を開始する（図５のステップＳ４１）。ここで、Ｋ−ｍｅａｎｓ法の処理カウンタＮｐを１に初期設定する。
ところで、文字ははっきりした色で描画されていたり、あるいは縁取られていたりすることが多い。したがって文字固有グループの初期値として、8色の純色からスタートする。この８色は、光の３原色のＲとＧとＢ、C（シアン）、M（マゼンタ）、Y（イエロー）、および白と黒である。 The character areas are finally divided into groups unique to N (2 = <N <8) character areas. However, since the background may appear around the character even within the character area, we want to separate the pixels belonging to this background from the pixels of the character itself. Therefore, L groups obtained by grouping the background regions described above are also used in the processing by the K-means method.
The number of groups unique to the character area is finally N, but initially eight is set. That is, the process by the K-means method is started with a total of (8 + L) groups of 8 character specifics and L backgrounds (step S41 in FIG. 5). Here, the processing counter Np of the K-means method is initialized to 1.
By the way, the characters are often drawn in a clear color or bordered. Therefore, the initial value of the character specific group starts from 8 pure colors. The eight colors are R, G, and B of the three primary colors of light, C (cyan), M (magenta), Y (yellow), and white and black.

背景グループの代表ピクセル値をｂｇＣＬＲ（ｌ）（ｌ＝１、２、・・・、Ｌ）とし、文字固有グループの代表ピクセル値をｃｈＣＬＲ（ｍ）（ｍ＝１、２、・・・、８）とする。注目ピクセル（ｉ，ｊ）のピクセル値をＱｉｊとする。ピクセル値は３次元実ベクトルであり、ノルムｂｇＳ（ｌ）＝｜Ｑｉｊ−ｂｇＣＬＲ（ｌ）｜（ｌ＝１、２、・・・、Ｌ）およびノルムｃｈＳ（ｍ）＝｜Ｑｉｊ−ｃｈＣＨＲ（ｍ）｜（ｍ＝１、２、・・・、８）を計算する（ステップＳ４２）。ｂｇＳ（ｌ）はＬ^＊ａ^＊ｂ^＊色空間の座標内における注目ピクセルと背景領域の各グループの固定ピクセル値との距離であり、ｃｈＳ（ｍ）は文字領域に固有な各グループの暫定的な代表ピクセル値との距離である。 The representative pixel value of the background group is bgCLR (l) (l = 1, 2,..., L), and the representative pixel value of the character specific group is chCLR (m) (m = 1, 2,..., 8). ). Let Qij be the pixel value of the pixel of interest (i, j). The pixel value is a three-dimensional real vector, and norm bgS (l) = | Qij−bgCLR (l) | (l = 1, 2,..., L) and norm chS (m) = | Qij−chCHR (m ) | (M = 1, 2,..., 8) is calculated (step S42). bgS (l) is the distance between the pixel of interest in the coordinates of the L ^* a ^* b ^* color space and the fixed pixel value of each group in the background region, and chS (m) is a provisional value for each group specific to the character region. This is the distance from the representative pixel value.

全グループについて算出された{ｂｇＳ（１）＊β、・・・、ｂｇＳ（Ｌ）＊β、ｃｈＳ（１）、ｃｈＳ（２）、・・・、ｃｈＳ（８）}のうち最小の値に対応するグループを注目ピクセル（ｉ，ｊ）が属するグループとする。ここで、背景グループとの距離に乗じたβ（例えば、β＝１．５）はバイアスである。バイアスをかけるのが好ましいのは、文字領域内のピクセルは背景領域内のグループの色に近いとしても、できるだけ文字として判定されるようにしたいからである。ただし、β＝１、つまりバイアスをかけなくてもかまわない。この計算をすべての文字領域内のピクセルについて行い、すべてのピクセルを（８＋Ｌ）個のグループの何れかに分類する（ステップＳ４２）。 The minimum value among {bgS (1) * β,..., BgS (L) * β, chS (1), chS (2),..., ChS (8)} calculated for all groups. Let the corresponding group be the group to which the pixel of interest (i, j) belongs. Here, β (for example, β = 1.5) multiplied by the distance to the background group is a bias. It is preferable to apply the bias because pixels in the character region are determined as characters as much as possible even if they are close to the color of the group in the background region. However, β = 1, that is, no bias may be applied. This calculation is performed for pixels in all character regions, and all the pixels are classified into any of (8 + L) groups (step S42).

次に、ステップＳ４２によるグループ化が安定したかどうかを判定する。処理カウンタＮｐが１より大、つまりＫ−ｍｅａｎｓ法の処理が初回でなければ（ステップＳ４３で“Ｎｐ＞１”）、全ピクセルについて、その属するグループが前回のＫ−ｍｅａｎｓ処理の適用結果と同一であるか否かを判定し、同一でないピクセルの個数が所定の閾値（０を含む）を超えていれば（ステップＳ４４でＮｏ）、ステップＳ４５の処理へ移る。Ｋ−ｍｅａｎｓ法の処理が初回の場合も（ステップＳ４３で“Ｎｐ＝１”）、ステップＳ４５の処理へ移る。ステップＳ４５では、同一グループに含まれるピクセルのピクセル値平均を計算し、得られた平均値で各グループの代表ピクセル値を更新する。ただし、背景領域に属するグループについては、更新しない。この代表ピクセル値の更新後に、処理カウンタＮｐに１を加算し（ステップＳ４５）、グループ化が安定するまで（ステップＳ４４でＮｏの場合）、同じグループ個数でＳ４２〜Ｓ４５を繰り返す。グループ化が安定し（ステップＳ４４でＹｅｓ）、且つ文字固有グループ数がＮ個を超えていれば（ステップＳ４６でＮｏ）、グループ内ピクセル個数が最も少ない文字固有グループを削除し（ステップＳ４７）、グループ個数を文字領域７個と背景Ｌ個の合計（７＋Ｌ）個とする。背景領域に属するグループはＬ個のままであって、削除対象としない。１個のグループを削除後、処理カウンタＮｐを１に再初期化した後、再度ステップＳ４２に戻る。ここで、ステップＳ４７で削除されたグループに分類されていたピクセルは、再度実行されるステップＳ４２において、残ったグループの中で最も近い代表ピクセル値を持つグループに吸収される。このようなピクセルを吸収したグループは、続くステップＳ４５において吸収したピクセルを含めたピクセル値平均を再計算する。 Next, it is determined whether or not the grouping in step S42 is stable. If the processing counter Np is greater than 1, that is, if the process of the K-means method is not the first time (“Np> 1” in step S43), the group to which all the pixels belong is the same as the application result of the previous K-means process. If the number of non-identical pixels exceeds a predetermined threshold (including 0) (No in step S44), the process proceeds to step S45. Even when the process of the K-means method is the first time (“Np = 1” in step S43), the process proceeds to step S45. In step S45, the pixel value average of the pixels included in the same group is calculated, and the representative pixel value of each group is updated with the obtained average value. However, the group belonging to the background area is not updated. After updating the representative pixel value, 1 is added to the processing counter Np (step S45), and S42 to S45 are repeated with the same number of groups until the grouping is stabilized (No in step S44). If the grouping is stable (Yes in step S44) and the number of character unique groups exceeds N (No in step S46), the character unique group having the smallest number of pixels in the group is deleted (step S47). The number of groups is a total (7 + L) of 7 character areas and L backgrounds. The number of groups belonging to the background area remains L and is not to be deleted. After deleting one group, the process counter Np is reinitialized to 1, and the process returns to step S42 again. Here, the pixels classified into the group deleted in step S47 are absorbed by the group having the closest representative pixel value among the remaining groups in step S42 to be executed again. The group that has absorbed such pixels recalculates the pixel value average including the pixels that have been absorbed in the subsequent step S45.

Ｋ−ｍｅａｎｓ法によるグループの分類が収束し（ステップＳ４４でＹｅｓ）、かつ文字領域固有のグループ数がＮ個になつたならば（ステップＳ４６でＹｅｓ）、文字領域内のピクセルは、文字領域Ｎ個と背景Ｌ個の合計（Ｎ＋Ｌ）個にグループ化されたことになる。図５に示す処理は、文字領域特定処理によって、特定された文字領域の個数分だけ実行される。図３（ｂ）の例では、文字領域がｃｈＡ，ｃｈＢ，ｃｈＣの３つあるので、文字領域グループ化処理を３回実行することになる。
なお、本実施の形態では、Ｋ−ｍｅａｎｓ法による処理を、背景Ｌ個のグループに加え文字固有の８個のグループから開始しているが、この８個は目標とするＮ個よりも多い。文字固有グループ数の初期値を８にするのは、Ｋ−ｍｅａｎｓ法が初期値に影響されるため、目標とする値よりも大きな値から段階的に減らすことが望ましいからである。さらに、あまり小さいグループ数でＫ−ｍｅａｎｓ法による処理を実行すると、残すべき文字の色のグループが消滅する可能性が出てくる。これらを考慮すると８個のグループから処理を開始することが適当である。しかしながら、グループ数が８個のままでは、本実施形態の出力結果である２値画像の個数が５１０個となって、実装上時間がかかりすぎる。そのため、段階的にグループ数を減らすこととした。 If the group classification by the K-means method converges (Yes in step S44) and the number of groups specific to the character area reaches N (Yes in step S46), the pixels in the character area are represented by the character area N. It is grouped into a total of N and L backgrounds (N + L). The process shown in FIG. 5 is executed by the number of character areas specified by the character area specifying process. In the example of FIG. 3B, since there are three character areas chA, chB, and chC, the character area grouping process is executed three times.
In the present embodiment, the processing by the K-means method is started from eight groups unique to characters in addition to the L groups of backgrounds, but these eight are more than the target N. The reason why the initial value of the number of character unique groups is set to 8 is that the K-means method is affected by the initial value, and therefore it is desirable to reduce the value stepwise from a value larger than the target value. Furthermore, if the process by the K-means method is performed with a very small number of groups, there is a possibility that a group of character colors to be left disappears. Considering these, it is appropriate to start processing from eight groups. However, if the number of groups remains eight, the number of binary images that are output results of the present embodiment is 510, which takes too much time for mounting. Therefore, we decided to reduce the number of groups step by step.

《５．画像処理装置による本処理３（処理対象図形についての２値画像作成）》
２値画像作成部２５は、処理対象画像がＮ＋Ｌ個のグループに分類された結果を受けて、各グループに属するピクセルを白または黒に変換する２値化処理を行う。 << 5. Main processing 3 by image processing apparatus (binary image creation for processing target figure) >>
The binary image creation unit 25 receives a result of the processing target image being classified into N + L groups, and performs a binarization process for converting pixels belonging to each group into white or black.

ここで、背景および文字領域を対象としたグループ化処理が終了してしまえば、もはや各グループの具体的な代表ピクセル値は情報として必要ない。背景部分のピクセルについては、ただ背景として分類されたという結果のみが意味をもつ。従ってピクセル毎に、背景の１個のグループに分類されたか、文字領域のＮ個のグループのいずれかに分類されたかが意味のある情報として残る。
Ｎ＝２の場合、背景を含めたグループ個数が３なので、図６に示すように、各グループを白あるいは黒で塗り分ける場合の数は（１）〜（８）の８通りがある。ただし、（１）と（８）は全グループを同一の色にするので意味がなく、（２）〜（７）の６通りの２値画像が作成されればよい。 Here, once the grouping process for the background and the character area is completed, the specific representative pixel value of each group is no longer necessary as information. For background pixels, only the result of being classified as background is meaningful. Therefore, for each pixel, whether it is classified into one group of backgrounds or one of N groups of character areas remains as meaningful information.
In the case of N = 2, the number of groups including the background is 3, so as shown in FIG. 6, there are eight numbers (1) to (8) when each group is painted with white or black. However, (1) and (8) are meaningless because all groups have the same color, and six binary images (2) to (7) may be created.

図７には、図３（ａ）に例示する処理対象図形から得られた２値画像の内２つを示す。
図７（ａ）も（ｂ）も、図３（ａ）の原画像にあった文字データを洩れなく抽出できてはいない。しかし、１枚の２値画像では不十分であるとしても、複数の２値画像を合成すれば、文字データの抽出洩れを少なくすることができる。文字の色にグラデーションがかかっていたり、文字が縞模様で描かれていたりといった画像であっても高い精度で文字データの抽出が可能である。
なお、Ｎ＝３とすると２値画像の個数は１４個、Ｎ＝４とすると２値画像の個数は３０個となり、文字抽出の精度が上がることになる。処理対象画像に含まれる色の個数や必要とされる精度などを勘案して適切なＮの値を設定すればよい。 FIG. 7 shows two of the binary images obtained from the processing target graphic illustrated in FIG.
In both FIGS. 7A and 7B, character data corresponding to the original image in FIG. 3A cannot be extracted without omission. However, even if a single binary image is not sufficient, omission of character data can be reduced by combining a plurality of binary images. Character data can be extracted with high accuracy even in an image in which gradation is applied to the color of characters or characters are drawn in a striped pattern.
When N = 3, the number of binary images is 14, and when N = 4, the number of binary images is 30, which increases the accuracy of character extraction. An appropriate value of N may be set in consideration of the number of colors included in the processing target image and the required accuracy.

出力された複数の２値画像データは、例えば文字認識を行う外部装置に送られたり、画面表示されたりする。得られた２値画像をどのように利用するかは、本発明とは別の発明の課題である。 The plurality of output binary image data is sent to, for example, an external device that performs character recognition or displayed on the screen. How to use the obtained binary image is a subject of an invention different from the present invention.

以上、本発明の一実施の形態を説明したが、本発明は特許請求の範囲に開示した主旨に従って、種々の変形が可能である。 Although one embodiment of the present invention has been described above, the present invention can be variously modified in accordance with the gist disclosed in the claims.

例えば、上記の実施の形態では、原データをＬ^＊ａ^＊ｂ^＊色空間の座標に変換しているが、原データの元の色情報をそのまま用いてもかまわない。Ｌ^＊ａ^＊ｂ^＊は、人間の視覚の特性に合致しているのでより望ましいというだけである。さらに、文字領域をグループ化する際の初期値として、８つの純色を用いたが、ＲＧＢの３色あるいはＣＭＹＫの４色を用いてもよい。 For example, in the above embodiment, the original data is converted into the coordinates of the L ^* a ^* b ^* color space, but the original color information of the original data may be used as it is. L ^* a ^* b ^* is only desirable because it matches the characteristics of human vision. Further, although eight pure colors are used as initial values when grouping character areas, three colors of RGB or four colors of CMYK may be used.

さらに、上記の実施の形態では、色表現を変換した後、直ちに文字領域特定処理の入力データとしていた。しかし、文字領域特定処理に先行して平滑化によるノイズ除去を行ってもよい。すなわち、本発明の最終段階で得られた２値画像に大量のノイズが含まれているのでは、これらの２値画像に基づく後続処理（例えば、文字認識処理）の精度が低下するので、ｂｉｌａｔｅｒａｌｆｉｌｔｅｒ等によって平滑化し、ノイズの少ない２値画像を出力することが望ましいのである。 Further, in the above-described embodiment, after the color expression is converted, it is used as input data for the character area specifying process immediately. However, noise removal by smoothing may be performed prior to the character region specifying process. That is, if a large amount of noise is included in the binary image obtained in the final stage of the present invention, the accuracy of subsequent processing (for example, character recognition processing) based on these binary images is reduced. It is desirable to smooth the image with a filter or the like and output a binary image with less noise.

上記の実施の形態では、２値化の目的は文字データの抽出であったが、本発明は文字だけでなく、ピクトグラム（絵文字）や交通標識などの抽出を目的としてもよい。これらも文字と同様に、視覚に訴えて情報を伝えたり注意を喚起したりするものだからである。
また、本発明はテレビの映像にオーバーレイされた文字データの抽出などへの利用に適しているが、カラーの印刷物をスキャナで読み取った画像等についても利用できる。 In the above embodiment, the purpose of binarization is extraction of character data. However, the present invention may be intended to extract not only characters but also pictograms (pictograms) and traffic signs. This is because, like letters, they appeal to the eye and convey information and call attention.
Further, the present invention is suitable for use in extracting character data overlaid on a television image, but can also be used for an image obtained by scanning a color printed matter with a scanner.

カラー画像を対象とした画像処理技術、特にテレビのテロップ等からの文字抽出に活用される技術として、広い需要が期待される。 Widespread demand is expected as an image processing technique for color images, particularly as a technique used for character extraction from television telops.

１：画像処理装置
２：処理部
２１：画像取得部
２２：文字領域特定処理部
２３：背景領域グループ化処理部
２４：文字領域グループ化処理部
２５：２値画像作成部
３：記憶部
４：通信インターフェース部
1: Image processing device 2: Processing unit 21: Image acquisition unit 22: Character region specifying processing unit 23: Background region grouping processing unit 24: Character region grouping processing unit 25: Binary image creation unit 3: Storage unit 4: Communication interface part

本発明は、カラー画像を２値化する画像処理方法であって、対象となるカラー画像から、文字が背景に重ね合わせて表示されている領域（以下、「文字領域」）を特定するステップと、前記カラー画像から前記文字領域を除いた領域（以下、「背景領域」）に含まれる各ピクセルをＬ（Ｌ＞＝２）個のグループに分類するステップと、前記文字領域に含まれる各ピクセルを、背景領域のＬ個と文字領域固有のＮ（Ｎ＞＝２）個をあわせたＮ＋Ｌ個のグループに分類するステップと、背景領域のグループを同一のグループとみなし、Ｎ＋１個のグループを２分し、一方に含まれるピクセルを同一の１色で、他方に含まれるピクセルを他の１色で表示する２値画像を作成するステップと、からなることを特徴とする。 The present invention is an image processing method for binarizing a color image , the step of specifying an area (hereinafter referred to as a “character area”) in which characters are displayed superimposed on a background from a target color image ; Classifying each pixel included in an area excluding the character area from the color image (hereinafter, “background area”) into L (L> = 2) groups, and each pixel included in the character area Are classified into N + L groups, which are a combination of L in the background area and N (N> = 2) unique to the character area, and the group in the background area is regarded as the same group, and N + 1 groups are represented by 2 And creating a binary image in which pixels included in one are displayed with the same one color and pixels included in the other are displayed with the other one color.

本発明では、機械学習の機能によって前記文字領域を特定することが望ましい。
なお、下記の実施形態のように、入力されたカラー画像を、ピクセル単位でＬ^＊ａ^＊ｂ^＊表色系の色空間の座標に変換して、この変換後の画像に対して、前記文字領域を特定するステップ以降の処理を行うとよい。
このように、ＲＧＢ値に比べると人間の視覚の特性をよく反映した色表現であるＬ^＊ａ^＊ｂ^＊値に変換するので、人間にとって違和感なく色の類似性を評価できる。
In the present invention, it is desirable to specify the character region by a machine learning function.
Note that, as in the following embodiment, the input color image is converted into coordinates of the color space of the L ^* a ^* b ^* color system in units of pixels, and the character is applied to the converted image. Processing after the step of specifying the region may be performed.
In this way, since the color representation is converted to the L ^* a ^* b ^* value, which is a color expression that better reflects the characteristics of human vision compared to the RGB values, the similarity of colors can be evaluated without a sense of incongruity for humans.

本発明では、特定された文字領域が複数ある場合、特定された文字領域毎に前記文字領域のグループ化を行うことが望ましい。文字領域毎に文字に使用されている色が異なることが多いからである。
なお、下記の実施形態のように、前記文字領域固有のグループ化に使用される当初のＭ色は、Ｒ，Ｇ，Ｂ，Ｃ（シアン）、Ｍ（マゼンダ）、Ｙ（イエロー），白、黒の８色であるとよい。文字の色は、黒や青など純色が多いので、文字領域を対象とするＫ−ｍｅａｎｓ処理は、これらの色からスタートすることが望ましい。
In the present invention, when there are a plurality of specified character areas, it is desirable to group the character areas for each specified character area. This is because the color used for the character is often different for each character area.
As in the following embodiment, the initial M colors used for grouping specific to the character area are R, G, B, C (cyan), M (magenta), Y (yellow), white, It should be 8 black colors. Since there are many pure colors such as black and blue, the K-means processing for the character area is preferably started from these colors.

Claims

An image processing method for binarizing a color image, the step of specifying a character area excluding a background area from a target color image, and L (L> = 2) pixels included in the background area A step of classifying each pixel included in the character region into a group of N + L that is a combination of L in the background region and N (N> = 2) unique to the character region, Consider a group of regions as the same group, divide N + 1 groups into two, and create a binary image that displays the pixels contained in one with the same color and the pixels contained in the other with the other color An image processing method comprising: steps.

The image processing method according to claim 1, wherein the input color image is converted into coordinates in a color space of the L ^* a ^* b ^* color system in pixel units.

The background region is grouped by the K-means method for the number of colors L, and the character region is grouped by the K-means method for the number of colors initially M + L, and the M region unique to the character region. 3. The process of deleting a group having the smallest number of pixels belonging to (M> N) groups is repeated until M reaches a final number N. 4. Image processing method.

The initial M colors used for grouping specific to the character area are eight colors of R, G, B, C (cyan), M (magenta), Y (yellow), white, and black. The image processing method according to claim 3.

In order to binarize the color image, a step of identifying a character area excluding the background area from the target color image, and L (L> = 2) pixels included in the background area A step of classifying the pixels into the group, a step of classifying each pixel included in the character region into N + L groups that are a combination of L (N> = 2) unique to the character region, and a background region. A binary image in which N + 1 groups are divided into two, and pixels included in one are displayed with the same color and pixels included in the other are displayed with the other color. And a computer program for executing the above.

6. The computer program according to claim 5, wherein the input color image is converted into coordinates in a color space of the L ^* a ^* b ^* color system in units of pixels.

The background region is grouped by the K-means method for the number of colors L, and the character region is grouped by the K-means method for the number of colors initially M + L, and the M region unique to the character region. 7. The process of deleting a group having the smallest number of pixels belonging to (M> N) groups is repeated until M reaches a final number N. 8. Computer program.

The initial M colors used for grouping specific to the character area are eight colors of R, G, B, C (cyan), M (magenta), Y (yellow), white, and black. The computer program according to claim 7.