JP2017130709A

JP2017130709A - Image processing apparatus, image processing method and image processing program

Info

Publication number: JP2017130709A
Application number: JP2016006928A
Authority: JP
Inventors: 直史和田; Tadashi Wada
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2016-01-18
Filing date: 2016-01-18
Publication date: 2017-07-27

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus capable of detecting a position of generation or a strength of mosquito noise easily, through processing with a little capacity of a memory to be used, without using coded information and while using only a decoded image, and capable of effectively cancelling the noise on the basis of a result of the detection.SOLUTION: An image processing apparatus 1 comprises: a local feature amount generation part 10 for generating multiple local feature amounts including at least a relative flat pixel count feature amount Frc indicating the number of peripheral pixels in the case where a center pixel in a local region of a decoded image is flatter than each of peripheral pixels in the local region; an evaluation value generation part 20 which weights and merges the multiple local feature amounts to calculate a noise evaluation value E for each pixel of the decoded image; a mixture ratio generation part 30 which generates a mixture ratio R on the basis of the noise evaluation value E; a filter processing part 40 which generates a smoothed image of the decoded image; and an image mixture processing part 50 which mixes the decoded image and the smoothed image on the basis of the mixture ratio R, thereby generating an output image.SELECTED DRAWING: Figure 3

Description

本発明は画像処理装置、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program.

フルＨＤ（１９２０×１０８０画素）や４Ｋ（３８４０×２１６０画素）などの高解像度映像のデータ量は膨大であり、放送、インターネット配信、記録メディアへの蓄積などを実現するために、画質をできるだけ保持したままでデータ量を削減する圧縮符号化技術が必要不可欠となっている。 The amount of data of high-resolution video such as full HD (1920 x 1080 pixels) and 4K (3840 x 2160 pixels) is enormous, and image quality is maintained as much as possible in order to realize broadcasting, Internet distribution, storage in recording media, etc. Therefore, a compression encoding technique that reduces the amount of data while maintaining it is indispensable.

非可逆圧縮符号化は、圧縮率を大きくする、すなわち、データ量を大幅に削減するほどノイズなどの画質劣化が増大する。このため、テレビやスマートフォンなどの表示機器では、受信した映像に対しノイズを除去するための補正処理を適用することにより高画質化を行うのが一般的である。 In lossy compression encoding, image quality degradation such as noise increases as the compression rate is increased, that is, the amount of data is significantly reduced. For this reason, display devices such as televisions and smartphones generally improve image quality by applying correction processing for removing noise to received video.

特に、ＪＰＥＧやＭＰＥＧ−２では、モスキートノイズと呼ばれるエッジ周辺に蚊が飛ぶように見える高周波ノイズが発生する。そこで、画像処理によってモスキートノイズを除去する技術が古くから研究されてきた。 In particular, in JPEG and MPEG-2, high-frequency noise called mosquito noise that appears to fly around the edge is generated. Therefore, techniques for removing mosquito noise by image processing have been studied for a long time.

モスキートノイズは高周波成分のノイズであり、バイラテラルフィルタ、イプシロンフィルタなどのローパスフィルタによって画像の平滑化を行い、ノイズを除去する方法が一般的である。しかしながら、画像全体に対して一様に平滑化を行うと、ノイズだけでなくテクスチャなど重要な画像構造も同時に失われてしまうという副作用が生じる。そこで、予めモスキートノイズが発生している領域を検出し、その領域のみに平滑化処理を行う方法が数多く提案されている。 Mosquito noise is high-frequency component noise, and generally a method of removing noise by smoothing an image with a low-pass filter such as a bilateral filter or an epsilon filter. However, when smoothing the entire image uniformly, there is a side effect that important image structures such as texture as well as noise are lost at the same time. Therefore, many methods have been proposed in which a region where mosquito noise is generated in advance is detected and smoothing processing is performed only on that region.

例えば、特許文献１には、エッジ検出によりエッジ画素を特定してエッジ画素周辺にエッジマップを生成し、フラット検出によりフラット画素を特定してフラット画素周辺にフラットマップを生成し、注目画素のエッジマップとフラットマップと２次微分値とを乗算したものをモスキートノイズレベルとしてノイズを検出し、検出した領域に対してエッジ保存平滑化フィルタを適用することが記載されている。 For example, in Patent Document 1, an edge pixel is identified by edge detection and an edge map is generated around the edge pixel, a flat pixel is identified by flat detection and a flat map is generated around the flat pixel, and the edge of the target pixel is detected. It describes that noise is detected using a product obtained by multiplying a map, a flat map, and a secondary differential value as a mosquito noise level, and an edge preserving smoothing filter is applied to the detected region.

また、符号化された映像データを受信機側で復号したときに得られる符号化情報（ピクチャタイプ、変換係数、量子化パラメータ、動きベクトルなど）を利用し、モスキートノイズの発生箇所や強度を特定する技術もある。 Also, using the encoded information (picture type, transform coefficient, quantization parameter, motion vector, etc.) obtained when the encoded video data is decoded on the receiver side, the location and intensity of mosquito noise are identified. There is also technology to do.

特表２０１２−５０２５４０号公報Special table 2012-502540 gazette

しかしながら、特許文献１に係る技術は、エッジの強度や複雑さに依存するノイズの視認性を考慮しておらず、また、フラットマップが必須となっているため、エッジが複雑に入り組んだ文字の間などフラットが検出できない領域のノイズは除去することができないという問題がある。 However, the technology according to Patent Document 1 does not consider the visibility of noise depending on the strength and complexity of the edge, and since a flat map is essential, the character with complicated edges is complicated. There is a problem that noise in areas where flats cannot be detected, such as gaps, cannot be removed.

また、符号化情報を利用して、モスキートノイズの発生箇所や強度を特定する技術は、受信機と表示機器とが異なる場合（例えば、ＳＴＢで受信・復号した映像信号をＨＤＭＩ（登録商標）ケーブルでテレビに転送して表示する場合など）に、表示機器側で符号化情報を利用できないという問題がある。 Also, the technology for identifying the location and intensity of mosquito noise using encoded information is different when the receiver and the display device are different (for example, an HDMI (registered trademark) cable for a video signal received and decoded by an STB). For example, when the information is transferred to a television and displayed, the encoded information cannot be used on the display device side.

このため、符号化情報を用いずに復号画像のみを用いてモスキートノイズの発生位置や強度を検出し、その検出結果に基づいて効果的にノイズを除去する技術が求められている。 For this reason, there is a need for a technique for detecting the position and intensity of mosquito noise using only decoded images without using encoded information, and effectively removing noise based on the detection results.

また、上記ノイズ除去を画像処理プロセッサなどのハードウェアで実現する場合、演算量やメモリ使用量が増加すると回路規模が増加し、製品コストの増加につながることから、できるだけ簡易かつメモリ使用量の少ない処理でノイズを検出・除去する方法が求められている。 In addition, when the above noise removal is realized by hardware such as an image processor, an increase in the amount of computation and memory usage increases the circuit scale and increases the product cost. Therefore, it is as simple as possible and uses less memory. There is a need for a method for detecting and removing noise by processing.

本発明は、このような問題を解決するためになされたものであり、簡易かつメモリ使用量の少ない処理で、符号化情報を用いずに復号画像のみを用いてモスキートノイズの発生位置や強度を検出し、その検出結果に基づいて効果的にノイズを除去することができる画像処理装置、画像処理方法及び画像処理プログラムを提供することを目的とする。 The present invention has been made to solve such a problem, and is a simple process with a small memory usage, and the position and intensity of occurrence of mosquito noise can be determined using only decoded images without using encoded information. An object of the present invention is to provide an image processing apparatus, an image processing method, and an image processing program capable of detecting and effectively removing noise based on the detection result.

本発明に係る画像処理装置は、復号画像の局所領域の中心画素が局所領域の個々の周辺画素よりも平坦である場合の周辺画素の数を示す相対的フラット画素数特徴量を少なくとも含む複数のローカル特徴量を生成するローカル特徴量生成部と、複数のローカル特徴量を重み付け統合して復号画像の画素毎のノイズ評価値を算出する評価値生成部と、ノイズ評価値に基づいて混合率を生成する混合率生成部と、復号画像の平滑化画像を生成するフィルタ処理部と、復号画像と平滑化画像とを混合率に基づいて混合し、出力画像を生成する画像混合処理部とを備えるものである。 An image processing apparatus according to the present invention includes a plurality of features including at least a relative flat pixel number feature amount indicating the number of peripheral pixels when a central pixel of a local region of a decoded image is flatter than individual peripheral pixels of the local region. A local feature value generating unit that generates a local feature value, an evaluation value generating unit that calculates a noise evaluation value for each pixel of the decoded image by weighting and integrating a plurality of local feature values, and a mixing ratio based on the noise evaluation value A mixing rate generation unit to generate, a filter processing unit to generate a smoothed image of the decoded image, and an image mixing processing unit to mix the decoded image and the smoothed image based on the mixing rate and generate an output image Is.

また、本発明に係る画像処理方法は、復号画像の局所領域の中心画素が局所領域の個々の周辺画素よりも平坦である場合の周辺画素の数を示す相対的フラット画素数特徴量を少なくとも含む複数のローカル特徴量を生成するステップと、複数のローカル特徴量を重み付け統合して復号画像の画素毎のノイズ評価値を算出するステップと、ノイズ評価値に基づいて混合率を生成するステップと、復号画像の平滑化画像を生成するステップと、復号画像と平滑化画像とを混合率に基づいて混合し、出力画像を生成するステップとを有するものである。 Further, the image processing method according to the present invention includes at least a relative flat pixel number feature amount indicating the number of peripheral pixels when the central pixel of the local region of the decoded image is flatter than each peripheral pixel of the local region. A step of generating a plurality of local feature values, a step of calculating a noise evaluation value for each pixel of the decoded image by weighted integration of the plurality of local feature values, a step of generating a mixing ratio based on the noise evaluation value, The method includes a step of generating a smoothed image of the decoded image, and a step of mixing the decoded image and the smoothed image based on the mixing ratio to generate an output image.

また、本発明に係る画像処理プログラムは、コンピュータに、復号画像の局所領域の中心画素が局所領域の個々の周辺画素よりも平坦である場合の周辺画素の数を示す相対的フラット画素数特徴量を少なくとも含む複数のローカル特徴量を生成する手順と、複数のローカル特徴量を重み付け統合して復号画像の画素毎のノイズ評価値を算出する手順と、ノイズ評価値に基づいて混合率を生成する手順と、復号画像の平滑化画像を生成する手順と、復号画像と平滑化画像とを混合率に基づいて混合し、出力画像を生成する手順とを実行させるためのものである。 In addition, the image processing program according to the present invention allows the computer to display a relative flat pixel number feature amount indicating the number of peripheral pixels when the central pixel of the local region of the decoded image is flatter than each peripheral pixel of the local region. For generating a plurality of local feature quantities including at least, a procedure for calculating a noise evaluation value for each pixel of a decoded image by weighted integration of the plurality of local feature quantities, and generating a mixing ratio based on the noise evaluation value This is a procedure for executing a procedure, a procedure for generating a smoothed image of the decoded image, and a procedure for mixing the decoded image and the smoothed image based on the mixing ratio to generate an output image.

本発明により、簡易かつメモリ使用量の少ない処理で、符号化情報を用いずに復号画像のみを用いてモスキートノイズの発生位置や強度を検出し、その検出結果に基づいて効果的にノイズを除去する画像処理装置、画像処理方法及び画像処理プログラムを提供することができる。 According to the present invention, the generation position and intensity of mosquito noise are detected using only decoded images without using encoded information, and processing is simple and requires less memory, and noise is effectively removed based on the detection results. An image processing apparatus, an image processing method, and an image processing program can be provided.

実施の形態１に係る画像処理の概要を説明するための図である。FIG. 5 is a diagram for explaining an overview of image processing according to the first embodiment. 実施の形態１に係る画像処理の復号画像及び出力画像の例である。4 is an example of a decoded image and an output image of image processing according to Embodiment 1. 実施の形態１に係る画像処理装置１の概略構成を示す図である。1 is a diagram illustrating a schematic configuration of an image processing apparatus 1 according to Embodiment 1. FIG. 実施の形態１に係るローカル特徴量生成部１０の構成を示す図である。3 is a diagram illustrating a configuration of a local feature value generation unit 10 according to Embodiment 1. FIG. 実施の形態１に係る第１ローカル特徴量生成部１１の構成を示す図である。FIG. 3 is a diagram illustrating a configuration of a first local feature quantity generation unit 11 according to the first embodiment. 実施の形態１に係る方向インデクスｋを説明するための図である。6 is a diagram for explaining a directional index k according to Embodiment 1. FIG. 実施の形態１に係る第１ローカル特徴量をそれぞれ画像化したものの例である。It is an example of what each imaged the 1st local feature-value which concerns on Embodiment 1. FIG. 実施の形態１に係るマスク値Ｂmaskの例である。6 is an example of a mask value Bmask according to the first embodiment. 実施の形態１に係る第２ローカル特徴量生成部１２の構成を示す図であるIt is a figure which shows the structure of the 2nd local feature-value production | generation part 12 which concerns on Embodiment 1. FIG. 実施の形態１に係る第２ローカル特徴量をそれぞれ画像化したものの例である。It is an example of what imaged the 2nd local feature-value which concerns on Embodiment 1. FIG. 実施の形態１に係る重み関数の例である。3 is an example of a weight function according to the first embodiment. 実施の形態１に係るノイズ評価値Ｅを画像化したものの例である。It is an example of what imaged the noise evaluation value E which concerns on Embodiment 1. FIG. 実施の形態１に係る混合率Ｒに関する重み関数の例である。It is an example of the weight function regarding the mixture rate R which concerns on Embodiment 1. FIG. 実施の形態１に係る混合率Ｒを画像化したものの例である。It is an example of what imaged the mixture rate R which concerns on Embodiment 1. FIG. 実施の形態１に係るバイラテラルフィルタによるフィルタ処理の例である。6 is an example of filter processing by the bilateral filter according to the first embodiment. 実施の形態１に係る画像処理装置１による出力画像の例である。3 is an example of an output image by the image processing apparatus 1 according to the first embodiment. 実施の形態２に係る画像処理装置２０１の概略構成を示す図である。3 is a diagram illustrating a schematic configuration of an image processing apparatus 201 according to Embodiment 2. FIG. 実施の形態２に係るグローバル特徴量生成部２６０の構成を示す図である。FIG. 10 is a diagram illustrating a configuration of a global feature value generation unit 260 according to Embodiment 2. 実施の形態２に係る重み関数ｗtxの例である。10 is an example of a weight function wtx according to the second embodiment. 実施の形態３に係る画像処理装置３０１の一部分の概略構成を示す図である。FIG. 10 is a diagram illustrating a schematic configuration of a part of an image processing apparatus 301 according to a third embodiment. 実施の形態３に係る重み関数ｗnsの例である。10 is an example of a weight function wns according to the third embodiment. 実施の形態４に係る画像処理装置４０１の一部分の概略構成を示す図である。FIG. 10 is a diagram illustrating a schematic configuration of a part of an image processing apparatus 401 according to a fourth embodiment. 実施の形態４に係る静止エッジ画素値Ｂstillの算出手順を示すフローチャートである。14 is a flowchart illustrating a calculation procedure of a still edge pixel value Bstill according to the fourth embodiment. 実施の形態４に係るサブタイトル出現位置の例である。It is an example of the subtitle appearance position which concerns on Embodiment 4. FIG. 実施の形態４に係るサブタイトル特徴量Ｆstに関する重み関数ｗstの例である。It is an example of the weight function wst regarding the subtitle feature-value Fst which concerns on Embodiment 4. FIG.

以下、図面を参照して各実施の形態に係る画像処理装置、画像処理方法又は画像処理プログラムについて説明する。
なお、本明細書で説明する「画像」には、「静止画像」及び「動画像（映像）」が含まれる。 Hereinafter, an image processing apparatus, an image processing method, or an image processing program according to each embodiment will be described with reference to the drawings.
The “image” described in this specification includes “still image” and “moving image (video)”.

（実施の形態１）
まず、本実施の形態１に係る、モスキートノイズを検出し除去する画像処理について、その概要を説明する。 (Embodiment 1)
First, an outline of image processing for detecting and removing mosquito noise according to the first embodiment will be described.

図１は、本実施の形態１に係る画像処理の概要を説明するための図である。
入力画像は、デジタルカメラやデジタルビデオカメラなどの撮像機器により取得した圧縮前のデジタル画像データである。送信側では、入力画像を伝送又は蓄積するために、所望のデータ量に収まるように圧縮符号化し、送信又は蓄積する。 FIG. 1 is a diagram for explaining an overview of image processing according to the first embodiment.
The input image is digital image data before compression acquired by an imaging device such as a digital camera or a digital video camera. On the transmission side, in order to transmit or store the input image, the input image is compressed and encoded so as to be within a desired data amount, and is transmitted or stored.

受信側では、圧縮符号化された画像データを受信し、復号して復号画像を生成する。デジタルテレビは、通常、復号装置を内部に備えており、受信・復号して表示するまでの処理を全て機器内で行うことが可能である。一方、セットトップボックス（ＳＴＢ）やハードディスクレコーダー（ＨＤＤ）などから映像信号のみをＨＤＭＩケーブルなどを介して取得して表示することもある。 On the receiving side, the compressed and encoded image data is received and decoded to generate a decoded image. A digital television normally includes a decoding device, and can perform all processes from reception / decoding to display within the device. On the other hand, only a video signal may be acquired and displayed via a HDMI cable or the like from a set top box (STB) or a hard disk recorder (HDD).

本実施の形態１に係る画像処理では、復号画像を取得してから出力画像を表示するまでの処理を表示機器内部で行うものとする。また、取得した復号画像に対して表示機器内部の画像処理装置でモスキートノイズを検出して除去し、補正した画像を出力画像として表示する。 In the image processing according to the first embodiment, it is assumed that processing from obtaining a decoded image to displaying an output image is performed inside the display device. Further, mosquito noise is detected and removed from the acquired decoded image by an image processing device inside the display device, and the corrected image is displayed as an output image.

本実施の形態１に係る画像処理では、復号画像は、輝度成分（Ｙ）と２つの色差成分（Ｃｂ，Ｃｒ）との３つの成分から成るＹＣｂＣｒ表色系で表されるものとし、各成分の１画素は８ビット（０以上２５５以下の整数値）で表現されるものとする。 In the image processing according to the first embodiment, the decoded image is represented by a YCbCr color system composed of three components, a luminance component (Y) and two color difference components (Cb, Cr). 1 pixel is expressed by 8 bits (an integer value of 0 to 255).

図２は、本実施の形態１に係る画像処理の復号画像及び出力画像の例である。
復号画像では、圧縮符号化により、エッジ周辺（例えば、番組テロップのような文字の周辺）にモヤモヤと蚊が飛ぶように見えるモスキートノイズが発生している。これに対し、出力画像では当該モスキートノイズが除去されて、クリアな画像が得られている。 FIG. 2 is an example of a decoded image and an output image of the image processing according to the first embodiment.
In the decoded image, mosquito noise is generated due to compression coding so that moyamoya and mosquitoes appear to fly around edges (for example, around characters such as program telops). On the other hand, in the output image, the mosquito noise is removed and a clear image is obtained.

図３は、本実施の形態１に係る画像処理装置１の概略構成を示す図である。
画像処理装置１は、復号画像を入力し、補正して、出力画像を出力する。このために、画像処理装置１は、ローカル特徴量生成部１０、評価値生成部２０、混合率生成部３０、フィルタ処理部４０、画像混合処理部５０などを備えている。 FIG. 3 is a diagram showing a schematic configuration of the image processing apparatus 1 according to the first embodiment.
The image processing apparatus 1 inputs a decoded image, corrects it, and outputs an output image. For this purpose, the image processing apparatus 1 includes a local feature value generation unit 10, an evaluation value generation unit 20, a mixing rate generation unit 30, a filter processing unit 40, an image mixing processing unit 50, and the like.

ローカル特徴量生成部１０は、復号画像を入力し、予め決められた範囲内（例えば、１５×１５画素の局所領域）の画素のみを用いて、モスキートノイズの発生可能性及び視認性を特定するための局所的な（ローカルな）画像特徴量である複数のローカル特徴量を生成し、評価値生成部２０に出力する。 The local feature generation unit 10 inputs the decoded image and specifies the possibility of occurrence of mosquito noise and the visibility using only pixels within a predetermined range (for example, a local region of 15 × 15 pixels). A plurality of local feature amounts, which are local (local) image feature amounts, are generated and output to the evaluation value generation unit 20.

評価値生成部２０は、複数のローカル特徴量を取得し、これらを重み付けし、統合することにより、モスキートノイズの発生可能性及び視認性を表すノイズ評価値を一画素毎に算出し、混合率生成部３０に出力する。 The evaluation value generation unit 20 obtains a plurality of local feature quantities, weights them, and integrates them, thereby calculating a noise evaluation value representing the possibility of occurrence of mosquito noise and visibility for each pixel, and the mixing ratio Output to the generation unit 30.

混合率生成部３０は、ノイズ評価値を取得し、後述する平滑化画像と復号画像を混合するための混合率を生成し、画像混合処理部５０に出力する。
フィルタ処理部４０は、復号画像を入力し、エッジ保存型平滑化フィルタ（例えば、バイラテラルフィルタ、イプシロンフィルタなど）により復号画像を平滑化し、平滑化画像を画像混合処理部５０に出力する。 The mixing rate generation unit 30 acquires a noise evaluation value, generates a mixing rate for mixing a smoothed image and a decoded image, which will be described later, and outputs the mixing rate to the image mixing processing unit 50.
The filter processing unit 40 receives the decoded image, smoothes the decoded image using an edge preserving smoothing filter (for example, a bilateral filter, an epsilon filter, etc.), and outputs the smoothed image to the image mixing processing unit 50.

画像混合処理部５０は、復号画像、混合率、平滑化画像を入力し、復号画像と平滑化画像とを混合率に従って混合することにより、モスキートノイズを除去した出力画像を生成して出力する。 The image mixing processing unit 50 receives the decoded image, the mixing rate, and the smoothed image, mixes the decoded image and the smoothed image according to the mixing rate, and generates and outputs an output image from which mosquito noise is removed.

なお、画像処理装置１が実現する各構成要素は、例えば、コンピュータである画像処理装置１が備える演算装置（図示せず）の制御によって、プログラムを実行させることにより実現できる。 Each component realized by the image processing device 1 can be realized by executing a program under the control of an arithmetic device (not shown) included in the image processing device 1 which is a computer, for example.

より具体的には、画像処理装置１は、記憶部（図示せず）に格納されたプログラムを主記憶装置（図示せず）にロードし、演算装置の制御によってプログラムを実行して実現する。また、各構成要素は、プログラムによるソフトウェアで実現することに限ることなく、ハードウェア、ファームウェア及びソフトウェアのうちのいずれかの組み合わせなどにより実現しても良い。 More specifically, the image processing apparatus 1 is realized by loading a program stored in a storage unit (not shown) into a main storage device (not shown) and executing the program under the control of an arithmetic device. Each component is not limited to being realized by software by a program, and may be realized by any combination of hardware, firmware, and software.

上述したプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、画像処理装置１に供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。 The above-described program can be stored using various types of non-transitory computer readable media and supplied to the image processing apparatus 1. Non-transitory computer readable media include various types of tangible storage media.

非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えば、フレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば、光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random access memory））を含む。 Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-ROMs. R, CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)).

また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によって画像処理装置１に供給されても良い。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバなどの有線通信路、又は無線通信路を介して、プログラムを画像処理装置１に供給できる。 Further, the program may be supplied to the image processing apparatus 1 by various types of temporary computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can supply the program to the image processing apparatus 1 via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

次に、本実施の形態１に係る画像処理装置１の各構成について、詳細に説明する。
図４は、本実施の形態１に係るローカル特徴量生成部１０の構成を示す図である。
ローカル特徴量生成部１０は、復号画像を入力して、複数のローカル特徴量を出力する。このために、ローカル特徴量生成部１０は、第１ローカル特徴量生成部１１、第２ローカル特徴量生成部１２などを備える。 Next, each configuration of the image processing apparatus 1 according to the first embodiment will be described in detail.
FIG. 4 is a diagram illustrating a configuration of the local feature value generation unit 10 according to the first embodiment.
The local feature value generation unit 10 receives the decoded image and outputs a plurality of local feature values. For this purpose, the local feature value generation unit 10 includes a first local feature value generation unit 11, a second local feature value generation unit 12, and the like.

図５は、本実施の形態１に係る第１ローカル特徴量生成部１１の構成を示す図である。
第１ローカル特徴量生成部１１は、復号画像の輝度値Ｙ（本実施の形態１では輝度成分のみを使用する）を取得し、第１ローカル特徴量であるマスク値Ｂmask、標準偏差値Ｖstdv、エッジ画素値Ｂedge、フラット画素値Ｂflat、隣接画素差分絶対値和Ｖsad1、微小変動画素値Ｂsmall、隣接画素２次差分絶対値和Ｖsad2を出力する。
なお、ここでは、画素値の分散値、標準偏差値Ｖstdv、隣接画素差分絶対値和Ｖsad1又は隣接画素２次差分絶対値和Ｖsad2を基本特徴量と呼んでも良い。 FIG. 5 is a diagram illustrating a configuration of the first local feature value generation unit 11 according to the first embodiment.
The first local feature value generation unit 11 acquires the luminance value Y of the decoded image (only the luminance component is used in the first embodiment), and the first local feature value is a mask value Bmask, a standard deviation value Vstdv, The edge pixel value Bedge, the flat pixel value Bflat, the adjacent pixel difference absolute value sum Vsad1, the minute variation pixel value Bsmall, and the adjacent pixel secondary difference absolute value sum Vsad2 are output.
Here, the variance value of the pixel value, the standard deviation value Vstdv, the adjacent pixel difference absolute value sum Vsad1 or the adjacent pixel secondary difference absolute value sum Vsad2 may be referred to as a basic feature amount.

ここで、Ｖは多値、Ｂは２値（０又は１）であることを意味する。出力する単位は、注目画素を中心とした予め指定した大きさの局所領域（例えば、１５×１５画素）とし、局所領域内の一画素に対して一つの値を持つベクトルとして出力する。ソフトウェアで実現する場合には、マスク値Ｂｍａｓｋ以外の特徴量の予め全ての画素に対する値を計算して保存しておき、後述する第２ローカル特徴量で使用する領域のみを読み込んで使用するようにしても良い。 Here, V means multivalue, and B means binary (0 or 1). The unit to be output is a local region (for example, 15 × 15 pixels) of a predetermined size centered on the pixel of interest, and is output as a vector having one value for one pixel in the local region. When implemented by software, values for all the pixels of the feature quantity other than the mask value Bmask are calculated and stored in advance, and only the area used for the second local feature quantity to be described later is read and used. May be.

このために、第１ローカル特徴量生成部１１は、標準偏差算出部１１１、エッジ画素生成部１１２、フラット画素生成部１１３、隣接画素差分絶対値和算出部１１４、隣接画素２次差分絶対値和算出部１１５、微小変動画素生成部１１６、局所一様領域マスク生成部１１７などを備える。 For this purpose, the first local feature generation unit 11 includes a standard deviation calculation unit 111, an edge pixel generation unit 112, a flat pixel generation unit 113, an adjacent pixel difference absolute value sum calculation unit 114, and an adjacent pixel secondary difference absolute value sum. A calculation unit 115, a minute variation pixel generation unit 116, a local uniform region mask generation unit 117, and the like are provided.

以下、第１ローカル特徴量生成部１１の各構成と各特徴量の算出方法とについて説明する。ここでは、復号画像内における注目画素の座標を（ｘ,ｙ）、当該注目画素の輝度値をＹ（ｘ,ｙ）とする。 Hereinafter, each configuration of the first local feature value generation unit 11 and a calculation method of each feature value will be described. Here, the coordinate of the pixel of interest in the decoded image is (x, y), and the luminance value of the pixel of interest is Y (x, y).

標準偏差算出部１１１は、次の式（１）に従って標準偏差値Ｖstdvを算出する。
The standard deviation calculation unit 111 calculates the standard deviation value Vstdv according to the following equation (1).

ここでは、演算コスト削減のために簡易的に絶対値による近似的な標準偏差を用いたが、二乗の平方根による通常の標準偏差を用いても良いし、平方根の計算を削減するために分散値を用いても良い。ただし、標準偏差値Ｖstdvはその算出方法が異なると、値のとりうる範囲が異なるため、算出方法に応じて後述するエッジ画素値Ｂedgeやフラット画素値Ｂflatを算出する際の閾値を調整する必要がある。また、ここでは注目画素を中心とした３×３画素の領域の標準偏差を用いたが、５×５画素や７×７画素などの領域を用いても良い。 Here, an approximate standard deviation based on an absolute value is simply used to reduce the calculation cost, but an ordinary standard deviation based on a square root of a square may be used, and a variance value may be used to reduce the calculation of the square root. May be used. However, the standard deviation value Vstdv varies depending on the calculation method, and the range of values that can be taken differs. Therefore, it is necessary to adjust the threshold value for calculating the edge pixel value Bedge and the flat pixel value Bflat described later according to the calculation method. is there. In addition, here, the standard deviation of the 3 × 3 pixel area centered on the target pixel is used, but an area such as 5 × 5 pixel or 7 × 7 pixel may be used.

エッジ画素生成部１１２は、次の式（２）に従ってエッジ画素値Ｂedgeを算出する。ここで、Ｔedgeは、閾値を示す。
The edge pixel generation unit 112 calculates the edge pixel value Bedge according to the following equation (2). Here, Tedge indicates a threshold value.

フラット画素生成部１１３は、次の式（３）に従ってフラット画素値Ｂflatを算出する。ここで、Ｔflatは、閾値を示す。
The flat pixel generation unit 113 calculates a flat pixel value Bflat according to the following equation (3). Here, Tflat represents a threshold value.

隣接画素差分絶対値和算出部１１４は、次の式（４）に従って隣接画素差分絶対値和Ｖsad1を算出する。
ここでは、方向インデクスｋを用いる。
図６は、本実施の形態１に係る方向インデクスｋを説明するための図である。 The adjacent pixel difference absolute value sum calculation unit 114 calculates the adjacent pixel difference absolute value sum Vsad1 according to the following equation (4).
Here, the direction index k is used.
FIG. 6 is a diagram for explaining the direction index k according to the first embodiment.

隣接画素２次差分絶対値和算出部１１５は、次の式（５）に従って隣接画素２次差分絶対値和Ｖsad2を算出する。ここでも、ｋは方向インデクスを示し、図６に従う。
The adjacent pixel secondary difference absolute value sum calculation unit 115 calculates the adjacent pixel secondary difference absolute value sum Vsad2 according to the following equation (5). Again, k indicates the direction index and follows FIG.

微小変動画素生成部１１６は、次の式（６）に従って微小変動画素値Ｂsmallを算出する。ここで、Ｔhigh、Ｔlowはそれぞれ上限・下限の閾値を示す。
The minute variation pixel generation unit 116 calculates the minute variation pixel value Bsmall according to the following equation (6). Here, Thigh and Tlow represent upper and lower thresholds, respectively.

図７は、本実施の形態１に係る第１ローカル特徴量（マスク値Ｂmaskを除く）をそれぞれ画像化したものの例である。左上は参照のための復号画像である。２値データのＢは０を黒、１を白で示し、多値データのＶは、値が０のときを黒として、値が大きくなるに従って白になるようなグレースケールで示している。 FIG. 7 is an example of images of the first local feature values (excluding the mask value Bmask) according to the first embodiment. The upper left is a decoded image for reference. B of binary data is shown as 0 for black, 1 is shown as white, and V of multi-value data is shown as gray when the value is 0, and becomes gray as the value increases.

局所一様領域マスク生成部１１７は、輝度値Ｙ及び標準偏差値Ｖstdvを用いて、次の式（７）によりマスク値Ｂmaskを算出する。注目画素（ｘ，ｙ）を中心とする１５×１５画素を局所領域とする場合は、局所領域内の注目画素（ｘ，ｙ）以外の画素である周辺画素の座標（ｉ，ｊ）は、それぞれ−７から７の値をとる。 The local uniform region mask generation unit 117 calculates the mask value Bmask by the following equation (7) using the luminance value Y and the standard deviation value Vstdv. When a 15 × 15 pixel centered on the pixel of interest (x, y) is a local region, the coordinates (i, j) of peripheral pixels that are pixels other than the pixel of interest (x, y) in the local region are Each takes a value from -7 to 7.

ここで、Ｔmaskはｗmaskに対する閾値を、sは位置を、||ｘ||はユークリッド距離をそれぞれ示す。また、Ｇσは標準偏差σのガウス分布を示し、Ｇσ（ｘ）＝ｅｘｐ（−ｘ／２σ^２）で示される。σs、σr、σdは、それぞれのガウス分布の標準偏差を決めるパラメータである。
図８は、本実施の形態１に係るマスク値Ｂmaskの例である。左の４図は４種類の局所領域画像を示し、中央の４図は各局所領域画像に基づいて算出した重みｗmaskを示し、右の４図は各局所領域画像に基づいて算出したマスク値Ｂmaskを示す。マスク値Ｂmaskが０の画素を黒く塗りつぶしている。 Here, Tmask indicates a threshold for wmask, s indicates a position, and || x || indicates a Euclidean distance. Gσ represents a Gaussian distribution with a standard deviation σ, and is represented by Gσ (x) = exp (−x / 2σ ² ). σs, σr, and σd are parameters that determine the standard deviation of each Gaussian distribution.
FIG. 8 is an example of the mask value Bmask according to the first embodiment. 4 on the left shows four types of local area images, 4 in the center show the weight wmask calculated based on each local area image, and 4 on the right shows the mask value Bmask calculated based on each local area image. Indicates. Pixels with a mask value Bmask of 0 are painted black.

重みｗmaskは、空間位置、輝度値、標準偏差値の３つの次元で注目画素と類似する画素で大きな値となり、これを閾値処理したマスク値Ｂmaskによって、エッジを超えない局所的に一様な領域（以下、「局所一様領域」という。）を選択することが可能となる。 The weight wmask has a large value in a pixel similar to the target pixel in three dimensions of a spatial position, a luminance value, and a standard deviation value, and is a locally uniform region that does not exceed the edge by a mask value Bmask obtained by thresholding this. (Hereinafter referred to as “local uniform region”) can be selected.

つまり、局所一様領域とは、所定の大きさ（例えば、１５×１５画素など）の局所領域において、局所領域の注目画素（中心画素）との空間位置、輝度値、分散値の類似度が高いほど大きい値となる重みを画素毎に算出し、当該重みが所定の閾値以上である画素で占められる領域のことをいう。 In other words, the local uniform region is a local region having a predetermined size (for example, 15 × 15 pixels), and the spatial position, luminance value, and variance value of the local region with the target pixel (center pixel) are similar. A higher weight is calculated for each pixel, and the weight is greater than a predetermined threshold value.

具体的には、図８では、注目画素と類似度が高く、マスク値Ｂmaskが１になっている領域が局所一様領域である。局所一様領域を用いることにより、例えば、次の第２ローカル特徴量生成部１２で算出するフラット画素数特徴量などの各ローカル特徴量の信頼性を向上させることができる。 Specifically, in FIG. 8, a region having a high similarity to the target pixel and a mask value Bmask of 1 is a local uniform region. By using the local uniform region, for example, the reliability of each local feature quantity such as the flat pixel number feature quantity calculated by the second local feature quantity generation unit 12 can be improved.

図９は、本実施の形態１に係る第２ローカル特徴量生成部１２の構成を示す図である。
第２ローカル特徴量生成部１２は、第１ローカル特徴量生成部１１が出力した各特徴量を取得し、エッジ強度特徴量Ｆes、エッジ画素数特徴量Ｆec、相対的フラット画素数特徴量Ｆrc、フラット画素数特徴量Ｆfc、微小変動強度特徴量Ｆms、微小変動画素数特徴量Ｆmc、微小変動分散特徴量Ｆmvを算出して出力する。第２ローカル特徴量生成部１２は、各注目画素に対して各特徴量をそれぞれ１つずつ算出して出力する。 FIG. 9 is a diagram illustrating a configuration of the second local feature value generation unit 12 according to the first embodiment.
The second local feature quantity generation unit 12 acquires each feature quantity output from the first local feature quantity generation unit 11, and obtains an edge strength feature quantity Fes, an edge pixel number feature quantity Fec, a relative flat pixel number feature quantity Frc, A flat pixel number feature amount Ffc, a minute variation intensity feature amount Fms, a minute variation pixel number feature amount Fmc, and a minute variation dispersion feature amount Fmv are calculated and output. The second local feature value generation unit 12 calculates and outputs one feature value for each pixel of interest.

このために、第２ローカル特徴量生成部１２は、エッジ強度特徴量生成部１２１、エッジ画素数特徴量生成部１２２、フラット画素数特徴量生成部１２３、微小変動強度特徴量生成部１２４、微小変動画素数特徴量生成部１２５、微小変動分散特徴量生成部１２６、相対的フラット画素数特徴量生成部１２７などを備える。 For this purpose, the second local feature value generation unit 12 includes an edge strength feature value generation unit 121, an edge pixel number feature value generation unit 122, a flat pixel number feature value generation unit 123, a minute variation strength feature value generation unit 124, A variation pixel number feature amount generation unit 125, a minute variation dispersion feature amount generation unit 126, a relative flat pixel number feature amount generation unit 127, and the like are provided.

以下、第２ローカル特徴量生成部１２の各構成と各特徴量の算出方法とについて説明する。ここでも、復号画像内における注目画素の座標を（ｘ，ｙ）、当該注目画素の輝度値をＹ（ｘ，ｙ）とする。また、局所領域のサイズを１５×１５画素とし、注目画素（ｘ，ｙ）の周辺画素の座標（ｉ，ｊ）は、それぞれ−７から７の値をとるものとする。 Hereinafter, each configuration of the second local feature quantity generation unit 12 and a calculation method of each feature quantity will be described. Again, the coordinates of the pixel of interest in the decoded image are (x, y), and the luminance value of the pixel of interest is Y (x, y). Further, the size of the local region is 15 × 15 pixels, and the coordinates (i, j) of the peripheral pixels of the target pixel (x, y) take values of −7 to 7, respectively.

エッジ強度特徴量生成部１２１は、次の式（８）に従ってエッジ強度特徴量Ｆesを算出する。
The edge strength feature value generation unit 121 calculates the edge strength feature value Fes according to the following equation (8).

エッジ画素数特徴量生成部１２２は、次の式（９）に従ってエッジ画素数特徴量Ｆecを算出する。
The edge pixel number feature quantity generation unit 122 calculates the edge pixel number feature quantity Fec according to the following equation (9).

フラット画素数特徴量生成部１２３は、次の式（１０）に従ってフラット画素数特徴量Ｆfcを算出する。
The flat pixel number feature quantity generating unit 123 calculates the flat pixel number feature quantity Ffc according to the following equation (10).

微小変動強度特徴量生成部１２４は、次の式（１１）に従って微小変動強度特徴量Ｆmsを算出する。
The minute fluctuation strength feature quantity generation unit 124 calculates the minute fluctuation strength feature quantity Fms according to the following equation (11).

微小変動画素数特徴量生成部１２５は、次の式（１２）に従って微小変動画素数特徴量Ｆmcを算出する。
The minute variation pixel number feature amount generation unit 125 calculates the minute variation pixel number feature amount Fmc according to the following equation (12).

微小変動分散特徴量生成部１２６は、次の式（１３）に従って微小変動分散特徴量Ｆmvを算出する。
The minute fluctuation dispersion feature quantity generation unit 126 calculates the minute fluctuation dispersion feature quantity Fmv according to the following equation (13).

相対的フラット画素数特徴量生成部１２７は、次の式（１４）に従って相対的フラット画素数特徴量Ｆrcを算出する。
The relative flat pixel number feature quantity generation unit 127 calculates a relative flat pixel number feature quantity Frc according to the following equation (14).

ここで、閾値Ｔrelfは、相対係数であり、０．０から１．０までの値をとる。相対係数は、局所一様領域の平均標準偏差aveＶstdvが局所領域の各周辺画素の標準偏差Ｖstdvに対してフラットであるとみなす比率を決める値である。 Here, the threshold Trelf is a relative coefficient and takes a value from 0.0 to 1.0. The relative coefficient is a value that determines a ratio that the average standard deviation aveVstdv of the local uniform region is considered to be flat with respect to the standard deviation Vstdv of each peripheral pixel of the local region.

また、相対的フラット画素数特徴量Ｆrcとは、局所一様領域の平均標準偏差aveＶstdvが各周辺画素の標準偏差Ｖstdvに相対係数を乗じた値より小さい場合に注目画素が周辺画素よりもフラットであるとしてカウントした値となる。 Further, the relative flat pixel number feature amount Frc is that the target pixel is flatter than the peripheral pixel when the average standard deviation aveVstdv of the local uniform region is smaller than the value obtained by multiplying the standard deviation Vstdv of each peripheral pixel by the relative coefficient. It is the value counted as being.

局所一様領域の平均標準偏差aveＶstdvの代わりに、注目画素の標準偏差Ｖstdvを用いることもできるが、注目画素に類似する画素で構成される局所一様領域の平均標準偏差aveＶstdvを用いた方が、注目画素に何らかのノイズが乗っているような画像でも、精度良く相対的フラット画素数特徴量Ｆrcを算出することができる。 The standard deviation Vstdv of the pixel of interest can be used instead of the average standard deviation aveVstdv of the local uniform region, but it is better to use the average standard deviation aveVstdv of the local uniform region composed of pixels similar to the pixel of interest. The relative flat pixel number feature quantity Frc can be calculated with high accuracy even in an image in which some noise is on the target pixel.

周辺に字幕などの標準偏差Ｖstdvが大きいエッジを含む領域では、ノイズやテクスチャがあっても、局所一様領域の平均標準偏差aveＶstdvがエッジの標準偏差Ｖstdvよりも十分小さければ相対的フラット画素数特徴量Ｆrcは大きい値となる。 In a region including an edge having a large standard deviation Vstdv such as captions in the vicinity, even if there is noise or texture, the relative flat pixel number feature is provided if the average standard deviation aveVstdv of the local uniform region is sufficiently smaller than the standard deviation Vstdv of the edge The amount Frc is a large value.

また、周辺に薄い文字のように標準偏差Ｖstdvが小さいエッジを含む領域でも、局所一様領域の平均標準偏差aveＶstdvがエッジの標準偏差Ｖstdvよりも十分小さければ、相対的フラット画素数特徴量Ｆrcは大きい値となる。 Also, even in an area including an edge with a small standard deviation Vstdv, such as a thin character, if the average standard deviation aveVstdv of the local uniform area is sufficiently smaller than the standard deviation Vstdv of the edge, the relative flat pixel number feature amount Frc is Larger value.

相対的フラット画素数特徴量Ｆrcを算出することにより、エッジの強度に対する相対的なノイズ強度の関係に基づいてノイズの視認性を考慮したモスキートノイズ領域を検出することができる。 By calculating the relative flat pixel number feature quantity Frc, it is possible to detect a mosquito noise region in consideration of noise visibility based on the relationship of the relative noise intensity to the edge intensity.

図１０は、本実施の形態１に係る第２ローカル特徴量をそれぞれ画像化したものの例である。左上は参照のための復号画像である。値が０のときを黒として、値が大きくなるに従って白になるようなグレースケールで示している。 FIG. 10 is an example of images of the second local feature values according to the first embodiment. The upper left is a decoded image for reference. When the value is 0, it is shown as black, and it is shown in a gray scale that becomes white as the value increases.

図３に示した評価値生成部２０は、ローカル特徴量生成部１０から取得した複数のローカル特徴量を重み付け統合することにより、ノイズ評価値Ｅを算出する。
ここでは、第２のローカル特徴量のうち、エッジ強度特徴量Ｆes、エッジ画素数特徴量Ｆec、相対的フラット画素数特徴量Ｆrcをローカル主要特徴量と呼び、フラット画素数特徴量Ｆfc、微小変動強度特徴量Ｆms、微小変動画素数特徴量Ｆmc、微小変動分散特徴量Ｆmvをローカル補助特徴量と呼ぶ。 The evaluation value generation unit 20 illustrated in FIG. 3 calculates a noise evaluation value E by weighted integration of a plurality of local feature amounts acquired from the local feature amount generation unit 10.
Here, among the second local feature amounts, the edge strength feature amount Fes, the edge pixel number feature amount Fec, and the relative flat pixel number feature amount Frc are referred to as local main feature amounts, the flat pixel number feature amount Ffc, and the minute variation. The intensity feature amount Fms, the minute variation pixel number feature amount Fmc, and the minute variation dispersion feature amount Fmv are referred to as local auxiliary feature amounts.

まず、評価値生成部２０は、ノイズ評価値Ｅに初期値Ｅ０を設定する（Ｅ = Ｅ０）。ここでは、ノイズ評価値Ｅが０〜１の範囲をとるものとし、初期値Ｅ０を０．５とする。 First, the evaluation value generation unit 20 sets an initial value E0 as the noise evaluation value E (E = E0). Here, the noise evaluation value E is in the range of 0 to 1, and the initial value E0 is 0.5.

次に、評価値生成部２０は、３つのローカル主要特徴量を用いて、次の式（１５）に従って、第１評価値Ｅmを算出する。
ここで、ｗ（）は重み関数であり、予め設定した重み関数に基づいてローカル主要特徴量を０〜１の範囲の重み係数ｗ（Ｆ（ｘ，ｙ））に変換する。各特徴量の重み関数については後述する。 Next, the evaluation value generation unit 20 calculates the first evaluation value Em using the three local main feature amounts according to the following equation (15).
Here, w () is a weight function, and the local main feature amount is converted into a weight coefficient w (F (x, y)) in the range of 0 to 1 based on a preset weight function. The weight function of each feature amount will be described later.

次に、評価値生成部２０は、４つのローカル補助特徴量を用いて次の式（１６）に従って、第２評価値Ｅsを算出する。
Next, the evaluation value generation unit 20 calculates the second evaluation value Es according to the following equation (16) using the four local auxiliary feature amounts.

最後に、評価値生成部２０は、エッジ上のノイズ評価値Ｅを低く抑えるために、標準偏差Ｖstdvに基づく重み係数ｗstdv（Ｖstdv（ｘ，ｙ））を乗じて最終的なノイズ評価値Ｅを次の式（１７）により算出する。
Finally, the evaluation value generation unit 20 multiplies a weight coefficient wstdv (Vstdv (x, y)) based on the standard deviation Vstdv to obtain a final noise evaluation value E in order to keep the noise evaluation value E on the edge low. It calculates with the following formula | equation (17).

以上の処理により、評価値生成部２０は、複数のローカル特徴量から一画素毎にノイズ評価値Ｅを算出することができる。 Through the above processing, the evaluation value generation unit 20 can calculate the noise evaluation value E for each pixel from a plurality of local feature amounts.

図１１は、本実施の形態１に係る重み関数の例である。
ローカル主要特徴量の重み係数は、特徴量が大きくなると重み係数が大きくなるよう設計する。ローカル補助特徴量の重み係数は、特徴量が所定の範囲内で重み係数が大きくなるように、所定の範囲外で重み係数が小さくなるように設計する。標準偏差の重み係数は、標準偏差が大きくなると重み係数が小さくなるように設計する。 FIG. 11 is an example of a weight function according to the first embodiment.
The weighting factor of the local main feature amount is designed so that the weighting factor increases as the feature amount increases. The weighting factor of the local auxiliary feature amount is designed so that the weighting factor is small outside the predetermined range so that the weighting factor is large when the feature amount is within the predetermined range. The weight coefficient of the standard deviation is designed so that the weight coefficient decreases as the standard deviation increases.

図１２は、本実施の形態１に係るノイズ評価値Ｅを画像化したものの例である。
ノイズ評価値Ｅは０．０〜０．７５の値となり、０が黒、０．７５が白となるように画像を正規化して表示している。ノイズ評価値Ｅの最大値が０．７５となるようにしたのは、後述する実施の形態３に係るサブタイトル特徴量を含めたときに当該最大値が１．０となるようにするためである。 FIG. 12 is an example of an image of the noise evaluation value E according to the first embodiment.
The noise evaluation value E is a value of 0.0 to 0.75, and the image is normalized and displayed such that 0 is black and 0.75 is white. The reason why the maximum value of the noise evaluation value E is set to 0.75 is that the maximum value becomes 1.0 when a subtitle feature amount according to Embodiment 3 described later is included. .

混合率生成部３０は、ノイズ評価値Ｅを取得し、後述する画像混合処理部５０で用いる混合率Ｒを生成する。混合率Ｒは次の式（１８）により算出する。
The mixing rate generation unit 30 acquires the noise evaluation value E, and generates a mixing rate R used in the image mixing processing unit 50 described later. The mixing ratio R is calculated by the following equation (18).

ここで、ｗratio（）は混合率Ｒに関する重み関数であり、この重み関数に基づいてノイズ評価値Ｅを０．０〜１．０の範囲の混合率Ｒに変換する。この重み関数は、下限値以下で０．０、上限値以上で１．０となり、その間の値が０．０〜１．０の単調増加になるように設定する。 Here, wratio () is a weighting function relating to the mixing ratio R, and the noise evaluation value E is converted to a mixing ratio R in the range of 0.0 to 1.0 based on this weighting function. This weighting function is set so that 0.0 is below the lower limit value, 1.0 is above the upper limit value, and the value therebetween is monotonically increasing from 0.0 to 1.0.

図１３は、本実施の形態１に係る混合率Ｒに関する重み関数の例である。
また、図１４は、本実施の形態１に係る混合率Ｒを画像化したものの例である。左のノイズ評価値Ｅの画像に対して、図１３に示した重み関数を用いて算出した混合率Ｒを画像化したものである。 FIG. 13 is an example of a weighting function related to the mixing ratio R according to the first embodiment.
FIG. 14 is an example of an image of the mixing ratio R according to the first embodiment. The mixing ratio R calculated using the weight function shown in FIG. 13 is imaged with respect to the image of the noise evaluation value E on the left.

フィルタ処理部４０は、復号画像を入力し、エッジ保存型平滑化フィルタ（例えば、バイラテラルフィルタ、イプシロンフィルタなど）によって平滑化処理を行い、平滑化画像を出力する。復号画像の輝度値Ｙ、フィルタ処理後の輝度値ｆ［Ｙ］、画像内の注目画素の位置ｐ、位置ｐを中心とした領域内の位置ｑを用いて、バイラテラルフィルタは次の式（１９）で表すことができる。
The filter processing unit 40 receives the decoded image, performs a smoothing process using an edge-preserving smoothing filter (for example, a bilateral filter or an epsilon filter), and outputs a smoothed image. Using the luminance value Y of the decoded image, the luminance value f [Y] after filtering, the position p of the target pixel in the image, and the position q in the region centered on the position p, the bilateral filter is expressed by the following formula ( 19).

ここで、関数Ｇσは標準偏差σのガウス分布を表すものである。また、重み係数ｗp,qは、位置ｐと位置ｑとのユークリッド距離及び輝度差の絶対値に基づいて定まる。正規化係数Ｗpは、領域Ω内の重みの和が１になるようにするものである。 Here, the function Gσ represents a Gaussian distribution with a standard deviation σ. The weighting factor wp, q is determined based on the Euclidean distance between the position p and the position q and the absolute value of the luminance difference. The normalization coefficient Wp is such that the sum of weights in the region Ω is 1.

図１５は、本実施の形態１に係るバイラテラルフィルタによるフィルタ処理の例である。領域（ａ）（ｂ）について、復号画像及び平滑化画像をそれぞれ示している。
領域（ａ）では、フィルタ処理前の復号画像に発生している文字周辺のモスキートノイズが、フィルタ処理後の平滑化画像ではエッジを保存したまま除去できている。一方、領域（ｂ）では、フィルタ処理前の木のテクスチャが、フィルタ処理後に消失してしまっている。つまり、画像全体に平滑化を施した平滑化画像では、ノイズだけでなく、テクスチャなどの重要な画像構造をも消失してしまう。 FIG. 15 is an example of filter processing by the bilateral filter according to the first embodiment. For the regions (a) and (b), the decoded image and the smoothed image are respectively shown.
In the area (a), the mosquito noise around the character generated in the decoded image before the filter processing can be removed while the edges are preserved in the smoothed image after the filter processing. On the other hand, in the region (b), the texture of the tree before the filtering process has disappeared after the filtering process. That is, in a smoothed image obtained by smoothing the entire image, not only noise but also important image structures such as textures are lost.

画像混合処理部５０は、復号画像の輝度値Ｙ、フィルタ処理部で生成した平滑化画像の輝度値ｆ［Ｙ］、混合率生成部で生成した混合率Ｒを取得し、次の式（２０）に従って復号画像と平滑化画像とを混合率Ｒに従って混合することによってノイズを除去した出力画像Ｙｏｕｔを生成・出力する。
The image mixing processing unit 50 acquires the luminance value Y of the decoded image, the luminance value f [Y] of the smoothed image generated by the filter processing unit, and the mixing rate R generated by the mixing rate generating unit, and the following equation (20 ), The decoded image and the smoothed image are mixed according to the mixing ratio R to generate and output an output image Yout from which noise has been removed.

図１６は、本実施の形態１に係る画像処理装置１による出力画像の例である。領域（ａ）（ｂ）について、復号画像、平滑化画像及び出力画像をそれぞれ示している。
従来技術では、領域（ａ）のような木（テクスチャ）と空（フラット）との境界（エッジ）において、木のテクスチャをノイズと誤検出してしまい、テクスチャを誤って平滑化してしまう問題があった。また、領域（ｂ）の下の文字のような薄い文字周辺のモスキートノイズを検出できない問題があった。また、領域（ｂ）の上の文字のように、文字が密集した間にモスキートノイズが発生している場合に、文字間にフラット領域がないため、モスキートノイズを検出できない問題があった。 FIG. 16 is an example of an output image by the image processing apparatus 1 according to the first embodiment. For regions (a) and (b), a decoded image, a smoothed image, and an output image are shown.
In the prior art, at the boundary (edge) between the tree (texture) and the sky (flat) as in the region (a), the texture of the tree is erroneously detected as noise, and the texture is erroneously smoothed. there were. In addition, there is a problem that mosquito noise around a thin character such as the character under the area (b) cannot be detected. Further, when mosquito noise is generated while the characters are dense like the characters on the region (b), there is a problem that the mosquito noise cannot be detected because there is no flat region between the characters.

本実施の形態１に係る出力画像では、これらの問題を解決して、モスキートノイズを効果的に、少ない副作用で除去できている。 In the output image according to the first embodiment, these problems are solved and mosquito noise can be effectively removed with few side effects.

このように、本実施の形態１に係る画像処理装置１又は画像処理方法では、復号画像のみから算出されるエッジ、相対フラット、フラット、微小変動などの複数のローカル特徴量を重み付け統合することでモスキートノイズの発生可能性及び視認性を考慮したノイズ評価値Ｅを一画素毎に算出し、ノイズ評価値Ｅに基づいて平滑化強度を適応的に制御する。 As described above, the image processing apparatus 1 or the image processing method according to the first embodiment weights and integrates a plurality of local feature amounts such as edges, relative flats, flats, and minute fluctuations calculated only from the decoded image. A noise evaluation value E considering the possibility of occurrence of mosquito noise and visibility is calculated for each pixel, and the smoothing intensity is adaptively controlled based on the noise evaluation value E.

これにより、テクスチャなどノイズ以外の画像構造を保持したまま、エッジが複雑に入り組んだ文字の間や強度が弱いエッジの周辺などの従来技術では検出困難であった状況でもモスキートノイズを検出し、平滑化による過剰なぼけなどの副作用を低減しながら人間が見た目に知覚しやすいモスキートノイズを効果的に適切に除去することができる。 As a result, mosquito noise is detected and smoothed even in situations where it was difficult to detect with conventional techniques such as between characters with complex edges or around weak edges, while retaining image structures other than noise, such as textures. It is possible to effectively and appropriately remove mosquito noise that can be easily perceived by human eyes while reducing side effects such as excessive blur due to conversion.

また、複数のローカル特徴量は、複雑なエッジ検出などの前処理を必要とせず、かつ、１５×１５画素などの局所領域画素のみを用いた簡易な処理で算出可能であるため、メモリ使用量も少なくて済み、比較的小さい回路規模のハードウェアで実現可能であり、製品コストを抑えることもできる。 In addition, since a plurality of local feature amounts do not require preprocessing such as complicated edge detection and can be calculated by simple processing using only local region pixels such as 15 × 15 pixels, the memory usage amount It can be realized with hardware with a relatively small circuit scale, and the product cost can be reduced.

以上、説明したように、本実施の形態１に係る画像処理装置１は、復号画像の局所領域の中心画素が局所領域の個々の周辺画素よりも平坦である場合の周辺画素の数を示す相対的フラット画素数特徴量Ｆrcを少なくとも含む複数のローカル特徴量を生成するローカル特徴量生成部１０と、複数のローカル特徴量を重み付け統合して復号画像の画素毎のノイズ評価値Ｅを算出する評価値生成部２０と、ノイズ評価値Ｅに基づいて混合率Ｒを生成する混合率生成部３０と、復号画像の平滑化画像を生成するフィルタ処理部４０と、復号画像と平滑化画像とを混合率Ｒに基づいて混合し、出力画像を生成する画像混合処理部５０とを備えるものである。 As described above, the image processing apparatus 1 according to the first embodiment has a relative value indicating the number of peripheral pixels when the central pixel of the local region of the decoded image is flatter than the individual peripheral pixels of the local region. A local feature quantity generation unit 10 that generates a plurality of local feature quantities including at least a flat flat pixel feature quantity Frc, and an evaluation that calculates a noise evaluation value E for each pixel of the decoded image by weighted integration of the plurality of local feature quantities The value generation unit 20, the mixing rate generation unit 30 that generates the mixing rate R based on the noise evaluation value E, the filter processing unit 40 that generates a smoothed image of the decoded image, and the decoded image and the smoothed image are mixed. An image mixing processing unit 50 that mixes based on the rate R and generates an output image is provided.

このような構成により、簡易かつメモリ使用量の少ない処理で、符号化情報を用いずに復号画像のみを用いてモスキートノイズの発生位置や強度を検出し、その検出結果に基づいて効果的にノイズを除去することができる。 With such a configuration, the generation position and intensity of mosquito noise are detected using only the decoded image without using encoded information, and processing is simple and requires less memory, and noise is effectively reduced based on the detection result. Can be removed.

また、本実施の形態１に係る画像処理装置１は、ローカル特徴量生成部１０が、複数のローカル特徴量を、基本特徴量である局所領域内の画素値の分散値、画素値の標準偏差値Ｖstdv又は隣接画素差分絶対値Ｖsadの少なくとも一つに基づいてそれぞれ算出することが好ましい。
このような構成により、より簡易かつメモリ使用量の少ない処理で、ノイズを除去することができる Further, in the image processing apparatus 1 according to the first embodiment, the local feature value generation unit 10 uses a plurality of local feature values as a basic feature value, a variance value of pixel values in a local region, which is a basic feature value, and a standard deviation of pixel values. It is preferable to calculate each based on at least one of the value Vstdv or the adjacent pixel difference absolute value Vsad.
With such a configuration, it is possible to remove noise with simpler processing with less memory usage.

また、本実施の形態１に係る画像処理装置１は、ローカル特徴量生成部１０が、相対的フラット画素数特徴量Ｆrcを生成するときに、中心画素との空間位置、輝度値又は画素値の分散値の類似度が所定値以上である局所領域中の画素の基本特徴量の平均値と、個々の周辺画素の基本特徴量とを比較することが好ましい。
このような構成により、中心画素にノイズが乗っているような復号画像でも、モスキートノイズの発生位置や強度を検出することができる。 Further, in the image processing device 1 according to the first embodiment, when the local feature value generation unit 10 generates the relative flat pixel number feature value Frc, the spatial position, luminance value, or pixel value of the center pixel is changed. It is preferable to compare the average value of the basic feature values of the pixels in the local area where the similarity of the variance values is equal to or greater than a predetermined value with the basic feature values of the individual surrounding pixels.
With such a configuration, it is possible to detect the generation position and intensity of mosquito noise even in a decoded image in which noise is on the center pixel.

また、本実施の形態１に係る画像処理装置１は、複数のローカル特徴量が、局所領域内のエッジ画素の基本特徴量の平均値であるエッジ強度特徴量Ｆes又は局所領域のエッジ画素数特徴量Ｆecの少なくとも一つを更に含むことが好ましい。
このような構成により、モスキートノイズの発生位置や強度をより正確に検出することができる。 Further, in the image processing apparatus 1 according to the first embodiment, the plurality of local feature amounts is the edge strength feature amount Fes that is the average value of the basic feature amounts of the edge pixels in the local region or the edge pixel number feature in the local region. Preferably it further comprises at least one of the quantities Fec.
With such a configuration, it is possible to more accurately detect the generation position and intensity of mosquito noise.

また、本実施の形態１に係る画像処理装置１は、複数のローカル特徴量が、基本特徴量が、０よりも大きい下限値と、上限値との間に入る微小変動画素に係る特徴量Ｆms、Ｆmc、Ｆmvを更に含むことが好ましい。
このような構成により、モスキートノイズの発生位置や強度をより正確に検出することができる。 In addition, the image processing apparatus 1 according to the first embodiment has a feature quantity Fms related to a minute variation pixel in which a plurality of local feature quantities fall between a lower limit value whose basic feature quantity is larger than 0 and an upper limit value. , Fmc, and Fmv.
With such a configuration, it is possible to more accurately detect the generation position and intensity of mosquito noise.

また、本実施の形態１に係る画像処理方法は、復号画像の局所領域の中心画素が局所領域の個々の周辺画素よりも平坦である場合の周辺画素の数を示す相対的フラット画素数特徴量Ｆrcを少なくとも含む複数のローカル特徴量を生成するステップと、複数のローカル特徴量を重み付け統合して復号画像の画素毎のノイズ評価値Ｅを算出するステップと、ノイズ評価値Ｅに基づいて混合率Ｒを生成するステップと、復号画像の平滑化画像を生成するステップと、復号画像と平滑化画像とを混合率Ｒに基づいて混合し、出力画像を生成するステップとを有するものである。 In addition, the image processing method according to the first embodiment uses the relative flat pixel number feature amount indicating the number of peripheral pixels when the central pixel of the local region of the decoded image is flatter than the individual peripheral pixels of the local region. A step of generating a plurality of local feature amounts including at least Frc, a step of calculating a noise evaluation value E for each pixel of the decoded image by weighted integration of the plurality of local feature amounts, and a mixing ratio based on the noise evaluation value E A step of generating R, a step of generating a smoothed image of the decoded image, and a step of mixing the decoded image and the smoothed image based on the mixing ratio R to generate an output image.

また、本実施の形態１に係る画像処理プログラムは、コンピュータに、復号画像の局所領域の中心画素が局所領域の個々の周辺画素よりも平坦である場合の周辺画素の数を示す相対的フラット画素数特徴量Ｆrcを少なくとも含む複数のローカル特徴量を生成する手順と、複数のローカル特徴量を重み付け統合して復号画像の画素毎のノイズ評価値Ｅを算出する手順と、ノイズ評価値Ｅに基づいて混合率Ｒを生成する手順と、復号画像の平滑化画像を生成する手順と、復号画像と平滑化画像とを混合率Ｒに基づいて混合し、出力画像を生成する手順とを実行させるためのものである。 The image processing program according to the first embodiment also causes the computer to display a relative flat pixel indicating the number of peripheral pixels when the central pixel of the local region of the decoded image is flatter than each peripheral pixel of the local region. A procedure for generating a plurality of local feature amounts including at least several feature amounts Frc, a procedure for calculating a noise evaluation value E for each pixel of a decoded image by weighted integration of the plurality of local feature amounts, and a noise evaluation value E To generate a mixing rate R, a procedure for generating a smoothed image of the decoded image, and a procedure for mixing the decoded image and the smoothed image based on the mixing rate R to generate an output image. belongs to.

（実施の形態２）
実施の形態１に係る画像処理装置又は画像処理方法では、ローカル特徴量を重み付け統合してノイズ評価値Ｅを算出したが、本実施の形態２に係る画像処理装置又は画像処理方法では、ローカル特徴量と後述するグローバル特徴量とを重み付け統合してノイズ評価値Ｅを算出する。 (Embodiment 2)
In the image processing apparatus or the image processing method according to the first embodiment, the noise evaluation value E is calculated by weighted integration of the local feature amounts. However, in the image processing apparatus or the image processing method according to the second embodiment, the local feature is calculated. The noise evaluation value E is calculated by weighted integration of the amount and a global feature amount described later.

ローカル特徴量は、復号画像における局所領域の輝度値のみを用いて算出するため、静止画像又は動画像の１フレームのみを用いて算出するが、グローバル特徴量は、動画像に対して使用する特徴量であり、時間的に前のフレームのローカル特徴量又はその統計量に基づいて算出する。
ここでは、処理対象のフレームの撮像時刻をtとし、その直前のフレームの撮像時刻をｔ−１とする。 Since the local feature amount is calculated using only the luminance value of the local region in the decoded image, the local feature amount is calculated using only one frame of the still image or the moving image. However, the global feature amount is a feature used for the moving image. It is a quantity, and is calculated based on the local feature quantity of the previous frame in time or its statistical quantity.
Here, the imaging time of the frame to be processed is t, and the imaging time of the immediately preceding frame is t-1.

図１７は、本実施の形態２に係る画像処理装置２０１の概略構成を示す図である。
画像処理装置２０１は、ローカル特徴量生成部２１０、評価値生成部２２０、混合率生成部２３０、フィルタ処理部２４０、画像混合処理部２５０、グローバル特徴量生成部２６０などを備えている。ローカル特徴量生成部２１０、混合率生成部２３０、フィルタ処理部２４０、画像混合処理部２５０は実施の形態１に係るものと同様の構成を有し、同様に動作するため、ここではその説明を省略する。 FIG. 17 is a diagram illustrating a schematic configuration of the image processing apparatus 201 according to the second embodiment.
The image processing apparatus 201 includes a local feature value generation unit 210, an evaluation value generation unit 220, a mixing rate generation unit 230, a filter processing unit 240, an image mixing processing unit 250, a global feature value generation unit 260, and the like. The local feature value generation unit 210, the mixing rate generation unit 230, the filter processing unit 240, and the image mixing processing unit 250 have the same configuration as that according to the first embodiment and operate in the same manner. Omitted.

図１８は、本実施の形態２に係るグローバル特徴量生成部２６０の構成を示す図である。
グローバル特徴量生成部２６０は、色情報（ＹＣｂＣｒ）を含む復号画像と、ローカル特徴量生成部２１０から出力されるエッジ画素数特徴量Ｆec及び微小変動画素数特徴量Ｆmcとを入力し、グローバル特徴量としてのテクスチャ特徴量Ｆtxを生成し、評価値生成部２２０に出力する。 FIG. 18 is a diagram illustrating a configuration of the global feature value generation unit 260 according to the second embodiment.
The global feature amount generation unit 260 receives the decoded image including color information (YCbCr), the edge pixel number feature amount Fec and the minute variation pixel number feature amount Fmc output from the local feature amount generation unit 210, and receives the global feature. A texture feature amount Ftx as a quantity is generated and output to the evaluation value generation unit 220.

このため、グローバル特徴量生成部２６０は、色変換部２６１、色クラス分類部２６２、色別テクスチャ確率生成部２６３、遅延器２６４、テクスチャ特徴量生成部２６５などを備える。 Therefore, the global feature value generation unit 260 includes a color conversion unit 261, a color class classification unit 262, a color-specific texture probability generation unit 263, a delay unit 264, a texture feature value generation unit 265, and the like.

色変換部２６１は、時刻ｔ−１の復号画像を入力すると、ＹＣｂＣｒ表色系からＨＳＶ表色系への一般的な色変換式に従って色相Ｈ及び彩度Ｓを算出し、ＹＣｂＣｒの輝度Ｙを用いて、ＹＣｂＣｒ表色系をＨＳＹ表色系に変換して色クラス分類部２６２に出力する。 When the decoded image at time t−1 is input, the color conversion unit 261 calculates the hue H and the saturation S according to a general color conversion formula from the YCbCr color system to the HSV color system, and sets the luminance Y of YCbCr. The YCbCr color system is converted into the HSY color system and output to the color class classification unit 262.

色クラス分類部２６２は、色相Ｈ、彩度Ｓ及び輝度Ｙの値に基づいて各画素を色クラスに分類する。本実施の形態２では、輝度ＹがＴdark ≦ Ｙ ≦ Ｔbrightの範囲内に収まる画素に対して、彩度Ｓの値が閾値Ｔsat以上を有彩色、Ｔsat未満を無彩色とする。また、有彩色は、０〜２５５の８ビットで表現した色相Ｈを定数ＱHで量子化した値をクラス値Ｃとする。同様に、無彩色は、０〜２５５の８ビットで表現した輝度Ｙを定数ＱYで量子化した値をクラス値Ｃとする。 The color class classification unit 262 classifies each pixel into a color class based on the values of hue H, saturation S, and luminance Y. In the second embodiment, for a pixel whose luminance Y falls within the range of Tdark ≦ Y ≦ Tbright, the value of the saturation S is a chromatic color when the value is greater than or equal to the threshold value Tsat, and the achromatic color is less than Tsat. In addition, for a chromatic color, a value obtained by quantizing a hue H expressed by 8 bits from 0 to 255 with a constant QH is defined as a class value C. Similarly, for the achromatic color, a value obtained by quantizing the luminance Y expressed by 8 bits from 0 to 255 with a constant QY is defined as a class value C.

このように、色を量子化することによって、隣接画素が同じクラスに分類されやすくなり、テクスチャ特徴量Ｆtxの空間的変動を抑制できるとともに、後述するヒストグラムのビン数が少なくなり、ヒストグラムを保存するメモリ容量を削減することができる。
色クラス分類部２６２は、クラス値Ｃを色別テクスチャ確率生成部２６３及びテクスチャ特徴量生成部２６５に出力する。 As described above, by quantizing the colors, adjacent pixels can be easily classified into the same class, the spatial variation of the texture feature amount Ftx can be suppressed, and the number of histogram bins described later is reduced, and the histogram is stored. Memory capacity can be reduced.
The color class classification unit 262 outputs the class value C to the color-specific texture probability generation unit 263 and the texture feature amount generation unit 265.

色別テクスチャ確率生成部２６３は、エッジ画素数特徴量Ｆec及び微小変動画素数特徴量Ｆmcを用いて各色クラス値Ｃに対するテクスチャ確率Ｐtextureを計算する。 The color-specific texture probability generation unit 263 calculates the texture probability Ptexture for each color class value C using the edge pixel number feature amount Fec and the minute variation pixel number feature amount Fmc.

まず、色別テクスチャ確率生成部２６３は、次の式（２１）に従ってテクスチャ画素値Ｂtextureを算出する。ここで、Ｔec及びＴmcは閾値を示す。
First, the color-specific texture probability generation unit 263 calculates a texture pixel value Btexture according to the following equation (21). Here, Tec and Tmc indicate threshold values.

次に、色別テクスチャ確率生成部２６３は、座標（ｘ，ｙ）、色クラス値Ｃの画素のテクスチャ画素値をＢtexture（ｘ，ｙ，Ｃ）とし、テクスチャ確率を求める領域の大きさをＮ×Ｍ画素として、次の式（２２）に従って、色クラス値Ｃ毎にテクスチャ画素値のヒストグラムを作成し、それを領域の大きさで除して、色クラス値Ｃ毎のテクスチャ確率Ｐtextureを算出する。なお、テクスチャ確率Ｐtextureを算出する領域は復号画像全体であっても良いし、画像を所定の大きさのブロックに分割し、ブロック毎にテクスチャ確率Ｐtextureを算出しても良い。
Next, the color-specific texture probability generation unit 263 sets the texture pixel value of the pixel of coordinates (x, y) and color class value C to Btexture (x, y, C), and sets the size of the area for obtaining the texture probability to N A histogram of texture pixel values is created for each color class value C as xM pixels and divided by the size of the area according to the following equation (22), and a texture probability Ptexture for each color class value C is calculated. To do. The area for calculating the texture probability Ptexture may be the entire decoded image, or the image may be divided into blocks of a predetermined size, and the texture probability Ptexture may be calculated for each block.

遅延器２６４は、時刻ｔ−１で算出した色別のテクスチャ確率Ｐtextureを、時刻ｔにおけるテクスチャ特徴量生成部２６５で使用するためにバッファに保存する。
一方、時刻ｔの復号画像を入力すると、同様に色変換部２６１においてＨＳＹを生成し、色クラス分類部２６２で画素毎に色クラス値Ｃを算出する。 The delay unit 264 stores the texture probability Ptexture for each color calculated at time t−1 in a buffer for use by the texture feature value generation unit 265 at time t.
On the other hand, when the decoded image at time t is input, HSY is similarly generated in the color conversion unit 261, and the color class value C is calculated for each pixel in the color class classification unit 262.

テクスチャ特徴量生成部２６５は、時刻ｔの復号画像の注目画素の色クラス値Ｃについて、時刻ｔ−１の復号画像の色クラス値Ｃに対して算出したテクスチャ確率Ｐtexture（Ｃ）を参照し、このテクスチャ確率Ｐtexture（Ｃ）を時刻ｔの復号画像の注目画素のテクスチャ特徴量Ｆtxとして出力する。 The texture feature quantity generation unit 265 refers to the texture probability Ptexture (C) calculated for the color class value C of the decoded image at time t−1 for the color class value C of the target pixel of the decoded image at time t, This texture probability Ptexture (C) is output as the texture feature amount Ftx of the target pixel of the decoded image at time t.

図１７に示した評価値生成部２２０は、ローカル特徴量生成部２１０から取得した複数のローカル特徴量と、グローバル特徴量であるテクスチャ特徴量Ｆtxとを重み付け統合して、ノイズ評価値Ｅを算出する。 The evaluation value generation unit 220 illustrated in FIG. 17 calculates the noise evaluation value E by weighting and integrating the plurality of local feature amounts acquired from the local feature amount generation unit 210 and the texture feature amount Ftx that is the global feature amount. To do.

評価値生成部２２０は、式（１６）の第２評価値Ｅｓに対して、次の式（２３）に従ってテクスチャ特徴量Ｆtxを統合し、第３評価値Ｅgを算出する。
ここで、ｗtx（）は、テクスチャ特徴量Ｆtxに関する重み関数である。 The evaluation value generation unit 220 integrates the texture feature amount Ftx according to the following equation (23) with respect to the second evaluation value Es of equation (16), and calculates a third evaluation value Eg.
Here, wtx () is a weighting function related to the texture feature amount Ftx.

図１９は、本実施の形態２に係る重み関数ｗtxの例である。
重み関数ｗtxは、特徴量が大きくなると重み係数が大きくなるように設計する。ただし、式（２３）では第２評価値Ｅｓから重み係数ｗtx（Ｆtx（ｘ，ｙ））を減算するため、テクスチャ特徴量Ｆtxが大きくなるほど第３評価値Ｅgは小さくなる。 FIG. 19 shows an example of the weight function wtx according to the second embodiment.
The weighting function wtx is designed so that the weighting factor increases as the feature amount increases. However, since the weight coefficient wtx (Ftx (x, y)) is subtracted from the second evaluation value Es in Expression (23), the third evaluation value Eg decreases as the texture feature amount Ftx increases.

最後に、評価値生成部２２０は、式（１７）と同様に、エッジ上のノイズ評価値Ｅを低く抑えるために、標準偏差Ｖstdvに基づく重み係数ｗstdv（Ｖstdv（ｘ，ｙ））を乗じて最終的なノイズ評価値Ｅを次の式（２４）により算出する。
Finally, the evaluation value generation unit 220 multiplies the weight coefficient wstdv (Vstdv (x, y)) based on the standard deviation Vstdv in order to keep the noise evaluation value E on the edge low, as in the equation (17). The final noise evaluation value E is calculated by the following equation (24).

このように、本実施の形態２に係る画像処理装置２０１又は画像処理方法では、複数のローカル特徴量及びテクスチャ特徴量Ｆtxから一画素毎にノイズ評価値Ｅを算出することができる。 As described above, in the image processing apparatus 201 or the image processing method according to the second embodiment, the noise evaluation value E can be calculated for each pixel from the plurality of local feature amounts and texture feature amounts Ftx.

これにより、ローカル特徴量ではノイズとテクスチャとの区別が難しかった領域においてもテクスチャの誤検出を抑制し、誤ってテクスチャを平滑化してしまう副作用を低減することができる。また、テクスチャ確率Ｐtextureは前のフレームを使用して計算するため、実施の形態１に比べて、メモリ使用量の増加を少なく抑えることができる。また、時刻ｔのテクスチャ特徴量Ｆtxは、１画素単位で求めることができ、ラインバッファなどを必要としないため、ハードウェアコストの増加を抑えることができる。 Thereby, even in a region where it is difficult to distinguish between noise and texture with local feature amounts, it is possible to suppress erroneous detection of the texture and reduce the side effect of erroneously smoothing the texture. Further, since the texture probability Ptexture is calculated using the previous frame, an increase in memory usage can be suppressed as compared with the first embodiment. Further, the texture feature amount Ftx at time t can be obtained in units of one pixel and does not require a line buffer or the like, so that an increase in hardware cost can be suppressed.

以上、説明したように、本実施の形態２に係る画像処理装置２０１は、復号画像よりも時間的に前の復号画像のローカル特徴量に基づいて、グローバル特徴量を生成するグローバル特徴量生成部２６０を更に備え、グローバル特徴量は、テクスチャであることを示すテクスチャ特徴量Ｆtxを含み、評価値生成部２２０は、複数のローカル特徴量とグローバル特徴量とを統合して、ノイズ評価値Ｅを算出することが好ましい。 As described above, the image processing apparatus 201 according to the second embodiment has a global feature amount generation unit that generates a global feature amount based on a local feature amount of a decoded image temporally prior to the decoded image. 260, the global feature amount includes a texture feature amount Ftx indicating that it is a texture, and the evaluation value generation unit 220 integrates a plurality of local feature amounts and the global feature amount to obtain a noise evaluation value E. It is preferable to calculate.

このような構成により、ローカル特徴量ではノイズとテクスチャとの区別が難しかった領域においてもテクスチャの誤検出を抑制し、誤ってテクスチャを平滑化してしまう副作用を低減することができる。 With such a configuration, it is possible to suppress erroneous detection of texture even in an area where it is difficult to distinguish between noise and texture with local feature amounts, and to reduce the side effect of erroneously smoothing the texture.

（実施の形態３）
実施の形態２に係る画像処理装置又は画像処理方法では、グローバル特徴量としてテクスチャ特徴量Ｆtxを用いたが、本実施の形態３に係る画像処理装置又は画像処理方法では、グローバル特徴量としてノイズシーン特徴量を用いる。 (Embodiment 3)
In the image processing apparatus or image processing method according to the second embodiment, the texture feature quantity Ftx is used as the global feature quantity. However, in the image processing apparatus or image processing method according to the third embodiment, the noise scene is used as the global feature quantity. Use features.

図２０は、本実施の形態３に係る画像処理装置３０１の一部分の概略構成を示す図である。
画像処理装置３０１は、ローカル特徴量生成部２１０、評価値生成部３２０、混合率生成部２３０、フィルタ処理部２４０、画像混合処理部２５０、グローバル特徴量生成部３６０などを備えている。混合率生成部２３０、フィルタ処理部２４０、画像混合処理部２５０は実施の形態２に係るものと同様の構成を有し、同様に動作するため、ここではその図示及び説明を省略する。 FIG. 20 is a diagram illustrating a schematic configuration of a part of the image processing apparatus 301 according to the third embodiment.
The image processing apparatus 301 includes a local feature value generation unit 210, an evaluation value generation unit 320, a mixing rate generation unit 230, a filter processing unit 240, an image mixing processing unit 250, a global feature value generation unit 360, and the like. Since the mixing rate generation unit 230, the filter processing unit 240, and the image mixing processing unit 250 have the same configuration as that according to the second embodiment and operate in the same manner, illustration and description thereof are omitted here.

グローバル特徴量生成部３６０は、復号画像と、ローカル特徴量生成部２１０が算出したフラット画素値Ｂflat（式（３）に示したフラット画素値Ｂflatを算出する際の閾値Ｔflatは、ノイズシーン特徴量を算出するために実施の形態１のものとは異なる値を用いても良い）及びフラット画素数特徴量Ｆecとを取得し、ノイズシーン特徴量Ｆscをグローバル特徴量として評価値生成部３２０に出力する。このため、グローバル特徴量生成部３６０は、符号変化算出部３６１、ノイズシーン特徴量生成部３６２、遅延器３６３などを備える。 The global feature value generation unit 360 uses the decoded image and the flat pixel value Bflat calculated by the local feature value generation unit 210 (the threshold value Tflat for calculating the flat pixel value Bflat shown in Expression (3)) as the noise scene feature value. And a flat pixel number feature value Fec are obtained, and the noise scene feature value Fsc is output to the evaluation value generation unit 320 as a global feature value. To do. For this reason, the global feature value generation unit 360 includes a code change calculation unit 361, a noise scene feature value generation unit 362, a delay unit 363, and the like.

符号変化算出部３６１は、時刻ｔ−１の復号画像を入力すると、各画素に対して水平方向の符号変化画素値Ｂ^Ｘsc、垂直方向の符号変化画素値Ｂ^Ｙscを算出し、ノイズシーン特徴量生成部３６２に出力する。水平方向の符号変化画素値Ｂ^Ｘscは、次の式（２５）に従って算出する。 When the decoded image at time t−1 is input, the code change calculation unit 361 calculates a code change pixel value B ^X sc in the horizontal direction and a code change pixel value B ^Y sc in the vertical direction for each pixel, and a noise scene The data is output to the feature value generation unit 362. The sign change pixel value B ^X sc in the horizontal direction is calculated according to the following equation (25).

垂直方向の符号変化画素値Ｂ^Ｙscも式（２５）と同様に、Ｙ（ｘ，ｙ＋１）、Ｙ（ｘ，ｙ）、Ｙ（ｘ，ｙ−１）の３画素を用いて算出する。 The sign change pixel value B ^Y sc in the vertical direction is also calculated using three pixels of Y (x, y + 1), Y (x, y), and Y (x, y−1), as in Expression (25).

ノイズシーン特徴量生成部３６２は、ローカル特徴量生成部２１０から取得したフラット画素数特徴量Ｆfc及びフラット画素値Ｂflatと、符号変化画素値Ｂ^Ｘsc、Ｂ^Ｙscとに基づいて、次の式（２６）に従ってノイズシーン特徴量Ｆnsを算出し、遅延器３６３に出力する。ここで、Ｎ、Ｍは復号画像の水平・垂直方向の画素数を示し、Ｔfcは閾値を示す。
Noise scene feature amount generating unit 362, based on a flat pixel number feature quantity Ffc and flat pixel value Bflat obtained from the local feature amount generating unit 210, a sign change pixel value B ^X sc, in the B ^Y sc, the following formula The noise scene feature value Fns is calculated according to (26) and output to the delay unit 363. Here, N and M indicate the number of pixels in the horizontal and vertical directions of the decoded image, and Tfc indicates a threshold value.

式（２６）から分かる通り、ノイズシーン特徴量Ｆnsは、画像全体のフラット領域の符号変化量を表す。従って、１つの画像に対して１つのノイズシーン特徴量Ｆnsが算出される。
遅延器３６３は、時刻ｔ−１のフレームで算出したノイズシーン特徴量Ｆnsを、時刻ｔのフレームで使用できるようにする。 As can be seen from the equation (26), the noise scene feature amount Fns represents a code change amount of the flat area of the entire image. Accordingly, one noise scene feature amount Fns is calculated for one image.
The delay unit 363 enables the noise scene feature amount Fns calculated in the frame at time t−1 to be used in the frame at time t.

評価値生成部３２０では、１つの画像内の全ての画素に対して同じノイズシーン特徴量Ｆnsを適用する。評価値生成部３２０は、ローカル特徴量生成部２１０から取得した複数のローカル特徴量とグローバル特徴量であるノイズシーン特徴量Ｆnsとを重み付け統合することにより、ノイズ評価値Ｅを算出し、混合率生成部２３０に出力する。 The evaluation value generation unit 320 applies the same noise scene feature amount Fns to all the pixels in one image. The evaluation value generation unit 320 calculates a noise evaluation value E by weighting and integrating a plurality of local feature amounts acquired from the local feature amount generation unit 210 and a noise scene feature amount Fns that is a global feature amount, and a mixing ratio The data is output to the generation unit 230.

具体的には、評価値生成部３２０は、実施の形態１における式（１７）を次の式（２７）のように変更し、ノイズ評価値Ｅを算出する。
Specifically, the evaluation value generation unit 320 changes the equation (17) in the first embodiment to the following equation (27), and calculates the noise evaluation value E.

ここで、ｗns（）は、ノイズシーン特徴量Ｆnsに関する重み関数である。
図２１は、本実施の形態３に係る重み関数ｗnsの例である。
重み関数ｗnsは、ノイズシーン特徴量Ｆnsが大きくなると重み係数ｗns（Ｆns）が小さくなるように設計する。 Here, wns () is a weighting function related to the noise scene feature amount Fns.
FIG. 21 shows an example of the weight function wns according to the third embodiment.
The weighting function wns is designed so that the weighting factor wns (Fns) decreases as the noise scene feature amount Fns increases.

このように、評価値生成部３２０は、複数のローカル特徴量とノイズシーン特徴量Ｆnsとから一画素毎にノイズ評価値Ｅを算出する。これにより、ローカル特徴量ではモスキートノイズとの区別が難しかったフィルムグレイン、カメラノイズなどを含むシーンにおいても誤検出を抑制することができ、制作者が意図的に付加したノイズなどを平滑化してしまう副作用やカメラノイズを一部の領域だけ平滑化してしまうことによるムラを低減することができる。 As described above, the evaluation value generation unit 320 calculates the noise evaluation value E for each pixel from the plurality of local feature amounts and the noise scene feature amount Fns. As a result, it is possible to suppress false detection even in scenes including film grain, camera noise, etc., which are difficult to distinguish from mosquito noise with local features, and smooth out noise intentionally added by the producer. Unevenness caused by smoothing side effects and camera noise in only a part of the region can be reduced.

以上、説明したように、本実施の形態３に係る画像処理装置３０１は、復号画像よりも時間的に前の復号画像のローカル特徴量に基づいて、グローバル特徴量を生成するグローバル特徴量生成部３６０を更に備え、グローバル特徴量は画像のノイズ量を示すノイズシーン特徴量Ｆscを含み、評価値生成部は、複数のローカル特徴量とグローバル特徴量とを統合して、ノイズ評価値Ｅを算出することが好ましい。 As described above, the image processing apparatus 301 according to the third embodiment has a global feature amount generation unit that generates a global feature amount based on the local feature amount of the decoded image temporally before the decoded image. 360, the global feature amount includes a noise scene feature amount Fsc indicating the noise amount of the image, and the evaluation value generation unit calculates a noise evaluation value E by integrating a plurality of local feature amounts and the global feature amount. It is preferable to do.

このような構成により、ローカル特徴量ではモスキートノイズとの区別が難しかったフィルムグレイン、カメラノイズなどを含むシーンにおいても誤検出を抑制することができる。 With such a configuration, it is possible to suppress erroneous detection even in a scene including film grain, camera noise, and the like, which are difficult to distinguish from mosquito noise with local feature amounts.

（実施の形態４）
実施の形態２、３に係る画像処理装置又は画像処理方法では、グローバル特徴量としてテクスチャ特徴量、ノイズシーン特徴量をそれぞれ用いたが、本実施の形態４に係る画像処理装置又は画像処理方法では、グローバル特徴量としてサブタイトル特徴量を用いる。 (Embodiment 4)
In the image processing apparatus or the image processing method according to the second or third embodiment, the texture feature quantity or the noise scene feature quantity is used as the global feature quantity, but the image processing apparatus or the image processing method according to the fourth embodiment is used. The subtitle feature value is used as the global feature value.

図２２は、本実施の形態４に係る画像処理装置４０１の一部分の概略構成を示す図である。
画像処理装置４０１は、ローカル特徴量生成部２１０、評価値生成部４２０、混合率生成部２３０、フィルタ処理部２４０、画像混合処理部２５０、グローバル特徴量生成部４６０などを備えている。混合率生成部２３０、フィルタ処理部２４０、画像混合処理部２５０は実施の形態２に係るものと同様の構成を有し、同様に動作するため、ここではその図示及び説明を省略する。 FIG. 22 is a diagram showing a schematic configuration of a part of the image processing apparatus 401 according to the fourth embodiment.
The image processing apparatus 401 includes a local feature value generation unit 210, an evaluation value generation unit 420, a mixing rate generation unit 230, a filter processing unit 240, an image mixing processing unit 250, a global feature value generation unit 460, and the like. Since the mixing rate generation unit 230, the filter processing unit 240, and the image mixing processing unit 250 have the same configuration as that according to the second embodiment and operate in the same manner, illustration and description thereof are omitted here.

グローバル特徴量生成部４６０は、ローカル特徴量生成部２１０で算出されたエッジ画素値Ｂedge及びエッジ強度特徴量Ｆesを取得し、グローバル特徴量としてサブタイトル特徴量Ｆstを算出し、評価値生成部４２０に出力する。このため、グローバル特徴量生成部４６０は、静止エッジ画素生成部４６１、静止エッジ画素数特徴量生成部４６２、サブタイトル特徴量生成部４６３などを備える。 The global feature value generation unit 460 acquires the edge pixel value Bedge and the edge strength feature value Fes calculated by the local feature value generation unit 210, calculates a subtitle feature value Fst as the global feature value, and sends it to the evaluation value generation unit 420. Output. Therefore, the global feature value generation unit 460 includes a still edge pixel generation unit 461, a still edge pixel number feature value generation unit 462, a subtitle feature value generation unit 463, and the like.

静止エッジ画素生成部４６１は、ローカル特徴量生成部２１０からエッジ画素値Ｂedgeを取得し、静止エッジ画素値Ｂstillを算出して、静止エッジ画素数特徴量生成部４６２に出力する。静止エッジ画素生成部４６１は、ある画素位置（ｘ，ｙ）において所定のフレーム数連続してエッジ画素値Ｂedge（ｘ，ｙ）が１となったときに、静止エッジ画素値Ｂstill（ｘ，ｙ）を１とする。 The still edge pixel generation unit 461 acquires the edge pixel value Bedge from the local feature value generation unit 210, calculates the still edge pixel value Bstill, and outputs it to the still edge pixel number feature value generation unit 462. When the edge pixel value Bedge (x, y) becomes 1 continuously for a predetermined number of frames at a certain pixel position (x, y), the still edge pixel generation unit 461 performs the still edge pixel value Bstill (x, y). ) Is 1.

図２３は、本実施の形態４に係る静止エッジ画素値Ｂstillの算出手順を示すフローチャートである。
静止エッジ画素生成部４６１は、復号画像を入力する（ステップＳ１０）と、入力映像の最初のフレーム、すなわち、先頭フレーム（時刻ｔ＝０）であるか判断する（ステップＳ２０）。先頭フレームであるとき（ステップＳ２０のＹｅｓ）は、全ての画素位置のカウント数ｃｎｔ（ｘ，ｙ）を０に初期化して（ステップＳ３０）、次の手順（ステップＳ４０）に進む。また、先頭フレームでないとき（ステップＳ２０のＮｏ）は、そのまま次の手順（ステップＳ４０）に進む。 FIG. 23 is a flowchart showing a calculation procedure of the still edge pixel value Bstill according to the fourth embodiment.
When the decoded edge image is input (step S10), the still edge pixel generation unit 461 determines whether it is the first frame of the input video, that is, the first frame (time t = 0) (step S20). When it is the first frame (Yes in step S20), the count numbers cnt (x, y) of all pixel positions are initialized to 0 (step S30), and the process proceeds to the next procedure (step S40). If it is not the first frame (No in step S20), the process proceeds to the next procedure (step S40) as it is.

次に、静止エッジ画素生成部４６１は、フレーム画像内の全ての画素位置（ｘ，ｙ）に対して、以下の処理（ステップＳ４０〜ステップＳ９０）を行って、静止エッジ画素値Ｂstill（ｘ，ｙ）を算出する。 Next, the still edge pixel generation unit 461 performs the following processing (step S40 to step S90) on all the pixel positions (x, y) in the frame image, so that the still edge pixel value Bstill (x, y y) is calculated.

まず、エッジ画素値Ｂedge（ｘ，ｙ）が０であるか判断する（ステップＳ４０）。
エッジ画素値Ｂedge（ｘ，ｙ）が０でないとき、すなわち、エッジ画素値Ｂedge（ｘ，ｙ）が１であるとき（ステップＳ４０のＮｏ）は、カウント数ｃｎｔ（ｘ，ｙ）を１増加させる（ステップＳ５０）。 First, it is determined whether the edge pixel value Bedge (x, y) is 0 (step S40).
When the edge pixel value Bedge (x, y) is not 0, that is, when the edge pixel value Bedge (x, y) is 1 (No in step S40), the count number cnt (x, y) is increased by 1. (Step S50).

さらに、カウント数ｃｎｔ（ｘ，ｙ）が所定の閾値Ｔcnt以上であるか判断する（ステップＳ６０）。
カウント数ｃｎｔ（ｘ，ｙ）が所定の閾値Ｔcnt以上であるとき（ステップＳ６０のＹｅｓ）は、静止エッジ画素値Ｂstill（ｘ，ｙ）を１とする（ステップＳ７０）。閾値Ｔcntは、何フレームか連続してエッジが発生しているときに「エッジが静止している」とみなすための値である。 Further, it is determined whether the count number cnt (x, y) is greater than or equal to a predetermined threshold value Tcnt (step S60).
When the count number cnt (x, y) is equal to or larger than the predetermined threshold Tcnt (Yes in step S60), the still edge pixel value Bstill (x, y) is set to 1 (step S70). The threshold value Tcnt is a value for determining that “the edge is stationary” when the edge is continuously generated for several frames.

また、カウント数ｃｎｔ（ｘ，ｙ）が所定の閾値Ｔcnt未満であるとき（ステップＳ６０のＮｏ）は、静止エッジ画素値Ｂstill（ｘ，ｙ）を０とする（ステップＳ８０）。 When the count number cnt (x, y) is less than the predetermined threshold Tcnt (No in step S60), the still edge pixel value Bstill (x, y) is set to 0 (step S80).

一方、エッジ画素値Ｂedge（ｘ，ｙ）が０であるとき（ステップＳ４０のＹｅｓ）は、静止エッジ画素値Ｂstill（ｘ，ｙ）を０とするとともに、カウント数ｃｎｔ（ｘ，ｙ）を０にリセットする（ステップＳ９０）。 On the other hand, when the edge pixel value Bedge (x, y) is 0 (Yes in step S40), the still edge pixel value Bstill (x, y) is set to 0 and the count number cnt (x, y) is set to 0. (Step S90).

そして、静止エッジ画素生成部４６１は、フレーム画像内の全ての画素位置（ｘ，ｙ）に対して静止エッジ画素値Ｂstill（ｘ，ｙ）を算出した（ステップＳ４０〜ステップＳ９０）後に、静止エッジ画素値Ｂstill（ｘ，ｙ）を静止エッジ画素数特徴量生成部４６２に出力する。 Then, the still edge pixel generation unit 461 calculates the still edge pixel value Bstill (x, y) for all the pixel positions (x, y) in the frame image (steps S40 to S90), and then the still edge Pixel value Bstill (x, y) is output to still edge pixel number feature quantity generation section 462.

次に、静止エッジ画素生成部４６１は、最後のフレームか判断する（ステップＳ１１０）。最後のフレームでないとき（ステップＳ１１０のＮｏ）は、ステップＳ１０に戻る。また、最後のフレームであるとき（ステップＳ１１０のＹｅｓ）は、処理を終了する。
このようにして、静止エッジ画素生成部４６１は、時刻ｔにおける静止エッジ画素値Ｂstill（ｘ，ｙ）を算出することができる。 Next, the still edge pixel generation unit 461 determines whether it is the last frame (step S110). When it is not the last frame (No in step S110), the process returns to step S10. When it is the last frame (Yes in step S110), the process is terminated.
In this way, the still edge pixel generation unit 461 can calculate the still edge pixel value Bstill (x, y) at time t.

静止エッジ画素数特徴量生成部４６２は、静止エッジ画素値Ｂstill（ｘ，ｙ）を取得し、次の式（２８）に従って静止エッジ画素数特徴量Ｆse（ｘ，ｙ）を算出し、サブタイトル特徴量生成部４６３に出力する。本実施の形態４では、実施の形態１と同様に、局所領域のサイズを１５×１５画素とし、注目画素（ｘ，ｙ）の周辺画素の座標（ｉ，ｊ）は、−７から７の値をとるものとする。
The still edge pixel number feature quantity generation unit 462 acquires the still edge pixel value Bstill (x, y), calculates the still edge pixel number feature quantity Fse (x, y) according to the following equation (28), and obtains the subtitle feature. The data is output to the quantity generation unit 463. In the fourth embodiment, as in the first embodiment, the size of the local area is 15 × 15 pixels, and the coordinates (i, j) of the peripheral pixels of the pixel of interest (x, y) are from −7 to 7 Take the value.

サブタイトル特徴量生成部４６３は、静止エッジ画素数特徴量Ｆse、エッジ強度特徴量Ｆes、サブタイトル位置特徴量Ｆspの３つの特徴量を用い、次の式（２９）に従ってサブタイトル特徴量Ｆstを算出し、評価値生成部４２０に出力する。
The subtitle feature value generation unit 463 calculates the subtitle feature value Fst according to the following equation (29) using the three feature values of the still edge pixel number feature value Fse, the edge strength feature value Fes, and the subtitle position feature value Fsp. The result is output to the evaluation value generation unit 420.

サブタイトル位置特徴量Ｆsp（ｘ，ｙ）は、座標（ｘ，ｙ）におけるサブタイトルの出現しやすさを表す値である。例えば、字幕は画面下に表れ易く、放送局のウォーターマークやスポーツの得点などは画面上部に表れ易い。 The subtitle position feature amount Fsp (x, y) is a value representing the ease of appearance of the subtitle at the coordinates (x, y). For example, captions are likely to appear at the bottom of the screen, and broadcast station watermarks and sports scores are likely to appear at the top of the screen.

図２４は、本実施の形態４に係るサブタイトル出現位置の例である。一般に、サブタイトルは、画面６の上部又は下部のハッチングを付した領域に出現しやすい。
サブタイトル特徴量生成部４６３は、このハッチング部分の値が高くなるように、サブタイトル位置特徴量Ｆsp（ｘ，ｙ）を予め設定しておく。また、サブタイトル特徴量生成部４６３は、大量の番組データを集めて画素位置（ｘ，ｙ）毎にサブタイトルの発生確率を予め計算し、サブタイトル位置特徴量Ｆsp（ｘ，ｙ）としても良い。 FIG. 24 is an example of the subtitle appearance position according to the fourth embodiment. In general, subtitles are likely to appear in hatched areas at the top or bottom of the screen 6.
The subtitle feature quantity generation unit 463 sets the subtitle position feature quantity Fsp (x, y) in advance so that the hatched portion has a high value. Further, the subtitle feature quantity generation unit 463 may collect a large amount of program data, calculate the subtitle occurrence probability for each pixel position (x, y) in advance, and use it as the subtitle position feature quantity Fsp (x, y).

評価値生成部４２０は、ローカル特徴量生成部２１０から取得した複数のローカル特徴量とグローバル特徴量であるサブタイトル特徴量Ｆspとを重み付け統合することにより、ノイズ評価値Ｅを算出する。
具体的には、次の式（３０）に従ってサブタイトル特徴量Ｆstと式（１６）で算出した第２評価値Ｅsとを統合し、第３評価値Ｅgを算出する。
ここで、ｗst（）はサブタイトル特徴量Ｆstに関する重み関数である。 The evaluation value generation unit 420 calculates the noise evaluation value E by weighting and integrating the plurality of local feature amounts acquired from the local feature amount generation unit 210 and the subtitle feature amount Fsp that is a global feature amount.
Specifically, the third evaluation value Eg is calculated by integrating the subtitle feature quantity Fst and the second evaluation value Es calculated by the expression (16) according to the following expression (30).
Here, wst () is a weighting function related to the subtitle feature quantity Fst.

図２５は、本実施の形態４に係るサブタイトル特徴量Ｆstに関する重み関数ｗstの例である。
重み関数ｗstは、サブタイトル特徴量Ｆstが大きくなると重み係数ｗst（Ｆst（ｘ，ｙ））が大きくなるように設定する。 FIG. 25 shows an example of the weighting function wst related to the subtitle feature quantity Fst according to the fourth embodiment.
The weighting function wst is set so that the weighting coefficient wst (Fst (x, y)) increases as the subtitle feature amount Fst increases.

そして、評価値生成部４２０は、式（１７）と同様に、エッジ上のノイズ評価値Ｅを低く抑えるために、次の式（３１）に従って、標準偏差に基づく重み係数ｗstdv（Ｖstdv（ｘ，ｙ））を乗じてノイズ評価値Ｅを算出する。
Then, in the same way as Expression (17), the evaluation value generation unit 420 suppresses the noise evaluation value E on the edge to be low, according to the following Expression (31), the weight coefficient wstdv (Vstdv (x, Multiply y)) to calculate the noise evaluation value E.

このように、評価値生成部４２０は、複数のローカル特徴量とサブタイトル特徴量Ｆstとから一画素毎にノイズ評価値Ｅを算出することができる。これにより、モスキートノイズが目立ちやすい字幕、番組テロップなどでのノイズ評価値Ｅを高め、より視認性を考慮してモスキートノイズを検出することができる。 As described above, the evaluation value generation unit 420 can calculate the noise evaluation value E for each pixel from the plurality of local feature values and the subtitle feature value Fst. Thereby, it is possible to increase the noise evaluation value E in subtitles, program telops, etc. in which mosquito noise is conspicuous, and to detect mosquito noise in consideration of visibility.

以上、説明したように、本実施の形態４に係る画像処理装置４０１は、復号画像よりも時間的に前の復号画像のローカル特徴量に基づいて、グローバル特徴量を生成するグローバル特徴量生成部４６０を更に備え、グローバル特徴量は静止した文字領域であることを示すサブタイトル特徴量Ｆspを含み、評価値生成部４２０は、複数のローカル特徴量とグローバル特徴量とを統合して、ノイズ評価値Ｅを算出することが好ましい。
このような構成により、より視認性を考慮してモスキートノイズを検出することができる。 As described above, the image processing apparatus 401 according to the fourth embodiment has a global feature amount generation unit that generates a global feature amount based on a local feature amount of a decoded image temporally prior to the decoded image. 460 further includes a subtitle feature value Fsp indicating that the global feature value is a stationary character region, and the evaluation value generation unit 420 integrates a plurality of local feature values and the global feature value to obtain a noise evaluation value. It is preferable to calculate E.
With such a configuration, it is possible to detect mosquito noise in consideration of visibility.

なお、上記実施の形態２〜４においては、前フレームのローカル特徴量又はその統計量に基づいて算出したグローバル特徴量を、現フレームのノイズ評価値Ｅを算出するときに使用した。これにより、ハードウェアで逐次ライン処理を行うような場合でも実現可能であり、フレーム遅延などが発生しないという利点がある。 In the second to fourth embodiments, the global feature amount calculated based on the local feature amount of the previous frame or its statistical amount is used when calculating the noise evaluation value E of the current frame. This can be realized even in the case where sequential line processing is performed by hardware, and has an advantage that frame delay or the like does not occur.

ただし、リアルタイム処理を必要としない場合、フレーム遅延が許容できる場合、ソフトウェアで実現する場合などにおいては、前フレームではなく現フレームのローカル特徴量又はその統計量に基づいてグローバル特徴量を算出し、当該グローバル特徴量を用いて現フレームのノイズ評価値Ｅを計算する２パス処理を行うこともできる。 However, when real-time processing is not required, when frame delay is acceptable, or when realized by software, the global feature is calculated based on the local feature of the current frame instead of the previous frame or its statistics, Two-pass processing for calculating the noise evaluation value E of the current frame using the global feature amount can also be performed.

また、実施の形態１〜４に係る複数のローカル特徴量及び複数のグローバル特徴量を全て用いてノイズ評価値Ｅを算出することもできるし、一部の特徴量を用いてノイズ評価値Ｅを算出することもできる。 Further, the noise evaluation value E can be calculated using all of the plurality of local feature values and the plurality of global feature values according to the first to fourth embodiments, or the noise evaluation value E can be calculated using a part of the feature values. It can also be calculated.

各実施の形態に係る画像処理装置又は画像処理方法は、当該技術分野の圧縮符号化により劣化した画像・映像においてモスキートノイズを検出し除去することによって画質を改善する技術に関するものであり、例えば、デジタルテレビなどの表示装置に搭載する画像処理チップ内のモスキートノイズ除去処理に使用することができる。ノイズの誤検出又は検出もれが少なく、ハードウェアコストも低いことから、性能及びコストの面で優れている。 The image processing apparatus or image processing method according to each embodiment relates to a technique for improving image quality by detecting and removing mosquito noise in an image / video deteriorated by compression coding in the technical field. It can be used for mosquito noise removal processing in an image processing chip mounted on a display device such as a digital television. Since there are few false detections or detection leaks of noise and the hardware cost is low, it is excellent in terms of performance and cost.

以上の各実施形態に関し、更に以下の付記を開示する。
（付記１）
圧縮符号化によって発生したモスキートノイズを復号画像のみを用いて除去するノイズ除去装置において、
復号画像における所定の局所領域内の画素を用いてエッジ、相対的フラット、フラット、微小変動に関する複数のローカル特徴量を生成するローカル特徴量生成部と、
複数のローカル特徴量の重み付け統合処理によって画素毎のノイズ評価値を算出する評価値生成部と、
復号画像に対して平滑化フィルタを適用し平滑化画像を生成するフィルタ処理部と、
ノイズ評価値に基づいて混合率を生成する混合率生成部と、
復号画像と平滑化画像とを混合率に基づいて混合することにより出力画像を生成する画像混合処理部と
を備えるノイズ除去装置。 Regarding the above embodiments, the following additional notes are disclosed.
(Appendix 1)
In a noise removal apparatus that removes mosquito noise generated by compression coding using only a decoded image,
A local feature amount generating unit that generates a plurality of local feature amounts related to edges, relative flats, flats, and minute variations using pixels in a predetermined local region in the decoded image;
An evaluation value generation unit that calculates a noise evaluation value for each pixel by weighted integration processing of a plurality of local feature amounts;
A filter processing unit that applies a smoothing filter to the decoded image to generate a smoothed image;
A mixing rate generator that generates a mixing rate based on the noise evaluation value;
A noise removal apparatus comprising: an image mixing processing unit that generates an output image by mixing a decoded image and a smoothed image based on a mixing ratio.

（付記２）
圧縮符号化された動画像において復号画像が動画像の１フレームである場合に、
処理対象フレームよりも時間的に前のフレームにおけるローカル特徴量を利用したグローバル特徴量を生成するグローバル特徴量生成部を更に備え、
評価値算出部は、複数のローカル特徴量に加えてグローバル特徴量も用いた重み付け統合処理によってノイズ評価値を算出する
付記１のノイズ除去装置。 (Appendix 2)
When a decoded image is one frame of a moving image in a compression-encoded moving image,
A global feature generation unit that generates a global feature using a local feature in a frame temporally prior to the processing target frame;
The noise removal apparatus according to appendix 1, wherein the evaluation value calculation unit calculates a noise evaluation value by weighted integration processing using a global feature amount in addition to a plurality of local feature amounts.

（付記３）
複数のローカル特徴量は、復号画像における局所領域（１５×１５画素など）の画素値に基づいて算出された基本特徴量により算出される
付記１のノイズ除去装置。
（付記４）
基本特徴量は、局所領域内の分散値または標準偏差値である
付記３のノイズ除去装置。 (Appendix 3)
The noise removal device according to supplementary note 1, wherein the plurality of local feature amounts are calculated based on a basic feature amount calculated based on a pixel value of a local region (15 × 15 pixels or the like) in the decoded image.
(Appendix 4)
The noise removal device according to appendix 3, wherein the basic feature amount is a dispersion value or a standard deviation value in the local region.

（付記５）
基本特徴量は、局所領域内の隣接画素差分絶対値である
付記３のノイズ除去装置。
（付記６）
複数のローカル特徴量は、基本特徴量に基づいて算出されるエッジ強度、エッジ画素数、相対的フラット画素数のうち少なくとも一つ以上を含む
付記３のノイズ除去装置。 (Appendix 5)
The noise removal device according to supplementary note 3, wherein the basic feature amount is an adjacent pixel difference absolute value in the local region.
(Appendix 6)
The noise removal device according to supplementary note 3, wherein the plurality of local feature amounts includes at least one of edge strength, edge pixel number, and relative flat pixel number calculated based on the basic feature amount.

（付記７）
複数のローカル特徴量は、基本特徴量に基づいて算出されるフラット画素数、微小変動強度、微小変動分散、微小変動画素数を含む
付記３のノイズ除去装置。
（付記８）
グローバル特徴量は、色に基づいて決められるテクスチャ特徴量を含む
付記２のノイズ除去装置。 (Appendix 7)
The noise removal apparatus according to supplementary note 3, wherein the plurality of local feature amounts includes a flat pixel number, a minute variation intensity, a minute variation variance, and a minute variation pixel number calculated based on the basic feature amount.
(Appendix 8)
The noise removal apparatus according to appendix 2, wherein the global feature amount includes a texture feature amount determined based on a color.

（付記９）
テクスチャ特徴量は、時間的に前のフレームにおける色別微小変動画素ヒストグラムに基づいて算出する
付記８のノイズ除去装置。
（付記１０）
グローバル特徴量は、画像全体のノイズ量を表すノイズシーン特徴量を含む
付記２のノイズ除去装置。 (Appendix 9)
The noise removal apparatus according to appendix 8, wherein the texture feature amount is calculated based on a color-dependent minute variation pixel histogram in a temporally previous frame.
(Appendix 10)
The noise removal apparatus according to appendix 2, wherein the global feature amount includes a noise scene feature amount representing a noise amount of the entire image.

（付記１１）
ノイズシーン特徴量は、時間的に前のフレームにおける平坦部の輝度差分符号変化数に基づいて算出する
付記１０のノイズ除去装置。
（付記１２）
グローバル特徴量は、静止した文字領域であることを表すサブタイトル特徴量を含む
付記２のノイズ除去装置。 (Appendix 11)
The noise removal device according to appendix 10, wherein the noise scene feature amount is calculated based on the number of changes in luminance difference code of the flat portion in the previous frame in time.
(Appendix 12)
The noise removal device according to attachment 2, wherein the global feature amount includes a subtitle feature amount indicating that the character region is a stationary character region.

（付記１３）
サブタイトル特徴量は、時間的に前のフレームにおけるエッジ画素と現フレームのエッジ画素に基づいて算出される静止エッジ画素数と、現フレームのエッジ強度と、領域別文字出現頻度と、から算出する
付記１２のノイズ除去装置。
（付記１４）
ノイズ評価値は、エッジ強度が大きくなるほど、大きな値となる
付記１のノイズ除去装置。 (Appendix 13)
The subtitle feature amount is calculated from the number of still edge pixels calculated based on the edge pixels in the previous frame in time and the edge pixels in the current frame, the edge strength of the current frame, and the character appearance frequency by region. 12 noise removers.
(Appendix 14)
The noise removal device according to appendix 1, wherein the noise evaluation value increases as the edge strength increases.

（付記１５）
ノイズ評価値は、エッジ画素数が大きくなるほど、大きな値となる
付記１のノイズ除去装置。
（付記１６）
ノイズ評価値は、相対的フラット画素数が大きくなるほど、大きな値となる
付記１のノイズ除去装置。 (Appendix 15)
The noise removal device according to appendix 1, wherein the noise evaluation value increases as the number of edge pixels increases.
(Appendix 16)
The noise removal device according to appendix 1, wherein the noise evaluation value becomes larger as the number of relative flat pixels increases.

（付記１７）
ノイズ評価値は、主要局所特徴量がゼロではないという条件下において、フラット画素数がある所定の範囲内の値をとるときに大きな値となる
付記１のノイズ除去装置。
（付記１８）
ノイズ評価値は、主要局所特徴量がゼロではないという条件下において、微小輝度変動強度がある所定の範囲内の値をとるときに大きな値となる
付記１のノイズ除去装置。 (Appendix 17)
The noise removal device according to supplementary note 1, wherein the noise evaluation value becomes a large value when the number of flat pixels takes a value within a predetermined range under a condition that the main local feature is not zero.
(Appendix 18)
The noise removal device according to supplementary note 1, wherein the noise evaluation value is a large value when the value of the minute luminance fluctuation takes a value within a predetermined range under a condition that the main local feature is not zero.

（付記１９）
ノイズ評価値は、主要局所特徴量がゼロではないという条件下において、微小輝度変動分散がある所定の範囲内の値をとるときに大きな値となる
付記１のノイズ除去装置。
（付記２０）
ノイズ評価値は、主要局所特徴量がゼロではないという条件下において、微小輝度変動画素数がある所定の範囲内の値をとるときに大きな値となる
付記１のノイズ除去装置。 (Appendix 19)
The noise removal device according to supplementary note 1, wherein the noise evaluation value becomes a large value when a value within a predetermined range having a minute luminance fluctuation dispersion is obtained under a condition that the main local feature amount is not zero.
(Appendix 20)
The noise removal device according to appendix 1, wherein the noise evaluation value becomes a large value when the number of minute luminance variation pixels takes a value within a predetermined range under a condition that the main local feature is not zero.

（付記２１）
ノイズ評価値は、テクスチャ特徴量が大きくなるほど、小さな値となる
付記２のノイズ除去装置。
（付記２２）
ノイズ評価値は、ノイズシーン特徴量が大きくなるほど、小さな値となる
付記１０のノイズ除去装置。 (Appendix 21)
The noise removal device according to attachment 2, wherein the noise evaluation value becomes smaller as the texture feature amount becomes larger.
(Appendix 22)
The noise removal device according to appendix 10, wherein the noise evaluation value becomes smaller as the noise scene feature amount becomes larger.

（付記２３）
ノイズ評価値は、サブタイトル特徴量が大きくなるほど、大きな値となる
付記１２のノイズ除去装置。
（付記２４）
ノイズ評価値は、基本特徴量がある一定の閾値を超えた場合に、基本特徴量の値が大きくなるほど、小さな値となる
付記３のノイズ除去装置。 (Appendix 23)
The noise removal device according to attachment 12, wherein the noise evaluation value increases as the subtitle feature amount increases.
(Appendix 24)
The noise removal device according to supplementary note 3, wherein the noise evaluation value becomes smaller as the basic feature value increases when the basic feature value exceeds a certain threshold.

（付記２５）
エッジ強度は、局所領域内のエッジ画素における基本特徴量の平均値である
付記１４のノイズ除去装置。
（付記２６）
エッジ画素数は、局所領域内におけるエッジ画素の数である
付記１５のノイズ除去装置。 (Appendix 25)
The noise removal device according to appendix 14, wherein the edge strength is an average value of basic feature values in edge pixels in the local region.
(Appendix 26)
The noise removal device according to supplementary note 15, wherein the number of edge pixels is the number of edge pixels in the local region.

（付記２７）
相対的フラット画素数は、局所領域内における各画素の基本特徴量に所定のフラット率をかけた値をフラット閾値とし、中心画素の基本特徴量または局所一様領域内の基本特徴量平均値がフラット閾値以下となる画素を数えたものである
付記１６のノイズ除去装置。
（付記２８）
フラット画素数は、局所一様領域内におけるフラット画素の数である
付記１７のノイズ除去装置。 (Appendix 27)
The relative flat pixel count is obtained by multiplying the basic feature value of each pixel in the local area by a predetermined flat rate, and the flat threshold value. The basic feature value of the central pixel or the basic feature value average value in the local uniform area is The noise removal device according to supplement 16, wherein the number of pixels equal to or less than the flat threshold is counted.
(Appendix 28)
The noise removal device according to appendix 17, wherein the number of flat pixels is the number of flat pixels in the local uniform region.

（付記２９）
微小変動強度は、局所一様領域内の微小変動画素における基本特徴量の平均値である
付記１８のノイズ除去装置。
（付記３０）
微小変動分散は、局所一様領域内の微小変動画素における基本特徴量の分散値である
付記１９のノイズ除去装置。 (Appendix 29)
The noise removal device according to appendix 18, wherein the minute variation intensity is an average value of basic feature amounts in minute variation pixels within a local uniform region.
(Appendix 30)
The noise removal device according to appendix 19, wherein the minute variation variance is a variance value of a basic feature amount in a minute variation pixel in a local uniform region.

（付記３１）
微小変動画素数は、局所一様領域内における微小変動画素の数である
付記２０のノイズ除去装置。
（付記３２）
エッジ画素は、基本特徴量が所定の閾値以上となる画素である
付記２５のノイズ除去装置。 (Appendix 31)
The noise removal device according to appendix 20, wherein the number of minute variation pixels is the number of minute variation pixels in a local uniform region.
(Appendix 32)
The noise removal device according to attachment 25, wherein the edge pixel is a pixel whose basic feature amount is equal to or greater than a predetermined threshold value.

（付記３３）
フラット画素は、基本特徴量が所定の閾値以下となる画素である
付記２８のノイズ除去装置。
（付記３４）
微小変動画素は、基本特徴量が上限・下限を示す２つの閾値の範囲内に入る画素である
付記２９のノイズ除去装置。 (Appendix 33)
The noise removal device according to appendix 28, wherein the flat pixel is a pixel whose basic feature amount is equal to or less than a predetermined threshold value.
(Appendix 34)
The noise removal device according to supplementary note 29, wherein the minute variation pixel is a pixel whose basic feature amount falls within a range of two threshold values indicating an upper limit and a lower limit.

（付記３５）
局所一様領域は、所定の大きさ（１５×１５画素など）の局所領域内の画素に対して、中心画素との空間位置、輝度値、分散値の類似度が高いほど大きい値となる重みを各画素で算出し、重みが所定の閾値以上となる画素のみを局所一様領域として用いる
付記２８のノイズ除去装置。 (Appendix 35)
The local uniform area is a weight that increases as the degree of similarity of the spatial position, luminance value, and variance value with the central pixel increases with respect to the pixels in the local area of a predetermined size (15 × 15 pixels or the like). The noise removal apparatus according to supplementary note 28, wherein the pixel is calculated for each pixel, and only pixels whose weight is equal to or greater than a predetermined threshold are used as the local uniform region.

１、２０１、３０１、４０１画像処理装置
１０、２１０ローカル特徴量生成部
１１第１ローカル特徴量生成部
１２第２ローカル特徴量生成部
２０、２２０、３２０、４２０評価値生成部
３０、２３０混合率生成部
４０、２４０フィルタ処理部
５０、２５０画像混合処理部
１１１標準偏差算出部
１１４隣接画素差分絶対値和算出部
１１６微小変動画素生成部
１２１エッジ強度特徴量生成部
１２２エッジ画素数特徴量生成部
１２７相対的フラット画素数特徴量生成部
２６０、３６０、４６０グローバル特徴量生成部
２６５テクスチャ特徴量生成部
３６２ノイズシーン特徴量生成部
４６３サブタイトル特徴量生成部 1, 201, 301, 401 Image processing device 10, 210 Local feature generation unit 11 First local feature generation unit 12 Second local feature generation unit 20, 220, 320, 420 Evaluation value generation unit 30, 230 Mixing rate Generation unit 40, 240 Filter processing unit 50, 250 Image mixing processing unit 111 Standard deviation calculation unit 114 Adjacent pixel difference absolute value sum calculation unit 116 Slight variation pixel generation unit 121 Edge strength feature quantity generation unit 122 Edge pixel number feature quantity generation unit 127 Relative flat pixel number feature generation unit 260, 360, 460 Global feature generation unit 265 Texture feature generation unit 362 Noise scene feature generation unit 463 Subtitle feature generation unit

Claims

A local feature that generates a plurality of local feature amounts including at least a relative flat pixel number feature amount indicating the number of peripheral pixels when the central pixel of the local region of the decoded image is flatter than the individual peripheral pixels of the local region A quantity generator;
An evaluation value generation unit that calculates a noise evaluation value for each pixel of the decoded image by weight-integrating the plurality of local feature amounts;
A mixing rate generation unit that generates a mixing rate based on the noise evaluation value;
A filter processing unit for generating a smoothed image of the decoded image;
An image processing apparatus comprising: an image mixing processing unit that mixes the decoded image and the smoothed image based on the mixing ratio and generates an output image.

The local feature value generation unit is configured to determine the plurality of local feature values based on at least one of a variance value of pixel values in the local region, a standard deviation value of pixel values, or an adjacent pixel difference absolute value, which is a basic feature value. The image processing device according to claim 1, wherein the image processing device calculates each of the images.

When the local feature amount generating unit generates the relative flat pixel number feature amount,
The average value of the basic feature values of the pixels in the local region whose spatial position, brightness value, or dispersion value of pixel values with the central pixel is equal to or greater than a predetermined value, and the basic features of the individual peripheral pixels The image processing apparatus according to claim 2, wherein the amount is compared.

The plurality of local feature amounts further include at least one of an edge strength feature amount that is an average value of the basic feature amounts of edge pixels in the local region or an edge pixel number feature amount of the local region. The image processing apparatus according to claim 3.

5. The plurality of local feature amounts further include a feature amount relating to a minute variation pixel in which the basic feature amount falls between a lower limit value greater than 0 and an upper limit value. The image processing apparatus according to item.

A global feature quantity generating unit that generates a global feature quantity based on the local feature quantity of the decoded image temporally prior to the decoded image;
The global feature amount includes at least one of a texture feature amount indicating texture, a noise scene feature amount indicating image noise amount, or a subtitle feature amount indicating a stationary character region,
The image processing apparatus according to any one of claims 1 to 5, wherein the evaluation value generation unit calculates the noise evaluation value by integrating the plurality of local feature values and the global feature value.

Generating a plurality of local feature amounts including at least a relative flat pixel number feature amount indicating the number of peripheral pixels when a central pixel of a local region of the decoded image is flatter than each peripheral pixel of the local region; ,
Calculating a noise evaluation value for each pixel of the decoded image by weight-integrating the plurality of local feature amounts;
Generating a mixing ratio based on the noise evaluation value;
Generating a smoothed image of the decoded image;
An image processing method comprising: mixing the decoded image and the smoothed image based on the mixing ratio to generate an output image.

On the computer,
A procedure for generating a plurality of local feature amounts including at least a relative flat pixel number feature amount indicating the number of peripheral pixels when a central pixel of a local region of a decoded image is flatter than each peripheral pixel of the local region; ,
A procedure for calculating a noise evaluation value for each pixel of the decoded image by weight-integrating the plurality of local feature amounts;
Generating a mixing ratio based on the noise evaluation value;
Generating a smoothed image of the decoded image;
An image processing program for executing a procedure of mixing the decoded image and the smoothed image based on the mixing ratio and generating an output image.