JP2020058014A

JP2020058014A - Video processing apparatus, video conference system, video processing method, and program

Info

Publication number: JP2020058014A
Application number: JP2019098709A
Authority: JP
Inventors: 耕司桑田; Koji Kuwata
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-09-28
Filing date: 2019-05-27
Publication date: 2020-04-09
Anticipated expiration: 2039-05-27
Also published as: JP7334470B2

Abstract

To reduce the amount of video data, and make it possible to make inconspicuous a difference in image quality at the boundary between a low-image quality area and a high-image quality area.SOLUTION: A video processing apparatus comprises: a video acquisition unit that acquires a video; a video analysis unit that analyzes a high frequency component for each of the areas in the video acquired by the video acquisition unit; and an image quality adjustment unit that performs image quality adjustment according to a result of the analysis performed by the video analysis unit such that at least part of the areas in the video has a higher image quality as the area has more high frequency components.SELECTED DRAWING: Figure 4

Description

本発明は、映像処理装置、ビデオ会議システム、映像処理方法、およびプログラムに関する。 The present invention relates to a video processing device, a video conference system, a video processing method, and a program.

下記特許文献１には、監視カメラによって撮像された撮像画像に対し、動きや顔が検出されていない領域の画像を低画質化し、動きや顔が検出された領域の画像を、動きや顔が検出されていない領域の画像に比べて高画質化する技術が開示されている。この技術によれば、撮像画像の符号化データサイズを小さくして、ネットワークにおける伝送路の負担を軽減することができるとともに、動き領域の画像の視認性を良くすることができるとされている。 Japanese Patent Application Laid-Open No. H11-163,1992 discloses a method in which an image in an area where no motion or face is detected is reduced in image quality from an image captured by a surveillance camera, and an image in an area where motion or face is detected is replaced with a motion or face. There is disclosed a technique for improving the image quality as compared with an image of an undetected area. According to this technology, the encoded data size of a captured image can be reduced to reduce the load on a transmission path in a network, and the visibility of an image in a moving area can be improved.

しかしながら、従来技術では、映像を低画質領域と高画質領域とに区分した場合、低画質領域と高画質領域との境界における画質の違いが目立ってしまい、視聴者に対して違和感を与えてしまうといった課題がある。 However, in the related art, when a video is divided into a low image quality region and a high image quality region, a difference in image quality at a boundary between the low image quality region and the high image quality region is conspicuous, and gives a sense of incongruity to a viewer. There is such a problem.

本発明は、上述した従来技術の課題を解決するため、映像データのデータ量を軽減するとともに、低画質領域と高画質領域との境界における画質の違いを目立ち難くすることができるようにすることを目的とする。 SUMMARY OF THE INVENTION In order to solve the above-described problems of the related art, the present invention reduces the data amount of video data and makes it possible to make the difference in image quality at the boundary between a low image quality region and a high image quality region less noticeable. With the goal.

上述した課題を解決するために、本発明の映像処理装置は、映像を取得する映像取得部と、前記映像取得部によって取得された前記映像における領域毎に高周波成分を解析する映像解析部と、前記映像解析部による解析結果に応じて、前記映像における前記領域の少なくとも一部が、前記高周波成分がより多い領域ほど、より高画質となるように画質調整を行う画質調整部とを備える。 In order to solve the above-described problem, the video processing device of the present invention is a video acquisition unit that acquires a video, a video analysis unit that analyzes a high-frequency component for each region in the video that is acquired by the video acquisition unit, An image quality adjustment unit is provided, which performs image quality adjustment so that at least a part of the region in the image has higher image quality according to the analysis result by the image analysis unit.

本発明によれば、映像データのデータ量を軽減するとともに、低画質領域と高画質領域との境界における画質の違いを目立ち難くすることができる。 ADVANTAGE OF THE INVENTION According to this invention, while reducing the data amount of image data, the difference in image quality in the boundary of a low image quality area | region and a high image quality area | region can be made inconspicuous.

本発明の一実施形態に係るビデオ会議システムのシステム構成を示す図FIG. 1 is a diagram showing a system configuration of a video conference system according to an embodiment of the present invention. 本発明の一実施形態に係るＩＷＢの外観を示す図FIG. 1 is a diagram illustrating an appearance of an IWB according to an embodiment of the present invention. 本発明の一実施形態に係るＩＷＢのハードウェア構成を示す図FIG. 1 is a diagram illustrating a hardware configuration of an IWB according to an embodiment of the present invention. 本発明の一実施形態に係るＩＷＢの機能構成を示す図FIG. 1 is a diagram illustrating a functional configuration of an IWB according to an embodiment of the present invention. 本発明の一実施形態に係るＩＷＢによるビデオ会議実行制御処理の手順を示すフローチャート5 is a flowchart illustrating a procedure of a video conference execution control process by IWB according to an embodiment of the present invention. 本発明の一実施形態に係る映像処理部による映像処理の手順を示すフローチャート4 is a flowchart illustrating a procedure of video processing by a video processing unit according to an embodiment of the present invention. 本発明の一実施形態に係る映像処理部による映像処理の一具体例を示す図The figure which shows one specific example of the video processing by the video processing part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る映像処理部による映像処理の一具体例を示す図The figure which shows one specific example of the video processing by the video processing part which concerns on one Embodiment of this invention.

〔一実施形態〕
以下、図面を参照して、本発明の一実施形態について説明する。 [One embodiment]
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

（ビデオ会議システム１０のシステム構成）
図１は、本発明の一実施形態に係るビデオ会議システム１０のシステム構成を示す図である。図１に示すように、ビデオ会議システム１０は、会議サーバ１２、会議予約サーバ１４、および複数のＩＷＢ（Interactive White Board）１００を備えており、これら複数の装置が、インターネット、イントラネット、ＬＡＮ（Local Area Network）等のネットワーク１６に接続されて構成されている。ビデオ会議システム１０は、これら複数の装置により、複数の拠点間でいわゆるビデオ会議を行うことができるようになっている。 (System Configuration of Video Conference System 10)
FIG. 1 is a diagram showing a system configuration of a video conference system 10 according to one embodiment of the present invention. As shown in FIG. 1, the video conference system 10 includes a conference server 12, a conference reservation server 14, and a plurality of IWBs (Interactive White Boards) 100. The plurality of devices are connected to the Internet, an intranet, a LAN (Local It is connected to a network 16 such as an Area Network. The video conference system 10 is capable of holding a so-called video conference between a plurality of bases by using the plurality of devices.

会議サーバ１２は、「サーバ装置」の一例である。会議サーバ１２は、複数のＩＷＢ１００によるビデオ会議に関する各種制御を行う。例えば、会議サーバ１２は、ビデオ会議の開始時においては、各ＩＷＢ１００と会議サーバ１２との通信接続状態の監視、各ＩＷＢ１００に対する呼び出し等を行う。また、会議サーバ１２は、ビデオ会議中においては、複数のＩＷＢ１００の間における各種データ（例えば、映像データ、音声データ、描画データ等）の転送処理等を行う。 The conference server 12 is an example of a “server device”. The conference server 12 performs various controls related to a video conference using a plurality of IWBs 100. For example, at the start of a video conference, the conference server 12 monitors the state of communication connection between each IWB 100 and the conference server 12, performs a call to each IWB 100, and the like. Further, during the video conference, the conference server 12 performs transfer processing of various data (for example, video data, audio data, drawing data, and the like) between the plurality of IWBs 100.

会議予約サーバ１４は、ビデオ会議の予約状況を管理する。具体的には、会議予約サーバ１４は、外部の情報処理装置（例えば、ＰＣ（Personal Computer）等）からネットワーク１６を介して入力された会議情報を管理する。会議情報には、例えば、開催日時、開催場所、参加者、役割、使用端末等が含まれている。ビデオ会議システム１０は、会議予約サーバ１４によって管理されている会議情報に基づいて、ビデオ会議を行う。 The conference reservation server 14 manages the reservation status of the video conference. Specifically, the conference reservation server 14 manages conference information input from an external information processing device (for example, a PC (Personal Computer) or the like) via the network 16. The meeting information includes, for example, the date and time of the meeting, the place of the meeting, participants, roles, terminals used, and the like. The video conference system 10 performs a video conference based on conference information managed by the conference reservation server 14.

ＩＷＢ１００は、「映像処理装置」、「撮像装置」、および「通信端末」の一例である。ＩＷＢ１００は、ビデオ会議が行われる各拠点に設置され、ビデオ会議の参加者によって使用される通信端末である。例えば、ＩＷＢ１００は、ビデオ会議において入力された各種データ（例えば、映像データ、音声データ、描画データ等）を、ネットワーク１６および会議サーバ１２を介して、他のＩＷＢ１００へ送信することができる。また、例えば、ＩＷＢ１００は、他のＩＷＢ１００から送信されてきた各種データを、データの種類に応じた出力方法（例えば、表示、音声出力等）によって出力することにより、ビデオ会議の参加者へ呈示することができる。 The IWB 100 is an example of a “video processing device”, an “imaging device”, and a “communication terminal”. The IWB 100 is a communication terminal installed at each site where a video conference is held and used by participants of the video conference. For example, the IWB 100 can transmit various data (for example, video data, audio data, drawing data, and the like) input in a video conference to another IWB 100 via the network 16 and the conference server 12. Further, for example, the IWB 100 outputs various data transmitted from another IWB 100 by an output method (for example, display, audio output, or the like) according to the type of data, and presents the data to the participants of the video conference. be able to.

（ＩＷＢ１００の構成）
図２は、本発明の一実施形態に係るＩＷＢ１００の外観を示す図である。図２に示すように、ＩＷＢ１００は、本体１００Ａの前面に、カメラ１０１、タッチパネル・ディスプレイ１０２、マイク１０３、およびスピーカ１０４を備えている。 (Configuration of IWB100)
FIG. 2 is a diagram illustrating an appearance of the IWB 100 according to the embodiment of the present invention. As shown in FIG. 2, the IWB 100 includes a camera 101, a touch panel display 102, a microphone 103, and a speaker 104 on a front surface of a main body 100A.

カメラ１０１は、当該ＩＷＢ１００の前方の映像を撮像する。カメラ１０１は、例えば、レンズと、イメージセンサと、ＤＳＰ（Digital Signal Processor）等の映像処理回路とを備えて構成されている。イメージセンサは、レンズによって集光された光を光電変換することにより、映像データ（ＲＡＷデータ）を生成する。イメージセンサとしては、例えば、ＣＣＤ（Charge Coupled Device）、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）等が用いられる。映像処理回路は、イメージセンサによって生成された映像データ（ＲＡＷデータ）に対して、ベイヤー変換、３Ａ制御（ＡＥ（自動露出制御）、ＡＦ（オートフォーカス）、およびＡＷＢ（オートホワイトバランス））等の一般的な映像処理を行うことにより、映像データ（ＹＵＶデータ）を生成する。そして、映像処理回路は、生成された映像データ（ＹＵＶデータ）を出力する。ＹＵＶデータは、色情報を、輝度信号（Ｙ）と、輝度信号と青色成分との差（Ｕ）と、輝度信号と赤色成分との差（Ｖ）との組み合わせで表したものである。 The camera 101 captures an image in front of the IWB 100. The camera 101 includes, for example, a lens, an image sensor, and an image processing circuit such as a DSP (Digital Signal Processor). The image sensor generates video data (RAW data) by photoelectrically converting the light collected by the lens. As the image sensor, for example, a charge coupled device (CCD), a complementary metal oxide semiconductor (CMOS), or the like is used. The video processing circuit performs Bayer conversion, 3A control (AE (automatic exposure control), AF (autofocus), and AWB (auto white balance)) on video data (RAW data) generated by the image sensor. Video data (YUV data) is generated by performing general video processing. Then, the video processing circuit outputs the generated video data (YUV data). The YUV data represents color information by a combination of a luminance signal (Y), a difference (U) between the luminance signal and the blue component, and a difference (V) between the luminance signal and the red component.

タッチパネル・ディスプレイ１０２は、ディスプレイとタッチパネルとを備えた装置である。タッチパネル・ディスプレイ１０２は、ディスプレイにより、各種情報（例えば、映像データ、描画データ等）を表示することができる。また、タッチパネル・ディスプレイ１０２は、タッチパネルにより、操作体１８（例えば、指、ペン等）の接触操作による、各種情報（例えば、文字、図形、画像等）の入力を行うことができる。ディスプレイとしては、例えば、液晶ディスプレイ、有機ＥＬディスプレイ、電子ペーパー等を用いることができる。タッチパネルとしては、例えば、静電容量方式のタッチパネルを用いることができる。 The touch panel display 102 is a device including a display and a touch panel. The touch panel display 102 can display various information (for example, video data, drawing data, and the like) on the display. In addition, the touch panel display 102 can input various information (for example, characters, figures, images, and the like) by a touch operation of the operating tool 18 (for example, a finger, a pen, and the like) using the touch panel. As the display, for example, a liquid crystal display, an organic EL display, electronic paper, or the like can be used. As the touch panel, for example, a capacitive touch panel can be used.

マイク１０３は、ＩＷＢ１００の周囲の音声を集音し、当該音声に対応する音声データ（アナログデータ）を生成した後、当該音声データ（アナログデータ）をアナログ−デジタル変換することによって、集音された音声に対応する音声データ（デジタルデータ）を出力する。 The microphone 103 collects sound around the IWB 100, generates sound data (analog data) corresponding to the sound, and performs analog-to-digital conversion of the sound data (analog data) to collect sound. Outputs audio data (digital data) corresponding to the audio.

スピーカ１０４は、音声データ（アナログデータ）に基づいて駆動されることにより、当該音声データに対応する音声を出力する。例えば、スピーカ１０４は、他拠点のＩＷＢ１００から送信された音声データに基づいて駆動されることにより、他拠点においてＩＷＢ１００によって集音された音声を出力する。 The speaker 104 outputs a sound corresponding to the audio data by being driven based on the audio data (analog data). For example, the speaker 104 is driven based on audio data transmitted from the IWB 100 at another site, and outputs sound collected by the IWB 100 at another site.

このように構成されたＩＷＢ１００は、カメラ１０１から取得された映像データに対して、後述する映像処理および符号化処理を行うことによってデータ量の削減を行った後、当該映像データとともに、タッチパネル・ディスプレイ１０２から取得された各種表示データ（例えば、映像データ、描画データ等）、および、マイク１０３から取得された音声データを、会議サーバ１２を介して他のＩＷＢ１００へ送信することにより、これらのデータを他のＩＷＢ１００と共有することができる。また、ＩＷＢ１００は、他のＩＷＢ１００から送信された各種表示データ（例えば、映像データ、描画データ等）に基づく表示内容を、タッチパネル・ディスプレイ１０２によって表示するとともに、他のＩＷＢ１００から送信された音声データに基づく音声を、スピーカ１０４によって音声出力することにより、これらの情報を他のＩＷＢ１００と共有することができる。 The IWB 100 configured as described above reduces the amount of data by performing video processing and encoding processing, which will be described later, on video data acquired from the camera 101. By transmitting various display data (for example, video data, drawing data, and the like) acquired from 102 and audio data acquired from the microphone 103 to another IWB 100 via the conference server 12, these data are transmitted. It can be shared with other IWBs 100. In addition, the IWB 100 displays display contents based on various display data (eg, video data, drawing data, and the like) transmitted from the other IWB 100 on the touch panel display 102, and also displays audio data transmitted from the other IWB 100 on the touch panel display 102. By outputting a sound based on the sound by the speaker 104, such information can be shared with another IWB 100.

例えば、図２に示す例では、タッチパネル・ディスプレイ１０２において、複数の表示領域１０２Ａ，１０２Ｂを有する表示レイアウトが表示されている。表示領域１０２Ａは、描画領域であり、操作体１８によって描画された描画データが表示される。表示領域１０２Ｂは、カメラ１０１によって撮像された自拠点の映像が表示される。なお、タッチパネル・ディスプレイ１０２は、他のＩＷＢ１００において描画された描画データ、他のＩＷＢ１００によって撮像された他拠点の映像等を、表示することも可能である。 For example, in the example shown in FIG. 2, a display layout having a plurality of display areas 102A and 102B is displayed on the touch panel display 102. The display area 102A is a drawing area in which drawing data drawn by the operating tool 18 is displayed. The display area 102 </ b> B displays an image of the own site captured by the camera 101. Note that the touch panel display 102 can also display drawing data drawn by another IWB 100, video of another site captured by the other IWB 100, and the like.

（ＩＷＢ１００のハードウェア構成）
図３は、本発明の一実施形態に係るＩＷＢ１００のハードウェア構成を示す図である。図３に示すように、ＩＷＢ１００は、図２で説明したカメラ１０１、タッチパネル・ディスプレイ１０２、マイク１０３、およびスピーカ１０４に加えて、ＣＰＵ（Central Processing Unit）を備えたシステムコントロール１０５、補助記憶装置１０６、メモリ１０７、通信Ｉ／Ｆ（Inter Face）１０８、操作部１０９、および録画装置１１０を備える。 (Hardware configuration of IWB100)
FIG. 3 is a diagram illustrating a hardware configuration of the IWB 100 according to the embodiment of the present invention. As shown in FIG. 3, the IWB 100 includes a system control 105 including a CPU (Central Processing Unit), an auxiliary storage device 106, in addition to the camera 101, the touch panel display 102, the microphone 103, and the speaker 104 described in FIG. , A memory 107, a communication I / F (Inter Face) 108, an operation unit 109, and a recording device 110.

システムコントロール１０５は、補助記憶装置１０６またはメモリ１０７に記憶されている各種プログラムを実行することにより、ＩＷＢ１００の各種制御を行う。例えば、システムコントロール１０５は、ＣＰＵ、周辺ユニットとのインタフェース、データアクセス調停機能等を備え、ＩＷＢ１００が備える各種ハードウェアの制御、ＩＷＢ１００が備えるビデオ会議に関する各種機能（図４参照）の実行制御を行う。 The system control 105 performs various controls of the IWB 100 by executing various programs stored in the auxiliary storage device 106 or the memory 107. For example, the system control 105 includes a CPU, an interface with a peripheral unit, a data access arbitration function, and the like, and controls various hardware included in the IWB 100 and controls execution of various functions related to a video conference included in the IWB 100 (see FIG. 4). .

例えば、システムコントロール１０５は、ビデオ会議に関する基本的な機能として、カメラ１０１から取得した映像データ、タッチパネル・ディスプレイ１０２から取得した描画データ、および、マイク１０３から取得した音声データを、通信Ｉ／Ｆ１０８を介して、他のＩＷＢ１００へ送信する。 For example, the system control 105 transmits video data acquired from the camera 101, drawing data acquired from the touch panel display 102, and audio data acquired from the microphone 103 to the communication I / F 108 as basic functions related to the video conference. Then, the data is transmitted to another IWB 100.

また、例えば、システムコントロール１０５は、カメラ１０１から取得された映像データに基づく映像、および、タッチパネル・ディスプレイ１０２から取得された描画データ（すなわち、自拠点の映像データおよび描画データ）に基づく描画内容を、タッチパネル・ディスプレイ１０２に表示させる。 In addition, for example, the system control 105 transmits a video based on the video data obtained from the camera 101 and a drawing content based on the drawing data obtained from the touch panel display 102 (that is, the video data and the drawing data of the own site). Is displayed on the touch panel display 102.

また、例えば、システムコントロール１０５は、他拠点のＩＷＢ１００から送信された映像データ、描画データ、および音声データを、通信Ｉ／Ｆ１０８を介して取得する。そして、システムコントロール１０５は、映像データに基づく映像、および、描画データに基づく描画内容を、タッチパネル・ディスプレイ１０２に表示させるとともに、音声データに基づく音声を、スピーカ１０４から音声出力させる。 Further, for example, the system control 105 acquires the video data, the drawing data, and the audio data transmitted from the IWB 100 at another site via the communication I / F 108. Then, the system control 105 causes the touch panel display 102 to display the video based on the video data and the drawing content based on the drawing data, and causes the speaker 104 to output the voice based on the voice data.

補助記憶装置１０６は、システムコントロール１０５により実行される各種プログラム、システムコントロール１０５が各種プログラムを実行するために必要なデータ等を記憶する。補助記憶装置１０６としては、例えば、フラッシュメモリ、ＨＤＤ（Hard Disk Drive）等の、不揮発性の記憶装置が用いられる。 The auxiliary storage device 106 stores various programs executed by the system control 105, data necessary for the system control 105 to execute various programs, and the like. As the auxiliary storage device 106, for example, a nonvolatile storage device such as a flash memory and an HDD (Hard Disk Drive) is used.

メモリ１０７は、システムコントロール１０５が各種プログラムを実行する際に利用する一時記憶領域として機能する。メモリ１０７としては、例えば、ＤＲＡＭ（Dynamic Random Access Memory）、ＳＲＡＭ（Static Random Access Memory）等の、揮発性の記憶装置が用いられる。 The memory 107 functions as a temporary storage area used when the system control 105 executes various programs. As the memory 107, for example, a volatile storage device such as a DRAM (Dynamic Random Access Memory) and an SRAM (Static Random Access Memory) is used.

通信Ｉ／Ｆ１０８は、ネットワーク１６に接続し、ネットワーク１６を介して他のＩＷＢ１００との間で各種データの送受信を行うためのインタフェースである。通信Ｉ／Ｆ１０８としては、例えば、１０Ｂａｓｅ−Ｔ，１００Ｂａｓｅ−ＴＸ，１０００Ｂａｓｅ−Ｔ等に対応した有線ＬＡＮインタフェース、ＩＥＥＥ８０２．１１ａ／ｂ／ｇ／ｎ等に対応した無線ＬＡＮインタフェース等、を用いることができる。 The communication I / F 108 is an interface for connecting to the network 16 and transmitting and receiving various data to and from another IWB 100 via the network 16. As the communication I / F 108, for example, a wired LAN interface corresponding to 10Base-T, 100Base-TX, 1000Base-T, a wireless LAN interface corresponding to IEEE802.11a / b / g / n, or the like may be used. it can.

操作部１０９は、各種入力を行うためにユーザによって操作される。操作部１０９としては、例えば、キーボード、マウス、スイッチ等が用いられる。 The operation unit 109 is operated by a user to perform various inputs. As the operation unit 109, for example, a keyboard, a mouse, a switch, and the like are used.

録画装置１１０は、ビデオ会議における映像データおよび音声データを、メモリ１０７に録画記録する。また、録画装置１１０は、メモリ１０７に録画記録された映像データおよび音声データを再生する。 The recording device 110 records and records video data and audio data in a video conference in the memory 107. The recording device 110 reproduces video data and audio data recorded in the memory 107.

（ＩＷＢ１００の機能構成）
図４は、本発明の一実施形態に係るＩＷＢ１００の機能構成を示す図である。図４に示すように、ＩＷＢ１００は、主制御部１２０、映像取得部１２２、映像処理部１５０、符号化部１２８、送信部１３０、受信部１３２、復号化部１３４、表示制御部１３６、音声取得部１３８、音声処理部１４０、および、音声出力部１４２を備えている。 (Functional configuration of IWB100)
FIG. 4 is a diagram illustrating a functional configuration of the IWB 100 according to the embodiment of the present invention. As shown in FIG. 4, the IWB 100 includes a main control unit 120, a video acquisition unit 122, a video processing unit 150, an encoding unit 128, a transmission unit 130, a reception unit 132, a decoding unit 134, a display control unit 136, an audio acquisition A unit 138, a sound processing unit 140, and a sound output unit 142 are provided.

映像取得部１２２は、カメラ１０１から取得された映像データ（ＹＵＶデータ）を取得する。映像取得部１２２によって取得される映像データは、複数のフレーム画像が組み合わされて構成されたものである。 The video obtaining unit 122 obtains video data (YUV data) obtained from the camera 101. The video data obtained by the video obtaining unit 122 is configured by combining a plurality of frame images.

映像処理部１５０は、映像取得部１２２によって取得された映像データに対する映像処理を行う。映像処理部１５０は、ブロック化部１５１、映像解析部１５２、画質決定部１５３、特定領域検出部１５４、および画質調整部１５５を備える。 The video processing unit 150 performs video processing on the video data acquired by the video acquisition unit 122. The video processing unit 150 includes a blocking unit 151, a video analysis unit 152, an image quality determination unit 153, a specific area detection unit 154, and an image quality adjustment unit 155.

ブロック化部１５１は、フレーム画像を複数のブロックに分割する。例えば、図７および図８に示す例では、ブロック化部１５１は、一のフレーム画像を４８ブロック（８×６ブロック）に分割する。但し、この例では、説明をわかり易くするために比較的少ないブロック数を用いており、実際には、例えば、フレーム画像の解像度が６４０×３６０画素（ＶＧＡ）である場合において、１ブロックを１６×１６画素とする場合、フレーム画像は４０×２３ブロックに分割されることとなる。また、例えば、フレーム画像の解像度が１９２０×１０８０画素（ＦｕｌｌＨＤ）である場合において、１ブロックを１６×１６画素とする場合、フレーム画像は１２０×６８ブロックに分割されることとなる。 The blocking unit 151 divides the frame image into a plurality of blocks. For example, in the examples shown in FIGS. 7 and 8, the blocking unit 151 divides one frame image into 48 blocks (8 × 6 blocks). However, in this example, a relatively small number of blocks is used to make the description easy to understand. In practice, for example, when the resolution of a frame image is 640 × 360 pixels (VGA), one block is converted to 16 × In the case of 16 pixels, the frame image is divided into 40 × 23 blocks. Further, for example, when the resolution of a frame image is 1920 × 1080 pixels (Full HD) and one block is 16 × 16 pixels, the frame image is divided into 120 × 68 blocks.

映像解析部１５２は、複数のブロックの各々について高周波成分を解析する。「高周波成分を解析する」とは、高周波成分の量を数値化することである。高周波成分とは、隣接画素の濃淡差の大きさを表すものであり、すなわち、フレーム画像において、隣接画素の濃淡差が少ない領域は、高周波成分が少ない領域であり、隣接画素の濃淡差が多い領域は、高周波成分が多い領域である。高周波成分の解析方法としては、公知の如何なる方法を用いてもよいが、例えば、ＦＦＴ（Fast Fourier Transform：高速フーリエ変換）、ＪＰＥＧ（Joint Photographic Experts Group）圧縮で用いられるＤＣＴ（Discrete Cosine Transform：離散コサイン変換）等を用いることができる。 The video analysis unit 152 analyzes a high-frequency component for each of the plurality of blocks. “Analyzing high-frequency components” refers to digitizing the amount of high-frequency components. The high-frequency component indicates the magnitude of the gray level difference between adjacent pixels.In other words, in a frame image, a region with a small gray level difference between adjacent pixels is a region with a small high-frequency component and a large gray level difference between adjacent pixels. The region is a region having many high frequency components. As a method of analyzing a high-frequency component, any known method may be used. Cosine transform) or the like can be used.

画質決定部１５３は、複数のブロックの各々について、高周波成分の解析結果に応じて画質を決定する。具体的には、画質決定部１５３は、映像解析部１５２による高周波成分の解析結果に基づいて、複数のブロックの各々に対して画質を設定することにより、画質レベルマップを作成する。この際、画質決定部１５３は、映像解析部１５２による高周波成分の解析結果に基づいて、高周波成分がより多い領域ほど、より高画質となるように、各ブロックに対して画質を設定する。例えば、画質決定部１５３は、各ブロックに対して、４段階の画質「Ａ（最高画質）」，「Ｂ（高画質）」，「Ｃ（中画質）」，「Ｄ（低画質）」のいずれかを設定する。 The image quality determining unit 153 determines the image quality of each of the plurality of blocks according to the analysis result of the high frequency component. Specifically, the image quality determination unit 153 creates an image quality level map by setting the image quality for each of the plurality of blocks based on the analysis result of the high frequency component by the video analysis unit 152. At this time, the image quality determination unit 153 sets the image quality for each block based on the analysis result of the high frequency component by the video analysis unit 152 so that the higher the area of the high frequency component, the higher the image quality. For example, for each block, the image quality determination unit 153 determines the four levels of image quality “A (highest image quality)”, “B (high image quality)”, “C (medium image quality)”, and “D (low image quality)”. Set one of them.

なお、画質決定部１５３は、上記のとおり一旦生成された画質レベルマップにおける画質設定を変更することができる。例えば、画質決定部１５３は、特定領域検出部１５４によって顔領域が検出された場合、顔領域の画質が当該顔領域以外の他の領域の画質よりも高画質となるように、画質レベルマップにおける画質設定を変更することができる。この際、画質決定部１５３は、顔領域の周辺の領域ではない領域の画質を、最低画質（例えば、画質「Ｄ」）に変更することで当該領域のデータ量を軽減することができる。 The image quality determination unit 153 can change the image quality setting in the image quality level map once generated as described above. For example, when the face area is detected by the specific area detection unit 154, the image quality determination unit 153 determines the image quality of the face area to be higher than the image quality of other areas other than the face area. Image quality settings can be changed. At this time, the image quality determining unit 153 can reduce the data amount of the area by changing the image quality of the area other than the area around the face area to the minimum image quality (for example, the image quality “D”).

また、例えば、画質決定部１５３は、ネットワーク帯域（「送信の際に使用する通信資源」の一例）が不足していることを判定するための所定の第１の条件を満たす場合（例えば、通信速度が所定の第１の閾値以下である場合）、他の領域の画質を最低画質（例えば、画質「Ｄ」）に変更することで当該領域のデータ量を軽減することができる。また、例えば、画質決定部１５３は、ネットワーク帯域に余裕があることを判定するための所定の第２の条件を満たす場合（例えば、通信速度が所定の第２の閾値以上である場合。但し、第２の閾値≧第１の閾値とする）、顔領域の画質を最高画質（例えば、画質「Ａ」）に変更することで当該顔領域を高画質化することができる。 Further, for example, the image quality determining unit 153 satisfies a predetermined first condition for determining that the network bandwidth (an example of “communication resources used for transmission”) is insufficient (for example, communication If the speed is equal to or lower than the predetermined first threshold value), the data amount of the other area can be reduced by changing the image quality of the other area to the minimum image quality (for example, the image quality “D”). Further, for example, the image quality determination unit 153 satisfies a predetermined second condition for determining that there is a margin in the network bandwidth (for example, a case where the communication speed is equal to or higher than a predetermined second threshold; however, By changing the image quality of the face area to the highest image quality (for example, the image quality "A"), the image quality of the face area can be improved.

また、例えば、画質決定部１５３は、画質レベルマップを作成した際に、話者領域の周辺の領域ではない領域の画質を「Ｄ（低画質）」に変更した場合、当該話者領域の周辺の領域ではない領域の画質を、最初に作成された画質レベルマップに設定されている画質に戻すことができる。 Further, for example, when the image quality determination unit 153 changes the image quality of an area other than the area around the speaker area to “D (low image quality)” when creating the image quality level map, Can be returned to the image quality set in the image quality level map created first.

特定領域検出部１５４は、映像取得部１２２によって取得された映像データ（フレーム画像）における特定領域を検出する。具体的には、特定領域検出部１５４は、映像取得部１２２によって取得された映像データ（フレーム画像）において、人物の顔が検出される領域である顔領域を、特定領域として検出する。顔領域の検出方法としては、公知の如何なる方法を用いてもよいが、例えば、目、鼻、口などの特徴点を抽出して顔領域を検出する方法等が挙げられる。また、特定領域検出部１５４は、公知の何れかの検出方法を用いて、会話を行っている人物の顔が映し出されている顔領域を、話者領域として特定する。 The specific area detection unit 154 detects a specific area in the video data (frame image) acquired by the video acquisition unit 122. Specifically, the specific area detection unit 154 detects, as the specific area, a face area in the video data (frame image) acquired by the video acquisition unit 122, where the face of a person is detected. As a method for detecting the face region, any known method may be used. For example, a method for extracting a feature point such as an eye, a nose, a mouth, and the like to detect the face region may be used. In addition, the specific area detection unit 154 specifies a face area in which the face of a person who is having a conversation is projected as a speaker area by using any known detection method.

画質調整部１５５は、最終的な画質レベルマップにしたがって、一のフレーム画像に対し、画素毎に画質調整を行う。例えば、画質レベルマップにおいて、各ブロックに対して「Ａ」，「Ｂ」，「Ｃ」，「Ｄ」のいずれかの画質が設定された場合、画質調整部１５５は、画質の高低関係が「Ａ」＞「Ｂ」＞「Ｃ」＞「Ｄ」となるように、各画素の画質を調整する。画質の調整方法としては公知の如何なる方法を用いてもよいが、例えば、画質調整部１５５は、画質「Ａ」が設定されたブロックに対しては、元の画質を維持する。また、例えば、画質調整部１５５は、画質「Ｂ」，「Ｃ」，「Ｄ」が設定されたブロックに対しては、公知の何らかの画質調整方法（例えば、解像度調整、コントラスト調整、ローパスフィルタ、フレームレート調整等）を用いて、元の画質（画質「Ａ」）から画質を低下させる。一例として、画質「Ａ」が設定されたブロックに対しては、ローパスフィルタを適用せず、画質「Ｂ」が設定されたブロックに対しては、３×３のローパスフィルタを適用し、画質「Ｃ」が設定されたブロックに対しては、５×５のローパスフィルタを適用し、画質「Ｄ」が設定されたブロックに対しては、７×７のローパスフィルタを適用する。これにより、画質レベルに応じて、フレーム画像の情報量を適切に削減することができる。 The image quality adjustment unit 155 performs image quality adjustment for each pixel on one frame image according to the final image quality level map. For example, when one of the image qualities “A”, “B”, “C”, and “D” is set for each block in the image quality level map, the image quality adjustment unit 155 determines whether the image quality is high or low. The image quality of each pixel is adjusted so that “A”> “B”> “C”> “D”. As a method for adjusting the image quality, any known method may be used. For example, the image quality adjustment unit 155 maintains the original image quality for a block for which the image quality “A” is set. Further, for example, the image quality adjustment unit 155 may perform any known image quality adjustment method (for example, resolution adjustment, contrast adjustment, low-pass filter, or the like) for a block in which image qualities “B”, “C”, and “D” are set. The image quality is reduced from the original image quality (image quality “A”) using frame rate adjustment or the like. As an example, a low-pass filter is not applied to a block to which the image quality “A” is set, and a 3 × 3 low-pass filter is applied to a block to which the image quality “B” is set. A 5 × 5 low-pass filter is applied to blocks to which “C” is set, and a 7 × 7 low-pass filter is applied to blocks to which image quality “D” is set. Thereby, the information amount of the frame image can be appropriately reduced according to the image quality level.

符号化部１２８は、映像処理部１５０による映像処理後の映像データを符号化する。符号化部１２８によって使用される符号化方式としては、例えば、Ｈ.２６４/ＡＶＣ、Ｈ.２６４/ＳＶＣ、Ｈ.２６５等が挙げられる。 The encoding unit 128 encodes the video data after the video processing by the video processing unit 150. Examples of the encoding scheme used by the encoding unit 128 include H.264 / AVC, H.264 / SVC, and H.265.

送信部１３０は、符号化部１２８によって符号化された映像データを、マイク１０３から取得された音声データ（音声処理部１４０による音声処理後の音声データ）とともに、ネットワーク１６を介して、他のＩＷＢ１００へ送信する。 The transmission unit 130 transmits the video data encoded by the encoding unit 128 to the other IWB 100 via the network 16 together with the audio data obtained from the microphone 103 (the audio data after the audio processing by the audio processing unit 140). Send to

受信部１３２は、他のＩＷＢ１００から送信された映像データおよび音声データを、ネットワーク１６を介して受信する。復号化部１３４は、所定の復号化方式により、受信部１３２によって受信された映像データを復号化する。復号化部１３４が用いる復号化方式は、符号化部１２８による符号化方式に対応する復号化方式（例えば、Ｈ.２６４/ＡＶＣ、Ｈ.２６４/ＳＶＣ、Ｈ.２６５等）である。 The receiving unit 132 receives the video data and the audio data transmitted from another IWB 100 via the network 16. The decoding unit 134 decodes the video data received by the receiving unit 132 according to a predetermined decoding method. The decoding method used by the decoding unit 134 is a decoding method (for example, H.264 / AVC, H.264 / SVC, H.265, etc.) corresponding to the coding method by the coding unit 128.

表示制御部１３６は、復号化部１３４によって復号化された映像データを再生することにより、当該映像データに基づく映像（すなわち、他拠点の映像）を、タッチパネル・ディスプレイ１０２に表示させる。また、表示制御部１３６は、カメラ１０１から取得された映像データを再生することにより、当該映像データに基づく映像（すなわち、自拠点の映像）を、タッチパネル・ディスプレイ１０２に表示させる。なお、表示制御部１３６は、ＩＷＢ１００に設定されているレイアウト設定情報に基づいて、複数種類の映像を、複数の表示領域を有する表示レイアウトで表示することができる。例えば、表示制御部１３６は、自拠点の映像と他拠点の映像とを同時に表示することができる。 The display control unit 136 causes the touch panel display 102 to display an image based on the image data (that is, an image of another site) by reproducing the image data decoded by the decoding unit 134. In addition, the display control unit 136 causes the touch panel display 102 to display an image based on the image data (that is, an image of the own base) by reproducing the image data acquired from the camera 101. Note that the display control unit 136 can display a plurality of types of videos in a display layout having a plurality of display areas based on the layout setting information set in the IWB 100. For example, the display control unit 136 can simultaneously display an image of the own site and an image of another site.

主制御部１２０は、ＩＷＢ１００の全体の制御を行う。例えば、主制御部１２０は、各モジュールの初期設定、カメラ１０１の撮影モードの設定、他のＩＷＢ１００に対する通信開始要求、ビデオ会議の開始、ビデオ会議の終了、録画装置１１０による録画等の制御を行う。 The main control unit 120 controls the entire IWB 100. For example, the main control unit 120 controls the initial setting of each module, the setting of the shooting mode of the camera 101, a request to start communication with another IWB 100, the start of a video conference, the end of a video conference, the recording by the recording device 110, and the like. .

音声取得部１３８は、マイク１０３から音声データを取得する。音声処理部１４０は、音声取得部１３８によって取得された音声データ、および、受信部１３２によって受信された音声データに対して、各種音声処理を行う。例えば、音声処理部１４０は、受信部１３２によって受信された音声データに対し、コーデック処理、ノイズキャンセル（ＮＣ）処理等、一般的な音声処理を行う。また、例えば、音声処理部１４０は、音声取得部１３８によって取得された音声データに対し、コーデック処理、エコーキャンセル（ＥＣ）処理等、一般的な音声処理を行う。 The audio acquisition unit 138 acquires audio data from the microphone 103. The audio processing unit 140 performs various types of audio processing on the audio data acquired by the audio acquisition unit 138 and the audio data received by the reception unit 132. For example, the audio processing unit 140 performs general audio processing such as codec processing and noise cancellation (NC) processing on the audio data received by the receiving unit 132. Further, for example, the audio processing unit 140 performs general audio processing such as codec processing and echo cancellation (EC) processing on the audio data acquired by the audio acquisition unit 138.

音声出力部１４２は、受信部１３２によって受信された音声データ（音声処理部１４０による音声処理後の音声データ）をアナログ信号に変換して再生することにより、当該音声データに基づく音声（すなわち、他拠点の音声）を、スピーカ１０４から出力させる。 The audio output unit 142 converts the audio data received by the receiving unit 132 (the audio data after the audio processing performed by the audio processing unit 140) into an analog signal and reproduces the analog signal. (Voice of the base) is output from the speaker 104.

上記したＩＷＢ１００の各機能は、例えば、ＩＷＢ１００において、補助記憶装置１０６に記憶されたプログラムを、システムコントロール１０５が有するＣＰＵが実行することにより実現される。このプログラムは、予めＩＷＢ１００に導入された状態で提供されてもよく、外部から提供されてＩＷＢ１００に導入されるようにしてもよい。後者の場合、このプログラムは、外部記憶媒体（例えば、ＵＳＢメモリ、メモリカード、ＣＤ−ＲＯＭ等）によって提供されてもよく、ネットワーク（例えば、インターネット等）上のサーバからダウンロードすることによって提供されるようにしてもよい。なお、上記したＩＷＢ１００の各機能のうち、一部の機能（例えば、映像処理部１５０の一部、または全機能、符号化部１２８、復号化部１３４等）については、システムコントロール１０５とは別に設けられた専用の処理回路によって実現されてもよい。 Each function of the IWB 100 is realized, for example, by executing a program stored in the auxiliary storage device 106 by the CPU of the system control 105 in the IWB 100. This program may be provided in a state where it has been introduced into the IWB 100 in advance, or may be provided from outside and introduced into the IWB 100. In the latter case, this program may be provided by an external storage medium (for example, a USB memory, a memory card, a CD-ROM, or the like) or provided by downloading from a server on a network (for example, the Internet). You may do so. Among the functions of the above-described IWB 100, some of the functions (for example, part or all of the video processing unit 150, the encoding unit 128, the decoding unit 134, and the like) are separate from the system control 105. It may be realized by a dedicated processing circuit provided.

（ＩＷＢ１００によるビデオ会議実行制御処理の手順）
図５は、本発明の一実施形態に係るＩＷＢ１００によるビデオ会議実行制御処理の手順を示すフローチャートである。 (Procedure of video conference execution control processing by IWB 100)
FIG. 5 is a flowchart illustrating a procedure of a video conference execution control process by the IWB 100 according to an embodiment of the present invention.

まず、主制御部１２０が、各モジュールの初期設定を行い、カメラ１０１による撮像が可能な状態とする（ステップＳ５０１）。次に、主制御部１２０が、カメラ１０１の撮影モードの設定を行う（ステップＳ５０２）。主制御部１２０による撮影モードの設定は、各種センサの出力に基づいて自動的に行われるものと、オペレータの操作入力により手動的に行われるものとを含み得る。そして、主制御部１２０が、他拠点のＩＷＢ１００に対して、通信開始を要求し、ビデオ会議を開始する（ステップＳ５０３）。なお、主制御部１２０は、他のＩＷＢ１００からの通信開始要求を受けたことをもって、ビデオ会議を開始してもよい。また、主制御部１２０は、ビデオ会議が開始されると同時に、録画装置１１０による映像および音声の録画を開始してもよい。 First, the main control unit 120 performs an initial setting of each module, and sets a state where an image can be captured by the camera 101 (step S501). Next, the main control unit 120 sets a shooting mode of the camera 101 (step S502). The setting of the shooting mode by the main control unit 120 may include one that is automatically performed based on the output of various sensors and one that is manually performed by an operation input of an operator. Then, the main control unit 120 requests the IWB 100 at another site to start communication, and starts a video conference (step S503). The main control unit 120 may start the video conference in response to receiving a communication start request from another IWB 100. Further, the main control unit 120 may start recording video and audio by the recording device 110 at the same time when the video conference is started.

ビデオ会議が開始されると、一方では、映像取得部１２２が、カメラ１０１から映像データ（ＹＵＶデータ）を取得するとともに、音声取得部１３８が、マイク１０３から音声データを取得する（ステップＳ５０４）。そして、映像処理部１５０が、ステップＳ５０４で取得された映像データに対する映像処理（図６で詳細に説明される）を行うとともに、音声処理部１４０が、ステップＳ５０４で取得された音声データに対する各種音声処理を行う（ステップＳ５０５）。さらに、符号化部１２８が、ステップＳ５０５による映像処理後の映像データを符号化する（ステップＳ５０６）。そして、送信部１３０が、ステップＳ５０６で符号化された映像データを、ステップＳ５０４で取得された音声データとともに、ネットワーク１６を介して、他のＩＷＢ１００へ送信する（ステップＳ５０７）。 When the video conference is started, on the other hand, the video acquisition unit 122 acquires video data (YUV data) from the camera 101, and the audio acquisition unit 138 acquires audio data from the microphone 103 (step S504). Then, the video processing unit 150 performs video processing (described in detail in FIG. 6) on the video data acquired in step S504, and the audio processing unit 140 performs various types of audio processing on the audio data acquired in step S504. The processing is performed (step S505). Further, the encoding unit 128 encodes the video data after the video processing in step S505 (step S506). Then, the transmitting unit 130 transmits the video data encoded in step S506 to another IWB 100 via the network 16 together with the audio data acquired in step S504 (step S507).

ステップＳ５０４〜Ｓ５０７と並行して、受信部１３２が、他のＩＷＢ１００から送信された映像データおよび音声データを、ネットワーク１６を介して受信する（ステップＳ５０８）。そして、復号化部１３４が、ステップＳ５０８で受信された映像データを復号化する。また、音声処理部１４０が、ステップＳ５０８で受信された音声データに対して、各種音声処理を行う（ステップＳ５１０）。さらに、表示制御部１３６が、ステップＳ５０９で復号化された映像データに基づく映像を、タッチパネル・ディスプレイ１０２に表示させるとともに、音声出力部１４２が、ステップＳ５１０による音声処理後の音声データに基づく音声を、スピーカ１０４から出力させる（ステップＳ５１１）。なお、ステップＳ５１１では、さらに、表示制御部１３６が、ステップＳ５０４で取得された映像データに基づく映像（すなわち、自拠点の映像）を、タッチパネル・ディスプレイ１０２に表示させることもできる。 In parallel with steps S504 to S507, the receiving unit 132 receives video data and audio data transmitted from another IWB 100 via the network 16 (step S508). Then, the decoding unit 134 decodes the video data received in step S508. Further, the audio processing unit 140 performs various types of audio processing on the audio data received in step S508 (step S510). Further, the display control unit 136 causes the touch panel display 102 to display an image based on the image data decoded in step S509, and the audio output unit 142 outputs an audio based on the audio data after the audio processing in step S510. Is output from the speaker 104 (step S511). In step S511, the display control unit 136 can further cause the touch panel display 102 to display an image based on the image data acquired in step S504 (that is, an image of the own site).

ステップＳ５０４〜Ｓ５０７の送信処理に続いて、主制御部１２０が、ビデオ会議が終了したか否かを判断する（ステップＳ５１２）。また、ステップＳ５０８〜Ｓ５１１の受信処理に続いて、主制御部１２０が、ビデオ会議が終了したか否かを判断する（ステップＳ５１３）。ビデオ会議の終了は、例えば、ビデオ会議に参加しているいずれかのＩＷＢ１００において、ユーザによる所定の終了操作がなされた場合に確定する。ステップＳ５１２において、ビデオ会議が終了していないと判断された場合（ステップＳ５１２：Ｎｏ）、ＩＷＢ１００は、ステップＳ５０４へ処理を戻す。すなわち、ステップＳ５０４〜Ｓ５０７の送信処理が繰り返し実行される。また、ステップＳ５１３において、ビデオ会議が終了していないと判断された場合（ステップＳ５１３：Ｎｏ）、ＩＷＢ１００は、ステップＳ５０８へ処理を戻す。すなわち、ステップＳ５０８〜Ｓ５１１の受信処理が繰り返し実行される。一方、ステップＳ５１２またはステップＳ５１３において、ビデオ会議が終了したと判断された場合（ステップＳ５１２：Ｙｅｓ、または、ステップＳ５１３：Ｙｅｓ）、ＩＷＢ１００は、図５に示す一連の処理を終了する。 Subsequent to the transmission processing in steps S504 to S507, the main control unit 120 determines whether or not the video conference has ended (step S512). Further, following the reception processing in steps S508 to S511, the main control unit 120 determines whether or not the video conference has ended (step S513). The end of the video conference is determined, for example, when a user performs a predetermined end operation in any of the IWBs 100 participating in the video conference. If it is determined in step S512 that the video conference has not ended (step S512: No), the IWB 100 returns the process to step S504. That is, the transmission processing of steps S504 to S507 is repeatedly executed. If it is determined in step S513 that the video conference has not ended (step S513: No), the IWB 100 returns the process to step S508. That is, the reception processing of steps S508 to S511 is repeatedly executed. On the other hand, when it is determined in step S512 or step S513 that the video conference has ended (step S512: Yes or step S513: Yes), the IWB 100 ends the series of processes illustrated in FIG.

（映像処理部１５０による映像処理の手順）
図６は、本発明の一実施形態に係る映像処理部１５０による映像処理の手順を示すフローチャートである。図６は、図５のフローチャートにおけるステップＳ５０５の映像処理の手順を詳細に表すものである。 (Procedure of video processing by video processing unit 150)
FIG. 6 is a flowchart illustrating a procedure of video processing by the video processing unit 150 according to an embodiment of the present invention. FIG. 6 shows in detail the procedure of the video processing of step S505 in the flowchart of FIG.

まず、ブロック化部１５１が、映像データを構成する複数のフレーム画像のうち、古いフレーム画像から順に、一のフレーム画像を選択する（ステップＳ６０１）。そして、ブロック化部１５１が、ステップＳ６０１で選択された一のフレーム画像を、複数のブロックに分割する（ステップＳ６０２）。 First, the blocking unit 151 selects one frame image from a plurality of frame images constituting video data in order from the oldest frame image (step S601). Then, the blocking unit 151 divides the one frame image selected in step S601 into a plurality of blocks (step S602).

次に、映像解析部１５２が、ステップＳ６０１で選択された一のフレーム画像に対し、ステップＳ６０２で分割されたブロック毎に、高周波成分を解析する（ステップＳ６０３）。 Next, the video analysis unit 152 analyzes the high-frequency components of the one frame image selected in step S601 for each block divided in step S602 (step S603).

そして、画質決定部１５３が、ステップＳ６０３による高周波成分の解析結果に基づいて、ステップＳ６０１で選択された一のフレーム画像に対し、ステップＳ６０２で分割されたブロック毎に画質を設定することにより、画質レベルマップを作成する（ステップＳ６０４）。 Then, the image quality determination unit 153 sets the image quality for each of the blocks divided in step S602 for the one frame image selected in step S601 based on the analysis result of the high frequency component in step S603. A level map is created (Step S604).

次に、特定領域検出部１５４が、ステップＳ６０１で選択された一のフレーム画像に対し、人物の顔が映し出されている領域である顔領域を検出する（ステップＳ６０５）。さらに、特定領域検出部１５４が、ステップＳ６０５で検出された顔領域の中から、会話を行っている人物の顔が映し出されている領域である話者領域を検出する（ステップＳ６０６）。 Next, the specific area detection unit 154 detects a face area, which is an area where a person's face is projected, in one frame image selected in step S601 (step S605). Further, the specific area detection unit 154 detects a speaker area, which is an area in which the face of the person who is talking is projected, from the face areas detected in step S605 (step S606).

そして、画質決定部１５３が、ステップＳ６０５による顔領域の検出結果と、ステップＳ６０６による話者領域の検出結果とに基づいて、ステップＳ６０４で作成された画質レベルマップを変更する（ステップＳ６０７）。例えば、画質決定部１５３は、ステップＳ６０４で作成された画質レベルマップに対し、話者領域である顔領域の画質を「Ａ（最高画質）」に変更し、話者領域ではない顔領域の画質を「Ｂ（高画質）」に変更する。加えて、画質決定部１５３は、ステップＳ６０４で作成された画質レベルマップに対し、話者領域の周辺の領域の画質を変更することなく、話者領域の周辺の領域ではない領域の画質を「Ｄ（低画質）」に変更する。 Then, the image quality determining unit 153 changes the image quality level map created in step S604 based on the face area detection result in step S605 and the speaker area detection result in step S606 (step S607). For example, the image quality determining unit 153 changes the image quality of the face area, which is the speaker area, to “A (highest image quality)” in the image quality level map created in step S604, and changes the image quality of the face area that is not the speaker area. To “B (high image quality)”. In addition, the image quality determination unit 153 changes the image quality of the area that is not the area around the speaker area to the image quality level map created in step S604 without changing the image quality of the area around the speaker area. D (low image quality) ".

次に、画質決定部１５３が、ビデオ会議に使用中のネットワーク帯域に余裕があるか否かを判断する（ステップＳ６０８）。ステップＳ６０８において、ネットワーク帯域に余裕があると判断された場合（ステップＳ６０８：Ｙｅｓ）、画質決定部１５３が、一部の領域の画質を高めるように、画質レベルマップを変更する（ステップＳ６０９）。例えば、画質決定部１５３は、話者領域ではない顔領域の画質を「Ｂ（高画質）」から「Ａ（最高画質）」に変更し、さらに、話者領域の周辺の領域ではない領域の画質を、ステップＳ６０４で最初に作成された画質レベルマップに設定されている画質に戻す。その後、映像処理部１５０が、ステップＳ６１２へ処理を進める。 Next, the image quality determining unit 153 determines whether or not the network bandwidth being used for the video conference has room (step S608). If it is determined in step S608 that there is room in the network bandwidth (step S608: Yes), the image quality determination unit 153 changes the image quality level map so as to enhance the image quality of some areas (step S609). For example, the image quality determining unit 153 changes the image quality of the face area that is not the speaker area from “B (high image quality)” to “A (highest image quality)”, and further, changes the image quality of the area that is not the area surrounding the speaker area The image quality is returned to the image quality set in the image quality level map created first in step S604. Thereafter, the video processing unit 150 proceeds with the process to step S612.

一方、ステップＳ６０８において、ビデオ会議に使用中のネットワーク帯域に余裕がないと判断された場合（ステップＳ６０８：Ｎｏ）、画質決定部１５３が、ネットワーク帯域が不足しているか否かを判断する（ステップＳ６１０）。ステップＳ６１０において、ネットワーク帯域が不足していると判断された場合（ステップＳ６１０：Ｙｅｓ）、画質決定部１５３は、顔領域以外の他の領域の画質を「Ｄ（低画質）」に変更する（ステップＳ６１１）。そして、映像処理部１５０が、ステップＳ６１２へ処理を進める。 On the other hand, when it is determined in step S608 that there is no room in the network bandwidth used for the video conference (step S608: No), the image quality determination unit 153 determines whether the network bandwidth is insufficient (step S608). S610). When it is determined in step S610 that the network bandwidth is insufficient (step S610: Yes), the image quality determination unit 153 changes the image quality of the area other than the face area to “D (low image quality)” ( Step S611). Then, the video processing unit 150 proceeds with the process to step S612.

一方、ステップＳ６１０において、ネットワーク帯域が不足していないと判断された場合（ステップＳ６１０：Ｎｏ）、映像処理部１５０が、ステップＳ６１２へ処理を進める。 On the other hand, when it is determined in step S610 that the network bandwidth is not insufficient (step S610: No), the video processing unit 150 proceeds with the process to step S612.

ステップＳ６１２では、画質調整部１５５が、最終的な画質レベルマップにしたがって、ステップＳ６０１で選択された一のフレーム画像に対し、画素毎に画質調整を行う。 In step S612, the image quality adjustment unit 155 performs image quality adjustment for each pixel on the one frame image selected in step S601 according to the final image quality level map.

その後、映像処理部１５０は、映像データを構成する全てのフレーム画像に対して、上記の映像処理を行ったか否かを判断する（ステップＳ６１３）。ステップＳ６１３において、全てのフレーム画像に対して映像処理を行っていないと判断された場合（ステップＳ６１３：Ｎｏ）、映像処理部１５０は、ステップＳ６０１へ処理を戻す。一方、ステップＳ６１３において、全てのフレーム画像に対して映像処理を行ったと判断された場合（ステップＳ６１３：Ｙｅｓ）、映像処理部１５０は、図６に示す一連の処理を終了する。 Thereafter, the video processing unit 150 determines whether or not the above-described video processing has been performed on all frame images constituting the video data (step S613). If it is determined in step S613 that video processing has not been performed on all frame images (step S613: No), the video processing unit 150 returns the process to step S601. On the other hand, when it is determined in step S613 that video processing has been performed on all frame images (step S613: Yes), the video processing unit 150 ends the series of processes illustrated in FIG.

（映像処理部１５０による映像処理の一具体例）
図７および図８は、本発明の一実施形態に係る映像処理部１５０による映像処理の一具体例を示す図である。図７に示すフレーム画像７００は、映像処理部１５０による映像処理の対象とされる一のフレーム画像の一例を表したものである。 (One Specific Example of Image Processing by Image Processing Unit 150)
7 and 8 are diagrams illustrating a specific example of video processing by the video processing unit 150 according to an embodiment of the present invention. A frame image 700 illustrated in FIG. 7 illustrates an example of one frame image to be subjected to video processing by the video processing unit 150.

まず、図７（ａ）に示すように、フレーム画像７００は、ブロック化部１５１により、複数のブロックに分割される。図７（ａ）に示す例では、フレーム画像７００は、４８ブロック（８×６ブロック）に分割されている。 First, as shown in FIG. 7A, the frame image 700 is divided by the blocking unit 151 into a plurality of blocks. In the example shown in FIG. 7A, the frame image 700 is divided into 48 blocks (8 × 6 blocks).

次に、フレーム画像７００は、映像解析部１５２により、複数のブロックの各々について高周波成分が解析される。図７（ａ）に示す例では、各ブロックに対し、高周波成分の解析結果である高周波成分のレベルとして「０」〜「３」のいずれかが示されている。ここでは、高周波成分のレベルの大小関係は、「３」＞「２」＞「１」＞「０」である。 Next, high-frequency components of the frame image 700 are analyzed by the video analysis unit 152 for each of the plurality of blocks. In the example illustrated in FIG. 7A, for each block, any one of “0” to “3” is indicated as the level of the high-frequency component that is the analysis result of the high-frequency component. Here, the magnitude relation between the levels of the high frequency components is “3”> “2”> “1”> “0”.

次に、画質決定部１５３により、フレーム画像７００に対応する画質レベルマップが作成される。図７（ｂ）に示す画質レベルマップ８００は、図７（ａ）に示す高周波成分の解析結果に基づいて、画質決定部１５３によって作成されたものである。図７（ｂ）に示す例では、画質レベルマップ８００では、各ブロックに対し、画質として「Ａ（最高画質）」，「Ｂ（高画質）」，「Ｃ（中画質）」，「Ｄ（低画質）」のいずれかが設定されている。画質「Ａ」，「Ｂ」，「Ｃ」，「Ｄ」は、それぞれ、高周波成分のレベル「３」，「２」，「１」，「０」に対応する。 Next, the image quality determination unit 153 creates an image quality level map corresponding to the frame image 700. The image quality level map 800 shown in FIG. 7B is created by the image quality determining unit 153 based on the analysis result of the high frequency component shown in FIG. In the example shown in FIG. 7B, in the image quality level map 800, the image quality for each block is “A (highest image quality)”, “B (high image quality)”, “C (medium image quality)”, “D ( Low image quality) "is set. The image qualities “A”, “B”, “C”, and “D” correspond to the high-frequency component levels “3”, “2”, “1”, and “0”, respectively.

次に、特定領域検出部１５４により、フレーム画像７００から、人物の顔が映し出されている領域である顔領域が検出される。さらに、特定領域検出部１５４により、フレーム画像７００において検出された顔領域の中から、会話を行っている人物の顔が映し出されている領域である話者領域が検出される。図７（ｃ）に示す例では、顔領域７１０，７１２が検出されている。このうち、顔領域７１０は、話者領域として検出されている。 Next, the specific area detection unit 154 detects, from the frame image 700, a face area that is an area where a person's face is projected. Further, the specific area detection unit 154 detects, from the face areas detected in the frame image 700, a speaker area which is an area in which the face of a person who is talking is projected. In the example shown in FIG. 7C, face regions 710 and 712 are detected. Among them, the face area 710 is detected as a speaker area.

そして、画質決定部１５３により、顔領域７１０，７１２の検出結果に基づいて、画質レベルマップ８００が変更される。例えば、図８（ａ）に示す例では、画質決定部１５３により、図７（ｂ）に示す画質レベルマップ８００に対し、話者領域である顔領域７１０の画質が「Ａ（最高画質）」に変更されており、話者領域ではない顔領域７１２の画質が「Ｂ（高画質）」に変更されている。また、図８（ａ）に示す例では、顔領域７１０，７１２以外の他の領域（以下、「背景領域７２０」と示す）においては、画質決定部１５３により、顔領域７１０の周辺の領域の画質が変更されることなく、顔領域７１０の周辺の領域ではない領域の画質が「Ｄ（低画質）」に変更されている。 Then, the image quality level map 800 is changed by the image quality determining unit 153 based on the detection results of the face regions 710 and 712. For example, in the example illustrated in FIG. 8A, the image quality determining unit 153 determines that the image quality of the face area 710 as the speaker area is “A (highest image quality)” in the image quality level map 800 illustrated in FIG. And the image quality of the face area 712 which is not the speaker area has been changed to “B (high image quality)”. Further, in the example shown in FIG. 8A, in an area other than the face areas 710 and 712 (hereinafter, referred to as “background area 720”), the image quality determination unit 153 determines the area around the face area 710. The image quality of an area other than the area around the face area 710 is changed to “D (low image quality)” without changing the image quality.

さらに、ビデオ会議に使用中のネットワーク帯域に余裕があると判断された場合、画質決定部１５３により、一部の領域の画質を高めるように、画質レベルマップ８００が変更される。 Further, when it is determined that the network bandwidth used for the video conference has room, the image quality determination unit 153 changes the image quality level map 800 so as to enhance the image quality of a part of the area.

例えば、図８（ｂ）に示す例では、画質決定部１５３により、画質レベルマップ８００において、顔領域７１２の画質が「Ｂ（高画質）」から「Ａ（最高画質）」に変更されている。 For example, in the example illustrated in FIG. 8B, the image quality of the face area 712 is changed from “B (high image quality)” to “A (highest image quality)” in the image quality level map 800 by the image quality determination unit 153. .

また、図８（ｃ）に示す例では、画質決定部１５３により、画質レベルマップ８００において、背景領域７２０における話者領域の周辺の領域ではない領域の画質が、画質「Ｄ（低画質）」から、図７（ｂ）に示す最初に設定された画質に戻されている。 In the example illustrated in FIG. 8C, the image quality determining unit 153 sets the image quality of an area other than the area surrounding the speaker area in the background area 720 in the image quality level map 800 to the image quality “D (low image quality)”. The image quality is returned to the initially set image quality shown in FIG.

反対に、ビデオ会議に使用中のネットワーク帯域が不足していると判断された場合、図８（ｄ）に示すように、画質決定部１５３により、画質レベルマップ８００において、背景領域７２０の画質が、「Ｄ（低画質）」に変更される。 Conversely, when it is determined that the network bandwidth in use for the video conference is insufficient, the image quality determination unit 153 determines the image quality of the background area 720 in the image quality level map 800 as shown in FIG. , "D (low image quality)".

フレーム画像７００は、画質調整部１５５により、上記のように作成される最終的な画質レベルマップ８００（図７（ｂ）、図８（ａ）〜（ｄ）のいずれかの画質レベルマップ８００）に基づいて、画素毎に画質調整が行われる。 The final image quality level map 800 (the image quality level map 800 shown in FIGS. 7B and 8A to 8D) created by the image quality adjustment unit 155 as described above is used for the frame image 700. , The image quality is adjusted for each pixel.

これにより、フレーム画像７００は、視聴者からの注目度が比較的高い顔領域７１０，７１２においては、比較的高い画質が設定されたものとなり、視聴者からの注目度が比較的低い背景領域７２０においては、比較的低い画質が設定されたものとなる。 As a result, in the frame image 700, relatively high image quality is set in the face regions 710 and 712 where the degree of attention from the viewer is relatively high, and the background area 720 where the degree of attention from the viewer is relatively low. In, a relatively low image quality is set.

但し、フレーム画像７００において、背景領域７２０は、高周波数成分の解析結果に応じて、画質劣化が比較的目立ち易い領域（ブラインド等、高周波数成分が多い領域）については、比較的高い画質が設定され、画質劣化が比較的目立ち難い領域（壁、ディスプレイ等、高周波数成分が少ない領域）については、比較的低い画質が設定される。このため、フレーム画像７００は、背景領域７２０における画質劣化が目立ち難いものとなる。 However, in the frame image 700, in the background area 720, a relatively high image quality is set in an area where the image quality degradation is relatively conspicuous (an area with many high frequency components such as blinds) according to the analysis result of the high frequency component. Therefore, a relatively low image quality is set for an area where image quality deterioration is relatively inconspicuous (an area with a small number of high frequency components such as a wall or a display). For this reason, in the frame image 700, the image quality deterioration in the background area 720 is less noticeable.

さらに、フレーム画像７００において、背景領域７２０は、ブロック単位で段階的に画質が空間方向に変化するものとなる。このため、フレーム画像７００は、背景領域７２０において、比較的高い画質が設定される領域と、比較的低い画質が設定される領域との境界における画質の違いが目立ち難いものとなる。 Further, in the frame image 700, the image quality of the background area 720 changes stepwise in the spatial direction in block units. Therefore, in the frame image 700, in the background area 720, a difference in image quality at a boundary between a region where a relatively high image quality is set and a region where a relatively low image quality is set is inconspicuous.

したがって、本実施形態のＩＷＢ１００によれば、映像データのデータ量を軽減するとともに、低画質領域と高画質領域との境界における画質の違いを目立ち難くすることができる。 Therefore, according to the IWB 100 of the present embodiment, the data amount of the video data can be reduced, and the difference in image quality at the boundary between the low image quality region and the high image quality region can be made less noticeable.

以上、本発明の好ましい実施形態について詳述したが、本発明はこれらの実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形又は変更が可能である。 As described above, the preferred embodiments of the present invention have been described in detail. However, the present invention is not limited to these embodiments, and various modifications or changes may be made within the scope of the present invention described in the appended claims. Changes are possible.

例えば、上記実施形態では、「映像処理装置」および「通信端末」の一例としてＩＷＢ１００（電子黒板）を用いているが、これに限らない。例えば、上記実施形態で説明したＩＷＢ１００の機能は、撮像装置を備えた他の情報処理装置（例えば、スマートフォン、タブレット端末、ノートパソコン等）によって実現されてもよく、撮像装置を備えていない他の情報処理装置（例えば、パーソナルコンピュータ等）によって実現されてもよい。 For example, in the above embodiment, the IWB 100 (electronic blackboard) is used as an example of the “video processing device” and the “communication terminal”, but the present invention is not limited to this. For example, the function of the IWB 100 described in the above embodiment may be realized by another information processing device including an imaging device (for example, a smartphone, a tablet terminal, a notebook personal computer, or the like), or another information processing device without an imaging device. It may be realized by an information processing device (for example, a personal computer or the like).

また、例えば、上記実施形態では、本発明をビデオ会議システムに適用した例を説明したが、これに限らない。すなわち、本発明は、映像データの一部の領域を低画質化することによってデータ量の削減を目的とするものであれば、如何なる用途にも適用可能である。また、本発明は、映像データの符号化および復号化を行わない情報処理装置にも適用可能である。 Further, for example, in the above embodiment, an example in which the present invention is applied to a video conference system has been described, but the present invention is not limited to this. That is, the present invention can be applied to any application as long as the purpose is to reduce the data amount by lowering the image quality of a part of the video data. Further, the present invention is also applicable to an information processing device that does not perform encoding and decoding of video data.

また、例えば、上記実施形態では、「特定領域」の一例として顔検出領域を用いているが、これに限らない。すなわち、「特定領域」は、比較的高画質とすることが好ましい被写体（例えば、文字や画像が示されている資料、ホワイトボード、監視カメラにおける人物等）が映し出されている領域であれば、如何なる領域であってもよい。 Further, for example, in the above embodiment, the face detection area is used as an example of the “specific area”, but the present invention is not limited to this. That is, the “specific area” is an area in which a subject (for example, a material indicating characters and images, a whiteboard, a person in a surveillance camera, and the like) that is preferably of relatively high image quality is displayed. Any area may be used.

また、上記実施形態において、各処理に用いられる各種設定値（例えば、特定領域の検出対象とする被写体の種類、フレーム画像を分割する際のブロックサイズ，ブロック数、高周波成分の解析結果のレベルの段階数、画質レベルの段階数、画質調整の際の調整項目，調整量、等）は、予め好適な値が設定されたものであってもよく、ユーザが、ユーザインタフェースを備える情報処理装置（例えば、パーソナルコンピュータ等）から、好適な値を任意に設定可能であってもよい。 Further, in the above embodiment, various setting values (for example, the type of the subject to be detected in the specific area, the block size when dividing the frame image, the number of blocks, and the level of the analysis result of the high frequency component used in each process) The number of steps, the number of steps of the image quality level, the adjustment items for adjusting the image quality, the adjustment amount, etc.) may be set to suitable values in advance, and the information processing apparatus having the user interface ( For example, a suitable value may be arbitrarily set from a personal computer or the like.

１０ビデオ会議システム
１２会議サーバ（サーバ装置）
１４会議予約サーバ
１６ネットワーク
１００ＩＷＢ（映像処理装置、撮像装置、通信端末）
１０１カメラ（撮像部）
１０２タッチパネル・ディスプレイ（表示装置）
１０３マイク
１０４スピーカ
１０８通信Ｉ／Ｆ（通信部）
１２０主制御部
１２２映像取得部
１２８符号化部
１３０送信部
１３２受信部
１３４復号化部
１３６表示制御部
１３８音声取得部
１４０音声処理部
１４２音声出力部
１５０映像処理部
１５１ブロック化部
１５２映像解析部
１５３画質決定部
１５４特定領域検出部
１５５画質調整部 10 Video conference system 12 Conference server (server device)
14 conference reservation server 16 network 100 IWB (video processing device, imaging device, communication terminal)
101 camera (imaging unit)
102 Touch panel display (display device)
103 Microphone 104 Speaker 108 Communication I / F (communication unit)
Reference Signs List 120 main control unit 122 video acquisition unit 128 encoding unit 130 transmission unit 132 reception unit 134 decoding unit 136 display control unit 138 audio acquisition unit 140 audio processing unit 142 audio output unit 150 video processing unit 151 blocking unit 152 video analysis unit 153 Image quality determination unit 154 Specific area detection unit 155 Image quality adjustment unit

特開２０１７−１６３２２８号公報JP 2017-163228 A

Claims

An image acquisition unit for acquiring an image,
A video analysis unit that analyzes high-frequency components for each region in the video acquired by the video acquisition unit,
At least a part of the region in the video according to the analysis result by the video analysis unit, the higher the high-frequency component, the higher the high-frequency component, the image quality adjustment unit that performs image quality adjustment so that higher image quality. Characteristic video processing device.

The image processing apparatus further includes a blocking unit that divides the video into a plurality of blocks,
The video analysis unit,
Analyzing the high frequency component for each block in the video,
The image quality adjustment unit,
The video processing device according to claim 1, wherein the image quality adjustment is performed for each of the blocks in the video.

The image processing apparatus further includes a specific area detection unit that detects a specific area that is an area where a specific subject in the video is projected,
The image quality adjustment unit,
The video processing device according to claim 1, wherein the image quality adjustment is performed such that image quality of the specific area is higher than image quality of an area other than the specific area.

The image quality adjustment unit,
The image quality of the area that is the peripheral area of the specific area in the other area becomes the image quality according to the analysis result, and the image quality of the area that is not the peripheral area of the specific area in the other area is the analysis result. The image processing apparatus according to claim 3, wherein the image quality adjustment is performed so that the image quality is lower than the image quality according to the image quality.

An encoding unit that encodes the image after image quality adjustment by the image quality adjustment unit,
The video processing device according to claim 3, further comprising: a communication unit configured to transmit the video encoded by the encoding unit to an external device.

The image quality adjustment unit,
The video processing device according to claim 5, wherein, when communication resources used for the transmission are insufficient, the image quality of the other area is set to a minimum image quality.

The image quality adjustment unit,
The video processing device according to claim 5, wherein, when communication resources to be used at the time of the transmission have a margin, the image quality of the specific area is set to the highest image quality.

The image quality adjustment unit,
The video processing device according to claim 5, wherein the image quality of the other area is enhanced when communication resources used for the transmission have a margin.

The specific area detection unit,
The image processing device according to any one of claims 3 to 8, wherein an area in which a person's face is projected in the image is detected as the specific area.

The image quality adjustment unit,
When a plurality of the specific areas are detected by the specific area detection unit, the specific area where the face of a person who is not talking is projected is displayed, and the specific area where the face of a person who is talking is displayed is displayed. The image processing apparatus according to claim 9, wherein the image quality is lower than that of the area.

Multiple communication terminals,
A server device for performing various controls related to the video conference by the plurality of communication terminals,
Each of the plurality of communication terminals,
An imaging unit that captures an image,
A video analysis unit that analyzes high frequency components for each region in the video imaged by the imaging unit,
According to the analysis result by the video analysis unit, at least a part of the region in the video, an image quality adjustment unit that performs image quality adjustment so that the higher the high-frequency component, the higher the image quality.
A communication unit for transmitting the video after the image quality adjustment by the image quality adjustment unit to an external device.

An image acquisition step of acquiring an image,
An image analysis step of analyzing a high-frequency component for each region in the image acquired in the image acquisition step,
An image quality adjustment step of performing image quality adjustment so that at least a part of the region in the image according to the analysis result in the image analysis step has higher image quality as the high frequency component increases. Video processing method.

Computer
An image acquisition unit that acquires images,
A video analysis unit that analyzes high frequency components for each region in the video acquired by the video acquisition unit, and
According to the analysis result by the video analysis unit, at least a part of the region in the video, as the region with more high frequency components, functions as an image quality adjustment unit that performs image quality adjustment so as to have higher image quality. program.