JP2011234033A

JP2011234033A - Monitoring camera and monitor system

Info

Publication number: JP2011234033A
Application number: JP2010101219A
Authority: JP
Inventors: Kazuyuki Iguma; 一行猪熊
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2010-04-26
Filing date: 2010-04-26
Publication date: 2011-11-17
Also published as: WO2011135776A1

Abstract

PROBLEM TO BE SOLVED: To improve the compressibility of the entire image while sufficiently securing the recognition performance of an object to be monitored.SOLUTION: A monitor system comprises a monitoring camera (1) and a monitoring terminal (2). The monitoring camera (1) comprises: an imaging unit (11) for imaging the object to be monitored; an image coding unit (12) which codes the image outputted from the imaging unit (11) by irreversible compression; an image decoding unit (13) which decodes the image coded by the image coding unit (12); a feature amount extraction unit (14) which receives a first image outputted from the imaging unit (11) and a second image outputted from the image decoding unit (13) and extracts the feature amount of each image; a feature variation detection unit (15) which detects a varied portion between a feature amount extracted from the first image and that extracted from the second image; and a coding control unit (16) which changes the control parameter of the image coding unit (12) on the basis of the varied portion of the detected feature amount.

Description

本発明は、監視カメラおよび監視カメラと端末をネットワーク等で結んだ監視システムに関するものであり、特に人物認識技術等を使って監視の自動化を行う監視システムに関する。 The present invention relates to a surveillance camera and a surveillance system in which a surveillance camera and a terminal are connected via a network or the like, and more particularly to a surveillance system that performs surveillance automation using a person recognition technology or the like.

近年、監視カメラの映像をＩＰネットワーク等で伝送し、それを監視端末で受信して広域を効率的に監視する監視システムが発展してきている。図６は従来の監視システムの構成を示している。監視システムは監視カメラ１００と監視端末２００から構成される。監視カメラ１００は、レンズ１０１、イメージセンサ１０２、およびカメラ信号処理１０３からなる撮像部１０４と、映像符号化部１０５、映像送信部１０６から構成される。監視端末２００は、映像受信部２０１、映像記録部２０２、映像復号化部２０３、および映像表示部６１２から構成される。 2. Description of the Related Art In recent years, surveillance systems have been developed that efficiently transmit video from surveillance cameras over an IP network or the like and receive them with a surveillance terminal to efficiently monitor a wide area. FIG. 6 shows the configuration of a conventional monitoring system. The monitoring system includes a monitoring camera 100 and a monitoring terminal 200. The surveillance camera 100 includes an imaging unit 104 including a lens 101, an image sensor 102, and camera signal processing 103, a video encoding unit 105, and a video transmission unit 106. The monitoring terminal 200 includes a video receiving unit 201, a video recording unit 202, a video decoding unit 203, and a video display unit 612.

レンズ１０１は監視対象の光学像をイメージセンサ１０２に結像させる。イメージセンサ１０２は光学像を電気信号に変換する。カメラ信号処理部１０３はイメージセンサ１０２の信号を輝度と色のデジタルデータに変換する。映像符号化部１０５は映像を圧縮符号化する。ネットワークの帯域は限られており、圧縮率を高めるために非可逆圧縮が採用される。映像送信部１０６は監視端末２００に符号化された映像を送信する。例えばＩＰネットワークで送信する場合には、映像送信部１０６はＩＰヘッダ処理や再送・輻輳制御等の送信制御を行う。 The lens 101 forms an optical image to be monitored on the image sensor 102. The image sensor 102 converts the optical image into an electrical signal. The camera signal processing unit 103 converts the signal of the image sensor 102 into luminance and color digital data. The video encoding unit 105 compresses and encodes the video. Network bandwidth is limited, and lossy compression is employed to increase the compression rate. The video transmission unit 106 transmits the encoded video to the monitoring terminal 200. For example, when transmitting on an IP network, the video transmission unit 106 performs transmission control such as IP header processing and retransmission / congestion control.

監視端末２００では、映像受信部２０１が監視カメラ１００からの符号化された映像を受信する。映像記録部２０２は受信された映像を記録する。映像復号化部２０３は符号化された映像を復号し映像データを再現する。ただし、監視カメラ１００から送信される映像は非可逆圧縮されているので、監視端末２００側では監視カメラ１００で取得した映像と完全に同じ映像にはならず必ず劣化が発生する。映像表示部２０４は監視カメラ１００の映像を表示する。 In the monitoring terminal 200, the video reception unit 201 receives the encoded video from the monitoring camera 100. The video recording unit 202 records the received video. The video decoding unit 203 reproduces video data by decoding the encoded video. However, since the video transmitted from the monitoring camera 100 is irreversibly compressed, on the monitoring terminal 200 side, the video acquired by the monitoring camera 100 is not completely the same, and deterioration always occurs. The video display unit 204 displays the video of the monitoring camera 100.

映像監視において重要な要件は、民生用カメラで重要な美しさ等ではなく、重要な情報が欠落していないことである。例えば不審人物を特定するためには人物顔が非常に重要となる。またネットワークの帯域は通常限られており、また記録も常時記録となるため可能な限り高い圧縮率の映像圧縮が求められている。そこで、監視カメラ１００に映像の中から人物画像の領域を関心領域として検出する関心領域検出部１０７を設け、映像符号化部１０５は関心領域の圧縮率を他の領域よりも相対的に下げることで、高画質化を図りつつ重要情報の欠落を防いでいる（例えば、特許文献１参照）。 An important requirement in video surveillance is that important information is not missing, not important beauty in consumer cameras. For example, a person's face is very important for identifying a suspicious person. In addition, the network bandwidth is usually limited, and since recording is always performed, video compression with the highest possible compression rate is required. Therefore, the surveillance camera 100 is provided with a region of interest detection unit 107 that detects a human image region as a region of interest from the video, and the video encoding unit 105 lowers the compression rate of the region of interest relative to other regions. Therefore, loss of important information is prevented while achieving high image quality (see, for example, Patent Document 1).

特開２００１−１４５１０１号公報JP 2001-145101 A

上記従来技術によると、関心領域の画質は、他の領域との相対的な関係において向上させることができるが、どの程度向上すべきかといった絶対的な尺度を以て制御することは困難である。そのため、ネットワーク帯域がより厳しくなり関心領域の画質さえある程度低下させざる得ない場合、どの程度まで低下させてもよいか判断できないおそれがある。また、それゆえ、十分な性能を確保した上でネットワーク帯域や映像記録部の容量を低減することによるコストパフォーマンスの向上を論理的かつ定量的に実施することが困難である。 According to the prior art, the image quality of the region of interest can be improved in a relative relationship with other regions, but it is difficult to control with an absolute measure of how much to improve. Therefore, when the network bandwidth becomes more severe and even the image quality of the region of interest has to be reduced to some extent, it may not be possible to determine how much it should be reduced. Therefore, it is difficult to logically and quantitatively improve cost performance by reducing the network bandwidth and the capacity of the video recording unit while ensuring sufficient performance.

かかる問題に鑑み、本発明は、監視システムにおいて監視対象の認識性能を十分に確保しつつ映像全体の圧縮率を高めることを課題とする。 In view of such a problem, an object of the present invention is to increase the compression ratio of the entire video while sufficiently ensuring the recognition performance of the monitoring target in the monitoring system.

上記課題を解決するために本発明によって次のような手段を講じた。すなわち、監視カメラは、監視対象を撮像する撮像部と、前記撮像部から出力された映像を非可逆圧縮によって符号化する映像符号化部と、前記映像符号化部によって符号化された映像を復号する映像復号化部と、前記撮像部から出力された第１の映像および前記映像復号化部から出力された第２の映像を受け、それぞれの映像における特徴量を抽出する特徴量抽出部と、前記第１の映像から抽出された特徴量と前記第２の映像から抽出された特徴量との変化分を検出する特徴変化検出部と、前記検出された特徴量の変化分に基づいて前記映像符号化部の制御パラメータを変更する符号化制御部とを備えているものとする。また、監視システムは、前記監視カメラと、前記監視カメラと通信回線を通じて接続される監視端末とを備えているものとする。 In order to solve the above problems, the present invention has taken the following measures. That is, the monitoring camera decodes the video encoded by the video encoding unit, the video encoding unit that encodes the video output from the imaging unit by irreversible compression, and the video encoding unit. A video decoding unit that receives the first video output from the imaging unit and the second video output from the video decoding unit, and extracts a feature amount in each video; A feature change detecting unit for detecting a change between the feature quantity extracted from the first video and the feature quantity extracted from the second video; and the video based on the detected change in the feature quantity. An encoding control unit that changes the control parameter of the encoding unit is provided. The surveillance system includes the surveillance camera and a surveillance terminal connected to the surveillance camera through a communication line.

これによると、映像監視の最終的な目的である映像の識別・認識に必要な特徴量の変化、すなわち劣化度合いに基づいて映像符号化に係る制御パラメータが変更される。したがって、映像全体を絶対的な尺度を以て適切に圧縮符号化することができる。 According to this, a control parameter related to video coding is changed based on a change in feature amount necessary for video identification / recognition, which is the final purpose of video monitoring, that is, a degree of deterioration. Therefore, the entire video can be appropriately compressed and encoded with an absolute scale.

好ましくは、前記監視端末は、前記監視カメラから受信した映像から前記監視対象を認識する監視対象認識部と、前記監視カメラから受信した映像の認識難度を評価する認識難度評価部とを有し、前記認識難度を前記監視カメラに送信するものであり、前記監視カメラにおける前記符号化制御部は、前記監視カメラから送信された認識難度を加味して前記映像符号化部の制御パラメータを変更する。これによると、監視端末からフィードバックされた監視対象の認識難度に基づいて映像符号化に係る制御パラメータが適応的に変更される。 Preferably, the monitoring terminal includes a monitoring target recognition unit that recognizes the monitoring target from a video received from the monitoring camera, and a recognition difficulty level evaluation unit that evaluates a recognition difficulty level of the video received from the monitoring camera, The recognition difficulty level is transmitted to the monitoring camera, and the encoding control unit in the monitoring camera changes the control parameter of the video encoding unit in consideration of the recognition difficulty level transmitted from the monitoring camera. According to this, the control parameter relating to video encoding is adaptively changed based on the recognition difficulty level of the monitoring target fed back from the monitoring terminal.

また、好ましくは、前記特徴量抽出部は、抽出した特徴量を特徴ベクトルとして出力するものであり、前記特徴変化検出部は、前記第１の映像に係る特徴ベクトルと前記第２の映像に係る特徴ベクトルとの差ベクトルを算出し、当該差ベクトルの各要素に重み付けをしたものを特徴差分ベクトルとして出力する。このように差ベクトルの各要素に重み付けをすることで認識性能を正確に反映する特徴量の変化を検出することができる。 Preferably, the feature quantity extraction unit outputs the extracted feature quantity as a feature vector, and the feature change detection unit relates to the feature vector related to the first video and the second video. A difference vector from the feature vector is calculated, and each element of the difference vector is weighted and output as a feature difference vector. Thus, by weighting each element of the difference vector, it is possible to detect a change in the feature amount that accurately reflects the recognition performance.

より好ましくは、前記監視端末は、前記監視カメラから受信した映像から前記監視対象を認識する監視対象認識部を有し、当該認識に係るアルゴリズムを前記監視カメラに送信するものであり、前記監視カメラにおける前記特徴変化検出部は、前記監視カメラから送信された認識アルゴリズムを加味して前記特徴差分ベクトルの各要素に重み付けをする。これによると、監視端末からフィードバックされた認識アルゴリズムに基づいて監視カメラにおける特徴量の重み付けが適応的に制御される。 More preferably, the monitoring terminal includes a monitoring target recognition unit that recognizes the monitoring target from video received from the monitoring camera, and transmits an algorithm related to the recognition to the monitoring camera. The feature change detecting unit weights each element of the feature difference vector in consideration of the recognition algorithm transmitted from the surveillance camera. According to this, the feature weighting in the monitoring camera is adaptively controlled based on the recognition algorithm fed back from the monitoring terminal.

本発明によると、監視システムにおいて監視対象の認識性能を十分に確保しつつ映像全体の圧縮率を高めることができる。これにより、ネットワーク帯域や映像記録部の効率的利用による監視システム全体のコストパフォーマンスを高めることが可能となる。 According to the present invention, it is possible to increase the compression rate of the entire video while sufficiently ensuring the recognition performance of the monitoring target in the monitoring system. This makes it possible to improve the cost performance of the entire monitoring system by efficiently using the network bandwidth and the video recording unit.

図１は、本発明の一実施形態に係る監視システムの構成図である。FIG. 1 is a configuration diagram of a monitoring system according to an embodiment of the present invention. 図２は、特徴抽出部の構成図である。FIG. 2 is a configuration diagram of the feature extraction unit. 図３は、誤差検出部の構成図である。FIG. 3 is a configuration diagram of the error detection unit. 図４は、符号化制御部の構成図である。FIG. 4 is a configuration diagram of the encoding control unit. 図５は、監視端末の主要部の構成図である。FIG. 5 is a configuration diagram of a main part of the monitoring terminal. 図６は、従来の監視システムの構成図である。FIG. 6 is a configuration diagram of a conventional monitoring system.

図１は本発明の一実施形態に係る監視システムの構成を示している。本実施形態に係る監視システムは監視カメラ１と監視端末２で構成される。監視カメラ１は、レンズ１１１、イメージセンサ１１２、カメラ信号処理部１１３からなる撮像部１１と、映像符号化部１２、映像復号化部１３、特徴量抽出部１４、特徴変化検出部１５、符号化制御部１６、映像送信部１７、認識難度受信部１８、および認識アルゴリズム受信部１９を備えている。監視端末２は、映像受信部２１、映像記録部２２、映像復号化部２３、映像表示部２４、監視対象認識部２５、認識難度評価部２６、認識難度送信部２７、および認識アルゴリズム送信部２８を備えている。 FIG. 1 shows a configuration of a monitoring system according to an embodiment of the present invention. The monitoring system according to this embodiment includes a monitoring camera 1 and a monitoring terminal 2. The surveillance camera 1 includes an imaging unit 11 including a lens 111, an image sensor 112, and a camera signal processing unit 113, a video encoding unit 12, a video decoding unit 13, a feature amount extraction unit 14, a feature change detection unit 15, and an encoding. A control unit 16, a video transmission unit 17, a recognition difficulty level reception unit 18, and a recognition algorithm reception unit 19 are provided. The monitoring terminal 2 includes a video receiving unit 21, a video recording unit 22, a video decoding unit 23, a video display unit 24, a monitoring target recognition unit 25, a recognition difficulty level evaluation unit 26, a recognition difficulty level transmission unit 27, and a recognition algorithm transmission unit 28. It has.

レンズ１１１は監視対象の光学像をイメージセンサ１１２に結像させる。イメージセンサ１１２は光学像を電気信号に変換する。カメラ信号処理部１１３はイメージセンサ１１２の信号を輝度と色のデジタルデータに変換する。映像符号化部１２は映像を圧縮符号化する。映像復号化部１３は符号化された映像を復号する。特徴量抽出部１４は監視端末２で実施される認識処理に必要な監視対象の特徴を映像から抽出する。特徴変化検出部１５は撮像部１１から出力された映像の特徴量と映像復号化部１３から出力された映像の特徴量との変化分を検出する。すなわち、特徴変化検出部１５は符号化前後の特徴量の変化を検出する。符号化制御部１６は、特徴変化検出部１５の検出結果に基づき映像符号化部１２の制御パラメータを変更する。映像送信部１７は映像符号化部１２で符号化された映像を監視端末２へ送信する。 The lens 111 forms an optical image to be monitored on the image sensor 112. The image sensor 112 converts the optical image into an electrical signal. The camera signal processing unit 113 converts the signal of the image sensor 112 into luminance and color digital data. The video encoding unit 12 compresses and encodes the video. The video decoding unit 13 decodes the encoded video. The feature amount extraction unit 14 extracts features to be monitored necessary for recognition processing performed by the monitoring terminal 2 from the video. The feature change detection unit 15 detects a change between the feature amount of the video output from the imaging unit 11 and the feature amount of the video output from the video decoding unit 13. That is, the feature change detection unit 15 detects a change in feature amount before and after encoding. The encoding control unit 16 changes the control parameter of the video encoding unit 12 based on the detection result of the feature change detection unit 15. The video transmission unit 17 transmits the video encoded by the video encoding unit 12 to the monitoring terminal 2.

なお、Ｈ．２６４やＭＰＥＧ等の動画圧縮方式では既に復号された画像を参照してフレーム間予測符号化を行うため、実際には映像符号化部１２と映像復号化部１３は一体であるが、ここでは説明の便宜上、映像符号化部１２と映像復号化部１３を区別して表示している。したがって、映像符号化部１２と映像復号化部１３とが個別に存在しても特に追加コストとなるわけではない。 H. In the video compression system such as H.264 and MPEG, since the inter-frame prediction encoding is performed with reference to the already decoded image, the video encoding unit 12 and the video decoding unit 13 are actually integrated. For convenience, the video encoding unit 12 and the video decoding unit 13 are displayed separately. Therefore, even if the video encoding unit 12 and the video decoding unit 13 exist separately, there is no particular additional cost.

映像受信部２１は監視カメラ１からの符号化された映像を受信する。映像記録部２２は受信された映像を記録する。映像復号化部２３は受信された映像もしくは映像記録部２２に記録された映像を復号する。映像表示部２４は復号された映像を表示する。監視対象認識部２５は映像から監視対象の認識、例えば人の個人認識を行う。認識難度評価部２６は監視対象認識部２５における認識処理の困難度合いを評価する。認識難度送信部２７は認識難度評価部２６から出力された認識難度を監視カメラ１に送信する。認識アルゴリズム送信部２８は監視端末２で行う認識処理のアルゴリズムを監視カメラ１に送信する。 The video receiver 21 receives the encoded video from the surveillance camera 1. The video recording unit 22 records the received video. The video decoding unit 23 decodes the received video or the video recorded in the video recording unit 22. The video display unit 24 displays the decoded video. The monitoring object recognition unit 25 recognizes the monitoring object from the video, for example, personal recognition of a person. The recognition difficulty level evaluation unit 26 evaluates the difficulty level of the recognition process in the monitoring target recognition unit 25. The recognition difficulty level transmission unit 27 transmits the recognition difficulty level output from the recognition difficulty level evaluation unit 26 to the monitoring camera 1. The recognition algorithm transmission unit 28 transmits an algorithm for recognition processing performed by the monitoring terminal 2 to the monitoring camera 1.

認識難度受信部１８は認識難度を受信する。符号化制御部１６は認識難度を受け、認識難度が高い場合には特徴変化検出部１５で検出される特徴量変化がより少なくなるように映像符号化部１２を制御する。認識アルゴリズム受信部１９は認識アルゴリズムを受信する。特徴変化検出部１５は認識アルゴリズムを受け、その認識アルゴリズムに合わせて特徴量変化の検出に係るパラメータを変更する。 The recognition difficulty level receiving unit 18 receives the recognition difficulty level. The encoding control unit 16 receives the recognition difficulty level. When the recognition difficulty level is high, the encoding control unit 16 controls the video encoding unit 12 so that the feature amount change detected by the feature change detection unit 15 becomes smaller. The recognition algorithm receiving unit 19 receives a recognition algorithm. The feature change detection unit 15 receives the recognition algorithm, and changes a parameter related to detection of a feature amount change according to the recognition algorithm.

次に監視カメラ１の各部について詳細に説明する。図２は特徴量抽出部１４の構成を示している。特徴量抽出部１４は、セレクタ１４１、顔検出部１４２、顔特徴抽出部１４３から構成されている。顔検出部１４３は、複数の特定顔向き検出部１４４と結合器１４５で構成されている。特定顔向き検出部１４４は複数の弱識別器１４６で構成されている。顔特徴抽出部１４３は、顔部品位置検出部１４７と顔部品形状検出部１４８から構成されている。顔部品形状検出部１４８は複数のガボールフィルタ１４９から構成されている。 Next, each part of the surveillance camera 1 will be described in detail. FIG. 2 shows the configuration of the feature quantity extraction unit 14. The feature quantity extraction unit 14 includes a selector 141, a face detection unit 142, and a face feature extraction unit 143. The face detection unit 143 includes a plurality of specific face direction detection units 144 and a coupler 145. The specific face direction detection unit 144 includes a plurality of weak classifiers 146. The face feature extraction unit 143 includes a face part position detection unit 147 and a face part shape detection unit 148. The face part shape detection unit 148 includes a plurality of Gabor filters 149.

セレクタ１４１は撮像部１１および映像復号化部１３からの映像のいずれか一方を選択する。顔検出部１４２は選択された映像の中から人物顔を検出する。顔検出部１４２において、映像の濃淡パタンに基づき顔らしさを識別する弱識別器１４６がカスケードに組み合わされた、いわゆるブースティング学習器が構成されている。また、顔検出部１４２において、特定の顔向きを区別するためと向き毎の最適化のために特定の顔向き毎に個別の学習を行った特定顔向き検出部１４４が構成されている。 The selector 141 selects one of the videos from the imaging unit 11 and the video decoding unit 13. The face detection unit 142 detects a human face from the selected video. In the face detection unit 142, a so-called boosting learning device is configured in which weak discriminators 146 for identifying the facial appearance based on the shading pattern of the video are combined in cascade. Further, the face detection unit 142 includes a specific face direction detection unit 144 that performs individual learning for each specific face direction in order to distinguish the specific face direction and to optimize the direction.

各特定顔向き検出部１４４による顔らしさ判定は結合器１４５で結合され、最終的に人物顔かどうかの判定が行われる。顔検出部１４２で人物顔と判定された領域では顔特徴抽出部１４３により人物顔の特徴が抽出される。まず顔部品位置検出部１４７で人物顔を特徴付ける重要な顔部品である眼、鼻、口等の位置が検出される。ここでは顔検出部１４２と同様の濃淡パタンから識別を行う弱識別器１４６を顔部品に特化して学習させたものを使用して、各顔部品位置を探索する。顔部品位置検出部１４７で検出された顔部品は顔部品形状検出部１４８でその形状情報が抽出される。顔部品形状検出部１４８は顔部品の形状情報となる傾きを含めたエッジ情報を検出するための複数のガボールフィルタ１４９を備えている。顔部品位置情報および顔部品形状情報は後段の特徴変化検出部１５に送られる。 The face-likeness determination by each specific face direction detection unit 144 is combined by a combiner 145, and finally it is determined whether or not it is a human face. In the area determined to be a human face by the face detection unit 142, the human face feature is extracted by the face feature extraction unit 143. First, the face part position detection unit 147 detects the positions of eyes, nose, mouth, and the like, which are important face parts characterizing the human face. Here, each facial part position is searched using a weak discriminator 146 that discriminates from the shading pattern similar to that of the face detection unit 142 and that is specially learned for the facial part. The face component detected by the face component position detection unit 147 is extracted by the face component shape detection unit 148. The face part shape detection unit 148 includes a plurality of Gabor filters 149 for detecting edge information including inclination that is shape information of the face part. The face part position information and the face part shape information are sent to the feature change detection unit 15 at the subsequent stage.

図３は特徴変化検出部１５の構成を示している。特徴量は複数あるため、これらを特徴ベクトル１５１として一括りにする。特徴変化検出部１５は、特徴ベクトルメモリ１５２、特徴差分ベクトル演算部１５３、特徴差分ベクトル空間変換部１５４、特徴差分ベクトル空間決定部１５５から構成される。特徴ベクトルメモリ１５２は撮像部１１から出力された映像に係る特徴ベクトルを記憶するメモリ１５２ａと映像復号化部１３から出力された映像に係る特徴ベクトルを記憶するメモリ１５２ｂを備えている。特徴差分ベクトル演算部１５３は撮像部１１から出力された映像に係る特徴ベクトルと映像復号化部１３から出力された映像に係る特徴ベクトルとの誤差である特徴差分ベクトルを算出する。特徴差分ベクトル空間変換部１５４は特徴差分ベクトルの空間、すなわち個々の特徴量の差分値を所定のルールで変換する。これは特徴量毎に異なる重み付けをしていることになる。監視端末２で実施する認識アルゴリズムにより、特徴量の重み付けが異なるためである。この処理を正確に行うため、特徴差分ベクトル空間決定部１５５は認識アルゴリズム受信部１９から受けた認識アルゴリズムに基づき変換ルールを決定する。 FIG. 3 shows the configuration of the feature change detector 15. Since there are a plurality of feature amounts, these are collected as a feature vector 151. The feature change detection unit 15 includes a feature vector memory 152, a feature difference vector calculation unit 153, a feature difference vector space conversion unit 154, and a feature difference vector space determination unit 155. The feature vector memory 152 includes a memory 152 a that stores a feature vector related to a video output from the imaging unit 11 and a memory 152 b that stores a feature vector related to a video output from the video decoding unit 13. The feature difference vector calculation unit 153 calculates a feature difference vector that is an error between the feature vector related to the video output from the imaging unit 11 and the feature vector related to the video output from the video decoding unit 13. The feature difference vector space conversion unit 154 converts a feature difference vector space, that is, a difference value between individual feature amounts, according to a predetermined rule. This means that different weighting is performed for each feature amount. This is because the weighting of the feature amount varies depending on the recognition algorithm implemented by the monitoring terminal 2. In order to perform this process accurately, the feature difference vector space determination unit 155 determines a conversion rule based on the recognition algorithm received from the recognition algorithm reception unit 19.

図４は符号化制御部１６の構成を示している。符号化制御部１６は、特徴差分ベクトル大きさ評価値演算部１６１、特徴差分ベクトル閾値比較部１６２、特徴差分ベクトル閾値決定部１６３、量子化パラメータ制御部１６４、顔部品位置差評価値演算部１６５、予測モード制御部１６６から構成される。 FIG. 4 shows the configuration of the encoding control unit 16. The encoding control unit 16 includes a feature difference vector magnitude evaluation value calculation unit 161, a feature difference vector threshold comparison unit 162, a feature difference vector threshold determination unit 163, a quantization parameter control unit 164, and a facial part position difference evaluation value calculation unit 165. The prediction mode control unit 166 is configured.

特徴差分ベクトル大きさ評価値演算部１６１は特徴変化検出部１５から出力された特徴差分ベクトルの大きさを示す総合的な指標を計算する。例えばベクトルの絶対値である。これにより特徴量の差異は一つの評価値に集約される。特徴差分ベクトル閾値比較部１６２は監視端末２における所望の認識性能を確保するためにあるべき特徴差分ベクトルの閾値と評価指標との比較を行う。特徴差分ベクトル閾値決定部１６３は特徴差分ベクトルの閾値の決定を行う。 The feature difference vector magnitude evaluation value calculation unit 161 calculates a comprehensive index indicating the size of the feature difference vector output from the feature change detection unit 15. For example, the absolute value of the vector. As a result, the difference in feature amount is collected into one evaluation value. The feature difference vector threshold value comparison unit 162 compares the threshold value of the feature difference vector that should be in order to ensure desired recognition performance in the monitoring terminal 2 and the evaluation index. The feature difference vector threshold value determination unit 163 determines the threshold value of the feature difference vector.

所望の認識性能の確保をより正確に行うため、特徴差分ベクトルの閾値は認識難度受信部１８から送られる認識難度に基づき更新される。例えば認識難度が高い場合は閾値を下げる処理を行う。量子化パラメータ制御部１６４は特徴差分ベクトル閾値比較部１６２の出力に基づき映像符号化部１２の量子化パラメータを制御する。量子化を粗くすると量子化に起因するノイズが増え特徴差分ベクトルが増大する。閾値を超えた場合、所望の認識性能が確保できないため、量子化を細かくする方向に制御を行う。 In order to ensure the desired recognition performance more accurately, the threshold value of the feature difference vector is updated based on the recognition difficulty level sent from the recognition difficulty level receiving unit 18. For example, when the recognition difficulty level is high, processing for lowering the threshold value is performed. The quantization parameter control unit 164 controls the quantization parameter of the video encoding unit 12 based on the output of the feature difference vector threshold comparison unit 162. When the quantization is roughened, noise due to the quantization increases and the feature difference vector increases. If the threshold value is exceeded, the desired recognition performance cannot be ensured, so control is performed in the direction of finer quantization.

量子化ノイズを低減しても特徴差分ベクトルが閾値内に収まらない場合もあり、それはフレーム間予測の誤差に起因する顔部品位置のズレが主要な原因となる。これに対処するため、顔部品位置差評価値演算部１６５は顔部品位置の誤差を評価し予測モード制御部１６６にその情報を送る。予測モード制御部１６６は顔部品位置差評価値演算部１６５からの顔部品位置誤差情報に基づき誤差が大きい場合はフレーム間予測を禁止するなどの処置を行う。なお、量子化を細かくすることやフレーム間予測の禁止は顔領域の符号量増大をもたらすが、顔以外の領域の符号量を低減することで補う。 Even if the quantization noise is reduced, the feature difference vector may not be within the threshold value, which is mainly caused by a shift of the face part position due to an error in inter-frame prediction. In order to cope with this, the facial part position difference evaluation value calculation unit 165 evaluates the error of the facial part position and sends the information to the prediction mode control unit 166. The prediction mode control unit 166 performs measures such as prohibiting inter-frame prediction when the error is large based on the face component position error information from the face component position difference evaluation value calculation unit 165. Note that finer quantization and prohibition of inter-frame prediction cause an increase in the code amount of the face region, but this is compensated by reducing the code amount of the region other than the face.

なお、映像符号化部１６はＨ．２６４等のＭＰＥＧ系動画符号化方式を採用しており、既存技術であるため簡単な解説にとどめる。ＭＰＥＧ系動画符号化方式は、フレーム間予測符号化とフレーム内符号化の組み合わせであり、フレーム内符号化では直交変換で周波数成分に変換した後、視覚特性上目立たない部分を量子化する。このため元の情報は再現できず非可逆圧縮となる。その後エントロピー符号化によりさらに圧縮を行うが、この部分は可逆圧縮である。フレーム間予測符号化のためには予測の元となる参照画像が必要なので、局所復号化によりそれを得る。エントロピー符号化は可逆圧縮なのでその前の量子化までされたデータを逆量子化、逆直交変換、フレーム間復号化と逆の手順で復号を実施し参照画像を得る。逆量子化、逆直交変換、フレーム間復号化はＭＰＥＧ系動画符号化方式では映像符号化部１２に含まれるが、説明の便宜上、映像復号化部１３を敢えて個別に表示している。 Note that the video encoding unit 16 is H.264. The MPEG video coding system such as H.264 is adopted, and since it is an existing technology, only a simple explanation will be given. The MPEG video coding method is a combination of inter-frame predictive coding and intra-frame coding. In intra-frame coding, after converting into frequency components by orthogonal transformation, a portion inconspicuous in visual characteristics is quantized. For this reason, the original information cannot be reproduced, resulting in lossy compression. Thereafter, further compression is performed by entropy coding. This part is lossless compression. Since inter-frame predictive coding requires a reference image as a prediction source, it is obtained by local decoding. Since entropy coding is lossless compression, the previous quantized data is decoded in the reverse order of inverse quantization, inverse orthogonal transform, and inter-frame decoding to obtain a reference image. Inverse quantization, inverse orthogonal transform, and inter-frame decoding are included in the video encoding unit 12 in the MPEG moving image encoding system, but the video decoding unit 13 is intentionally displayed individually for convenience of explanation.

次に監視端末２の主要部の詳細説明を行う。図５は監視端末２の主要部の構成を示している。監視対象認識部２５は、顔検出部２５１、顔特徴抽出部２５２、特徴空間変換部２５３、個人差情報抽出部２５４、個人特徴情報データベース２５５、および個人識別部２５６から構成される。認識難度評価部２６は、外乱ノイズ抽出部２６１および個人差情報対外乱ノイズ比較評価部２６２から構成される。顔検出部２５１および顔特徴抽出部２５２はそれぞれ基本的には監視カメラ１の特徴抽出部１４における顔検出部１４２および顔特徴抽出部１４３と同じものである（図２参照）。特徴空間変換部２５３は、次段の個人差情報抽出部２５４での個人差情報の抽出を容易にするために、外乱ノイズの影響を受けやすい特徴量の重みを下げ、個人差をより的確に反映する特徴量の重みを増大させるように特徴空間の変換を行う。この情報が認識アルゴリズム送信部２８より監視カメラ１に送信され、特徴差分ベクトル空間の変換、すなわち特徴量差の重み付けに用いられる。 Next, detailed description of the main part of the monitoring terminal 2 will be given. FIG. 5 shows the configuration of the main part of the monitoring terminal 2. The monitoring target recognition unit 25 includes a face detection unit 251, a face feature extraction unit 252, a feature space conversion unit 253, an individual difference information extraction unit 254, a personal feature information database 255, and a personal identification unit 256. The recognition difficulty level evaluation unit 26 includes a disturbance noise extraction unit 261 and individual difference information versus disturbance noise comparison evaluation unit 262. The face detection unit 251 and the face feature extraction unit 252 are basically the same as the face detection unit 142 and the face feature extraction unit 143 in the feature extraction unit 14 of the monitoring camera 1 (see FIG. 2). In order to facilitate the extraction of individual difference information in the individual difference information extraction unit 254 in the next stage, the feature space conversion unit 253 lowers the weight of the feature amount that is easily affected by disturbance noise, and makes the individual difference more accurate. The feature space is converted so as to increase the weight of the feature amount to be reflected. This information is transmitted from the recognition algorithm transmission unit 28 to the monitoring camera 1 and used for conversion of the feature difference vector space, that is, weighting of the feature amount difference.

個人差情報抽出部２５４は個人差を的確に反映する評価量を特徴量から生成する。個人特徴情報データベース２５５は個人の顔の特徴量を記憶しているデータベースである。個人識別部２５６は個人特徴情報データベース２５５内の特定の個人の特徴量と個人差情報抽出部２５４より抽出された映像中の顔の特徴を比較し、映像中の顔が特定の個人の顔かどうかを識別する。 The individual difference information extraction unit 254 generates an evaluation amount that accurately reflects the individual difference from the feature amount. The personal feature information database 255 is a database that stores feature values of individual faces. The personal identification unit 256 compares the feature amount of a specific individual in the personal feature information database 255 with the facial features in the video extracted by the personal difference information extraction unit 254, and determines whether the face in the video is a specific personal face. Identify whether.

認識難度評価部２６における外乱ノイズ抽出部２６１は特徴量の中から照明等の外乱の影響を受けやすい特徴量を抽出する。個人差情報対外乱ノイズ比較評価部２６２は個人差を的確に反映する評価量と外乱の影響を受けやすい特徴量との大きさを比較し、認識の難度を反映した評価量を生成する。認識の難度を反映した評価量は認識難度送信部２７を通じて監視カメラ１に送信され、符号化制御部１６での符号化制御をより適切に行うために使用される。 A disturbance noise extraction unit 261 in the recognition difficulty level evaluation unit 26 extracts a feature amount that is easily affected by a disturbance such as illumination from the feature amount. The individual difference information versus disturbance noise comparison / evaluation unit 262 compares the evaluation amount that accurately reflects the individual difference with the feature amount that is easily affected by the disturbance, and generates an evaluation amount that reflects the recognition difficulty level. The evaluation amount reflecting the recognition difficulty level is transmitted to the monitoring camera 1 through the recognition difficulty level transmission unit 27 and is used for performing the encoding control in the encoding control unit 16 more appropriately.

なお、本発明は本実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲で様々な実施形態を取り得る。例えば、本実施形態では人物顔の認識を目的にしているが、顔以外の認識技術に対しても同様の考え方の適用で認識性能とネットワーク帯域効率化との両立を実現することができる。 In addition, this invention is not limited to this embodiment, Various embodiment can be taken in the range which does not deviate from the summary of this invention. For example, although the present embodiment aims to recognize a human face, it is possible to realize both recognition performance and network bandwidth efficiency by applying the same concept to recognition techniques other than faces.

本発明に係る監視システムは、監視対象の認識性能を十分に確保しつつ映像全体の圧縮率を高めることができるため、ネットワーク帯域や映像記録部の効率的利用が要求される監視システムとして有用である。 The monitoring system according to the present invention is useful as a monitoring system that requires efficient use of the network bandwidth and the video recording unit because it can increase the compression rate of the entire video while sufficiently ensuring the recognition performance of the monitoring target. is there.

１監視カメラ
１１撮像部
１２映像符号化部
１３映像復号化部
１４特徴量抽出部
１４２顔検出部
１４３顔特徴抽出部
１５特徴変化検出部
１６符号化制御部
２監視端末
２５監視対象認識部
２６認識難度評価部 DESCRIPTION OF SYMBOLS 1 Surveillance camera 11 Image pick-up part 12 Image | video encoding part 13 Image | video decoding part 14 Feature-value extraction part 142 Face detection part 143 Face feature extraction part 15 Feature change detection part 16 Encoding control part 2 Monitoring terminal 25 Monitoring object recognition part 26 Recognition Difficulty evaluation department

Claims

An imaging unit for imaging a monitoring target;
A video encoding unit that encodes the video output from the imaging unit by lossy compression;
A video decoding unit for decoding the video encoded by the video encoding unit;
A feature amount extraction unit that receives the first video output from the imaging unit and the second video output from the video decoding unit, and extracts a feature amount in each video;
A feature change detection unit that detects a change amount between the feature amount extracted from the first video and the feature amount extracted from the second video;
A surveillance camera comprising: an encoding control unit that changes a control parameter of the video encoding unit based on the detected change in the feature amount.

The surveillance camera of claim 1,
The feature quantity extraction unit outputs the extracted feature quantity as a feature vector;
The feature change detecting unit calculates a difference vector between a feature vector related to the first video and a feature vector related to the second video, and weights each element of the difference vector as a feature difference vector A surveillance camera characterized by output.

The surveillance camera of claim 2,
The encoding control unit converts the feature difference vector output from the feature change detection unit into an evaluation index, and changes the control parameter of the video encoding unit so that the evaluation index falls within a threshold value. Surveillance camera.

The surveillance camera of claim 1,
The surveillance camera, wherein the coding control unit controls presence / absence of inter-frame prediction and a quantization parameter of the video coding unit.

The surveillance camera of claim 1,
The feature amount extraction unit includes:
A face detection unit for detecting a human face in the input video;
A surveillance camera, comprising: a facial feature extraction unit that extracts density gradients of components such as eyes, nose, and mouth in the detected human face.

The surveillance camera of claim 1,
A surveillance camera, wherein the video encoding unit, video decoding unit, feature quantity extraction unit, feature change detection unit, and encoding control unit are configured by one or a plurality of semiconductor chips.

A surveillance camera according to claim 1;
A monitoring system comprising: a monitoring terminal connected to the monitoring camera through a communication line.

The monitoring system of claim 7,
The monitoring terminal is
A monitoring object recognition unit for recognizing the monitoring object from the video received from the monitoring camera;
A recognition difficulty level evaluation unit that evaluates the recognition difficulty level of the video received from the surveillance camera;
Transmitting the recognition difficulty level to the surveillance camera;
The monitoring system, wherein the encoding control unit in the monitoring camera changes a control parameter of the video encoding unit in consideration of a recognition difficulty level transmitted from the monitoring camera.

A surveillance camera according to claim 2;
A monitoring terminal connected to the monitoring camera through a communication line;
The monitoring terminal includes a monitoring target recognition unit that recognizes the monitoring target from the video received from the monitoring camera, and transmits an algorithm related to the recognition to the monitoring camera.
The monitoring system, wherein the feature change detection unit in the monitoring camera weights each element of the feature difference vector in consideration of a recognition algorithm transmitted from the monitoring camera.