JP2019201344A

JP2019201344A - Image processing apparatus, image processing method, and program

Info

Publication number: JP2019201344A
Application number: JP2018095389A
Authority: JP
Inventors: 一樹客野; Kazuki Kakuno
Original assignee: Axell Corp
Current assignee: Axell Corp
Priority date: 2018-05-17
Filing date: 2018-05-17
Publication date: 2019-11-21
Anticipated expiration: 2038-05-17
Also published as: JP6997449B2

Abstract

To provide an image processing apparatus capable of generating an image that makes it easy to monitor an object to be monitored, and an image processing method or the like.SOLUTION: An image processing apparatus 2 includes: an image acquisition unit 7 for acquiring an image; a face image extraction unit 9 for extracting a face image of a person 100 included in the acquired image acquired by the image acquisition unit 7; a monitoring image generator 10 for generating a monitoring image in which one or more face images extracted by the face image extraction unit 9 are arranged; and a determination and rearrangement unit 15 for arranging a face image of the same person in the same position included in each monitoring image generated using each acquired image, when an object image of the same target object is included in a plurality of acquired images.SELECTED DRAWING: Figure 1

Description

本発明は、監視用の画像を処理するための技術に関する。 The present invention relates to a technique for processing an image for monitoring.

近年、遠隔監視システム等に用いる監視カメラが普及しているが、この監視カメラは、撮影した画像（映像や動画を含む）のデータ量を削減してネットワーク負荷を軽減させると共に、所定の空間に存在する人物１００の顔等、監視対象としての対象物（以下単に「対象物」と称する。）を識別できる解像度の画像が必要となる。このため、多くの監視カメラでは、撮影した画像を画像認識に必要な解像度の画像に縮小し、縮小した画像を、たとえばH.264やH.265などのコーデックを用いたエンコードを行ったのち、画像処理装置に送る。この画像処理装置は、データ量が削減され、かつ対象物の識別が可能な画像を生成するための各種の処理を行う。 In recent years, surveillance cameras used in remote surveillance systems and the like have become widespread, but this surveillance camera reduces the amount of data of captured images (including video and moving images) to reduce the network load, and in a predetermined space. An image having a resolution capable of identifying an object to be monitored (hereinafter simply referred to as “object”) such as the face of the existing person 100 is required. For this reason, in many surveillance cameras, the captured image is reduced to an image with a resolution necessary for image recognition, and the reduced image is encoded using, for example, a codec such as H.264 or H.265. Send to image processing device. This image processing apparatus performs various processes for generating an image in which the amount of data is reduced and an object can be identified.

従来、この画像処理装置において、入力画像の画像情報から、監視対象として重要な監視対象領域の画像情報と監視対象として重要でない非監視対象領域の画像情報とを分離し、監視対象領域の画像情報のビットレートよりも非監視対象領域の画像情報のビットレートを低くして、画像情報を送信する技術が知られている（例えば、特許文献１参照）。 Conventionally, in this image processing apparatus, the image information of the monitoring target region is separated from the image information of the monitoring target region that is important as the monitoring target and the image information of the non-monitoring target region that is not important as the monitoring target. There is known a technique for transmitting image information by lowering the bit rate of image information in a non-monitoring target area from the bit rate of the image (see, for example, Patent Document 1).

また、従来、画像処理装置において、ネットワークを介してカメラから画像を取得する第一の取得手段と、第一の取得手段により取得された画像に基づいて解析処理を行う解析手段とを備えると共に、解析手段による解析処理の結果に応じて、追加の画像を、ネットワークを介してカメラから取得する第二の取得手段と、第二の取得手段により取得された画像に基づいて解析処理を行う再解析手段とを備える解析装置を備えた技術が知られている（例えば、特許文献２）。 Conventionally, the image processing apparatus includes a first acquisition unit that acquires an image from a camera via a network, and an analysis unit that performs an analysis process based on the image acquired by the first acquisition unit. According to the result of the analysis process by the analysis unit, a second acquisition unit that acquires an additional image from the camera via the network, and a reanalysis that performs the analysis process based on the image acquired by the second acquisition unit A technique including an analysis device including a means is known (for example, Patent Document 2).

また、従来、画像処理装置において、物体認識部が、カメラにおいて生成された所定時間分の映像データから物体を認識し、認識結果分析部が、その認識結果を分析し、映像受信部で受信したフレーム画像から物体認識部で認識した映像中の顔の最小サイズと、移動速度とを求め、映像処理制御パラメータ決定部が、認識結果分析部で求めた顔の最小サイズと移動速度とから、フレームレート及び解像度の下限値を決定する技術が知られている（例えば、特許文献３）。 Further, conventionally, in an image processing apparatus, an object recognition unit recognizes an object from video data for a predetermined time generated by a camera, and a recognition result analysis unit analyzes the recognition result and receives it by a video reception unit. The minimum size of the face in the video recognized by the object recognition unit and the moving speed are obtained from the frame image, and the video processing control parameter determination unit determines the frame from the minimum size of the face and the moving speed obtained by the recognition result analysis unit. A technique for determining the lower limit value of the rate and resolution is known (for example, Patent Document 3).

特開２０１３−７０１８７号公報JP 2013-70187 A 特開２０１７−２１２６８０号公報JP 2017-212680 A 特開２０１０−２６３５８１号公報JP 2010-263581 A

ここで、監視カメラが撮影する空間には、複数の対象物が存在することが多い。また、たとえば対象物が人物の顔などである場合には、人物の移動にともなって対象物同士の位置関係や大きさや向きなどが時間の経過にともなって変化する。その結果、上記特許文献１乃至特許文献３に記載の発明では、それぞれの人物の顔位置の検出やそれぞれの顔画像の抽出等の処理に要する時間がフレームごとに変化するようなことになりやすい。その場合、顔認識が行われた画像から順に顔画像が並べられていくようなことになりやすい。そのため、上記特許文献１乃至３に記載の発明では、画像処理がされた画像は、時間の経過に伴って画像上を対象物が移動しやすくなってしまい、監視対象である対象物の監視を行いにくいものとなってしまうという問題がある。 Here, there are often a plurality of objects in the space where the surveillance camera captures images. For example, when the target is a person's face or the like, the positional relationship, size, direction, or the like of the target changes with time as the person moves. As a result, in the inventions described in Patent Documents 1 to 3, the time required for processing such as detection of each person's face position and extraction of each face image tends to change from frame to frame. . In that case, the face images are likely to be arranged in order from the image subjected to face recognition. For this reason, in the inventions described in Patent Documents 1 to 3, the image-processed image becomes easy to move on the image as time elapses, and monitoring of the object to be monitored is performed. There is a problem that it becomes difficult to do.

本発明は上記課題に鑑みてなされたものであり、監視対象である対象物の監視を行いやすい画像を生成できる画像処理装置、画像処理方法などを提供することを課題としている。 The present invention has been made in view of the above problems, and it is an object of the present invention to provide an image processing apparatus, an image processing method, and the like that can generate an image that can easily monitor an object to be monitored.

かかる課題を解決するために、請求項１に記載の発明は、画像処理装置であって、画像を取得する画像取得手段と、前記画像取得手段によって取得された取得画像に含まれる、対象物の対象物画像を抽出する対象物画像抽出手段と、前記対象物画像抽出手段によって抽出された１以上の対象物画像を配置した監視用画像を生成する監視用画像生成手段と、複数の取得画像に同一の対象物の対象物画像が含まれるとき、前記各取得画像を用いて生成される各監視用画像に含まれる同一の位置に、前記同一の対象物の対象物画像を配置する同一画像判定配置手段を備えたことを特徴とする。 In order to solve such a problem, the invention according to claim 1 is an image processing apparatus, and includes an image acquisition unit that acquires an image, and an object included in the acquired image acquired by the image acquisition unit. Object image extracting means for extracting an object image, monitoring image generating means for generating a monitoring image in which one or more object images extracted by the object image extracting means are arranged, and a plurality of acquired images When the same target object image is included, the same image determination in which the same target object image is arranged at the same position included in each monitoring image generated using each acquired image An arrangement means is provided.

請求項２に記載の発明は、請求項１に記載の構成に加え、前記同一画像判定配置手段は、前記取得画像から対象物の識別に用いる特徴量情報を抽出し、抽出した前記特徴量情報を用いて、複数の対象物画像が同一の画像か否かを判定することを特徴とする。 According to a second aspect of the present invention, in addition to the configuration according to the first aspect, the same image determination and arrangement unit extracts feature amount information used for identifying an object from the acquired image, and the extracted feature amount information Is used to determine whether or not a plurality of object images are the same image.

請求項３に記載の発明は、請求項１又は２に記載の構成に加え、前記同一画像判定配置手段は、ニューラルネットワークを用いて前記特徴量情報の抽出、及び、前記複数の対象物画像が同一の画像か否かの判定のうち少なくとも一方を行うことを特徴とする。 According to a third aspect of the present invention, in addition to the configuration according to the first or second aspect, the same image determination / arranging unit extracts the feature amount information using a neural network, and the plurality of object images are At least one of the determination of whether or not the images are the same is performed.

請求項４に記載の発明は、請求項１乃至３の何れか一つに記載の構成に加え、前記対象物画像を含む領域画像を抽出する領域画像抽出手段を備え、前記監視用画像生成手段は、前記領域画像抽出手段によって抽出された１以上の前記領域画像を配置した監視用画像を生成することを特徴とする。 According to a fourth aspect of the present invention, in addition to the configuration according to any one of the first to third aspects, a region image extracting unit that extracts a region image including the object image is provided, and the monitoring image generating unit Generates a monitoring image in which one or more of the region images extracted by the region image extraction unit are arranged.

請求項５に記載の発明は、請求項１乃至４の何れか一つに記載の構成に加え、前記画像の画素数を削減して前記取得画像、及び前記監視用画像のうち少なくとも一方を縮小する画像縮小手段を備え、前記画像縮小手段は、前記取得画像、又は、前記取得画像及び前記監視用画像を、前記取得画像における前記対象物画像を含む領域の方が前記監視用画像における前記対象物を含む領域よりも高い縮小率となるように縮小することを特徴とする。 According to a fifth aspect of the present invention, in addition to the configuration according to any one of the first to fourth aspects, the number of pixels of the image is reduced to reduce at least one of the acquired image and the monitoring image. An image reduction means that performs the acquisition image or the acquisition image and the monitoring image, and the region of the acquisition image that includes the object image is the target in the monitoring image. It is characterized in that it is reduced so as to have a higher reduction ratio than the region including the object.

請求項６に記載の発明は、請求項１乃至５の何れか一つに記載の構成に加え、前記取得画像を送信する第一のストリーム送信手段と、前記監視用画像を送信する第二のストリーム送信手段とを含み、前記取得画像、及び前記監視用画像のうち少なくとも一方を送信する送信手段を備えることを特徴とする。 According to a sixth aspect of the present invention, in addition to the configuration according to any one of the first to fifth aspects, a first stream transmission unit that transmits the acquired image and a second stream that transmits the monitoring image. Stream transmission means, and further comprising transmission means for transmitting at least one of the acquired image and the monitoring image.

請求項７に記載の発明は、請求項１乃至５の何れか一つに記載の構成に加え、前記監視用画像生成手段は、前記取得画像と前記対象物画像とを含む監視用画像とを並べて配置した配置画像を生成し、前記取得画像、前記監視用画像、及び前記配置画像のうち少なくとも何れか一つを送信する送信手段を備えることを特徴とする。 According to a seventh aspect of the present invention, in addition to the configuration according to any one of the first to fifth aspects, the monitoring image generation means includes a monitoring image including the acquired image and the object image. The image processing apparatus includes a transmission unit that generates an arrangement image arranged side by side and transmits at least one of the acquired image, the monitoring image, and the arrangement image.

請求項８に記載の発明は、請求項１乃至５の何れか一つに記載の構成に加え、前前記監視用画像生成手段は、前記取得画像上に対象物画像を含む監視用画像を重ね合わせて配置した重ね合わせ画像を生成し、前記取得画像、前記監視用画像、及び前記重ね合わせ画像のうち少なくとも何れか一つを送信する送信手段を備えることを特徴とする。 According to an eighth aspect of the present invention, in addition to the configuration according to any one of the first to fifth aspects, the monitoring image generation unit superimposes a monitoring image including an object image on the acquired image. The image processing apparatus includes a transmission unit that generates a superimposed image arranged together and transmits at least one of the acquired image, the monitoring image, and the superimposed image.

請求項９に記載の発明は、コンピュータによって実行される画像処理方法であって、画像を取得する処理と、前記画像を取得する処理によって取得された取得画像に含まれる、対象物の対象物画像を抽出する処理と、前記対象物の対象物画像を抽出する処理によって抽出された１以上の対象物画像を配置した監視用画像を生成する処理と、複数の取得画像に同一の対象物の対象物画像が含まれるとき、前記各取得画像を用いて生成される各監視用画像に含まれる同一の位置に、前記同一の対象物の対象物画像を配置する処理とを備えたことを特徴とする。 The invention according to claim 9 is an image processing method executed by a computer, the object image of the object included in the acquired image acquired by the process of acquiring the image and the process of acquiring the image A process for generating a monitoring image in which one or more target images extracted by the process for extracting a target object image of the target object are arranged, and a target of the same target object in a plurality of acquired images And a process of arranging the object image of the same object at the same position included in each monitoring image generated using each acquired image when the object image is included. To do.

請求項１０に記載の発明は、画像処理プログラムであって、画像を取得する処理と、前記画像を取得する処理によって取得された取得画像に含まれる、対象物の対象物画像を抽出する処理と、前記対象物の対象物画像を抽出する処理によって抽出された１以上の対象物画像を配置した監視用画像を生成する処理と、複数の取得画像に同一の対象物の対象物画像が含まれるとき、前記各取得画像を用いて生成される各監視用画像に含まれる同一の位置に、前記同一の対象物の対象物画像を配置する処理とをコンピュータに実行させることを特徴とする。 The invention according to claim 10 is an image processing program, a process of acquiring an image, and a process of extracting a target object image included in the acquired image acquired by the process of acquiring the image. , A process of generating a monitoring image in which one or more object images extracted by the process of extracting the object image of the object are arranged, and object images of the same object are included in a plurality of acquired images In this case, the computer is caused to execute a process of arranging the object image of the same object at the same position included in each monitoring image generated using each acquired image.

請求項１、請求項９、請求項１０に記載の発明によれば、取得画像から抽出された対象物画像が、配置された監視用画像が生成されることにより、監視用画像には、取得画像中の監視対象である対象物が表示される。そして、監視用画像における対象物の位置は、取得画像中の対象物の位置などに依存しない態様とすることができるので、時間の経過にともなう取得画像中の対象物の位置などの変化の影響を抑止した監視用画像を生成できる。これにより、監視対象である対象物の監視を行いやすい画像を生成することが可能となる。 According to the first, ninth, and tenth aspects of the present invention, the monitoring image is generated by generating the monitoring image in which the object image extracted from the acquired image is arranged. An object to be monitored in the image is displayed. Since the position of the object in the monitoring image can be made independent of the position of the object in the acquired image, etc., the influence of changes in the position of the object in the acquired image over time Can be generated. As a result, it is possible to generate an image that makes it easy to monitor the target object.

請求項１、請求項９、請求項１０に記載の発明によれば、複数の取得画像に同一の画像が存在する場合、各取得画像を用いて生成される監視用画像に含まれる同一の位置に同一の対象物の対象物画像を配置することにより、対象物の移動等にかかわらず、複数の監視用画像の同一の位置に同一の対象物の対象物画像を配置できるので、時間の経過に伴って対象物が移動したり複数の対象物の位置関係が入れ替わったりしても、監視用画像の同一の位置に同一の対象物を表示させ続けることができる。これにより、時間の経過に伴って監視用画像において同一の対象物の表示位置が頻繁に入れ替わるような事態を抑止して、監視対象である対象物を監視しやすい監視用画像を提供できる。 According to the first, ninth, and tenth inventions, when the same image exists in a plurality of acquired images, the same position included in the monitoring image generated using each acquired image By locating the same object image on the same object, the object image of the same object can be placed at the same position in the plurality of monitoring images regardless of the movement of the object. Accordingly, even if the object moves or the positional relationship of the plurality of objects is switched, the same object can be continuously displayed at the same position in the monitoring image. Accordingly, it is possible to prevent a situation in which the display position of the same object frequently changes in the monitoring image as time passes, and to provide a monitoring image that can easily monitor the object that is the monitoring target.

請求項２に記載の発明によれば、取得画像から抽出した特徴量情報を用いて、複数の対象物画像が同一の画像か否かを判定することにより、対象物が同一か否かを高い精度で判別できて、高い精度で監視用画像の同一の位置に同一の対象物を配置することができる。これにより、監視対象である対象物をより監視しやすい監視用画像を提供できる。 According to the second aspect of the present invention, it is determined whether or not the objects are the same by determining whether or not the plurality of object images are the same image using the feature amount information extracted from the acquired image. It is possible to discriminate with accuracy, and it is possible to arrange the same object at the same position of the monitoring image with high accuracy. As a result, it is possible to provide a monitoring image that makes it easier to monitor an object that is a monitoring target.

請求項３に記載の発明によれば、特徴量情報の抽出、及び複数の対象物画像が同一の画像か否かの判定のうち少なくとも一方にニューラルネットワークを用いることにより、特徴量情報に基づく同一の画像か否かの判別を、ニューラルネットワークの学習によって高い精度で行わせることができて、高い精度で監視用画像の同一位置に同一の対象物を配置することができる。これにより、監視対象である対象物をより監視しやすい監視用画像を提供できる。 According to the third aspect of the present invention, by using a neural network for at least one of extraction of feature amount information and determination of whether or not a plurality of object images are the same image, the same based on the feature amount information It is possible to determine whether or not the image is an image of the monitoring image with high accuracy by learning of the neural network, and it is possible to place the same object at the same position of the monitoring image with high accuracy. As a result, it is possible to provide a monitoring image that makes it easier to monitor an object that is a monitoring target.

請求項４に記載の発明によれば、対象物画像を含む領域画像を抽出して監視用画像とすることができるので、対象物の監視に用いる対象物画像の抽出を簡易な手順で行うことができ、処理の効率化を図ることができる。 According to the fourth aspect of the present invention, the region image including the object image can be extracted and used as the monitoring image. Therefore, the object image used for monitoring the object can be extracted with a simple procedure. And the processing efficiency can be improved.

請求項５に記載の発明によれば、画像縮小手段によって、取得画像、又は、取得画像及び監視用画像を、取得画像における対象物画像を含む領域の方が監視用画像における対象物画像を含む領域よりも高い縮小率となるように縮小することにより、撮影手段が撮影した全体の画像のデータ量を削減しつつ、対象物の監視に必要な画像を高い解像度の画像データとして生成できる。これにより、画像データ全体のデータ量を削減してネットワーク負荷を軽減させると共に、監視対象である対象物を容易に識別できる解像度の画像を生成できる。 According to the invention described in claim 5, the acquired image or the acquired image and the monitoring image are included by the image reducing means, and the region including the target image in the acquired image includes the target image in the monitoring image. By reducing the image so that the reduction ratio is higher than that of the area, it is possible to generate an image necessary for monitoring the object as high-resolution image data while reducing the data amount of the entire image captured by the imaging unit. Accordingly, it is possible to reduce the data amount of the entire image data to reduce the network load, and to generate an image having a resolution that can easily identify the object to be monitored.

請求項６に記載の発明によれば、第一のストリーム送信手段によって取得画像を送信し、第二のストリーム送信手段によって監視用画像を送信することにより、取得画像と監視用画像とを、一方の画像の通信状態が他方の画像の通信状態に影響を及ぼすことを抑止しつつ、ストリーム型のデータ通信による良好な通信状態でリアルタイムに送信することができる。 According to the sixth aspect of the present invention, the acquired image is transmitted by the first stream transmitting unit, and the monitoring image is transmitted by the second stream transmitting unit. It is possible to transmit in real time in a good communication state by the stream type data communication while suppressing the communication state of one image from affecting the communication state of the other image.

請求項７に記載の発明によれば、取得画像と対象物画像とを含む監視用画像とを並べて配置した配置画像を生成し、取得画像、監視用画像、及び配置画像のうち少なくとも一つを送信することにより、撮影手段が撮影した全体の画像と対象物の監視に必要な画像とが分離しない状態で一緒に送信できる。これにより、撮影手段が撮影した全体の画像とその全体の画像に含まれる監視対象である対象物とを容易に対照できる監視用画像を提供できる。 According to the seventh aspect of the present invention, an arrangement image in which the acquired image and the monitoring image including the object image are arranged and arranged is generated, and at least one of the acquired image, the monitoring image, and the arrangement image is generated. By transmitting, the entire image captured by the imaging unit and the image necessary for monitoring the object can be transmitted together without being separated. Thereby, it is possible to provide a monitoring image that can easily contrast the entire image captured by the imaging unit with the object that is the monitoring target included in the entire image.

請求項８に記載の発明によれば、取得画像上に対象物画像を含む監視用画像を重ね合わせて配置した重ね合わせ画像を生成し、取得画像、監視用画像、及び重ね合わせ画像のうち少なくとも一つを送信することにより、撮影手段が撮影した全体の画像と対象物の監視に必要な画像とを対照しやすい状態で配置したものを送信できる。これにより、撮影手段が撮影した全体の画像とその全体の画像に含まれる監視対象である対象物とを容易に対照できる監視用画像を提供できる。 According to the eighth aspect of the present invention, a superimposed image is generated by superposing and arranging the monitoring image including the object image on the acquired image, and at least the acquired image, the monitoring image, and the superimposed image are generated. By transmitting one, it is possible to transmit the entire image captured by the imaging unit and the image arranged in a state in which it is easy to contrast the image necessary for monitoring the object. Thereby, it is possible to provide a monitoring image that can easily contrast the entire image captured by the imaging unit with the target object that is included in the entire image.

この実施の形態の画像処理装置を示す機能ブロック図である。It is a functional block diagram which shows the image processing apparatus of this embodiment. （ａ）同上画像処理装置における、特定の時点の取得画像から抽出した領域画像を監視用画像に配置した状態を示す模式図であり、（ｂ）前記（ａ）よりも後の時点で取得画像から抽出した領域画像を監視用画像に配置した状態を示す模式図である。(A) It is a schematic diagram which shows the state which has arrange | positioned the area | region image extracted from the acquired image of the specific time in the image for monitoring in an image processing apparatus same as the above, (b) An acquired image at the time after the said (a) It is a schematic diagram which shows the state which has arrange | positioned the area | region image extracted from the image for monitoring. 同上画像処理装置における処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in an image processing apparatus same as the above. 同上画像処理装置において、取得画像から領域画像を抽出する状態、及び、抽出した領域画像に基づいて監視用画像を生成し、生成した画像をストリーム型データ通信で監視装置に送信する状態の一例を模式的に示す図である。An example of a state in which an area image is extracted from an acquired image and a monitoring image is generated based on the extracted area image and the generated image is transmitted to the monitoring apparatus by stream type data communication in the image processing apparatus same as above. It is a figure shown typically. 同上画像処理装置において、取得画像から領域画像を抽出する状態、及び、抽出した領域画像に基づいて監視用画像を生成し、生成した画像をストリーム型データ通信で監視装置に送信する状態の他の例を模式的に示す図である。In the image processing apparatus as described above, another state in which an area image is extracted from an acquired image, and another state in which a monitoring image is generated based on the extracted area image and the generated image is transmitted to the monitoring apparatus by stream type data communication It is a figure which shows an example typically. 同上画像処理装置において、同上画像処理装置において、取得画像から領域画像を抽出する状態、及び、抽出した領域画像に基づいて監視用画像を生成し、生成した画像をストリーム型データ通信で監視装置に送信する状態のさらに他の例を模式的に示す図である。In the image processing apparatus, the image processing apparatus generates a monitoring image based on the state in which the area image is extracted from the acquired image and the extracted area image, and the generated image is transmitted to the monitoring apparatus by stream type data communication. It is a figure which shows typically the other example of the state which transmits.

図１乃至図６にこの発明の実施の形態を示す。 1 to 6 show an embodiment of the present invention.

［基本構成］
図１は、この実施の形態に係る画像処理システム１Ａの全体構造を示す機能ブロック図である。 [Basic configuration]
FIG. 1 is a functional block diagram showing the overall structure of an image processing system 1A according to this embodiment.

［画像処理システム］
図１に示す画像処理システム１Ａは、「撮影手段」としての監視カメラ１、この実施の形態に係る画像処理装置２、「端末」としての監視装置３を備えている。監視カメラ１と画像処理装置２、画像処理装置２と監視装置３とはネットワーク４，５によって接続されている。 [Image processing system]
An image processing system 1A shown in FIG. 1 includes a monitoring camera 1 as “imaging means”, an image processing device 2 according to this embodiment, and a monitoring device 3 as “terminal”. The monitoring camera 1 and the image processing apparatus 2, and the image processing apparatus 2 and the monitoring apparatus 3 are connected by networks 4 and 5.

［監視カメラ］
監視カメラ１は、遠隔地の監視に用いられるカメラである。監視カメラ１は、ＣＣＤイメージセンサやＣＭＯＳイメージセンサ等の撮像素子６を備えている。監視カメラ１は、監視を行う所定の空間、たとえば所定の室内や所定の屋外の地域などに設置される。監視カメラ１の撮像素子６は、監視カメラ１のレンズ（図示せず）から撮影された映像を、所定の画素数、たとえば１フレームあたり７６８０（横）×４３２０（縦）の画像（いわゆる８Ｋ画像）、１フレームあたり（横）４０９６×（縦）２０４８画素の画像（いわゆる４Ｋ画像）、あるいは１フレームあたり１９２０×１０８０画素（いわゆる２Ｋ画像）等で、所定のフレーム数、たとえば１秒あたり３０フレームの動画として撮像する。ただし、監視カメラ１によって撮像される画像の画素の大きさや単位時間あたりのフレーム数はどのようなものでもよい。なお、この実施の形態では、監視カメラ１は８Ｋ画像の動画を撮影するものとして以下説明する。 [Surveillance camera]
The monitoring camera 1 is a camera used for remote monitoring. The surveillance camera 1 includes an image sensor 6 such as a CCD image sensor or a CMOS image sensor. The monitoring camera 1 is installed in a predetermined space for monitoring, for example, a predetermined room or a predetermined outdoor area. The image sensor 6 of the monitoring camera 1 converts an image taken from a lens (not shown) of the monitoring camera 1 into an image (so-called 8K image) of a predetermined number of pixels, for example, 7680 (horizontal) × 4320 (vertical) per frame. ) An image of 4096 × (vertical) 2048 pixels per frame (horizontal) (so-called 4K image) or 1920 × 1080 pixels (so-called 2K image) per frame, etc., and a predetermined number of frames, for example, 30 frames per second As a video. However, the pixel size of the image captured by the monitoring camera 1 and the number of frames per unit time may be any. In this embodiment, the following description will be made assuming that the surveillance camera 1 captures an 8K moving image.

監視カメラ１で撮像された画像は画像データとして画像処理装置２に送信される。 An image captured by the monitoring camera 1 is transmitted to the image processing apparatus 2 as image data.

［画像処理装置］
画像処理装置２は、監視カメラ１で撮影した画像を用いて監視画像（後述）を生成して監視装置３に送る機能を有する。画像処理装置２は、例えば、監視カメラ１内に設けられていてもよいし、監視カメラ１とは別に設けられていてもよい。画像処理装置２は、ＣＰＵを用いた処理やハードウェアロジックを用いた処理等により、所定の画像処理を行う。 [Image processing device]
The image processing device 2 has a function of generating a monitoring image (described later) using an image captured by the monitoring camera 1 and sending it to the monitoring device 3. For example, the image processing apparatus 2 may be provided in the monitoring camera 1 or may be provided separately from the monitoring camera 1. The image processing apparatus 2 performs predetermined image processing by processing using a CPU, processing using hardware logic, or the like.

図１の機能ブロック図に示すとおり、画像処理装置２は、「画像取得手段」としての画像取得部７、「対象物画像抽出手段」及び「対象物画像検出手段」としての顔画像検出部８、「対象物画像抽出手段」及び「領域画像抽出手段」としての顔画像抽出部９、「監視用画像生成手段」としての監視用画像生成部１０、「画像圧縮手段」としての画像縮小部１１、エンコーダ１２、「送信手段」としての送信部１３を備えている。また、監視用画像生成部１０は、「特徴量情報抽出手段」としての特徴量抽出部１４、「同一画像判定配置手段」としての判定・並べ替え部１５を備えている。これらの構成のうちの一部又は全部は、ハードウェアとして構成されていてもよいし、コンピュータで演算することで機能するプログラムとして構成されていてもよい。プログラムは、コンピュータ読み取り可能な非一時的な記録媒体に記憶されてもよい。記録媒体は、例えば、ＳＤメモリーカード（ＳＤＭｅｍｏｒｙＣａｒｄ）、ＦＤ（ＦｌｏｐｐｙＤｉｓｋ）、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、ＢＤ（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｋ：）、およびフラッシュメモリなどの非一時的な記録媒体である。 As shown in the functional block diagram of FIG. 1, the image processing apparatus 2 includes an image acquisition unit 7 as an “image acquisition unit”, a face image detection unit 8 as an “object image extraction unit”, and an “object image detection unit”. , A face image extracting unit 9 as “object image extracting unit” and “region image extracting unit”, a monitoring image generating unit 10 as “monitoring image generating unit”, and an image reducing unit 11 as “image compressing unit” , An encoder 12, and a transmission unit 13 as "transmission means". The monitoring image generation unit 10 includes a feature amount extraction unit 14 as “feature amount information extraction unit” and a determination / sorting unit 15 as “same image determination / arrangement unit”. A part or all of these configurations may be configured as hardware, or may be configured as a program that functions by computing with a computer. The program may be stored in a computer-readable non-transitory recording medium. Examples of the recording medium include an SD memory card (SD Memory Card), an FD (Floppy Disk), a CD (Compact Disc), a DVD (Digital Versatile Disk), a BD (Blu-ray (registered trademark) Disk :), and a flash memory. It is a non-temporary recording medium.

画像取得部７は、監視カメラ１が撮影して送信した画像を取得する。取得された画像は、画像処理装置２において、画像取得部７によって取得された画像としての取得画像（以下単に「取得画像」と称する。）として扱われる。取得画像は、監視カメラ１から送られたフレームごとの画像を単位に生成される。 The image acquisition unit 7 acquires an image captured and transmitted by the monitoring camera 1. The acquired image is handled as an acquired image (hereinafter simply referred to as “acquired image”) as an image acquired by the image acquisition unit 7 in the image processing apparatus 2. The acquired image is generated for each frame image sent from the monitoring camera 1.

顔画像検出部８は、取得画像に含まれる、所定の対象物が撮影された画像である対象物画像を検出する。 The face image detection unit 8 detects an object image that is an image of a predetermined object included in the acquired image.

顔画像検出部８は、物体検出用のニューラルネットワークのアルゴリズム、たとえばＹＯＬＯ（You Only Look Once）やＳＳＤ（Single Shot Multibox Detector）を用いた物体検出器としての機能を備え、取得画像内にある対象物を検出する。ここで、顔画像検出部８において、ニューラルネットワークを使用せずに、OpenCV等のアルゴリズムを使用することもできる。 The face image detection unit 8 has a function as an object detector using an algorithm of a neural network for object detection, for example, YOLO (You Only Look Once) or SSD (Single Shot Multibox Detector), and is included in an acquired image. Detect objects. Here, the face image detection unit 8 can use an algorithm such as OpenCV without using a neural network.

なお、この実施の形態において、図１に示すように、「対象物」は人物１００の顔１０１であり、「対象物画像」は人物１００の顔１０１を撮影した顔画像である。また、この実施の形態において、顔画像検出部８は、ニューラルネットワークを用いて取得画像から特徴量を抽出することで、取得画像の中から顔画像を検出する。また、この実施の形態において、顔画像検出部８による顔画像の検出は、取得画像中の人物１００の顔位置、つまり人物１００の顔１０１が写っている位置を検出するものであるが、ここでの顔画像の検出は、取得画像中の顔画像を検出するものであればどのようなものでもよく、たとえば取得画像中にどの人物１００の顔１０１がどの向きで写っている、というような情報が取得されてもよい。 In this embodiment, as shown in FIG. 1, “object” is a face 101 of the person 100, and “object image” is a face image obtained by photographing the face 101 of the person 100. In this embodiment, the face image detection unit 8 detects a face image from the acquired image by extracting a feature amount from the acquired image using a neural network. In this embodiment, the detection of the face image by the face image detection unit 8 detects the face position of the person 100 in the acquired image, that is, the position where the face 101 of the person 100 is shown. The detection of the face image in the method may be any method as long as it detects the face image in the acquired image. For example, the face 101 of which person 100 appears in which direction in the acquired image. Information may be acquired.

顔画像抽出部９は、顔画像検出部８によって検出された、顔画像を含む領域としての領域画像（以下単に「領域画像」と称する。）を抽出する。 The face image extraction unit 9 extracts a region image (hereinafter simply referred to as “region image”) as a region including the face image detected by the face image detection unit 8.

監視用画像生成部１０は、顔画像抽出部９によって抽出された一又は複数の領域画像を所定の配置領域に配置した監視用画像（以下単に「監視用画像」と称する。詳細は後述する。）を生成する。 The monitoring image generation unit 10 is referred to as a monitoring image (hereinafter simply referred to as a “monitoring image”) in which one or a plurality of area images extracted by the face image extraction unit 9 are arranged in a predetermined arrangement area. Details will be described later. ) Is generated.

画像縮小部１１は、取得画像の画素数を削減して取得画像を縮小する。この実施の形態においては、画像縮小部１１においては、取得画像と領域画像のうち、取得画像のみを縮小するが、取得画像と領域画像の双方を縮小してもよい。 The image reduction unit 11 reduces the number of pixels of the acquired image to reduce the acquired image. In this embodiment, the image reduction unit 11 reduces only the acquired image among the acquired image and the region image, but both the acquired image and the region image may be reduced.

エンコーダ１２は、取得画像と監視用画像を符号化（エンコード）する。符号化には、例えばＨ．２６４やＨ．２６５のような画像圧縮プロトコルを用いる。 The encoder 12 encodes (encodes) the acquired image and the monitoring image. For encoding, for example, H.264. H.264 and H.264. An image compression protocol such as H.265 is used.

送信部１３は符号化した取得画像と監視用画像を監視装置３に送信する。この実施の形態において、送信部１３は、所定のプロトコル、例えばＨ．２６４やＨ．２６５で画像を符号化して、ストリーム型のデータ通信を行う。また、送信部１３は、複数系統のストリーム通信を行うことができ、取得画像と監視用画像とを２ストリーム化して別々の系統でストリーム型のデータ通信を行い、それぞれの画像を監視装置３に送信する。ただし、送信部１３は１系統のみのストリーム型のデータ通信のみにより、取得画像と監視用画像を同一のデータストリームとして通信を行うものでもよい。 The transmission unit 13 transmits the encoded acquired image and the monitoring image to the monitoring device 3. In this embodiment, the transmission unit 13 is a predetermined protocol such as H.264. H.264 and H.264. An image is encoded at 265 to perform stream-type data communication. Further, the transmission unit 13 can perform stream communication of a plurality of systems, convert the acquired image and the monitoring image into two streams, perform stream-type data communication with separate systems, and send each image to the monitoring device 3. Send. However, the transmission unit 13 may perform communication by using the acquired image and the monitoring image as the same data stream only by stream data communication of only one system.

特徴量抽出部１４は、取得画像から人物１００の顔１０１の識別に必要な特徴量の情報である特徴量情報を抽出する。この特徴量情報については後述する。 The feature amount extraction unit 14 extracts feature amount information, which is information of a feature amount necessary for identifying the face 101 of the person 100, from the acquired image. This feature amount information will be described later.

なお、この実施の形態において、特徴量抽出部１４は、特徴検出用のニューラルネットワークのアルゴリズム、たとえばVGG16やVGG-Faceを用いた特徴検出機能を備え、取得画像内の顔画像から特徴量情報を検出する。 In this embodiment, the feature amount extraction unit 14 has a feature detection function using a neural network algorithm for feature detection, for example, VGG16 or VGG-Face, and extracts feature amount information from the face image in the acquired image. To detect.

判定・並べ替え部１５は、異なる時間に取得された複数の取得画像の中に同一の人物１００の顔１０１が撮影された顔画像である同一の画像（以下単に「同一画像」と称する。）が存在する場合、それぞれの取得画像に基づいて生成されるそれぞれの監視用画像の同一の位置（以下単に「同一位置」と称する。）に同一画像を配置する。この実施の形態において、判定・並べ替え部１５は、特徴量抽出部１４において抽出された特徴量を対比することで同一画像か否かを判定する。 The determination / sorting unit 15 is the same image (hereinafter, simply referred to as “same image”) that is a face image obtained by photographing the face 101 of the same person 100 among a plurality of acquired images acquired at different times. Is present, the same image is arranged at the same position (hereinafter simply referred to as “the same position”) of the respective monitoring images generated based on the respective acquired images. In this embodiment, the determination / sorting unit 15 determines whether or not the images are the same by comparing the feature amounts extracted by the feature amount extraction unit 14.

［監視装置］
監視装置３は、パーソナルコンピュータ、スマートフォン、タブレット等、ネットワーク５を介して画像処理装置２とデータや信号の送受信が可能な端末である。監視装置３は、マウス、キーボード等の操作部３１と、ＬＣＤディスプレイ等の表示部３２とを有する。なお、操作部３１と表示部３２とは、タッチパネル式ディスプレイのように一体になったものであってもよい。 [Monitoring device]
The monitoring device 3 is a terminal that can transmit and receive data and signals to and from the image processing device 2 via the network 5 such as a personal computer, a smartphone, and a tablet. The monitoring device 3 includes an operation unit 31 such as a mouse and a keyboard, and a display unit 32 such as an LCD display. The operation unit 31 and the display unit 32 may be integrated like a touch panel display.

この実施の形態における監視装置３は、画像処理装置２から送られた画像を複号（デコード）して表示部３２に監視用画像を画像表示し、この監視用画像を監視者に視認させることにより監視を行わせるための装置として構成されている。ただし、監視装置３にニューラルネットワーク等を用いた画像認識による人物判定機能を設け、監視装置３が自動的に監視カメラ１で撮影した画像に写った人物１００の識別と監視とを行う構成であってもよい。 The monitoring device 3 in this embodiment decodes (decodes) the image sent from the image processing device 2, displays the monitoring image on the display unit 32, and causes the monitoring person to visually recognize the monitoring image. It is comprised as an apparatus for performing monitoring by. However, the monitoring device 3 is provided with a person determination function based on image recognition using a neural network or the like, and the monitoring device 3 automatically identifies and monitors the person 100 in the image taken by the monitoring camera 1. May be.

［ネットワーク］
ネットワーク４は監視カメラ１と画像処理装置２とを有線接続する通信媒体である。ネットワーク５は画像処理装置２と監視装置３とを有線接続又は無線接続する通信媒体である。ネットワーク４，５は、画像データと画像処理システム１Ａを制御する信号とを送受信できるものならば、どのような通信形式や通信媒体であってもよい。 [network]
The network 4 is a communication medium for connecting the monitoring camera 1 and the image processing apparatus 2 by wire. The network 5 is a communication medium for connecting the image processing apparatus 2 and the monitoring apparatus 3 by wire connection or wireless connection. As long as the networks 4 and 5 can transmit and receive image data and a signal for controlling the image processing system 1A, they may be any communication format or communication medium.

［領域画像の抽出］
この実施の形態において、特徴量抽出部１４が領域画像から抽出する特徴量情報は、画像に含まれるベクトルの情報（つまり、画像の情報としては、「目」「鼻」等の顔１０１のパーツの位置、大きさ等の情報などよりも抽象的な、画像の特徴を表す情報）として取得される。そして、特徴量情報は、人物１００の顔１０１の識別に用いる特徴量の情報であり、具体的には、複数の領域画像に含まれる人物１００の顔１０１が同一人物の顔か否かを判定するために用いられる情報である。特徴量抽出部１４は、備えているニューラルネットワークの特徴量検出機能に取得画像を入力し、それぞれの領域画像から人物１００の顔１０１の識別に用いる特徴量情報を抽出する。そして、判定・並べ替え部１５においては、この特徴量情報を用いて、それぞれの領域画像に含まれる画像が同一画像か否か、即ち、同一人物の顔１０１か否かを判定する。 [Extract region image]
In this embodiment, the feature amount information extracted from the region image by the feature amount extraction unit 14 is vector information included in the image (that is, the image information includes parts of the face 101 such as “eyes” and “nose”). It is acquired as information representing image characteristics that is more abstract than information such as the position and size of the image). The feature amount information is feature amount information used for identifying the face 101 of the person 100. Specifically, it is determined whether or not the face 101 of the person 100 included in the plurality of region images is the face of the same person. It is information used to The feature amount extraction unit 14 inputs an acquired image to the feature amount detection function of the neural network provided, and extracts feature amount information used for identifying the face 101 of the person 100 from each region image. Then, the determination / rearrangement unit 15 determines whether the images included in the respective region images are the same image, that is, whether the face 101 is the same person, using the feature amount information.

具体的には、特徴量抽出部１４は、例えばＶＧＧ１６の４０９６次元の特徴ベクトルなど、画像の特徴を適切に捉えられるニューラルネットワーク等を使用して、領域画像に含まれる顔が同一人物の顔１０１か否かを判定することが考えられる。ニューラルネットワークの特徴量を用いる場合には、例えば、前フレーム（の取得画像から抽出された領域画像）の顔画像と現フレーム（の取得画像から抽出された領域画像）の顔画像の４０９６次元の特徴ベクトルを計算し、それらの内積の値、もしくはベクトル距離を見ることにより、比較している顔画像が同一人物の顔画像か否かを判定することが考えられる。 Specifically, the feature quantity extraction unit 14 uses a neural network or the like that appropriately captures image features such as a 4096-dimensional feature vector of VGG16, for example, and the face included in the region image is the face 101 of the same person. It is conceivable to determine whether or not. When using the feature amount of the neural network, for example, the face image of the previous frame (region image extracted from the acquired image) and the face image of the current frame (region image extracted from the acquired image) are 4096-dimensional. It may be possible to determine whether or not the face images being compared are face images of the same person by calculating feature vectors and looking at the value of the inner product or vector distance.

なお、判定・並べ替え部１５における同一人物の顔画像か否かの判定には、特徴量抽出部１４で抽出された特徴量情報を用いずに、画像間の二乗誤差を使用することも考えられる。 Note that it is also possible to use a square error between images without using the feature amount information extracted by the feature amount extraction unit 14 in the determination / rearrangement unit 15 for determining whether or not they are face images of the same person. It is done.

そして、判定・並べ変え部１５は、特定の取得画像から抽出した領域画像と、その次に取得された取得画像の領域画像とが同一人物の顔画像を含む同一画像である可能性が高い場合は、前の領域画像を用いて生成した監視用画像に含まれる同一画像と同じ配置に、次の領域画像を用いて生成した監視用画像の同一画像を表示するようにする。 Then, the determination / sorting unit 15 has a high possibility that the area image extracted from the specific acquired image and the area image of the acquired image acquired next are the same image including the face image of the same person. The same image of the monitoring image generated using the next region image is displayed in the same arrangement as the same image included in the monitoring image generated using the previous region image.

［領域画像の監視用画像への配置］
図２は、この実施の形態の画像処理システム１Ａの画像処理装置２における、領域画像を監視用画像に配置する際の処理の原理を模式的に示す図である。 [Placement of area image on monitoring image]
FIG. 2 is a diagram schematically showing the principle of processing when an area image is arranged on a monitoring image in the image processing apparatus 2 of the image processing system 1A of this embodiment.

図２の（ａ）に示すように、特定の時点の取得画像２１_１に人物１００_１の顔１０１_１、人物１００_２の顔１０１_２、人物１００_３の顔１０１_３が写った状態で、この取得画像２１から、人物１００_１の顔１０１_１の領域画像２３_１、人物１００_２の顔１０１_２の領域画像２３_２、人物１００_３の顔１０１_３の領域画像２３_３がそれぞれ生成されて監視用画像２２が生成された場合を考える。この場合、それぞれの人物１００が自由に空間内を移動すると、監視カメラ１で撮影した動画像のそれぞれのフレームに撮像された画像のそれぞれの人物１００の顔１０１は、時間の経過と共に移動したり大きさが変化したりする。図２の（ｂ）は、図２の（ａ）に示した取得画像２１_１が取得された後に取得画像２１_２が取得された状態を模式的に示したものである。この状態においては、人物１００_１と人物１００_３の左右方向の位置が入れ替わり、人物１００_２に替わって人物１００_４が写っている。実際の空間内の人物１００の撮影では、このように、時間の経過と共に取得画像２１中の人物１００の顔１０１同士の位置関係が入れ替わってしまったり、フレームごとに人物１００が現れたり、去ったりすることが頻繁に起こる。 As shown in (a) of FIG. 2, the face 101 ₁ of the person 100 ₁ to obtain an image 21 ₁ of a particular point in time, the face 101 ₂ of the person 100 _2, with the face 101 ₃ was captured person 100 _3, this from the obtained image 21, the area image 23 ₁ of the face 101 ₁ person 100 _1, a person 100 ₂ face 101 _{and second} area image 23 _2, surveillance area image 23 ₃ of the face 101 ₃ persons 100 ₃ is generated respectively Consider the case where an image 22 is generated. In this case, when each person 100 freely moves in the space, the face 101 of each person 100 of the image captured in each frame of the moving image captured by the surveillance camera 1 moves with time. The size changes. (B) in FIG. 2 is a state in which the acquired image 21 ₂ has been obtained after obtaining the image 21 ₁ is acquired as shown in FIG. 2 (a) shows schematically. In this state, the position in the lateral direction of the person 100 ₁ and the person 100 ₃ are swapped, is reflected a person 100 ₄ on behalf of the person 100 _2. In the actual photographing of the person 100 in the space, the positional relationship between the faces 101 of the person 100 in the acquired image 21 is changed over time as described above, or the person 100 appears or leaves every frame. It happens frequently.

このような状況では、顔画像検出部８における顔検出の順番は、ニューラルネットワークの特性により一定ではなくなる。そのため、顔画像検出部８において顔画像が検出された順番に、複数の領域画像２３を監視用画像２２の配置領域に配置していくと、顔１０１の表示順が、監視用画像２２ごとにめまぐるしく変わってしまい、監視装置３の表示部３２を監視する監視者等、監視対象である顔画像の視認性等が落ちることになる。 In such a situation, the order of face detection in the face image detection unit 8 is not constant due to the characteristics of the neural network. Therefore, when the plurality of area images 23 are arranged in the arrangement area of the monitoring image 22 in the order in which the face images are detected by the face image detection unit 8, the display order of the face 101 is changed for each monitoring image 22. It changes rapidly, and the visibility etc. of the face image which is a monitoring object, such as the monitoring person who monitors the display part 32 of the monitoring apparatus 3, will fall.

そこで、この実施の形態においては、特徴量抽出部１４の特徴量抽出の結果、複数の取得画像、たとえば特定の取得画像２１から抽出された領域画像２３と、その取得画像２１を生成した直後に生成された取得画像２１から抽出された領域画像２３とに、同一人物が写った同一画像が存在する場合には、判定・並べ替え部１５は、その同一画像を、それぞれの監視用画像２２の同一位置に配置するようにする。 Therefore, in this embodiment, as a result of the feature amount extraction by the feature amount extraction unit 14, a plurality of acquired images, for example, region images 23 extracted from a specific acquired image 21 and immediately after the acquired images 21 are generated. When the same image in which the same person appears is included in the region image 23 extracted from the generated acquired image 21, the determination / sorting unit 15 converts the same image into each of the monitoring images 22. Arrange them at the same position.

例えば図２の（ｂ）の監視用画像２２においては、図２の（ａ）の監視用画像２２と同様に、一番左の領域画像２３_１に人物１００_１の顔１０１_１が、一番右の領域画像２３_３に人物１００_３の顔１０１_３が、それぞれ表示された状態が維持されている。一方、図２の（ｂ）においては、監視用画像２２の真ん中の領域画像２３_４には、（図２の（ａ）の取得画像２１_１から図２の（ｂ）の取得画像２１_２までの時間経過の過程で）画像に表示されなくなった人物１００_２の顔１０１_２に替えて、新たに表示された人物１００_４の顔１０１_４が表示されている。このように構成することで、同一人物の顔１０１（例えば図２の（ａ）（ｂ）における、人物１００_１の顔１０１_１、人物１００_３の顔１０１_３）の表示された監視用画像２２（例えば図２（ａ）の監視用画像２２_１、図２（ｂ）の監視用画像２２_２）上での位置が、時間の経過と共にめまぐるしく変わる事態が抑止される。 For example, in the monitoring image 22 of FIG. 2 (b), like the monitor image 22 of (a) 2, the face 101 ₁ of the person 100 ₁ to the leftmost area image 23 _1, most face 101 ₃ persons 100 ₃ to the right of the area image 23 _3, while being displayed respectively is maintained. On the other hand, in FIG. 2 (b), the area image 23 ₄ in the middle of the monitoring image 22 until the acquired image 21 ₂ of (in FIG. 2 from the acquired image 21 ₁ of FIG. 2 (a) (b) instead of the person 100 ₂ face 101 ₂ that is no longer displayed on the course in) the image of the time course of, and newly displayed person 100 ₄ face 101 ₄ appears. With this configuration, (for example in the FIG. 2 (a) (b), the face 101 ₁ of the person 100 _1, a person 100 ₃ face 101 ₃₎ the same person's face 101 displayed monitoring images 22 For example, the situation in which the position on the monitoring image 22 ₁ in FIG. 2A and the monitoring image 22 _{2 in} FIG. 2B rapidly changes with time is suppressed.

なお、以下の説明では、監視カメラ１が撮影した動画の隣接するフレーム同士から生成された取得画像２１（つまり、特定の取得画像２１と、その直後に取得された取得画像２１）を、同一画像の有無を判定する対象として説明するが、これに限定されず、同一画像か否かを判定できる態様であればどのような取得画像２１同士の判定をおこなってもよい。たとえば、隣接する３つ以上のフレームからなる取得画像２１に共通して同一人物の顔１０１が写っている場合を同一画像としてもよいし、所定の枚数ごと、例えば１０フレームごとに取得された取得画像２１に共通して同一人物の顔１０１が写っている場合を同一画像としてもよい。 In the following description, an acquired image 21 (that is, a specific acquired image 21 and an acquired image 21 acquired immediately thereafter) generated from adjacent frames of a moving image captured by the monitoring camera 1 is the same image. However, the present invention is not limited to this, and any acquired image 21 may be determined as long as it can be determined whether the images are the same. For example, the same image may be used when the face 101 of the same person is shown in common in the acquired image 21 composed of three or more adjacent frames, or acquired at every predetermined number of frames, for example, every 10 frames. A case where the face 101 of the same person is shown in common with the image 21 may be the same image.

また、この実施の形態の判定・並べ替え部１５は、監視用画像２２の同じ位置に同一画像を表示させるのではなく、人物１００ごとにＩＤを振り、予め設定された人物１００ごとの表示位置にそれぞれの人物１００の写った領域画像２３が表示されるように並べ変えを行って、監視用画像２２を生成してもよい。 In addition, the determination / sorting unit 15 of this embodiment does not display the same image at the same position of the monitoring image 22 but assigns an ID to each person 100 and sets a display position for each person 100 set in advance. The monitoring image 22 may be generated by rearranging so that the area image 23 in which each person 100 is captured is displayed.

［処理手順］
図３は、この実施の形態の画像処理システム１Ａの処理手順を示すフローチャートである。以下、このフローチャートに基づいてこの実施の形態の処理手順を説明する。 [Processing procedure]
FIG. 3 is a flowchart showing a processing procedure of the image processing system 1A according to this embodiment. The processing procedure of this embodiment will be described below based on this flowchart.

まず、画像処理システム１Ａを用いて遠隔監視を行う監視者（以下単に「監視者」と称する。）が画像処理システム１Ａを起動すると、監視カメラ１は所定の空間を動画で撮影する（ステップＳ１）。監視カメラ１は高解像度の画像（ここでは８Ｋの画像）を動画として取得する。監視カメラ１が撮影した画像はネットワーク４を介して画像処理装置２に送られる。 First, when a monitor (hereinafter simply referred to as “monitorer”) who performs remote monitoring using the image processing system 1A activates the image processing system 1A, the monitoring camera 1 captures a predetermined space with a moving image (step S1). ). The surveillance camera 1 acquires a high-resolution image (here, an 8K image) as a moving image. An image taken by the monitoring camera 1 is sent to the image processing apparatus 2 via the network 4.

画像処理装置２においては、画像取得部７が監視カメラ１から送られた画像を取得する（ステップＳ２）。画像取得部７が取得した取得画像２１は、顔画像検出部８に送られる。顔画像検出部８は、取得画像２１を所定のフォーマットに縮小するための処理（例えば画素数を減少させて８Ｋの取得画像２１をより解像度の低い取得画像２１にする処理）を行う。この処理ののち、顔画像検出部８は、備えているニューラルネットワークの顔検出機能に取得画像２１を入力し、取得画像２１中に含まれる顔位置を検出する（ステップＳ３）。 In the image processing device 2, the image acquisition unit 7 acquires the image sent from the monitoring camera 1 (step S2). The acquired image 21 acquired by the image acquisition unit 7 is sent to the face image detection unit 8. The face image detection unit 8 performs a process for reducing the acquired image 21 into a predetermined format (for example, a process of reducing the number of pixels to change the 8K acquired image 21 to the acquired image 21 having a lower resolution). After this processing, the face image detection unit 8 inputs the acquired image 21 to the face detection function of the provided neural network, and detects the face position included in the acquired image 21 (step S3).

図４には、ステップＳ２に顔位置を検出する状態を模式的に示している。具体的には、顔画像検出部８は、取得画像２１中の顔位置の情報として、ｘ，ｙ，ｗ，ｈの４変数を取得する。なお、ここで、ｘ，ｙ，ｗ，ｈの４変数は、例えば、（ｘ，ｙ）が画像の左上座標（元の取得画像２１_１の特定の画素の座標位置Ｐ（ｘ，ｙ）を示す情報である。この座標位置Ｐが、抽出する領域画像２３_１の左上の頂点にあたる。）を示し、ｗが画像の幅（抽出する領域画像２３_１の幅方向の大きさ）を示し、ｈが画像の高さ（抽出する領域画像２３_１の高さ方向の大きさ）を示している。 FIG. 4 schematically shows a state in which the face position is detected in step S2. Specifically, the face image detection unit 8 acquires four variables x, y, w, and h as information on the face position in the acquired image 21. Note that, x, y, w, 4 variables h, for example, the (x, y) is the coordinate position P of the upper left coordinates (original specific pixels of the acquired image 21 ₁ of the image (x, y) This coordinate position P corresponds to the upper left vertex of the area image 23 _{1 to be} extracted.), W represents the width of the image (size in the width direction of the area image 23 _{1 to be} extracted), and h There is shown the height of the image (size in the height direction of the area image 23 ₁ of extract).

次に、顔画像抽出部９は、顔画像検出部８が検出した顔部分を切り出して領域画像２３を抽出する（ステップＳ４）。 Next, the face image extraction unit 9 extracts the region image 23 by cutting out the face portion detected by the face image detection unit 8 (step S4).

図４は、取得画像２１から領域画像２３を抽出する状態、及び、抽出した領域画像２３に基づいて監視用画像２２を生成する状態を模式的に示す図である。同図に示すとおり、例えば領域画像２３_１は、元の取得画像２１に写った人物１００_１の顔１０１_１の近傍を抽出した、元の取得画像２１のｘ，ｙ座標の位置を左上の頂点とし、幅ｗ、高さｈの矩形の画像となっている。この領域画像２３_１は元の取得画像２１よりも少ない画素数の画像、ここでは２Ｋの画像となっている。 FIG. 4 is a diagram schematically illustrating a state in which the region image 23 is extracted from the acquired image 21 and a state in which the monitoring image 22 is generated based on the extracted region image 23. As shown in the figure, for example, the region image 23 ₁ is obtained by extracting the vicinity of the face 101 ₁ of the person 100 ₁ shown in the original acquired image 21 and setting the x and y coordinate positions of the original acquired image 21 to the upper left vertex. And a rectangular image having a width w and a height h. The area image 23 ₁ is the number of fewer pixels than the original acquired image 21 of the image, has become a 2K image here.

なお、この実施の形態では、画像縮小部１１で、元の取得画像２１の画素数を減少させる処理（例えば、８Ｋの画像の画素数を減少させて２Ｋの取得画像２１とする処理）を行ったのちにエンコーダ１２での符号化処理と監視装置３への送信とを行っているが、領域画像２３は、元の取得画像２１から一部分を切り取って２Ｋの画像としているので、取得画像２１を２Ｋの画像とすることで、取得画像２１の人物１００の顔１０１が撮影された部分だけは実質的に８Ｋの画像の画質を保持してエンコーダ１２による符号化処理や監視装置３への送信を行うことができる。つまり、この実施の形態においては、エンコーダ１２の処理能力が低かったり、ネットワーク５の帯域が狭かったりしても、符号化の遅延や通信の遅延等の不具合が起こる事態を抑止し、高い解像度の監視カメラ１で撮影した画像を活用した監視を行うことができるようになる。 In this embodiment, the image reduction unit 11 performs a process of reducing the number of pixels of the original acquired image 21 (for example, a process of reducing the number of pixels of the 8K image to obtain the 2K acquired image 21). Later, encoding processing by the encoder 12 and transmission to the monitoring device 3 are performed, but the region image 23 is cut out from the original acquired image 21 to form a 2K image. By using a 2K image, only the portion of the acquired image 21 where the face 101 of the person 100 is photographed substantially retains the image quality of the 8K image, and the encoding process by the encoder 12 and the transmission to the monitoring device 3 are performed. It can be carried out. In other words, in this embodiment, even if the processing capability of the encoder 12 is low or the bandwidth of the network 5 is narrow, a situation such as an encoding delay or a communication delay is suppressed, and a high resolution is achieved. Monitoring using an image taken by the monitoring camera 1 can be performed.

次に、監視用画像生成部１０は、顔画像抽出部９によって抽出された一又は複数の領域画像２３によって監視用画像２２を生成する（ステップＳ５）。 Next, the monitoring image generation unit 10 generates a monitoring image 22 by using one or a plurality of area images 23 extracted by the face image extraction unit 9 (step S5).

ステップＳ４の手順を詳述すると、まず、特徴量抽出部１４は、領域画像２３を所定のフォーマットに縮小するための処理（例えば画素数を減少させて２Ｋの領域画像２３をより解像度の低い領域画像２３にする処理）を行う。この処理ののち、特徴量抽出部１４は、領域画像２３から特徴量情報を抽出する（ステップＳ５１）。 The procedure of step S4 will be described in detail. First, the feature amount extraction unit 14 performs processing for reducing the area image 23 into a predetermined format (for example, reducing the number of pixels to make the 2K area image 23 an area having a lower resolution). (Process to make image 23). After this processing, the feature amount extraction unit 14 extracts feature amount information from the region image 23 (step S51).

判定・並べ替え部１５は、特徴量抽出部１４において抽出された、領域画像２３ごとの特徴量情報を対比する。そして、隣接する取得画像２１（つまり、例えば図２の（ａ）に示す、監視カメラ１が撮影した特定のフレームにより生成された取得画像２１_１と、図２の（ｂ）に示す、そのフレームの次のフレームにより生成された取得画像２１_２）から取得された顔画像の特徴量情報が同一又は所定の類似範囲内のものである場合、判定・並べ替え部１５は、それらの顔画像は同一人物の顔画像すなわち同一画像であると判定する。 The determination / sorting unit 15 compares the feature amount information for each region image 23 extracted by the feature amount extraction unit 14. Then, the neighboring acquired image 21 (that is, for example, shown in (a) of FIG. 2, an acquired image 21 ₁ generated by the particular frame monitoring camera 1 is taken, shown in FIG. 2 (b), the frame If the feature amount information of the face image acquired from the acquired image 21 ₂ ) generated by the next frame is the same or within a predetermined similar range, the determination / sorting unit 15 The face image of the same person, that is, the same image is determined.

そして、判定・並べ替え部１５は、隣接する取得画像２１から抽出された複数の領域画像２３が同一画像である場合、それぞれの取得画像２１に基づいて生成されるそれぞれの領域画像２３を並べ変えて、監視用画像２２の同一位置に同一画像を配置する（ステップＳ５２）。また、同一画像ではない領域画像２３は、判定・並べ替え部１５の所定の処理によって、監視用画像２２の同一画像が配置されていない配置領域に配置される。 Then, when the plurality of region images 23 extracted from the adjacent acquired images 21 are the same image, the determination / sorting unit 15 rearranges each region image 23 generated based on each acquired image 21. Thus, the same image is arranged at the same position of the monitoring image 22 (step S52). Further, the region images 23 that are not the same image are arranged in an arrangement region where the same image of the monitoring image 22 is not arranged by a predetermined process of the determination / sorting unit 15.

以上のステップＳ５の処理により、監視用画像生成部１０において監視用画像２２が生成される。なお、監視用画像２２に配置されるそれぞれの領域画像２３は２Ｋ画像の画素数となっている。 The monitoring image 22 is generated in the monitoring image generation unit 10 by the processing in step S5 described above. Each area image 23 arranged in the monitoring image 22 has the number of pixels of the 2K image.

画像縮小部１１は、上述の顔画像検出部８〜顔画像抽出部９〜監視用画像生成部１０での処理とは別系統の処理により、画像取得部７が監視カメラ１から取得した取得画像２１の画素数を減少させる縮小処理を行う（ステップＳ６）。この実施の形態では、画像縮小部１１は、元の取得画像２１の８Ｋ画像の画素数を２Ｋ画像の画素数に減少させた取得画像２１を生成する処理を行う。このような処理を行うことで、画像のデータ量を削減できるので、エンコーダ１２の符号化処理の能力では監視カメラ１で撮影した画素数の画像を処理できない場合であっても、適正に符号化を行うことができ、また、画像データを送る際のネットワーク５の帯域の負荷が過大となる事態を抑制できる。 The image reduction unit 11 acquires the acquired image acquired by the image acquisition unit 7 from the monitoring camera 1 through a process different from the processes in the face image detection unit 8 to the face image extraction unit 9 to the monitoring image generation unit 10 described above. A reduction process is performed to reduce the number of pixels 21 (step S6). In this embodiment, the image reduction unit 11 performs a process of generating the acquired image 21 in which the number of pixels of the 8K image of the original acquired image 21 is reduced to the number of pixels of the 2K image. By performing such processing, the amount of image data can be reduced, so that even if the image of the number of pixels photographed by the monitoring camera 1 cannot be processed with the encoding processing capability of the encoder 12, encoding is performed properly. In addition, it is possible to suppress a situation where the load on the bandwidth of the network 5 when sending image data is excessive.

エンコーダ１２は、ステップＳ４で生成された監視用画像２２とステップＳ５で生成された取得画像２１とを符号化（エンコード）する（ステップＳ７）。 The encoder 12 encodes (encodes) the monitoring image 22 generated in step S4 and the acquired image 21 generated in step S5 (step S7).

そして、送信部１３は符号化した取得画像２１と監視用画像２２とを監視装置３に送信する（ステップＳ８）。ここで、送信部１３は、図４に示すように、「第一のストリーム送信手段」としての取得画像用ストリーム２４と、「第二のストリーム送信手段」としての監視用画像用ストリーム２５の２系統のデータストリームによって画像を監視装置３に送信してもよい。 Then, the transmission unit 13 transmits the encoded acquired image 21 and the monitoring image 22 to the monitoring device 3 (step S8). Here, as illustrated in FIG. 4, the transmission unit 13 includes two of an acquired image stream 24 as a “first stream transmission unit” and a monitoring image stream 25 as a “second stream transmission unit”. The image may be transmitted to the monitoring device 3 by a system data stream.

なお、送信部１３は、取得画像２１と監視用画像２２を１つの画像にまとめた態様の画像を生成し、１系統のデータストリーム２８によって監視装置３に送信してもよい。例えば、図５に示す他の例のように、送信部１３は、上半分を取得画像２１、下半分を監視用画像２２のように並べて配置した配置画像２６を生成してデータストリーム２８によって監視装置３に送信してもよい。また、図６に示す更に他の例のように、送信部１３は、取得画像２１の上に監視用画像２２をウィンドウ表示のような形で重ね合わせて配置した重ね合わせ画像２７を生成してデータストリーム２８によって監視装置３に送信してもよい。また、監視装置３等からの遠隔操作で、画像処理装置２が監視装置３に送信する画像を、特定の監視用画像２２（例えば、特定の人物１００の顔１０１が撮影された特定の領域画像２３）がズームされたものとして送信してもよいし、取得画像２１の送信と監視用画像２２との送信を切り替えられるようにしてもよい。 Note that the transmission unit 13 may generate an image in a form in which the acquired image 21 and the monitoring image 22 are combined into one image, and may transmit the image to the monitoring device 3 using one data stream 28. For example, as in another example illustrated in FIG. 5, the transmission unit 13 generates an arrangement image 26 in which the upper half is arranged and arranged as the acquired image 21 and the lower half is the monitoring image 22 and is monitored by the data stream 28. You may transmit to the apparatus 3. Further, as in another example illustrated in FIG. 6, the transmission unit 13 generates a superimposed image 27 in which the monitoring image 22 is superimposed on the acquired image 21 in a window display form. The data stream 28 may be transmitted to the monitoring device 3. Further, an image transmitted from the image processing apparatus 2 to the monitoring apparatus 3 by remote operation from the monitoring apparatus 3 or the like is used as a specific monitoring image 22 (for example, a specific area image in which the face 101 of a specific person 100 is captured). 23) may be transmitted as a zoomed image, or transmission between the acquired image 21 and the monitoring image 22 may be switched.

以上、この実施の形態においては、取得画像２１から抽出された顔画像が配置された監視用画像２２が生成されることにより、監視用画像２２には、取得画像２１中の監視対象である顔画像が表示される。そして、監視用画像２２における顔画像の位置は、取得画像２１中の人物１００の顔１０１の位置などに依存しない態様とすることができるので、時間の経過にともなう取得画像２１中の人物１００の顔１０１の位置などの変化の影響を抑止した監視用画像２２を生成できる。これにより、監視対象である人物１００の顔１０１の監視を行いやすい画像を生成することが可能となる。 As described above, in this embodiment, by generating the monitoring image 22 in which the face image extracted from the acquired image 21 is arranged, the monitoring image 22 includes the face that is the monitoring target in the acquired image 21. An image is displayed. Since the position of the face image in the monitoring image 22 can be made independent of the position of the face 101 of the person 100 in the acquired image 21 or the like, the position of the person 100 in the acquired image 21 as time passes. The monitoring image 22 in which the influence of changes such as the position of the face 101 is suppressed can be generated. As a result, it is possible to generate an image that facilitates monitoring of the face 101 of the person 100 to be monitored.

この実施の形態においては、複数の取得画像２１に同一人物の顔１０１が撮影された同一の画像が存在する場合、各取得画像２１を用いて生成される監視用画像２２に含まれる同一の位置に同一の人物の顔画像を配置することにより、人物１００の移動等にかかわらず、複数の監視用画像２２の同一の位置に同一の人物の顔画像を配置できるので、時間の経過に伴って人物１００の顔１０１が移動したり複数の人物１００の顔１０１の位置関係が入れ替わったりしても、監視用画像２２の同一の位置に同一の人物の顔画像を表示させ続けることができる。これにより、時間の経過に伴って監視用画像２２において同一の人物の顔１０１が表示された位置が頻繁に入れ替わるような事態を抑止して、監視対象である人物１００の顔１０１を監視しやすい監視用画像２２を提供できる。 In this embodiment, when the same image obtained by photographing the face 101 of the same person exists in the plurality of acquired images 21, the same position included in the monitoring image 22 generated using each acquired image 21. By arranging the same person's face image in the same position, the same person's face image can be arranged at the same position in the plurality of monitoring images 22 regardless of the movement of the person 100, etc. Even if the face 101 of the person 100 moves or the positional relationship of the faces 101 of the plurality of persons 100 is switched, the face image of the same person can be continuously displayed at the same position of the monitoring image 22. As a result, it is easy to monitor the face 101 of the person 100 to be monitored by suppressing a situation in which the position where the face 101 of the same person is displayed in the monitoring image 22 is frequently changed over time. A monitoring image 22 can be provided.

この実施の形態においては、取得画像２１から抽出した特徴量情報を用いて、複数の顔画像が同一人物の顔画像か否かを判定することにより、撮影された人物１００が同一か否かを高い精度で判別できて、高い精度で監視用画像２２の同一の位置に同一の人物の顔画像を配置することができる。これにより、監視対象である人物１００の顔１０１をより監視しやすい監視用画像２２を提供できる。 In this embodiment, by using the feature amount information extracted from the acquired image 21, it is determined whether or not the plurality of face images are the face images of the same person, thereby determining whether or not the photographed person 100 is the same. The face image of the same person can be arranged at the same position of the monitoring image 22 with high accuracy. As a result, it is possible to provide the monitoring image 22 that makes it easier to monitor the face 101 of the person 100 to be monitored.

この実施の形態においては、特徴量情報の抽出や複数の顔画像が同一の画像か否かの判定にニューラルネットワークを用いることにより、特徴量情報に基づいた同一人物の顔画像か否かの判別を、ニューラルネットワークの学習によって高い精度で行わせることができて、高い精度で監視用画像２２の同一位置に同一人物の顔画像を配置することができる。これにより、監視対象である人物１００の顔１０１をより監視しやすい監視用画像２２を提供できる。 In this embodiment, a neural network is used for extracting feature amount information and determining whether or not a plurality of face images are the same image, thereby determining whether or not they are face images of the same person based on the feature amount information. Can be performed with high accuracy by learning a neural network, and face images of the same person can be arranged at the same position of the monitoring image 22 with high accuracy. As a result, it is possible to provide the monitoring image 22 that makes it easier to monitor the face 101 of the person 100 to be monitored.

この実施の形態においては、顔画像を含む領域画像２３を抽出して監視用画像２２とすることができるので、監視対象である人物１００の監視に用いる顔画像の抽出を簡易な手順で行うことができ、処理の効率化を図ることができる。 In this embodiment, since the area image 23 including the face image can be extracted and used as the monitoring image 22, the extraction of the face image used for monitoring the person 100 to be monitored is performed with a simple procedure. And the processing efficiency can be improved.

この実施の形態においては、画像縮小部１１によって、取得画像２１、又は、取得画像２１及び監視用画像２２を、取得画像２１における顔画像を含む領域の方が監視用画像２２における顔画像を含む領域よりも高い縮小率となるように縮小することにより、監視カメラ１が撮影した全体の画像のデータ量を削減しつつ、監視対象である人物１００の顔１０１の監視に必要な画像を高い解像度の画像データとして生成できる。これにより、画像データ全体のデータ量を削減してネットワーク負荷を軽減させると共に、監視対象である人物１００の顔１０１を容易に識別できる解像度の画像を生成できる。 In this embodiment, the image reduction unit 11 causes the acquired image 21 or the acquired image 21 and the monitoring image 22 to be included in the region including the face image in the acquired image 21. By reducing the image so that the reduction ratio is higher than that of the area, the amount of data of the entire image captured by the monitoring camera 1 is reduced, and the image necessary for monitoring the face 101 of the person 100 to be monitored has a high resolution. Image data. As a result, the data amount of the entire image data can be reduced to reduce the network load, and an image having a resolution that can easily identify the face 101 of the person 100 to be monitored can be generated.

この実施の形態においては、第一のストリーム送信手段によって取得画像２１を送信し、第二のストリーム送信手段によって監視用画像２２を送信することにより、取得画像２１と監視用画像２２とを、一方の画像の通信状態が他方の画像の通信状態に影響を及ぼすことを抑止しつつ、ストリーム型通信による良好な通信状態でリアルタイムに送信することができる。 In this embodiment, the acquired image 21 is transmitted by the first stream transmitting unit, and the monitoring image 22 is transmitted by the second stream transmitting unit. It is possible to transmit in real time in a good communication state by the stream type communication while suppressing the communication state of one image from affecting the communication state of the other image.

この実施の形態においては、取得画像２１と顔画像とを含む監視用画像２２とを並べて配置した配置画像２６を生成し、取得画像２１、監視用画像２２、及び配置画像２６のうち少なくとも何れか一つを送信することにより、監視カメラ１が撮影した全体の画像と監視対象である人物１００の監視に必要な画像とが分離しない状態で一緒に送信できる。これにより、監視カメラ１が撮影した全体の画像とその全体の画像に含まれる監視対象である人物１００の顔１０１とを容易に対照できる監視用画像２２を提供できる。 In this embodiment, an arrangement image 26 in which the acquired image 21 and the monitoring image 22 including the face image are arranged side by side is generated, and at least one of the acquired image 21, the monitoring image 22, and the arrangement image 26 is generated. By transmitting one, it is possible to transmit the entire image captured by the monitoring camera 1 and the image necessary for monitoring the person 100 to be monitored together without being separated. Thereby, it is possible to provide the monitoring image 22 that can easily contrast the entire image captured by the monitoring camera 1 with the face 101 of the person 100 that is the monitoring target included in the entire image.

この実施の形態においては、取得画像２１上に顔画像を含む監視用画像２２を重ね合わせて配置した重ね合わせ画像２７を生成し、取得画像２１、監視用画像２２、及び重ね合わせ画像２７のうち少なくとも何れか一つを送信することにより、監視カメラ１が撮影した全体の画像と監視対象である人物１００の監視に必要な画像とを対照しやすい状態で配置したものを送信できる。これにより、撮影手段が撮影した全体の画像とその全体の画像に含まれる監視対象である人物１００の顔１０１とを容易に対照できる監視用画像２２を提供できる。 In this embodiment, a superimposed image 27 is generated by superimposing a monitoring image 22 including a face image on the acquired image 21, and the acquired image 21, the monitoring image 22, and the superimposed image 27 are among the acquired images 21, superposed images 27. By transmitting at least one of them, it is possible to transmit an entire image captured by the monitoring camera 1 and an image arranged in a state where it is easy to contrast the image necessary for monitoring the person 100 to be monitored. Accordingly, it is possible to provide the monitoring image 22 that can easily contrast the entire image captured by the imaging unit with the face 101 of the person 100 that is the monitoring target included in the entire image.

なお、この実施の形態においては、顔画像抽出部９は、取得画像２１から、顔画像検出部８によって検出された、顔画像を含む領域としての領域画像２３を抽出し、監視用画像２２を生成する構成としたが、これに替えて、顔画像抽出部９が取得画像２１から顔画像検出部８によって検出された顔画像のみを抽出し、この顔画像のみによって監視用画像２２を生成する構成としてもよい。 In this embodiment, the face image extraction unit 9 extracts a region image 23 as a region including the face image detected by the face image detection unit 8 from the acquired image 21, and displays the monitoring image 22. However, instead of this, the face image extraction unit 9 extracts only the face image detected by the face image detection unit 8 from the acquired image 21, and generates the monitoring image 22 using only this face image. It is good also as a structure.

また、この実施の形態においては、監視対象である対象物を人物１００の顔１０１としたが、これに限定されず、対象物は、人物１００の顔１０１以外、例えば全身像や手や足等であってもよいし、人物１００以外、例えば動物や昆虫や植物などでもよいし、自動車や飛行機や電車等であってもよい。 In this embodiment, the object to be monitored is the face 101 of the person 100. However, the object is not limited to this, and the object other than the face 101 of the person 100, for example, a full body image, a hand, a leg, or the like. Other than the person 100, for example, an animal, an insect, a plant, or the like, or an automobile, an airplane, a train, or the like may be used.

上記実施の形態は本発明の例示であり、本発明が上記実施の形態のみに限定されることを意味するものではないことは、いうまでもない。 The above embodiment is an exemplification of the present invention, and it is needless to say that the present invention is not limited to the above embodiment.

１・・・監視カメラ（撮影手段）
２・・・画像処理装置
７・・・画像取得部（画像取得手段）
８・・・顔画像検出部（対象物画像抽出手段、対象物画像検出手段）
９・・・顔画像抽出部（対象物画像抽出手段、領域画像抽出手段）
１０・・・監視用画像生成部（監視用画像生成手段）
１１・・・画像縮小部（画像縮小手段）
１３・・・送信部（送信手段）
１５・・・判定・並べ替え部（同一画像判定配置手段）
２１，２１_１，２１_２・・・取得画像
２２，２２_１，２２_２・・・監視用画像
２３，２３_１，２３_２，２３_３・・・領域画像
２４・・・取得画像用ストリーム（第一のストリーム送信手段）
２５・・・監視用画像用ストリーム（第二のストリーム送信手段）
２６・・・配置画像
２７・・・重ね合わせ画像
１００，１００_１，１００_２，１００_３，１００_４・・・人物
１０１，１０１_１，１０１_２，１０１_３，１０１_４・・・顔
1 ... Surveillance camera (photographing means)
2 ... Image processing device 7 ... Image acquisition unit (image acquisition means)
8. Face image detection unit (object image extraction means, object image detection means)
9 ... Face image extraction unit (object image extraction means, area image extraction means)
10: Monitoring image generation unit (monitoring image generation means)
11: Image reduction section (image reduction means)
13: Transmitter (transmitter)
15 ... Judgment / rearrangement unit (same image judgment arrangement means)
21, 21 ₁ , 21 ₂ ... Acquired image 22, 22 ₁ , 22 ₂ ... Monitoring image 23, 23 ₁ , 23 ₂ , 23 ₃ . One stream transmission means)
25 ... Image stream for monitoring (second stream transmission means)
26 ... arranged image 27 ... superimposed image 100, 100 _1, ₁₀₀ _2, 100 3, ₁₀₀ 4.. People 101, 101 _1, ₁₀₁ _2, 101 3, ₁₀₁ 4.. Face

Claims

Image acquisition means for acquiring images;
Object image extraction means for extracting the object image of the object included in the acquired image acquired by the image acquisition means;
Monitoring image generation means for generating a monitoring image in which one or more object images extracted by the object image extraction means are arranged;
When object images of the same object are included in a plurality of acquired images, the object image of the same object is placed at the same position included in each monitoring image generated using each acquired image. An image processing apparatus comprising: an identical image determination / arranging means for arranging.

The same image determination / arrangement unit extracts feature amount information used for identifying an object from the acquired image, and determines whether or not a plurality of object images are the same image using the extracted feature amount information. The image processing apparatus according to claim 1.

The said same image determination arrangement | positioning means performs at least one of the extraction of said feature-value information using a neural network, and the determination whether these several target object images are the same images. An image processing apparatus according to 1.

The object image extraction means includes
A region image extracting means for extracting a region image of a region including the object image;
4. The monitoring image generation unit generates a monitoring image in which one or more region images extracted by the region image extraction unit are arranged. 5. Image processing device.

Image reduction means for reducing the number of pixels of the image and reducing at least one of the acquired image and the monitoring image;
The image reduction means is configured to reduce the acquired image or the acquired image and the monitoring image in a region including the object image in the acquired image compared to a region including the object image in the monitoring image. The image processing apparatus according to claim 1, wherein the image processing apparatus is reduced so as to have a high reduction ratio.

Transmission means for transmitting at least one of the acquired image and the monitoring image, comprising: a first stream transmitting means for transmitting the acquired image; and a second stream transmitting means for transmitting the monitoring image. The image processing apparatus according to claim 1, further comprising: an image processing apparatus according to claim 1.

The monitoring image generation means includes
Generating an arrangement image in which the acquired image and the monitoring image including the object image are arranged side by side;
The image processing apparatus according to claim 1, further comprising a transmission unit configured to transmit at least one of the acquired image, the monitoring image, and the arrangement image.

The monitoring image generation means includes
Generating a superimposed image in which the monitoring image including the object image is superimposed on the acquired image,
The image processing apparatus according to claim 1, further comprising: a transmission unit that transmits at least one of the acquired image, the monitoring image, and the superimposed image.

An image processing method executed by a computer,
Processing to acquire images,
A process of extracting a target object image included in the acquired image acquired by the process of acquiring the image;
A process of generating a monitoring image in which one or more object images extracted by the process of extracting the object image of the object are arranged;
When object images of the same object are included in a plurality of acquired images, the object image of the same object is placed at the same position included in each monitoring image generated using each acquired image. An image processing method comprising: arranging processing.

Processing to acquire images,
A process of extracting a target object image included in the acquired image acquired by the process of acquiring the image;
A process of generating a monitoring image in which one or more object images extracted by the process of extracting the object image of the object are arranged;
When object images of the same object are included in a plurality of acquired images, the object image of the same object is placed at the same position included in each monitoring image generated using each acquired image. An image processing program that causes a computer to execute a process of arranging.