JP7069667B2

JP7069667B2 - Estimating program, estimation system, and estimation method

Info

Publication number: JP7069667B2
Application number: JP2017230761A
Authority: JP
Inventors: 敏規半谷; 裕起蒲山; 智美鈴木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2022-05-18
Anticipated expiration: 2037-11-30
Also published as: JP2019101664A

Description

本発明は、推定プログラム、推定システム、及び推定方法に関する。 The present invention relates to an estimation program, an estimation system, and an estimation method.

店舗の混雑状況を把握する手法として、例えば、食券制等の店舗における利用者による発券機の利用状況を分析することで、利用者のサービスの提供待ち状況を把握する手法が知られている。 As a method of grasping the congestion status of a store, for example, a method of grasping the waiting status of a user's service by analyzing the usage status of a ticket issuing machine by a user in a store such as a meal ticket system is known.

上記の手法では、店舗の混雑状況を把握できるが、座席の混雑状況、例えば、座席の着席又は空席状況等を把握することは困難である。 With the above method, it is possible to grasp the congestion status of the store, but it is difficult to grasp the congestion status of the seats, for example, the seating status or the vacant seat status of the seats.

座席の混雑状況を把握する手法としては、各座席に加圧センサを設置したり、店舗に赤外線センサを設置したりする手法が知られている。加圧センサを用いた手法では、例えば、座席ごとに加圧センサを１つずつ取り付けることで、各座席における着席状況を検出し、店舗の混雑状況を把握できる。また、赤外線センサを用いた手法では、例えば、店舗に設置した赤外線センサにより、人体からの赤外線を検知することで、人の居る位置や人数を把握できる。 As a method for grasping the congestion status of seats, a method of installing a pressure sensor in each seat or an infrared sensor in a store is known. In the method using the pressure sensor, for example, by attaching one pressure sensor to each seat, the seating status in each seat can be detected and the congestion status of the store can be grasped. Further, in the method using an infrared sensor, for example, an infrared sensor installed in a store can detect infrared rays from a human body to grasp the position and number of people.

しかし、これらの手法では、例えば、座席ごとに加圧センサを１つずつ取り付けたり、人体からの赤外線の検知が困難な座席領域を分析用のソフトウェアに設定したり、といった導入コストや、分析用のソフトウェアの運用及び保守等の運用コスト等が発生し得る。 However, with these methods, for example, the introduction cost such as installing one pressure sensor for each seat or setting the seat area where it is difficult to detect infrared rays from the human body in the analysis software, and for analysis Operation costs such as operation and maintenance of the software may be incurred.

従って、例えば、飲食店のような１００以上の座席を有する店舗の場合、店舗ごとに上述したコストが発生し得るため、容易に導入することは難しい。 Therefore, for example, in the case of a store having 100 or more seats such as a restaurant, the above-mentioned costs may be incurred for each store, and it is difficult to easily introduce the above-mentioned costs.

ところで、店舗に設置した監視カメラにより撮影された映像を分析することで、店舗における混雑状況を把握する手法も知られている。 By the way, there is also known a method of grasping a congestion situation in a store by analyzing an image taken by a surveillance camera installed in the store.

特開２０１０－８６３００号公報Japanese Unexamined Patent Publication No. 2010-86300 特開２０１７－１５６９５６号公報JP-A-2017-156965

しかしながら、監視カメラの映像を分析する手法では、監視カメラの角度（例えば店舗内の撮影方向の角度や画角）、映像の解像度、撮影条件等が、認識精度に大きく影響を与える。また、例えば、混雑状況の監視対象とするエリアを指定して監視したり、撮影した映像と空席画像との比較により監視したりすることもできる。しかし、これらの監視を行なうためには、店舗の座席位置の情報や監視カメラの設定・設置条件等の種々の条件をソフトウェアに設定することになる。 However, in the method of analyzing the image of the surveillance camera, the angle of the surveillance camera (for example, the angle of the shooting direction in the store and the angle of view), the resolution of the image, the shooting conditions, and the like have a great influence on the recognition accuracy. Further, for example, it is possible to specify and monitor the area to be monitored for the congestion situation, or to monitor by comparing the captured image with the vacant seat image. However, in order to perform these monitoring, various conditions such as store seat position information and surveillance camera settings / installation conditions must be set in the software.

なお、座席位置の情報としては、上記の例においては座席配置図が挙げられる。座席位置（領域）は、空間において人物等の物体が滞留する「滞留領域」の一例である。 As the information on the seat position, the seat layout diagram may be mentioned in the above example. The seat position (area) is an example of a "retention area" in which an object such as a person stays in the space.

多くの店舗を有する企業の場合、このようなソフトウェアへの設定を店舗ごとに行なうことになるため、多大なコストが発生し得る。 In the case of a company having many stores, such software settings are made for each store, which may result in a large cost.

１つの側面では、本発明は、空間における物体の滞留領域の混雑状況を、低コストに又は高精度に推定することを目的とする。 In one aspect, it is an object of the present invention to estimate the congestion status of a stagnant area of an object in space at low cost or with high accuracy.

１つの側面では、推定プログラムは、以下の処理をコンピュータに実行させてよい。前記処理は、空間を撮影した複数の第１画像の各々から検出された物体について、検出され
た各物体の位置情報と、前記複数の第１画像の各々における各物体の検出状況と、に基づき、前記複数の第１画像間で相互に関連する物体をグループ化してよい。また、前記処理は、前記グループ化の結果に基づき、前記空間において各物体が滞留する滞留領域を推定してよい。さらに、前記処理は、所定数以上の前記第１画像間において、物体の画像成分が一致すると判断した前記所定数以上の物体を一の物体と対応付けて管理してよい。さらに、前記グループ化は、前記複数の第１画像間で、距離に関する条件を満たす一の物体同士を前記相互に関連する物体としてグループ化する処理を含んでよい。さらに、前記グループ化は、前記複数の第１画像の各々における各物体の検出状況を示す情報であって、前記一の物体ごとに、前記複数の第１画像の各々において当該一の物体に対応付けられた物体が存在するか否かを示す情報を生成し、前記生成した情報に基づき、対応する物体が一の第１画像内に存在する一の物体同士をグループ化の対象から除外する、処理を含んでよい。 In one aspect, the estimation program may cause the computer to perform the following processing. The processing is based on the position information of each detected object for the object detected from each of the plurality of first images obtained by photographing the space, and the detection status of each object in each of the plurality of first images. , Objects related to each other may be grouped among the plurality of first images. Further, in the process, the residence region where each object stays in the space may be estimated based on the result of the grouping. Further, in the process, the predetermined number or more of the objects determined to match the image components of the objects may be managed in association with one object among the predetermined number or more of the first images. Further, the grouping may include a process of grouping one object satisfying the condition regarding the distance between the plurality of first images as the mutually related object. Further, the grouping is information indicating the detection status of each object in each of the plurality of first images, and corresponds to the one object in each of the plurality of first images for each of the one objects. Information indicating whether or not the attached object exists is generated, and based on the generated information, one object whose corresponding object exists in one first image is excluded from the grouping target. Processing may be included.

１つの側面では、空間における物体の滞留領域の混雑状況を、低コストに又は高精度に推定することができる。 In one aspect, the congestion status of the stagnant area of an object in space can be estimated at low cost or with high accuracy.

一実施形態に係る混雑度推定システムの構成例を示すブロック図である。It is a block diagram which shows the configuration example of the congestion degree estimation system which concerns on one Embodiment. 一実施形態に係る動作フェーズの一例を示す図である。It is a figure which shows an example of the operation phase which concerns on one Embodiment. 一実施形態に係るサーバによる物体の検出例を示す図である。It is a figure which shows the example of the detection of the object by the server which concerns on one Embodiment. 検出情報の一例を示す図である。It is a figure which shows an example of the detection information. 静的ボックス及び非静的ボックスの一例を示す図である。It is a figure which shows an example of a static box and a non-static box. ボックスの座標の類似性判定処理の一例を示す図である。It is a figure which shows an example of the similarity determination processing of the coordinates of a box. ボックス自体の類似性判定処理の一例を示す図である。It is a figure which shows an example of the similarity determination process of a box itself. 座席位置の推定処理の一例を示す図である。It is a figure which shows an example of the estimation process of a seat position. 階層化クラスタリングの距離指標の一例を示す図である。It is a figure which shows an example of the distance index of the layered clustering. 座席位置の推定手順の一例を示す図である。It is a figure which shows an example of the estimation procedure of a seat position. （ａ）～（ｃ）は、推定した混雑度の一例を示す図である。(A) to (c) are diagrams showing an example of the estimated degree of congestion. 混雑度の提示態様の一例を示す図である。It is a figure which shows an example of the presentation mode of the degree of congestion. 一実施形態に係る座席位置推定フェーズの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the seat position estimation phase which concerns on one Embodiment. 一実施形態に係る座席位置推定フェーズの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the seat position estimation phase which concerns on one Embodiment. 一実施形態に係る混雑状況推定フェーズの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the congestion situation estimation phase which concerns on one Embodiment. 一実施形態に係るコンピュータのハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of the computer which concerns on one Embodiment.

以下、図面を参照して本発明の実施の形態を説明する。ただし、以下に説明する実施形態は、あくまでも例示であり、以下に明示しない種々の変形や技術の適用を排除する意図はない。例えば、本実施形態を、その趣旨を逸脱しない範囲で種々変形して実施することができる。なお、以下の実施形態で用いる図面において、同一符号を付した部分は、特に断らない限り、同一若しくは同様の部分を表す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below are merely examples, and there is no intention of excluding various modifications and applications of techniques not specified below. For example, the present embodiment can be variously modified and implemented without departing from the spirit of the present embodiment. In the drawings used in the following embodiments, the parts with the same reference numerals represent the same or similar parts unless otherwise specified.

〔１〕一実施形態
上述のように、監視カメラにより撮影された映像を分析して座席の混雑状況を把握するには、座席配置図の情報や監視カメラの設定・設置条件等をソフトウェアに設定することになり、店舗数が増えるほどコストが増大し得る。 [1] Embodiment As described above, in order to analyze the image taken by the surveillance camera and grasp the congestion status of the seats, the information of the seat layout and the setting / installation conditions of the surveillance camera are set in the software. As the number of stores increases, the cost can increase.

そこで、一実施形態では、監視カメラにより撮影された映像に基づき、座席位置を推定することができる推定システムについて説明する。 Therefore, in one embodiment, an estimation system capable of estimating the seat position based on the image taken by the surveillance camera will be described.

例えば、一実施形態に係る推定システムは、以下の処理を行なってよい。 For example, the estimation system according to the embodiment may perform the following processing.

・空間を撮影した複数の第１画像の各々から検出された物体について、検出された各物体の位置情報と、複数の第１画像の各々における各物体の検出状況と、に基づき、複数の第１画像間で相互に関連する物体をグループ化する。 -For an object detected from each of a plurality of first images obtained by photographing a space, a plurality of first images are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. 1 Group objects that are related to each other between images.

・グループ化の結果に基づき、空間において各物体が滞留する滞留領域を推定する。 -Based on the grouping results, estimate the retention area where each object stays in space.

以上の処理により、一実施形態に係る推定システムは、複数の第１画像に基づいて、各物体が滞留する滞留領域、例えば、人が居る座席位置を推定することができる。なお、滞留領域は、人物が留まる特定の領域を意味し、例えば、椅子や座布団等の着席領域に限らず、立っている領域（空間）を含んでよい。立っている領域の一例としては、立食（或いは、「立ち食い」と呼ばれる）形式の飲食店等や、顧客が立ったままサービスの提供を受ける施設等において、店舗から顧客に対して割り当てられる立席領域が挙げられる。 By the above processing, the estimation system according to the embodiment can estimate the retention area where each object stays, for example, the seat position where a person is present, based on a plurality of first images. The staying area means a specific area where a person stays, and may include not only a sitting area such as a chair or a cushion but also a standing area (space). As an example of a standing area, a standing restaurant (or called "tachigui") restaurant or a facility where a customer is provided with a service while standing is assigned to the customer by the store. The seat area can be mentioned.

空間を撮影した第１画像に基づき、空間における滞留領域を推定することで、例えば、空間を撮影した画像と、推定した滞留領域とに基づいて、滞留領域の混雑状況を容易に推定することができる。 By estimating the retention area in the space based on the first image obtained by photographing the space, for example, it is possible to easily estimate the congestion state of the retention area based on the image obtained by photographing the space and the estimated retention area. can.

例えば、監視カメラの映像に基づき座席の混雑状況を推定する際に、座席位置の情報等の条件をソフトウェアに設定する（或いは更新する）といった処理を不要とすることができ、低コスト化を実現できる。 For example, when estimating the seat congestion status based on the image of the surveillance camera, it is possible to eliminate the need for processing such as setting (or updating) conditions such as seat position information in the software, and realizing cost reduction. can.

また、事前に用意された座席配置図の情報を用いて座席の混雑状況を把握する場合、撮影した画像を、監視カメラの設定・設置条件等に基づき補正等を行なった上で、座席配置図と比較するため、混雑状況を正確に推定できない場合がある。これに対し、一実施形態に係る推定システムは、滞留領域の推定及び混雑状況の推定において、いずれも撮影した画像を利用した処理が行なわれる。従って、監視カメラの設定・設置条件等の影響を低減でき、混雑状況を高精度に推定することができる。 In addition, when grasping the seat congestion situation using the information of the seat layout prepared in advance, the captured image is corrected based on the setting and installation conditions of the surveillance camera, and then the seat layout is used. It may not be possible to accurately estimate the congestion situation because it is compared with. On the other hand, in the estimation system according to one embodiment, processing using captured images is performed in both the estimation of the retention area and the estimation of the congestion status. Therefore, the influence of the setting / installation conditions of the surveillance camera can be reduced, and the congestion status can be estimated with high accuracy.

以上のことから、一実施形態に係る推定システムは、推定した滞留領域の情報に基づいて、例えば、空間における滞留領域の利用状況を、低コストに又は高精度に推定することができる。 From the above, the estimation system according to the embodiment can estimate, for example, the usage status of the retention area in the space at low cost or with high accuracy based on the estimated information of the retention area.

なお、空間を撮影する監視カメラは、防犯等の目的により、店舗等において導入され利用されている可能性が高い。このため、座席の混雑状況の推定に監視カメラを用いることにより、既存の設備を利用できることから、加圧センサや赤外線センサ、或いは発券機等を新たに導入して座席の混雑状況を推定するよりも、コストを抑制することができる。 It is highly possible that surveillance cameras that capture space are introduced and used in stores and the like for the purpose of crime prevention and the like. For this reason, existing equipment can be used by using a surveillance camera to estimate the seat congestion status, so it is better to newly introduce a pressure sensor, infrared sensor, ticketing machine, etc. to estimate the seat congestion status. However, the cost can be suppressed.

〔１－１〕一実施形態の構成例
以下、一実施形態の構成例について説明する。図１は一実施形態に係る混雑度推定システム１の構成例を示すブロック図である。混雑度推定システム１は、推定対象の施設における滞留領域の混雑度を推定するシステムであり、上述した推定システムの一例である。混雑度推定システム１は、図１に示すように、例示的に、１以上（図１の例では複数）の監視カメラ２と、サーバ４と、をそなえてよい。 [1-1] Configuration Example of One Embodiment Hereinafter, a configuration example of one embodiment will be described. FIG. 1 is a block diagram showing a configuration example of a congestion degree estimation system 1 according to an embodiment. The congestion degree estimation system 1 is a system for estimating the congestion degree of the residence area in the facility to be estimated, and is an example of the above-mentioned estimation system. As shown in FIG. 1, the congestion degree estimation system 1 may optionally include one or more (plurality in the example of FIG. 1) surveillance cameras 2 and a server 4.

なお、監視カメラ２及びサーバ４は、例えば、ネットワーク５により相互に通信可能に接続されてよい。 The surveillance camera 2 and the server 4 may be connected to each other so as to be able to communicate with each other by, for example, the network 5.

ネットワーク５は、例えば、ＬＡＮ（Local Area Network）或いはＷＡＮ（Wide Area Network）、又はこれらの組み合わせを含む、インターネット及びイントラネットの少なくとも一方であってよい。また、ネットワーク５は、ＶＰＮ（Virtual Private Network）等の仮想ネットワークを含んでもよい。なお、ネットワーク５は、有線ネットワーク及び無線ネットワークの一方又は双方により形成されてよい。 The network 5 may be, for example, a LAN (Local Area Network) or a WAN (Wide Area Network), or at least one of an Internet and an intranet including a combination thereof. Further, the network 5 may include a virtual network such as a VPN (Virtual Private Network). The network 5 may be formed by one or both of a wired network and a wireless network.

監視カメラ２は、撮影方向における空間を撮影し、画像系列、例えば、動画像等の時系列に並んだ複数の画像（「フレーム」と称されてもよい）を取得してよい。 The surveillance camera 2 may shoot a space in a shooting direction and acquire a plurality of images (may be referred to as "frames") arranged in a time series such as an image series, for example, a moving image.

監視カメラ２は、例えば店舗３に設置されてよく、一例として、店舗３の店内及び店外に存在する顧客用の座席（好ましくは顧客用の全ての座席）を撮影範囲に含むように配置されてよい。 The surveillance camera 2 may be installed in, for example, a store 3, and as an example, the surveillance camera 2 is arranged so as to include customer seats (preferably all customer seats) existing inside and outside the store 3 in the shooting range. It's okay.

店舗３の店内及び店外の全ての座席を撮影範囲に含めるために、例えば、複数の監視カメラ２が互いの死角を補完する（或いは一部の撮影範囲が重なる）ような位置に設置されてもよいし、１以上の可動式の監視カメラ２が設置されてもよい。或いは、これらの組み合わせが採用されてもよい。 In order to include all the seats inside and outside the store 3 in the shooting range, for example, a plurality of surveillance cameras 2 are installed at positions that complement each other's blind spots (or some shooting ranges overlap). Alternatively, one or more movable surveillance cameras 2 may be installed. Alternatively, a combination of these may be adopted.

監視カメラ２は、取得した映像をネットワーク５を介してサーバ４に送信してよい。例えば、監視カメラ２は、取得した映像を図示しないレコーダ等に蓄積し、所定のタイミングで、レコーダ内のデータをサーバ４に送信してもよい。所定のタイミングとしては、所定の時刻の到来、所定時間の経過、レコーダへの蓄積容量、蓄積フレーム数、等の種々の条件が用いられてよい。又は、監視カメラ２は、レコーダを介さずに、撮影した映像をサーバ４に送信してもよい。レコーダを介さない場合、例えば、監視カメラ２は、１～数フレームごとに（略リアルタイムに）映像を送信してもよい。 The surveillance camera 2 may transmit the acquired video to the server 4 via the network 5. For example, the surveillance camera 2 may store the acquired video in a recorder or the like (not shown) and transmit the data in the recorder to the server 4 at a predetermined timing. As the predetermined timing, various conditions such as the arrival of a predetermined time, the passage of a predetermined time, the storage capacity in the recorder, the number of storage frames, and the like may be used. Alternatively, the surveillance camera 2 may transmit the captured video to the server 4 without going through the recorder. When not via the recorder, for example, the surveillance camera 2 may transmit an image every one to several frames (in substantially real time).

監視カメラ２としては、例えば、ボックス型カメラ、ドームカメラ、ネットワークカメラ等が挙げられる。なお、ネットワークカメラとしては、ＩＰ（Internet Protocol）カメラ等が挙げられる。また、店内又は店外の照明が暗い店舗３においては、監視カメラ２として赤外線カメラ等の暗視カメラが用いられてもよい。監視カメラ２は、例えば、防犯カメラ、監視カメラ、街頭カメラ、定点カメラ等の種々の用途のカメラが用いられてよい。 Examples of the surveillance camera 2 include a box-type camera, a dome camera, a network camera, and the like. Examples of the network camera include an IP (Internet Protocol) camera and the like. Further, in a store 3 where the lighting inside or outside the store is dark, a night-vision camera such as an infrared camera may be used as the surveillance camera 2. As the surveillance camera 2, for example, cameras for various purposes such as a security camera, a surveillance camera, a street camera, and a fixed point camera may be used.

サーバ４は、監視カメラ２が撮影した映像に基づき、店舗３における座席位置を推定し、推定した座席位置と、監視カメラ２が撮影した映像と、に基づき、店舗３における座席の混雑度を推定してよい。サーバ４の詳細については後述する。 The server 4 estimates the seat position in the store 3 based on the image taken by the surveillance camera 2, and estimates the degree of congestion of the seat in the store 3 based on the estimated seat position and the image taken by the surveillance camera 2. You can do it. The details of the server 4 will be described later.

なお、サーバ４が推定した座席の混雑度の情報は、例えば、図１に示すように、端末装置６に提供されてよい。一例として、サーバ４は、Ｗｅｂサーバの機能（例えば、後述する情報提示部１４）を有してよく、Ｗｅｂサーバの機能により、ネットワーク５を介して、店舗３の混雑度を表すＷｅｂページを端末装置６に表示させてもよい。 Information on the degree of seat congestion estimated by the server 4 may be provided to the terminal device 6 as shown in FIG. 1, for example. As an example, the server 4 may have a Web server function (for example, an information presentation unit 14 described later), and the Web server function allows a Web page indicating the degree of congestion of the store 3 to be terminal via the network 5. It may be displayed on the device 6.

端末装置６は、サーバ４が推定した店舗３の混雑度の情報を受け取るコンピュータである。例えば、端末装置６は、店舗３の座席の混雑状況に関心のある店舗３の利用候補者が有するコンピュータであってよい。端末装置６としては、例えば、デスクトップ、ラップトップ又はモバイル等のＰＣ（Personal Computer）、タブレット、スマートホン、携帯電話等の各種情報処理装置が挙げられる。 The terminal device 6 is a computer that receives information on the degree of congestion of the store 3 estimated by the server 4. For example, the terminal device 6 may be a computer owned by a candidate for use of the store 3 who is interested in the congestion situation of the seats of the store 3. Examples of the terminal device 6 include various information processing devices such as a personal computer (PC) such as a desktop, laptop or mobile, a tablet, a smartphone, and a mobile phone.

端末装置６は、例えば、ネットワーク５を介して、サーバ４との間で、店舗３の座席の混雑状況取得に関する要求の送信、及び、混雑状況の推定結果に関する応答の受信等の種々の通信を行なってよい。 The terminal device 6 transmits various communications such as transmission of a request for acquiring the congestion status of the seats of the store 3 and reception of a response regarding the estimation result of the congestion status with the server 4 via the network 5, for example. You may go.

なお、図１に例示するように、端末装置６が無線通信を行なうＰＣ、タブレット、スマートホン、携帯電話等である場合、ネットワーク５との接続は、基地局７を介したモバイルネットワーク経由で行なわれてよい。 As illustrated in FIG. 1, when the terminal device 6 is a PC, tablet, smart phone, mobile phone or the like that performs wireless communication, the connection with the network 5 is performed via the mobile network via the base station 7. You can do it.

端末装置６は、例示的に、ユーザからの情報（操作要求）の入力手段、ユーザへの情報の出力手段、及び、サーバ４との間の通信手段、等をそなえてよい。 Illustratively, the terminal device 6 may include means for inputting information (operation request) from the user, means for outputting information to the user, means for communicating with the server 4, and the like.

〔１－２〕サーバ４の動作フェーズ
次に、サーバ４の動作フェーズについて説明する。 [1-2] Operation phase of the server 4 Next, the operation phase of the server 4 will be described.

サーバ４は、上述のように、監視カメラ２が撮影した映像に基づき、座席位置を推定してよい。そして、サーバ４は、推定した座席位置と、監視カメラ２が撮影した（例えば最新の）映像と、に基づいて、座席の（例えば最新の）混雑状況を推定してよい。 As described above, the server 4 may estimate the seat position based on the image taken by the surveillance camera 2. Then, the server 4 may estimate the congestion status of the seat (for example, the latest) based on the estimated seat position and the (for example, the latest) image taken by the surveillance camera 2.

映像に基づく座席位置の推定には、種々の画像認識技術が用いられてよいが、一実施形態においては、例示的に、ニューラルネットワーク（ＮＮ；Neural Network）を用いたディープラーニング（Deep Learning）による検出モデルが用いられるものとする。 Various image recognition techniques may be used for estimating the seat position based on the image, but in one embodiment, by way of example, deep learning using a neural network (NN) is used. A detection model shall be used.

なお、店舗３の提供するサービス内容や時間帯等にも依るが、店舗３において顧客が座席を使用する期間は、十数分～数十分（或いは１時間以上）であると考えられる。監視カメラ２は、少なくとも１秒間に１フレーム（１ＦＰＳ；Frame Per Second）以上の画像データを撮影可能であるが、顧客の着席、離席、退席等の座席の変化を検出するために全てのフレームのデータを分析することは非効率である。 Although it depends on the service content and time zone provided by the store 3, it is considered that the period during which the customer uses the seat in the store 3 is a dozen minutes to several tens of minutes (or one hour or more). The surveillance camera 2 can capture image data of at least one frame (1 FPS; Frame Per Second) or more per second, but all frames are used to detect changes in the seat such as seating, leaving, and leaving the customer. It is inefficient to analyze the data of.

このため、機械学習では、撮影された画像データのうちの一部のフレームの画像データが用いられればよい。例えば、一実施形態においては、映像から５分間隔等の所定のサンプリング間隔で取得されたフレームがニューラルネットワークに入力されてよい。 Therefore, in machine learning, the image data of a part of the captured image data may be used. For example, in one embodiment, frames acquired from the video at a predetermined sampling interval such as a 5-minute interval may be input to the neural network.

また、一実施形態においては、座席位置の推定のために、例えば、数時間～数日間の推定期間が設けられてよい。以下の説明では、例示的に、３日間の推定期間（所定期間）が設けられるものとする。 Further, in one embodiment, an estimation period of several hours to several days may be provided for estimating the seat position. In the following description, it is assumed that an estimated period (predetermined period) of 3 days is provided as an example.

以上の点から、一実施形態に係る手法は、図２に例示するフェーズに分けて実施されてよい。例えば、図２に示すように、監視カメラ２の運用が行なわれている店舗３に対して、監視カメラ２の映像をサーバ４に送信するための設定を行ない、所定期間として３日間の座席位置推定フェーズが設けられてよい。 From the above points, the method according to the embodiment may be carried out by dividing it into the phases illustrated in FIG. For example, as shown in FIG. 2, the store 3 in which the surveillance camera 2 is operated is set to transmit the image of the surveillance camera 2 to the server 4, and the seat position is set for 3 days as a predetermined period. An estimation phase may be provided.

座席位置推定フェーズにより、座席位置が推定されると、次いで、サーバ４による混雑状況推定フェーズが開始されてよい。混雑状況推定フェーズでは、サーバ４は、推定された座席位置と、監視カメラ２より送られてくる画像データとに基づいて、所定のタイミングで混雑状況の推定を行なってよい。 After the seat position is estimated by the seat position estimation phase, the congestion situation estimation phase by the server 4 may be started next. In the congestion situation estimation phase, the server 4 may estimate the congestion situation at a predetermined timing based on the estimated seat position and the image data sent from the surveillance camera 2.

このように、座席位置推定フェーズは、一実施形態に係る手法の初期設定フェーズと位置付けられてよく、混雑状況推定フェーズは、一実施形態に係る手法の通常運用フェーズと位置付けられてよい。 As described above, the seat position estimation phase may be positioned as the initial setting phase of the method according to the embodiment, and the congestion situation estimation phase may be positioned as the normal operation phase of the method according to the embodiment.

なお、店舗３においては、例えば店舗内のレイアウト変更や、顧客又は従業員等による座席の移動が発生する可能性がある。そこで、座席位置推定フェーズの終了後、混雑状況推定フェーズにおいても、推定した座席位置を更新するための座席位置推定（更新）フェーズが実施されてもよい。座席位置推定（更新）フェーズは、座席位置推定フェーズと同様の処理に実施可能であるため、以下の説明では、これらを区別せずに、単に座席位置推定フェーズと表記する。 In the store 3, for example, there is a possibility that the layout of the store may be changed or the seats may be moved by a customer, an employee, or the like. Therefore, after the end of the seat position estimation phase, a seat position estimation (update) phase for updating the estimated seat position may be carried out also in the congestion situation estimation phase. Since the seat position estimation (update) phase can be performed in the same process as the seat position estimation phase, they are not distinguished in the following description and are simply referred to as the seat position estimation phase.

〔１－３〕サーバ４の構成例
次に、サーバ４の構成例について説明する。サーバ４は、店舗３、店舗３を有する企業等、又はデータセンタ等に設置される１以上のコンピュータの一例である。サーバ４としては、例えば、種々の物理サーバ装置及び／又は仮想サーバ装置が挙げられる。サーバ４の少なくとも一部の機能は、例えばクラウドサービスにより提供されるリソース、フレームワーク、アプリケーション等を利用して実現されてもよい。また、サーバ４の少なくとも一部の機能は、複数のコンピュータに分散又は冗長化して配置されてもよい。 [1-3] Configuration Example of Server 4 Next, a configuration example of the server 4 will be described. The server 4 is an example of one or more computers installed in a store 3, a company having a store 3, a data center, or the like. Examples of the server 4 include various physical server devices and / or virtual server devices. At least a part of the functions of the server 4 may be realized by using, for example, resources, frameworks, applications and the like provided by the cloud service. Further, at least a part of the functions of the server 4 may be distributed or redundantly arranged in a plurality of computers.

図１に示すように、サーバ４は、例示的に、メモリ部１１、制御部１２、ＮＮ１３、及び、情報提示部１４をそなえてよい。 As shown in FIG. 1, the server 4 may optionally include a memory unit 11, a control unit 12, an NN 13, and an information presentation unit 14.

メモリ部１１は、サーバ４の処理に用いられる種々の情報を格納する。メモリ部１１が格納する情報については、サーバ４の機能の説明において後述する。なお、メモリ部１１としては、メモリ、例えばＲＡＭ（Random Access Memory）等の揮発性メモリ、並びに、記憶部、例えばＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）等の記憶装置、の一方又は双方が挙げられる。 The memory unit 11 stores various information used for the processing of the server 4. The information stored in the memory unit 11 will be described later in the description of the function of the server 4. The memory unit 11 includes a memory, for example, a volatile memory such as a RAM (Random Access Memory), and a storage unit, for example, a storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive). Both can be mentioned.

制御部１２は、座席位置の推定及び混雑状況の推定に関する制御を行なう。図１に示すように、制御部１２は、例示的に、情報取得部１２１、静的ボックス判定部１２２、座席推定部１２３、及び、混雑度算出部１２４をそなえてよい。 The control unit 12 controls the estimation of the seat position and the estimation of the congestion situation. As shown in FIG. 1, the control unit 12 may optionally include an information acquisition unit 121, a static box determination unit 122, a seat estimation unit 123, and a congestion degree calculation unit 124.

情報取得部１２１は、監視カメラ２（店舗３）から送信された映像データを受信し、メモリ部１１に映像データ１１１として格納してよい。なお、一実施形態において、制御部１２及びＮＮ１３は、５分間隔で撮影された、時系列に並んだ画像データ（フレーム）を用いてよい。時系列に並んだ画像データは、複数の第１画像の一例である。 The information acquisition unit 121 may receive the video data transmitted from the surveillance camera 2 (store 3) and store it in the memory unit 11 as the video data 111. In one embodiment, the control unit 12 and the NN 13 may use image data (frames) arranged in time series taken at 5-minute intervals. The image data arranged in time series is an example of a plurality of first images.

例えば、情報取得部１２１は、受信した映像データから５分間隔でフレームを抽出した情報を、映像データ１１１としてメモリ部１１に格納してもよい。或いは、店舗３又は監視カメラ２において、撮影した映像データから５分間隔で抽出したフレーム群が、映像データとしてサーバ４に送信されてもよい。 For example, the information acquisition unit 121 may store the information obtained by extracting frames from the received video data at intervals of 5 minutes in the memory unit 11 as video data 111. Alternatively, the frame group extracted from the captured video data at 5-minute intervals in the store 3 or the surveillance camera 2 may be transmitted to the server 4 as video data.

なお、映像データ１１１は、少なくともＮ（Ｎは整数；例えば“５”）フレームの画像データを含んでよい。情報取得部１２１は、メモリ部１１の記憶容量節約の観点から、映像データ１１１のデータサイズ或いはフレーム数等に上限を設け、上限を超えるフレームについて、撮影日時が過去のフレーム順に映像データ１１１内のフレームを削除してもよい。 The video data 111 may include image data of at least N (N is an integer; for example, “5”) frame. From the viewpoint of saving the storage capacity of the memory unit 11, the information acquisition unit 121 sets an upper limit on the data size or the number of frames of the video data 111, and for the frames exceeding the upper limit, the shooting date and time is in the video data 111 in the order of the past frames. You may delete the frame.

静的ボックス判定部１２２及び座席推定部１２３は、座席位置推定フェーズにおいて、ＮＮ１３と協働して、映像データ１１１から店舗３の座席位置を推定する。換言すれば、静的ボックス判定部１２２及び座席推定部１２３は、以下のグループ化部、及び、推定部の一例である。 The static box determination unit 122 and the seat estimation unit 123, in cooperation with the NN 13 in the seat position estimation phase, estimate the seat position of the store 3 from the video data 111. In other words, the static box determination unit 122 and the seat estimation unit 123 are examples of the following grouping unit and estimation unit.

グループ化部は、空間を撮影した複数の第１画像の各々から検出された物体について、検出された各物体の位置情報と、複数の第１画像の各々における各物体の検出状況と、に基づき、複数の第１画像間で相互に関連する物体をグループ化してよい。推定部は、グループ化の結果に基づき、空間において各物体が滞留する滞留領域を推定してよい。 The grouping unit is based on the position information of each detected object for the object detected from each of the plurality of first images obtained by photographing the space, and the detection status of each object in each of the plurality of first images. , Objects that are related to each other may be grouped among a plurality of first images. The estimation unit may estimate the retention region in which each object stays in the space based on the result of grouping.

混雑度算出部１２４は、推定した座席位置に基づいて、店舗３の混雑度を算出する。換言すれば、混雑度算出部１２４は、空間を撮影した第２画像から検出された物体の位置情報と、推定された空間における滞留領域の情報と、に基づき、滞留領域の混雑度を推定する混雑度推定部の一例である。 The congestion degree calculation unit 124 calculates the congestion degree of the store 3 based on the estimated seat position. In other words, the congestion degree calculation unit 124 estimates the congestion degree of the retention region based on the position information of the object detected from the second image of the space and the information of the retention region in the estimated space. This is an example of the congestion degree estimation unit.

情報提示部１４は、例えば、Ｗｅｂサーバ或いはＤＢ（Database）サーバの機能を有してよく、端末装置６からの要求に応じて、端末装置６に対して、混雑度算出部１２４が算出した混雑度の情報を提示してよい。 The information presentation unit 14 may have, for example, a function of a Web server or a DB (Database) server, and the congestion degree calculation unit 124 calculates the congestion degree for the terminal device 6 in response to a request from the terminal device 6. Information on the degree may be presented.

以下、サーバ４の機能及び動作の一例について説明する。 Hereinafter, an example of the function and operation of the server 4 will be described.

〔１－３－１〕ＮＮの説明
まず、ＮＮ１３について説明する。ＮＮ１３は、監視カメラ２が空間を撮影した複数の第１画像の各々から、物体を検出する。例えば、ＮＮ１３は、メモリ部１１に格納された、５分間隔の複数の画像データを含む映像データ１１１に基づき物体を検出してよい。 [1-3-1] Explanation of NN First, NN13 will be described. The NN 13 detects an object from each of the plurality of first images taken by the surveillance camera 2 in the space. For example, the NN 13 may detect an object based on the video data 111 including a plurality of image data at 5-minute intervals stored in the memory unit 11.

ＮＮ１３は、事前に、画像データから検出対象の物体を検出するように機械学習が行なわれたシステムであってよい。例えば、ＮＮ１３には、画像データから人物の頭部を検出する検出モデルが適用されてよい。人物の頭部は、検出対象の物体の一例である。 The NN 13 may be a system in which machine learning is performed in advance so as to detect an object to be detected from image data. For example, a detection model for detecting the head of a person from image data may be applied to the NN 13. The head of a person is an example of an object to be detected.

図３にＮＮ１３による物体の検出例を示す。図３に例示するように、ＮＮ１３は、入力された画像データから、人物の頭部の位置及びスコアを算出してよい。なお、図３は、監視カメラ２により撮影された撮影空間３０の画像に対して、ＮＮ１３が当該画像に基づいて検出した頭部の位置及びスコアを当て嵌めた様子を示す。撮影空間３０の画像は、画像の左上を（０，０）とし、画像の右下を（ｘ，ｙ）とする座標系を有してよい。ｘは画像の横幅（width）のサイズ、ｙは画像の高さ（height）のサイズを、それぞれピクセル数で示した値であってよい。 FIG. 3 shows an example of detecting an object by NN13. As illustrated in FIG. 3, the NN 13 may calculate the position and score of the head of a person from the input image data. Note that FIG. 3 shows a state in which the position and score of the head detected by the NN 13 based on the image are applied to the image of the shooting space 30 taken by the surveillance camera 2. The image of the photographing space 30 may have a coordinate system in which the upper left of the image is (0,0) and the lower right of the image is (x, y). x may be a value indicating the size of the width of the image, and y may be a value indicating the size of the height of the image in terms of the number of pixels.

頭部の位置は、例えば、頭部を囲う矩形形状の領域として特定されてよい。以下、頭部を囲う矩形形状を「ボックス」と表記する場合がある。なお、ボックスの形状は、矩形形状に限定されるものではなく、円形状、楕円形状、多角形状等であってもよい。 The position of the head may be specified, for example, as a rectangular area surrounding the head. Hereinafter, the rectangular shape surrounding the head may be referred to as a "box". The shape of the box is not limited to a rectangular shape, and may be a circular shape, an elliptical shape, a polygonal shape, or the like.

スコアは、最大値を“１．０００”とする、検出した領域が人体の頭部であるという確からしさ（尤度）を示す情報の一例である。なお、図３の例は１２時（“12:00”）に撮影されたフレームである。 The score is an example of information indicating the certainty (likelihood) that the detected region is the head of the human body, with the maximum value being "1.000". The example of FIG. 3 is a frame taken at 12:00 (“12:00”).

検出対象の物体を人物（人体）の頭部とすることにより、人体のうちの監視カメラ２に移りやすい部位を捉えることができ、検出精度を向上させることができる。また、ディープラーニングにより物体を検出することにより、正面の顔以外にも、種々の姿勢或いは状態における人物の頭部を検出できる。 By setting the object to be detected as the head of a person (human body), it is possible to capture a portion of the human body that easily moves to the surveillance camera 2, and the detection accuracy can be improved. Further, by detecting an object by deep learning, it is possible to detect a person's head in various postures or states in addition to the front face.

例えば、ＮＮ１３は、背面（後頭部；図３の符号Ａ参照）、上面（頭頂部（俯いた姿勢）；符号Ｂ参照）、側面（横顔；符号Ｃ参照）等、人物の頭部が種々の姿勢であっても、正確に頭部を検出することができる。 For example, in NN13, the head of a person has various postures such as the back surface (occipital region; see reference numeral A in FIG. 3), the upper surface (top of the head (downward posture); see reference numeral B), and the side surface (profile; see reference numeral C). Even so, the head can be detected accurately.

また、ＮＮ１３は、頭部が他の物体に隠れている状態、例えば、椅子に遮られている状態（符号Ｄ参照）や、着帽状態（符号Ｅ参照）、或いは混雑した場所において他人や障害物に遮られている状態等であっても、正確に頭部を検出することができる。 In addition, the NN13 is a state in which the head is hidden by another object, for example, a state in which the head is blocked by a chair (see reference numeral D), a state in which a cap is worn (see reference numeral E), or an obstacle or an obstacle in a crowded place. The head can be detected accurately even when it is blocked by an object.

ＮＮ１３は、検出した頭部の位置及びスコアを検出情報１１２としてメモリ部１１に格納してよい。検出情報１１２はフレームごとに生成されてもよいし、検出情報１１２においてフレームの識別情報が物体に対応付けられてもよい。なお、検出情報１１２は、種々のフォーマットのファイルであってよく、一例として、ＣＳＶ（Comma-Separated Values）等の形式のファイルであってよい。 The NN 13 may store the detected head position and score as the detection information 112 in the memory unit 11. The detection information 112 may be generated for each frame, or the frame identification information may be associated with the object in the detection information 112. The detection information 112 may be a file in various formats, and as an example, it may be a file in a format such as CSV (Comma-Separated Values).

図４は検出情報１１２の一例である。図４に示すように、検出情報１１２は、例示的に、「Ｎｏ．」、「ｘ１」、「ｙ１」、「ｘ２」、「ｙ２」、「スコア」の情報が含まれてよい。なお、図４の例は、１つのフレーム（画像）において検出された物体の情報を示す。 FIG. 4 is an example of the detection information 112. As shown in FIG. 4, the detection information 112 may optionally include information of "No.", "x1", "y1", "x2", "y2", and "score". The example of FIG. 4 shows information on an object detected in one frame (image).

「Ｎｏ．」は検出した物体を識別する情報である。「Ｎｏ．」はフレームを識別する情報を更に含んでもよい。「ｘ１」及び「ｙ１」は、検出した頭部のボックスの左上の頂点の座標（ｘ１，ｙ１）を示し、「ｘ２」及び「ｙ２」は、検出した頭部のボックスの右下の頂点の座標（ｘ２，ｙ２）を示してよい。スコアは、（ｘ１，ｙ１）、（ｘ２，ｙ２）で表されるボックスの尤度を示してよい。なお、ＮＮ１３により検出情報１１２に設定されるボックスは、スコアが“０．５００”以上であるボックスに制限されてもよい。 "No." is information for identifying the detected object. "No." may further include information for identifying the frame. “X1” and “y1” indicate the coordinates (x1, y1) of the upper left vertex of the detected head box, and “x2” and “y2” indicate the coordinates of the lower right vertex of the detected head box. Coordinates (x2, y2) may be indicated. The score may indicate the likelihood of the box represented by (x1, y1), (x2, y2). The box set in the detection information 112 by the NN 13 may be limited to a box having a score of "0.500" or more.

〔１－３－２〕静的ボックス判定部の説明
静的ボックス判定部１２２は、座席位置の推定の前処理として、人物の頭部の検出結果である検出情報１１２に基づいて、座席に座っている人の頭部位置を判定してよい。 [1-3-2] Description of the static box determination unit The static box determination unit 122 sits on the seat based on the detection information 112, which is the detection result of the head of a person, as a preprocessing for estimating the seat position. You may determine the position of the head of the person who is doing it.

なお、静的ボックス判定部１２２による以下の処理の少なくとも一部は、ＮＮを用いたディープラーニングにより実行されてもよい。例えば、静的ボックス判定部１２２は、ＮＮをそなえてもよく、或いは、ＮＮ１３が更に以下の処理を実行するように構成されてもよい。 At least a part of the following processing by the static box determination unit 122 may be executed by deep learning using NN. For example, the static box determination unit 122 may be provided with an NN, or the NN 13 may be configured to further execute the following processing.

静的ボックス判定部１２２は、例えば、或るフレームに係る頭部の検出結果を、過去数フレームに係る頭部の検出結果と比較し、同じ位置に留まっているボックスを抽出する。同じ位置に留まっているボックスには、例えば、店舗３において、座席に着席している人物（の頭部）が含まれてよい。換言すれば、静的ボックス判定部１２２は、監視カメラ２の撮影画像から、撮影画像ごとに、滞留領域に留まっている物体を含む、滞留している物体を検出する。 For example, the static box determination unit 122 compares the detection result of the head related to a certain frame with the detection result of the head related to the past several frames, and extracts the box remaining at the same position. Boxes that remain in the same position may include, for example, a person (head) seated in a seat at store 3. In other words, the static box determination unit 122 detects a stagnant object, including an object staying in the stagnant region, for each captured image from the captured image of the surveillance camera 2.

フレーム間のボックスの同一性判定には、ボックスの座標（例えば、大きさ及び位置）の類似性判定、及び、ボックス自体の類似性判定が用いられてよい。 For the box identity determination between frames, the similarity determination of the box coordinates (for example, size and position) and the similarity determination of the box itself may be used.

以下、静的ボックス判定部１２２による「座っている人」の頭部位置のデータを取得する手法の一例について説明する。なお、以下の説明において、座っている人の頭部位置のボックスを「静的ボックス」と表記する。 Hereinafter, an example of a method of acquiring data on the head position of the “sitting person” by the static box determination unit 122 will be described. In the following description, the box at the head position of the sitting person is referred to as a "static box".

静的ボックス判定部１２２は、過去フレームに係る頭部の検出結果のデータを、以下の２種類のデータに分けて管理してよい。 The static box determination unit 122 may manage the data of the detection result of the head related to the past frame by dividing it into the following two types of data.

（ａ）静的ボックスのデータ。 (A) Static box data.

（ｂ）過去Ｎフレームにおける、静的ボックス以外の非静的ボックスのデータ。なお、Ｎ＝５であるものとする。 (B) Data of non-static boxes other than static boxes in the past N frames. It is assumed that N = 5.

静的ボックス判定部１２２は、例えば、静的ボックス及び非静的ボックスのデータを、ボックス情報１１３としてメモリ部１１に格納してよい。ボックス情報１１３は、例えば、静的ボックスのデータとして、静的ボックスごとに、複数のフレームにおける当該静的ボックスに対応すると判断したボックスの識別情報を含んでよい。また、ボックス情報１１３は、非静的ボックスのデータとして、複数のフレームにおける静的ボックスではないボックスの識別情報を含んでよい。 The static box determination unit 122 may store, for example, the data of the static box and the non-static box in the memory unit 11 as the box information 113. The box information 113 may include, for example, as the data of the static box, the identification information of the box determined to correspond to the static box in a plurality of frames for each static box. Further, the box information 113 may include identification information of a box that is not a static box in a plurality of frames as data of a non-static box.

このように、静的ボックス判定部１２２は、所定数以上の第１画像間において同一位置に存在すると判断した所定数以上の物体を一の静的物体と対応付けて管理する管理部の一例である。 As described above, the static box determination unit 122 is an example of a management unit that manages a predetermined number or more of objects determined to exist at the same position between a predetermined number or more of the first images in association with one static object. be.

静的ボックス判定部１２２は、例えば、各時刻の画像データに対して、検出された物体の位置及び各フレームにおける各物体の検出状況に基づいて、以下の（ｉ）～（iii）の処理を実行してよい。 For example, the static box determination unit 122 performs the following processes (i) to (iii) on the image data at each time based on the position of the detected object and the detection status of each object in each frame. You may do it.

以下の説明において、「現在の時刻のフレーム」とは、判定対象のフレームと読み替えてもよい。すなわち、静的ボックス判定部１２２は、判定対象のフレーム内で検出された各物体を、当該フレームよりも過去のＮフレームと比較してよい。また、静的ボックス判定部１２２は、判定を行なうと、次の（５分後の）フレームを判定対象のフレームとして、当該フレームよりも過去のＮフレームと比較してよい。 In the following description, the "frame of the current time" may be read as the frame to be determined. That is, the static box determination unit 122 may compare each object detected in the frame to be determined with N frames earlier than the frame. Further, when the static box determination unit 122 makes a determination, the next frame (after 5 minutes) may be set as the frame to be determined and compared with the N frames past the frame.

（ｉ）静的ボックス判定部１２２は、現在の時刻のフレームにおいて検出されたボックスを過去の時刻のフレームの静的ボックスと比較し、同一と判定したボックスを静的ボックスのデータに追加する。 (I) The static box determination unit 122 compares the box detected in the frame of the current time with the static box of the frame of the past time, and adds the box determined to be the same to the data of the static box.

なお、静的ボックス判定部１２２は、検出情報１１２から、各時刻のフレームにおいて検出されたボックスの情報を取得してよい。フレーム間において静的ボックスが同一であるか否かの判定手法は後述する。 The static box determination unit 122 may acquire information on the box detected in each time frame from the detection information 112. The method for determining whether or not the static boxes are the same between frames will be described later.

（ii）静的ボックス判定部１２２は、現在の時刻フレームにおいて検出されたボックスのうち、静的ボックスと判別されなかったボックスを、過去Ｎフレームの静的ボックス以外の非静的ボックスのデータと比較する。そして、静的ボックス判定部１２２は、静的ボックスの条件を満たすボックスの組み合わせを検出すると、当該ボックスを新たな静的ボックスのデータに追加する。 (Ii) The static box determination unit 122 sets the boxes detected in the current time frame that are not determined to be static boxes as the data of non-static boxes other than the static boxes in the past N frames. Compare. Then, when the static box determination unit 122 detects a combination of boxes that satisfy the conditions of the static box, the static box determination unit 122 adds the box to the data of the new static box.

例えば、静的ボックス判定部１２２は、Ｎ（Ｎ＝５）フレーム中のＭフレームに、互いに同一であると判定できるボックスが含まれていれば、これらのボックスのグループを新たに静的ボックスと判定してよい。なお、Ｍは、Ｎ未満の整数であり、例えば「３」等であってよい。 For example, if the M frame in the N (N = 5) frame contains boxes that can be determined to be identical to each other, the static box determination unit 122 newly sets a group of these boxes as a static box. You may judge. Note that M is an integer less than N and may be, for example, "3".

（iii）静的ボックス判定部１２２は、Ｍフレーム連続で観測（検出）されなかった静的ボックスについて、当該ボックスのデータを例えば記憶部のファイルに書き出し、メモリ上の静的ボックスのデータからは削除してよい。このような状況としては、例えば、着席していた人物が離席した等の事象が挙げられる。 (Iii) The static box determination unit 122 writes the data of the box to the file of the storage unit, for example, for the static box that has not been observed (detected) continuously in M frames, and from the data of the static box in the memory. You may delete it. Examples of such a situation include an event in which a person who has been seated has left the seat.

なお、上述のように、メモリ部１１は、メモリ及び記憶部の一方又は双方であってよい。従って、上記（iii）の処理は、過去Ｎフレーム中のＭフレームにおいて検出されたアクティブな静的ボックスのデータをメモリに格納し、過去Ｍフレームにおいて検出されていない非アクティブな静的ボックスのデータを記憶部に退避させる、ことを意味する。 As described above, the memory unit 11 may be one or both of the memory unit and the storage unit. Therefore, the process of (iii) above stores the data of the active static box detected in the M frame in the past N frames in the memory, and the data of the inactive static box not detected in the past M frame. Means to save to the storage unit.

図５は、時刻“10:00”、“10:05”、“10:10”の３フレームにおける静的ボックス及び非静的ボックスの一例を示す図である。図５に示すように、例えば、３フレームにおいて同一であると判定された、右上がりの網掛けで示すボックス、及び、右下がりの網掛けで示すボックスは、それぞれ、静的ボックス（“static box 1”及び“static box 2”）であると判定される。一方、過去５フレーム中で同一であると判定されたフレームが３フレームに満たない白背景のボックスは、非静的ボックス（“non-static box”）であると判定される。 FIG. 5 is a diagram showing an example of a static box and a non-static box in three frames of time “10:00”, “10:05”, and “10:10”. As shown in FIG. 5, for example, the box shown by the shaded up to the right and the box shown by the shaded down to the right, which are determined to be the same in the three frames, are static boxes (“static boxes”, respectively. 1 ”and“ static box 2 ”) are determined. On the other hand, a box with a white background in which the frames determined to be the same in the past 5 frames is less than 3 frames is determined to be a non-static box (“non-static box”).

次に、フレーム間のボックスの同一性の判定手法について説明する。 Next, a method for determining the identity of boxes between frames will be described.

（ボックスの座標の類似性判定）
はじめに、ボックスの座標の類似性判定の手法について説明する。静的ボックス判定部１２２は、フレーム間のボックス同士の座標を比較し、比較結果が閾値を超える場合、両者を「異なるボックス」と判定してよい。比較対象の座標は、大きさ及び位置の少なくとも一方を含んでよい。 (Similarity judgment of box coordinates)
First, a method for determining the similarity of the box coordinates will be described. The static box determination unit 122 may compare the coordinates of the boxes between the frames, and if the comparison result exceeds the threshold value, determine both as "different boxes". The coordinates to be compared may include at least one of magnitude and position.

静的ボックス判定部１２２は、ボックスの座標の類似性判定により、サイズが大きく異なる、及び／又は、位置（換言すれば距離）が大きく離れている、ボックスのペアを、静的ボックスの判定対象から除外してよい。これにより、その後のボックス自体の類似性判定における計算量を削減することができる。 The static box determination unit 122 determines a pair of boxes whose sizes are significantly different and / or whose positions (in other words, distances) are significantly different depending on the similarity determination of the box coordinates. May be excluded from. As a result, the amount of calculation in the subsequent similarity determination of the box itself can be reduced.

図６は、ボックスの座標の類似性判定処理の一例を示す図である。図６に例示するように、静的ボックス判定部１２２は、フレーム間のボックスの大きさの比較、及び、フレーム間のボックスの位置の比較、の少なくとも一方を実行してよい。なお、図６の例では、比較する２つのフレームとして、現在のフレーム（ｆ）及び１つ過去のフレーム（ｆ－１）を用いている。 FIG. 6 is a diagram showing an example of the similarity determination process of the coordinates of the box. As illustrated in FIG. 6, the static box determination unit 122 may execute at least one of the comparison of the size of the box between the frames and the comparison of the position of the box between the frames. In the example of FIG. 6, the current frame (f) and the past frame (f-1) are used as the two frames to be compared.

例えば、静的ボックス判定部１２２は、図６の左上に示すように、比較する２つのボックスのサイズ比が閾値、例えば“１．３”、以下であるか否かを判定する。なお、ボックスサイズ比は、例えば、（大きい方のボックスサイズ）／（小さい方のボックスサイズ）により求められてよい。ボックスサイズは、例えば、ボックスの幅及び高さの平均であってよく、（ｗｉｄｔｈ＋ｈｅｉｇｈｔ）／２により求められてよい。 For example, as shown in the upper left of FIG. 6, the static box determination unit 122 determines whether or not the size ratio of the two boxes to be compared is a threshold value, for example, “1.3” or less. The box size ratio may be obtained by, for example, (larger box size) / (smaller box size). The box size may be, for example, the average of the width and height of the box and may be determined by (wise + height) / 2.

また、例えば、静的ボックス判定部１２２は、図６の右上に示すように、比較する２つのボックス間の距離が、閾値、例えば“平均ボックスサイズ×１．５”以下であるか否かを判定する。なお、平均ボックスサイズは、比較する２つのボックス又は静的ボックスの平均のボックスサイズであってよい。或いは、平均ボックスサイズは、現在のフレーム、過去のＮフレーム、又は、これまで分析した全てのフレーム等において検出されたボックスの平均ボックスサイズであってよい。 Further, for example, as shown in the upper right of FIG. 6, the static box determination unit 122 determines whether or not the distance between the two boxes to be compared is a threshold value, for example, “average box size × 1.5” or less. judge. The average box size may be the average box size of the two boxes or static boxes to be compared. Alternatively, the average box size may be the average box size of the boxes detected in the current frame, the past N frames, or all the frames analyzed so far.

静的ボックス判定部１２２は、上記の２つの判定により、ボックスのサイズ比、及び、ボックス間の距離、のうちの少なくとも一方が、対応する閾値よりも大きい場合、２つのボックスを静的ボックスの判定対象から除外してよい。 When at least one of the box size ratio and the distance between the boxes is larger than the corresponding threshold value, the static box determination unit 122 determines the two boxes of the static box according to the above two determinations. It may be excluded from the judgment target.

（ボックス自体の類似性判定）
次に、ボックス自体の類似性判定の手法について説明する。静的ボックス判定部１２２は、例えば、比較する２つのボックスのクロップ画像の類似性、及び、ボックス同士の重なり、をそれぞれスコア化し、これらのスコアの総合スコアを算出してよい。そして、静的ボックス判定部１２２は、総合スコアが閾値（例えば“１”）以上であれば、２つボックスが同一であると判定してよい。 (Similarity judgment of the box itself)
Next, the method of determining the similarity of the box itself will be described. The static box determination unit 122 may, for example, score the similarity of the cropped images of the two boxes to be compared and the overlap between the boxes, and calculate the total score of these scores. Then, if the total score is equal to or higher than the threshold value (for example, “1”), the static box determination unit 122 may determine that the two boxes are the same.

なお、クロップ画像とは、画像データから物体（一実施形態では頭部）を切り出した画像である。クロップ画像の類似性のスコアを求めることは、例えば、色ヒストグラムの類似性のスコアを求めること、及び、ピクセル差分の類似性のスコアを求めること、を含んでよい。 The crop image is an image obtained by cutting out an object (head in one embodiment) from image data. Obtaining the similarity score of the crop image may include, for example, obtaining the similarity score of the color histogram and obtaining the similarity score of the pixel difference.

図７は、ボックス自体の類似性判定処理の一例を示す図である。図７に例示するように、静的ボックス判定部１２２は、以下のスコアを算出してよい。 FIG. 7 is a diagram showing an example of the similarity determination process of the box itself. As illustrated in FIG. 7, the static box determination unit 122 may calculate the following score.

（Ｉ）色ヒストグラムの類似性のスコア：“sim_color” (I) Color histogram similarity score: “sim_color”

色ヒストグラムとしては、例えば、ＲＧＢ（Red Green Blue）値のヒストグラムが挙げられる。 Examples of the color histogram include a histogram of RGB (Red Green Blue) values.

（II）ピクセル差分による類似性のスコア：“sim_pixel” (II) Similarity score by pixel difference: “sim_pixel”

ピクセル差分による類似性は、例えば、ＮＲＭＳＥ（Normalized Root Mean-Squared Error）等の手法により求められてよい。ＮＲＭＳＥによるピクセル差分の算出手法については、例えば、［ｈｔｔｐ:／／ｓｃｉｋｉｔ．ｉｍａｇｅ．ｏｒｇ／ｄｏｃｓ／ｄｅｖ／ａｐｉ／ｓｋｉｍａｇｅ．ｍｅａｓｕｒｅ．ｈｔｍｌ＃ｓｋｉｍａｇｅ．ｍｅａｓｕｒｅ．ｃｏｍｐａｒｅ＿ｎｒｍｓｅ］に記載されている。 Similarity due to pixel difference may be obtained by, for example, a method such as NRMSE (Normalized Root Mean-Squared Error). For the method of calculating the pixel difference by NRMSE, for example, [http: // scikit. image. org / docks / dev / api / skimage. measurement. html # skimage. measurement. compare_nrmse].

（III）ボックスの重なりの度合い：ＩｏＵ（Intersection over Union） (III) Degree of overlap of boxes: IoU (Intersection over Union)

ＩｏＵは、２つのボックス（換言すれば領域）の重なりの度合いを示す情報である。例えば、ＩｏＵは、図７に示すように、座標系において２つの領域を重ねた場合に、２つの領域を結合した結合領域（ＯＲ領域）に対する、重複する重複領域（ＡＮＤ領域）の比率を示す。 IoU is information indicating the degree of overlap of two boxes (in other words, regions). For example, as shown in FIG. 7, IoU indicates the ratio of the overlapping overlapping region (AND region) to the combined region (OR region) in which the two regions are overlapped when the two regions are overlapped in the coordinate system. ..

なお、上記（Ｉ）～（III）のスコアは、ニューラルネットワークにより算出されてもよい。 The scores of (I) to (III) may be calculated by a neural network.

静的ボックス判定部１２２は、算出した上記（Ｉ）～（III）のスコアに基づいて、“sim_color * sim_pixel + IoU”を総合スコアとして算出する。総合スコアは、物体の画像成分が一致するか否かを判断するための指標の一例である The static box determination unit 122 calculates "sim_color * sim_pixel + IoU" as a total score based on the calculated scores (I) to (III). The overall score is an example of an index for determining whether or not the image components of an object match.

そして、静的ボックス判定部１２２は、総合スコアが閾値（例えば“１”）以上か否かを判定する。総合スコアが閾値以上の場合、静的ボックス判定部１２２は、比較する２つのボックスがフレーム間で同一のボックスであると判定してよい。 Then, the static box determination unit 122 determines whether or not the total score is equal to or higher than the threshold value (for example, “1”). When the total score is equal to or higher than the threshold value, the static box determination unit 122 may determine that the two boxes to be compared are the same box between frames.

なお、静的ボックス判定部１２２は、上記（ｉ）の処理における、現在のフレームのボックスと過去のフレームの静的ボックスとの比較では、以下のようにして類似性の判定を行なってよい。例えば、静的ボックス判定部１２２は、現在のフレームにおける比較対象のボックスと、過去の複数のフレームの各々における比較対象の静的ボックスと、の間で、それぞれ類似性の総合スコアを算出してよい。そして、静的ボックス判定部１２２は、算出した類似性の総合スコアの平均値を計算し、平均値が閾値（例えば“１”）よりも大きいか否かを判定してよい。平均値が閾値よりも大きい場合に、比較対象のボックスは、静的ボックスに追加されてよい。 The static box determination unit 122 may determine the similarity as follows in the comparison between the box of the current frame and the static box of the past frame in the process of (i) above. For example, the static box determination unit 122 calculates the total score of similarity between the box to be compared in the current frame and the static box to be compared in each of the plurality of past frames. good. Then, the static box determination unit 122 may calculate the average value of the calculated total scores of the similarity and determine whether or not the average value is larger than the threshold value (for example, “1”). If the mean value is greater than the threshold, the box to be compared may be added to the static box.

また、静的ボックス判定部１２２は、上記（ii）の処理における、現在のフレームのボックスと過去のフレームの非静的ボックスとの比較では、以下のようにして類似性の判定を行なってよい。例えば、静的ボックス判定部１２２は、現在のフレームにおける比較対象のボックスと、過去の複数のフレームの各々における非静的ボックスと、の間で、それぞれ類似性を算出してよい。そして、静的ボックス判定部１２２は、過去Ｎ（例えば“５”）フレーム中のＭ（例えば“３”）フレームにおいて、算出した類似性の総合スコアが閾値（例えば“１”）よりも大きいか否かを判定してよい。Ｍフレーム以上で総合スコアが閾値よりも大きい場合に、比較対象のボックス及び比較した非静的ボックスは、新たに静的ボックスと判定されてよい。 Further, the static box determination unit 122 may determine the similarity as follows in the comparison between the box of the current frame and the non-static box of the past frame in the process of (ii) above. .. For example, the static box determination unit 122 may calculate the similarity between the box to be compared in the current frame and the non-static box in each of the plurality of past frames. Then, in the static box determination unit 122, is the total score of similarity calculated in the M (for example, "3") frame in the past N (for example, "5") frame larger than the threshold value (for example, "1")? It may be determined whether or not. When the total score is larger than the threshold value in M frames or more, the box to be compared and the non-static box compared may be newly determined as a static box.

以上のように、静的ボックス判定部１２２は、ボックス自体の類似性を判定することにより、座席位置の誤検出の確率を低減させることができる。 As described above, the static box determination unit 122 can reduce the probability of erroneous detection of the seat position by determining the similarity of the boxes themselves.

例えば、レジ前やトイレ前等の位置は、座席以外に顧客が留まる可能性のある場所である。このような場所は、５分間隔のフレーム間において、ボックスが連続して検出される可能性がある。 For example, a position in front of a cash register or a toilet is a place where customers may stay other than a seat. In such locations, boxes may be detected consecutively between frames at 5-minute intervals.

これに対し、静的ボックス判定部１２２は、ボックス自体の類似性を判定することにより、例えば、比較するボックスが互いに異なる人物の頭部である場合には、総合スコアが閾値以下となり、静的ボックスと判定されない。これにより、レジ前やトイレ前のような座席以外の場所において、互いに異なる人物の頭部が複数のフレームに亘って検出されたとしても、当該場所が座席位置として検出される可能性を低減又は排除することができ、座席位置の正確に推定することができる。 On the other hand, the static box determination unit 122 determines the similarity of the boxes themselves, so that, for example, when the boxes to be compared are the heads of people different from each other, the total score becomes equal to or less than the threshold value, and the static box determination unit 122 is static. Not judged as a box. As a result, even if the heads of different persons are detected over a plurality of frames in a place other than the seat such as in front of the cash register or in front of the toilet, the possibility that the place is detected as the seat position is reduced or It can be excluded and the seat position can be estimated accurately.

〔１－３－３〕座席推定部の説明
座席推定部１２３は、ボックス情報１１３として蓄積した「座っている人」の情報に基づいて、座席位置を推定し、推定した座席位置の情報を座席情報１１４としてメモリ部１１に格納する。 [1-3-3] Explanation of the seat estimation unit The seat estimation unit 123 estimates the seat position based on the information of the "sitting person" accumulated as the box information 113, and the seat estimation unit 123 uses the estimated seat position information as the seat. It is stored in the memory unit 11 as information 114.

なお、座席推定部１２３による以下の処理の少なくとも一部は、ＮＮを用いたディープラーニングにより実行されてもよい。例えば、座席推定部１２３は、ＮＮをそなえてもよく、或いは、ＮＮ１３が更に以下の処理を実行するように構成されてもよい。 In addition, at least a part of the following processing by the seat estimation unit 123 may be executed by deep learning using NN. For example, the seat estimation unit 123 may be provided with an NN, or the NN 13 may be further configured to perform the following processing.

例えば、座席推定部１２３は、静的ボックス判定部１２２が判定した「座っている人」の位置データ（静的ボックス）を一定期間（一実施形態では３日程度）蓄積したボックス情報１１３を利用し、座席位置及び座席数を推定してよい。 For example, the seat estimation unit 123 uses the box information 113 that stores the position data (static box) of the "sitting person" determined by the static box determination unit 122 for a certain period (about 3 days in one embodiment). Then, the seat position and the number of seats may be estimated.

一例として、座席推定部１２３は、静的ボックスの情報を元にクラスタリングを行ない、推測されたクラスタの数を座席数として扱い、各クラスタの位置を各座席の位置情報として扱ってよい。なお、静的ボックスの情報としては、位置、サイズ、及び、観測された時刻（フレーム）が挙げれられる。また、各クラスタの位置は、各クラスタに含まれる静的ボックスの位置の平均値に基づき算出されてよい。 As an example, the seat estimation unit 123 may perform clustering based on the information of the static box, treat the estimated number of clusters as the number of seats, and treat the position of each cluster as the position information of each seat. The static box information includes a position, a size, and an observed time (frame). Further, the position of each cluster may be calculated based on the average value of the positions of the static boxes included in each cluster.

一実施形態においては、店舗３の座席数は、初期設定の簡略化のために、取得・設定されていないものとする。この場合、クラスタリングのアルゴリズムは、階層型クラスタリングをベースとしてよい。 In one embodiment, the number of seats in the store 3 is not acquired or set for the sake of simplification of the initial setting. In this case, the clustering algorithm may be based on hierarchical clustering.

例えば、図８に示すように、座席推定部１２３は、観測された全ての静的ボックスが別々のクラスタである状態を初期状態とし（“static boxes”参照）、距離の近いクラスタを同一クラスタとしてマージしてよい（“clustering result”参照）。マージする距離に閾値を設けることで、距離の離れたクラスタ同士はマージされずに残るため、座席推定部１２３は、残った各クラスタを最終的な座席データと推定してよい（“estimated seat positions”参照）。 For example, as shown in FIG. 8, the seat estimation unit 123 sets the initial state in which all the observed static boxes are separate clusters (see “static boxes”), and sets the clusters that are close to each other as the same cluster. May be merged (see “clustering result”). By setting a threshold value for the distance to be merged, clusters separated by a distance remain without being merged. Therefore, the seat estimation unit 123 may estimate each remaining cluster as the final seat data (“estimated seat positions”). "reference).

ここで、クラスタとは、観測された静的ボックスをグループ化したものであり、複数の第１画像間で相互に関連する物体の一例である。同一クラスタ内の静的ボックス同士は、同じ座席に座った別の人の観測結果と考えられる。すなわち、一実施形態においては、最終的に残ったクラスタの位置を物体が滞留する位置と捉え、物体が滞留する位置を、座席位置として捉えることによって、座席位置を推定するのである。 Here, the cluster is a group of observed static boxes, and is an example of objects that are related to each other among a plurality of first images. Static boxes in the same cluster are considered to be observations of different people sitting in the same seat. That is, in one embodiment, the seat position is estimated by regarding the position of the finally remaining cluster as the position where the object stays and the position where the object stays as the seat position.

以下、図９を参照して、階層化クラスタリングを行なうための距離指標について説明する。図９は、階層化クラスタリングの距離指標の一例を示す図である。 Hereinafter, a distance index for performing hierarchical clustering will be described with reference to FIG. 9. FIG. 9 is a diagram showing an example of a distance index of hierarchical clustering.

座席推定部１２３は、距離計算のためのクラスタの代表値を設定してよい。図９の例では、座席推定部１２３は、各静的ボックスについて、クラスタ内の静的ボックスの中心座標の平均値を（ｃ_ｘ，ｃ_ｙ）、ボックスサイズの平均値をｓｉｚｅ、クラスタ内に含まれる静的ボックスの数をｎ、としてよい。また、座席推定部１２３は、クラスタ内に含まれる静的ボックスについて、観測のあった時刻（フレーム）を“１”、観測の無い時刻（フレーム）を“０”、とする２値の配列を生成し、例えばメモリ部１１に保持してよい。配列は、複数の第１画像の各々における各物体の検出状況を示す情報の一例である。 The seat estimation unit 123 may set a representative value of the cluster for distance calculation. In the example of FIG. 9, the seat estimation unit 123 sets the average value of the center coordinates of the static boxes in the cluster to (c _x , _cy ), the average value of the box sizes to size, and the average value of the box sizes in the cluster for each static box. The number of static boxes included may be n. Further, the seat estimation unit 123 arranges a binary array in which the time (frame) with observation is “1” and the time (frame) without observation is “0” for the static box included in the cluster. It may be generated and held in, for example, the memory unit 11. The array is an example of information indicating the detection status of each object in each of the plurality of first images.

座席推定部１２３は、上記の代表値を用いて、図９に例示するように、下記の（１）～（３）式を算出することで、クラスタ間の距離指標を求めてよい。 The seat estimation unit 123 may obtain a distance index between clusters by calculating the following equations (1) to (3) as illustrated in FIG. 9 using the above representative values.

ここで、（ｃ_ｘＡ，ｃ_ｙＡ），ｓｉｚｅ_Ａ，ｎ_Ａは、それぞれ、クラスタＡの座標、サイズ、静的ボックス数を示し、（ｃ_ｘＢ，ｃ_ｙＢ），ｓｉｚｅ_Ｂ，ｎ_Ｂは、それぞれ、クラスタＢの座標、サイズ、静的ボックス数を示す。 Here, (c _xA , _cyA ), size _A , and n _A indicate the coordinates, size, and number of static boxes of cluster A, respectively, and (c _xB , _cyB ), size _B , and n _B , respectively. , The coordinates of cluster B, the size, and the number of static boxes.

なお、上記（３）式のうち、“penalty”以外の部分は、ｗａｒｄ法を用いた階層化クラスタリングの距離指標と同じである。これに対して、一実施形態では、ｗａｒｄ法の距離指標に“penalty”を追加することにより、座席位置及び座席数の推定精度を向上させている。 In the above equation (3), the portion other than "penalty" is the same as the distance index of the hierarchical clustering using Ward's method. On the other hand, in one embodiment, the accuracy of estimating the seat position and the number of seats is improved by adding "penalty" to the distance index of the Ward method.

例えば、クラスタＡ及びクラスタＢ間で同時刻（同一フレーム）に静的ボックス、換言すれば「座っている人」、が観測された場合、これらのクラスタには、互いに別の座席が存在することが予想される。 For example, if a static box, in other words, a "sitting person", is observed between clusters A and B at the same time (same frame), these clusters have different seats from each other. Is expected.

このため、上記（３）式では、クラスタＡ及びクラスタＢ間で同時刻（同一フレーム）に静的ボックスが観測された場合に、観測された回数に応じてクラスタ間の距離を大きくするようなペナルティ項を追加している。例えば、座席推定部１２３は、観測時刻の配列同士のＡＮＤを取り、ＡＮＤ結果の配列のＳＵＭ（合計値）をペナルティとして、上記（３）式に適用してよい。 Therefore, in the above equation (3), when a static box is observed between clusters A and B at the same time (same frame), the distance between the clusters is increased according to the number of observations. A penalty term has been added. For example, the seat estimation unit 123 may take an AND between arrays of observation times and apply the SUM (total value) of the array of AND results to the above equation (3) as a penalty.

このように、座席推定部１２３は、静的物体ごとに、複数の第１画像の各々において当該静的物体に対応付けられた物体が存在するか否かを示す情報を生成してよい。そして、座席推定部１２３は、生成した情報に基づき、対応する物体が一の第１画像内に存在する静的物体同士をグループ化の対象から除外してよい。 In this way, the seat estimation unit 123 may generate information indicating whether or not an object associated with the static object exists in each of the plurality of first images for each static object. Then, the seat estimation unit 123 may exclude static objects whose corresponding objects exist in the first image from the grouping target based on the generated information.

座席推定部１２３は、上記（３）式により得られるｄ_ｗａｒｄを距離指標として用い、全ての静的ボックスのデータが別々のクラスタである初期状態から（図１０の「初期状態」参照）、以下の手順により、階層化クラスタリングを実行してよい。 The seat estimation unit 123 uses the _dward obtained by the above equation (3) as a distance index, and from the initial state in which the data of all static boxes are separate clusters (see "initial state" in FIG. 10), the following. Hierarchical clustering may be performed by the procedure of.

例えば、座席推定部１２３は、クラスタ間の距離指標（ｄ_ｗａｒｄ）が最も小さいクラスタペアを決定し、下記（４）式の停止条件（stop condition）を満たすか否かを判定する。 For example, the seat estimation unit 123 determines the cluster pair having the smallest distance index ( _dward ) between clusters, and determines whether or not the stop condition of the following equation (4) is satisfied.

上記（４）式の停止条件を満たす場合、座席推定部１２３は、階層化クラスタリングを終了し、残っているクラスタの情報を座席情報１１４としてメモリ部１１に格納する。 When the stop condition of the above equation (4) is satisfied, the seat estimation unit 123 ends the hierarchical clustering and stores the remaining cluster information as the seat information 114 in the memory unit 11.

一方、上記（４）式の停止条件を満たさない場合、座席推定部１２３は、当該クラスタペアをマージして新しいクラスタとする。 On the other hand, if the stop condition of the above equation (4) is not satisfied, the seat estimation unit 123 merges the cluster pairs into a new cluster.

座席推定部１２３は、上記の処理を、停止条件が満たされるか、或いは、クラスタ数が１になるまで繰り返し実行する。 The seat estimation unit 123 repeatedly executes the above process until the stop condition is satisfied or the number of clusters becomes 1.

このように、階層化クラスタリングによりマージされずに残ったクラスタがそれぞれ座席位置として推定される（図１０の「座席位置の推定結果」参照）。座席推定部１２３は、クラスタに含まれる静的ボックスの中心座標の平均値（ｃ_ｘ，ｃ_ｙ）、及び、ボックスサイズの平均値（ｓｉｚｅ）を、座席の位置及び参考サイズのデータとして用いてよい。 In this way, the clusters remaining unmerged by the hierarchical clustering are estimated as seat positions (see “Seat position estimation result” in FIG. 10). The seat estimation unit 123 uses the average value (c _x , _cy ) of the center coordinates of the static boxes included in the cluster and the average value (size) of the box size as the data of the seat position and the reference size. good.

なお、座席推定部１２３は、マージされずに残ったクラスタのうち、クラスタ内に含まれるサンプル数（ｎの値）が極端に少ないクラスタを、ノイズと判断して座席情報から除外してもよい。サンプル数が極端に少ないクラスタとは、例えば、他のクラスタ内に含まれるサンプル数の平均値の数％（例示的に、２％）以下のクラスタであってよい。 The seat estimation unit 123 may determine that the clusters having an extremely small number of samples (value of n) contained in the clusters among the clusters remaining without being merged are noise and exclude them from the seat information. .. The cluster with an extremely small number of samples may be, for example, a cluster having a sample number of several percent (exemplarily 2%) or less of the average number of samples contained in other clusters.

〔１－３－４〕混雑度算出部の説明
混雑度算出部１２４は、混雑状況推定フェーズにおいて、頭部の検出結果と座席情報１１４とを用いて、混雑度を算出する。 [1-3-4] Description of Congestion Degree Calculation Unit The congestion degree calculation unit 124 calculates the congestion degree using the head detection result and the seat information 114 in the congestion status estimation phase.

例えば、混雑度算出部１２４は、人の頭部の検出結果と座席情報１１４の各座席位置との距離を算出し、距離の近い検出結果及び座席位置のペアから順に、座席位置に対して、座席位置から閾値以下の距離に存在する人の頭部の検出結果を割り当てる。 For example, the congestion degree calculation unit 124 calculates the distance between the detection result of the human head and each seat position of the seat information 114, and with respect to the seat position in order from the pair of the detection result and the seat position having the closest distance. Allocate the detection result of the head of a person who is at a distance below the threshold from the seat position.

これにより、頭部の検出結果が割り当てられた座席（人の居る座席）の情報と、検出結果が割り当てられない座席（空席）の情報と、が得られる。なお、頭部の検出位置と座席位置とが所定の距離以上離れている場合は、座席位置に対して当該頭部を割り当てなくてよい。 As a result, information on the seat to which the detection result of the head is assigned (seat with a person) and information on the seat to which the detection result is not assigned (vacant seat) can be obtained. If the detection position of the head and the seat position are separated by a predetermined distance or more, the head may not be assigned to the seat position.

人の頭部の検出結果としては、例えば、混雑状況推定フェーズにおいて取得された第２の画像における物体の検出結果が用いられてよい。一例として、人の頭部の検出結果は、メモリ部１１に格納された検出情報１１２に含まれる、最新の（或いは端末装置６から要求された時刻の）検出対象のフレームにおいて検出された物体の情報が用いられてよい。 As the detection result of the human head, for example, the detection result of the object in the second image acquired in the congestion situation estimation phase may be used. As an example, the detection result of the human head is the object detected in the latest detection target frame (or the time requested by the terminal device 6) included in the detection information 112 stored in the memory unit 11. Information may be used.

混雑度算出部１２４は、例えば、“人の居る座席数／全座席数”を算出することにより、検出対象のフレームに基づく混雑率を算出してよい。また、混雑度算出部１２４は、算出した混雑率に対して、“０．３”，“０．６”等の閾値を設けることにより、「空いている」／「通常」／「混んでいる」等の混雑状況の程度を表す混雑度（離散値）を推定してよい。なお、上記の例では、３段階の混雑度を推定するものとしたが、閾値の数を増減させることにより、混雑度の段階数を増減させてもよい。 The congestion degree calculation unit 124 may calculate the congestion rate based on the frame to be detected, for example, by calculating "the number of seats with people / the total number of seats". Further, the congestion degree calculation unit 124 is "vacant" / "normal" / "crowded" by setting threshold values such as "0.3" and "0.6" for the calculated congestion rate. The degree of congestion (discrete value) indicating the degree of congestion such as "" may be estimated. In the above example, the degree of congestion is estimated in three stages, but the number of stages of the degree of congestion may be increased or decreased by increasing or decreasing the number of threshold values.

なお、上述のように、混雑状況推定フェーズ中に座席位置推定（更新）フェーズが実行される場合には、静的ボックス判定部１２２及び座席推定部１２３は、第２（第１）の画像における物体の検出結果を用いて、座席情報１１４を推定（更新）してよい。 As described above, when the seat position estimation (update) phase is executed during the congestion status estimation phase, the static box determination unit 122 and the seat estimation unit 123 are in the second (first) image. The seat information 114 may be estimated (updated) using the detection result of the object.

図１１に混雑度算出部１２４による混雑度の推定結果の一例を示す。なお、図１１は、監視カメラ２により撮影された撮影空間３０の画像に対して、混雑度算出部１２４が当該画像に基づいて検出した、人の居る座席と、空席とを当て嵌めた様子を示す。図１１においては、座席位置を丸の記号で示し、人が着席している座席位置を四角の記号で示している。 FIG. 11 shows an example of the estimation result of the degree of congestion by the degree of congestion calculation unit 124. In addition, FIG. 11 shows a state in which a seat with a person and a vacant seat detected by the congestion degree calculation unit 124 based on the image of the shooting space 30 taken by the surveillance camera 2 are fitted. show. In FIG. 11, the seat position is indicated by a circle symbol, and the seat position in which a person is seated is indicated by a square symbol.

図１１（ａ）は混雑度が「空いている」場合、例えば混雑率が“０．３”未満である場合を示す。図１１（ｂ）は混雑度が「通常」の場合、例えば混雑率が“０．３”以上且つ“０．６”未満である場合を示す。図１１（ｃ）は混雑度が「混んでいる」場合、例えば混雑率が“０．６”以上である場合を示す。 FIG. 11A shows a case where the degree of congestion is “vacant”, for example, a case where the congestion rate is less than “0.3”. FIG. 11B shows a case where the degree of congestion is “normal”, for example, a case where the congestion rate is “0.3” or more and less than “0.6”. FIG. 11C shows a case where the degree of congestion is “crowded”, for example, a case where the congestion rate is “0.6” or more.

混雑度算出部１２４は、推定した混雑度の情報を、混雑度情報１１５としてメモリ部１１に格納してよい。 The congestion degree calculation unit 124 may store the estimated congestion degree information in the memory unit 11 as the congestion degree information 115.

なお、図１１（ａ）において符号Ａで示すように、座席位置の情報が存在しないエリアで検出された物体には、座席が割り当てられなくてよく、この場合、当該物体は混雑率の算出に考慮しなくてよい。これにより、店員や移動中の顧客等が混雑率の算出に影響を与えないようにすることができる。なお、図１１（ａ）においては、座席が割り当てられない物体を二重線の四角の記号で示している。 As shown by reference numeral A in FIG. 11A, a seat does not have to be assigned to an object detected in an area where the seat position information does not exist, and in this case, the object is used for calculating the congestion rate. You don't have to consider it. As a result, it is possible to prevent the clerk, the moving customer, and the like from affecting the calculation of the congestion rate. In FIG. 11A, an object to which a seat is not assigned is indicated by a double-lined square symbol.

〔１－３－５〕情報提示部の説明
情報提示部１４は、制御部１２により推定された混雑度の情報を、例えば端末装置６に提示してよい。 [1-3-5] Explanation of Information Presentation Unit The information presentation unit 14 may present information on the degree of congestion estimated by the control unit 12 to, for example, the terminal device 6.

図１２は、情報提示部１４による端末装置６への情報の提示例を示す図である。情報提示部１４は、例えば、図１２に示すように、端末装置６からの要求に応じて、端末装置６の表示装置６０に対して、混雑度算出部１２４による推定結果である混雑度情報１１５に基づく店舗３の混雑度を提示してよい。 FIG. 12 is a diagram showing an example of presentation of information to the terminal device 6 by the information presentation unit 14. For example, as shown in FIG. 12, the information presenting unit 14 responds to a request from the terminal device 6 with respect to the display device 60 of the terminal device 6, and the congestion degree information 115 which is an estimation result by the congestion degree calculation unit 124. The degree of congestion of the store 3 based on the above may be presented.

提示される情報は、１つの店舗３に関する情報であってもよいし、図１２に示すように複数の店舗３の各々に関する情報であってもよい。また、情報提示部１４は、図１２に示すように、店舗３の空席状況を閲覧できるようにしてもよい。空席情報は、例えば、推定した座席位置に基づきマップを生成し、当該マップに対して、空席或いは人が着席している座席を特定（例えば印を付ける等）した情報であってよい。 The information presented may be information about one store 3 or information about each of the plurality of stores 3 as shown in FIG. Further, as shown in FIG. 12, the information presenting unit 14 may be able to browse the vacant seat status of the store 3. The vacant seat information may be, for example, information that generates a map based on the estimated seat position and identifies (for example, marks) a vacant seat or a seat in which a person is seated on the map.

なお、端末装置６及び情報提示部１４は、一例として、端末装置６がＷｅｂクライアント、情報提示部１４がＷｅｂサーバとして機能してよい。この場合、端末装置６からのリクエストに応じて、情報提示部１４がＷｅｂページを生成し、生成したＷｅｂページを端末装置６に応答してよい。なお、リクエストは、例えば、所定のＵＲＬ（Uniform Resource Locator）に対するｈｔｔｐ（Hypertext Transfer Protocol）リクエストであってよい。また、Ｗｅｂページは、例えば、ｈｔｍｌ（HyperText Markup Language）等のマークアップ言語により生成されてよい。 As an example of the terminal device 6 and the information presentation unit 14, the terminal device 6 may function as a Web client and the information presentation unit 14 may function as a Web server. In this case, the information presenting unit 14 may generate a Web page in response to a request from the terminal device 6, and may respond to the generated Web page to the terminal device 6. The request may be, for example, a http (Hypertext Transfer Protocol) request for a predetermined URL (Uniform Resource Locator). Further, the Web page may be generated by a markup language such as html (HyperText Markup Language).

以上のように、一実施形態によれば、サーバ４は、ＮＮ１３により、監視カメラ２の映像における人物の頭部を検出し、検出した頭部の位置を、静的ボックス判定部１２２及び座席推定部１２３により、例えばディープラーニングにより学習してよい。 As described above, according to one embodiment, the server 4 detects the head of a person in the image of the surveillance camera 2 by the NN 13, and estimates the position of the detected head by the static box determination unit 122 and the seat. The unit 123 may be used for learning, for example, by deep learning.

人物の頭部が頻繁に滞留する、例えば停止する場所は、座席の位置と推定することができる。従って、映像から検出した人物の頭部が、推定した座席位置に重なるか否かを判定することにより、座席に人が居るか否かを判断でき、混雑率を算出できるようになる。 The location where the person's head frequently stays, for example, stops, can be presumed to be the position of the seat. Therefore, by determining whether or not the head of the person detected from the video overlaps with the estimated seat position, it is possible to determine whether or not there is a person in the seat, and the congestion rate can be calculated.

また、静的ボックス判定部１２２及び座席推定部１２３による処理によって、監視カメラ２の映像から座席位置を推定することができるため、店舗３の座席位置のマップや座席数を事前に取得しなくても、一実施形態に係る手法を適用できる。 Further, since the seat position can be estimated from the image of the surveillance camera 2 by the processing by the static box determination unit 122 and the seat estimation unit 123, it is not necessary to acquire the map of the seat position of the store 3 and the number of seats in advance. Also, the method according to one embodiment can be applied.

例えば、店舗３において、監視カメラ２の映像をサーバ４に送信する等の簡素な初期設定を行なうことで、一実施形態に係る手法を容易に店舗３に適用でき、高精度に座席の混雑率を把握可能となる。特に、多くの店舗３の座席の混雑状況を把握する際には、煩雑な初期設定等が不要となるため、従来のように１店舗ごとに監視カメラ２の撮影領域等の設定を行なうといった、多くのコストを削減できる。 For example, in the store 3, by performing simple initial settings such as transmitting the image of the surveillance camera 2 to the server 4, the method according to one embodiment can be easily applied to the store 3, and the seat congestion rate can be achieved with high accuracy. Can be grasped. In particular, when grasping the congestion status of the seats of many stores 3, complicated initial settings and the like are not required, so the shooting area and the like of the surveillance camera 2 are set for each store as in the conventional case. Many costs can be reduced.

一実施形態に係る手法は、例えば、以下のような場面に適用し、又は、応用することができる。 The method according to one embodiment can be applied or applied to, for example, the following situations.

・店舗３の混雑状況をリアルタイムに監視及び数値化する。
・施設や店舗３の経営者が混雑状況を把握し、店舗戦略に活用する。
・施設や店舗３の利用者が、混雑状況を端末装置６のモバイルアプリ等で確認し、施設や店舗３に行くか否かの意思決定を行なう。・ Monitor and quantify the congestion status of store 3 in real time.
・ The managers of facilities and stores 3 grasp the congestion situation and utilize it for store strategy.
-The user of the facility or store 3 confirms the congestion status with the mobile application of the terminal device 6 or the like, and makes a decision as to whether or not to go to the facility or store 3.

〔１－４〕動作例
次に、図１３～図１５を参照して、上述の如く構成された混雑度推定システム１の動作例を説明する。 [1-4] Operation Example Next, an operation example of the congestion degree estimation system 1 configured as described above will be described with reference to FIGS. 13 to 15.

〔１－４－１〕座席位置推定フェーズの動作例
はじめに、図１３及び図１４を参照して、座席位置推定フェーズの動作例を説明する。 [1-4-1] Operation example of the seat position estimation phase First, an operation example of the seat position estimation phase will be described with reference to FIGS. 13 and 14.

サーバ４では、制御部１２の情報取得部１２１が、店舗３に設置された監視カメラ２の映像データを取得し、映像データ１１１としてメモリ部１１に格納する。 In the server 4, the information acquisition unit 121 of the control unit 12 acquires the video data of the surveillance camera 2 installed in the store 3 and stores it in the memory unit 11 as the video data 111.

図１３に例示するように、サーバ４のＮＮ１３は、映像データ１１１から一定間隔（例えば５分間隔）でフレームを取得し（ステップＳ１）、フレームから人物の頭部のボックスを検出する（ステップＳ２）。なお、ＮＮ１３は、検出したボックスを検出情報１１２としてメモリ部１１に格納してよい。 As illustrated in FIG. 13, the NN13 of the server 4 acquires frames from the video data 111 at regular intervals (for example, every 5 minutes) (step S1), and detects a box of a person's head from the frames (step S2). ). The NN 13 may store the detected box as the detection information 112 in the memory unit 11.

制御部１２の静的ボックス判定部１２２は、検出情報１１２に基づいて、現在（例えば最新）のフレーム内の各ボックスを、過去Ｎ（例えばＮ＝５）フレーム内のボックスと比較する（ステップＳ３）。 The static box determination unit 122 of the control unit 12 compares each box in the current (for example, the latest) frame with the box in the past N (for example, N = 5) frame based on the detection information 112 (step S3). ).

なお、ボックス間の比較は、ボックスのサイズ及び／又はボックス間の距離、の比較によるスクリーニングと、色ヒストグラム、ピクセル差分、及びＩｏＵ等を用いたボックスの類似性の総合スコア同士の比較と、を含んでよい（図６及び図７参照）。 For comparison between boxes, screening by comparison of box size and / or distance between boxes and comparison of total scores of box similarity using color histograms, pixel differences, and IoU, etc. are performed. May include (see FIGS. 6 and 7).

静的ボックス判定部１２２は、比較した現在のフレーム内のボックスが過去のフレーム内のいずれかの静的ボックスと同一か否かを判定する（ステップＳ４）。 The static box determination unit 122 determines whether or not the box in the compared current frame is the same as any static box in the past frame (step S4).

ボックスが過去のフレーム内のいずれかの静的ボックスと同一の場合（ステップＳ４でＹｅｓ）、静的ボックス判定部１２２は、当該ボックスを、ボックス情報１１３における当該静的ボックスのデータに追加し（ステップＳ５）、処理がステップＳ８に移行する。 If the box is the same as any static box in the past frame (Yes in step S4), the static box determination unit 122 adds the box to the data of the static box in the box information 113 (Yes). Step S5), the process proceeds to step S8.

一方、ボックスが過去のフレーム内のいずれの静的ボックスとも同一ではない場合（ステップＳ４でＮｏ）、過去Ｎフレーム中のＭ（例えばＭ＝３）フレームに、ボックスと同一の非静的ボックスが存在するか否かを判定する（ステップＳ６）。ステップＳ６の条件を満たさない場合（ステップＳ６でＮｏ）、処理がステップＳ８に移行する。 On the other hand, when the box is not the same as any static box in the past frame (No in step S4), the same non-static box as the box is in the M (for example, M = 3) frame in the past N frames. It is determined whether or not it exists (step S6). If the condition of step S6 is not satisfied (No in step S6), the process proceeds to step S8.

一方、ステップＳ６の条件を満たす場合（ステップＳ６でＹｅｓ）、静的ボックス判定部１２２は、ボックス情報１１３において、これらのボックス及び非静的ボックスを新たな静的ボックスとして管理し（ステップＳ７）、処理がステップＳ８に移行する。 On the other hand, when the condition of step S6 is satisfied (Yes in step S6), the static box determination unit 122 manages these boxes and non-static boxes as new static boxes in the box information 113 (step S7). , The process proceeds to step S8.

ステップＳ８では、過去Ｍフレーム連続で観測のない（すなわち検出されていない）静的ボックスが存在するか否かを判定する。該当する静的ボックスが存在しない場合（ステップＳ８でＮｏ）、処理がステップＳ１０に移行する。 In step S8, it is determined whether or not there is a static box that has not been observed (that is, has not been detected) in the past M frames in a row. If the corresponding static box does not exist (No in step S8), the process proceeds to step S10.

一方、該当する静的ボックスが存在する場合（ステップＳ８でＹｅｓ）、静的ボックス判定部１２２は、当該静的ボックスをファイルに書き出し、メモリ上から削除する（ステップＳ９）。そして、静的ボックス判定部１２２は、現在のフレームにおいて全てのボックスを比較したか否かを判定する（ステップＳ１０）。 On the other hand, when the corresponding static box exists (Yes in step S8), the static box determination unit 122 writes the static box to a file and deletes it from the memory (step S9). Then, the static box determination unit 122 determines whether or not all the boxes have been compared in the current frame (step S10).

全てのボックスを比較していない場合（ステップＳ１０でＮｏ）、処理がステップＳ３に移行し、比較を未実施のボックスについて比較を行なう。一方、全てのボックスを比較した場合（ステップＳ１０でＹｅｓ）、静的ボックス判定部１２２は、ボックス情報１１３の蓄積を開始してから所定期間、例えば３日間が経過したか否かを判定する（ステップＳ１１）。 If all the boxes have not been compared (No in step S10), the process proceeds to step S3, and the boxes for which comparison has not been performed are compared. On the other hand, when all the boxes are compared (Yes in step S10), the static box determination unit 122 determines whether or not a predetermined period, for example, 3 days has elapsed since the accumulation of the box information 113 was started (Yes). Step S11).

所定期間が経過していない場合（ステップＳ１１でＮｏ）、処理がステップＳ１に移行し、次に入力されるフレームについて、上記の処理を実行する。 If the predetermined period has not elapsed (No in step S11), the process proceeds to step S1, and the above process is executed for the frame to be input next.

一方、所定期間が経過した場合（ステップＳ１１でＹｅｓ）、処理が図１４のステップＳ１２に移行する。なお、この場合、ＮＮ１３及び静的ボックス判定部１２２は、座席位置推定（更新）フェーズとして、入力される映像データ１１１に基づき処理を継続してもよい。 On the other hand, when the predetermined period has elapsed (Yes in step S11), the process proceeds to step S12 in FIG. In this case, the NN 13 and the static box determination unit 122 may continue the processing based on the input video data 111 as the seat position estimation (update) phase.

図１４のステップＳ１２では、座席推定部１２３が、各静的ボックスを初期のクラスタとして、各クラスタの代表値を設定する。例えば、座席推定部１２３は、クラスタ内の静的ボックスの中心座標の平均値（ｃ_ｘ，ｃ_ｙ）、ボックスサイズの平均値“ｓｉｚｅ”、クラスタ内の静的ボックス数“ｎ”、クラスタ内の静的ボックスの観測有無を示す配列、等を設定してよい。なお、配列には、例えば、フレームごとに、静的ボックスが観測されていれば“１”、観測されていなければ“０”が設定されてよい。 In step S12 of FIG. 14, the seat estimation unit 123 sets the representative value of each cluster with each static box as the initial cluster. For example, the seat estimation unit 123 has an average value (c _x , _cy ) of the center coordinates of the static boxes in the cluster, an average value of the box size “size”, the number of static boxes in the cluster “n”, and the cluster. An array indicating the presence or absence of observation of the static box of is set. In the array, for example, "1" may be set for each frame if a static box is observed, and "0" may be set if the static box is not observed.

次いで、座席推定部１２３は、クラスタのペアごとに、クラスタ間の距離指標（ｄ_ｗａｒｄ）を算出する（ステップＳ１３）。なお、距離指標（ｄ_ｗａｒｄ）は、上記（３）式により算出されてよい。距離指標（ｄ_ｗａｒｄ）の算出には、クラスタペアの配列同士のＡＮＤ結果のＳＵＭ（合計値）により求められるペナルティが加味されてよい（図９参照）。 Next, the seat estimation unit 123 calculates a distance index ( _dward ) between clusters for each pair of clusters (step S13). The distance index ( _dward ) may be calculated by the above equation (3). In the calculation of the distance index ( _dward ), a penalty obtained by the SUM (total value) of the AND result between the arrays of the cluster pair may be added (see FIG. 9).

座席推定部１２３は、距離指標（ｄ_ｗａｒｄ）が最小となるクラスタペアを決定し（ステップＳ１４）、当該クラスタペアが停止条件を満たすか否かを判定する（ステップＳ１５）。停止条件は、上記（４）式により算出されてよい（図９参照）。 The seat estimation unit 123 determines the cluster pair having the smallest distance index ( _dward ) (step S14), and determines whether or not the cluster pair satisfies the stop condition (step S15). The stop condition may be calculated by the above equation (4) (see FIG. 9).

クラスタペアが停止条件を満たさない場合（ステップＳ１５でＮｏ）、座席推定部１２３は、当該クラスタペアをマージして新しいクラスタとする（ステップＳ１６）。 When the cluster pair does not satisfy the stop condition (No in step S15), the seat estimation unit 123 merges the cluster pair into a new cluster (step S16).

座席推定部１２３は、残りクラスタ数が１であるか否かを判定する（ステップＳ１７）。残りクラスタ数が１ではない場合（ステップＳ１７でＮｏ）、処理がステップＳ１３に移行する。 The seat estimation unit 123 determines whether or not the number of remaining clusters is 1 (step S17). When the number of remaining clusters is not 1 (No in step S17), the process proceeds to step S13.

ステップＳ１５でクラスタペアが停止条件を満たす場合（ステップＳ１５でＹｅｓ）、又は、ステップＳ１７で残りクラスタ数が１の場合（ステップＳ１７でＹｅｓ）、処理がステップＳ１８に移行する。 If the cluster pair satisfies the stop condition in step S15 (Yes in step S15), or if the number of remaining clusters is 1 in step S17 (Yes in step S17), the process proceeds to step S18.

ステップＳ１８では、座席推定部１２３は、クラスタ内の静的ボックス数が他のクラスタの静的ボックス数の所定割合（例えば２％）以下であるクラスタを削除する。 In step S18, the seat estimation unit 123 deletes a cluster in which the number of static boxes in the cluster is equal to or less than a predetermined ratio (for example, 2%) of the number of static boxes in other clusters.

そして、座席推定部１２３は、残ったクラスタを座席情報１１４と出力、例えばメモリ部１１に格納し（ステップＳ１９）、座席位置推定フェーズの処理が終了する。 Then, the seat estimation unit 123 stores the remaining clusters in the seat information 114 and output, for example, the memory unit 11 (step S19), and the processing of the seat position estimation phase is completed.

〔１－４－２〕混雑状況推定フェーズの動作例
次に、図１５を参照して、混雑状況推定フェーズの動作例を説明する。 [1-4-2] Operation example of congestion status estimation phase Next, an operation example of the congestion status estimation phase will be described with reference to FIG.

図１５に例示するように、ＮＮ１３は、映像データ１１１から一定間隔（例えば５分間隔）でフレームを取得し（ステップＳ２１）、フレームから人物の頭部のボックスを検出する（ステップＳ２２）。なお、ＮＮ１３は、検出したボックスを検出情報１１２としてメモリ部１１に格納してよい。また、座席位置推定（更新）フェーズが行なわれる場合、座席位置推定（更新）フェーズにおける図１３のステップＳ１及びＳ２の処理は、ステップＳ２１及びＳ２２の処理に置き換えられてよい（ステップＳ１及びＳ２の処理を省略してもよい）。 As illustrated in FIG. 15, the NN 13 acquires frames from the video data 111 at regular intervals (for example, at intervals of 5 minutes) (step S21), and detects a box of a person's head from the frames (step S22). The NN 13 may store the detected box as the detection information 112 in the memory unit 11. Further, when the seat position estimation (update) phase is performed, the processing of steps S1 and S2 in FIG. 13 in the seat position estimation (update) phase may be replaced with the processing of steps S21 and S22 (steps S1 and S2). Processing may be omitted).

制御部１２の混雑度算出部１２４は、検出情報１１２における現在（例えば最新）のフレームのボックスと、座席情報１１４における座席位置との間の距離を算出する（ステップＳ２３）。なお、距離は、ボックス及び座席位置のそれぞれの中心座標或いは平均座標間の距離であってよい。 The congestion degree calculation unit 124 of the control unit 12 calculates the distance between the box of the current (for example, the latest) frame in the detection information 112 and the seat position in the seat information 114 (step S23). The distance may be the distance between the center coordinates or the average coordinates of the box and the seat position.

次いで、混雑度算出部１２４は、距離の近い順に、座席位置に対して、距離が閾値以下のボックスを割り当てる（ステップＳ２４）。 Next, the congestion degree calculation unit 124 assigns a box having a distance equal to or less than a threshold value to the seat position in the order of the closest distance (step S24).

そして、混雑度算出部１２４は、ボックスを割り当てられた、換言すれば人の居る、座席数をカウントし（ステップＳ２５）、混雑率として、“人の居る座席数／全座席数”を算出する（ステップＳ２６）。 Then, the congestion degree calculation unit 124 counts the number of seats to which the box is assigned, in other words, the number of seats with people (step S25), and calculates "the number of seats with people / the total number of seats" as the congestion rate. (Step S26).

混雑度算出部１２４は、算出した混雑率が、第１の閾値の一例である“０．３”未満か否かを判定する（ステップＳ２７）。混雑率が“０．３”未満の場合（ステップＳ２７でＹｅｓ）、混雑度算出部１２４は、混雑度を「空いている」と推定し（ステップＳ２８）、混雑度情報１１５をメモリ部１１に格納して、処理がステップＳ３２に移行する。 The congestion degree calculation unit 124 determines whether or not the calculated congestion rate is less than “0.3”, which is an example of the first threshold value (step S27). When the congestion rate is less than "0.3" (Yes in step S27), the congestion degree calculation unit 124 estimates that the congestion degree is "vacant" (step S28), and transfers the congestion degree information 115 to the memory unit 11. After storing, the process proceeds to step S32.

混雑率が“０．３”以上の場合（ステップＳ２７でＮｏ）、混雑度算出部１２４は、混雑率が、第２の閾値の一例である“０．６”未満か否かを判定する（ステップＳ２９）。混雑率が“０．６”未満（且つ“０．３”以上）の場合（ステップＳ２９でＹｅｓ）、混雑度算出部１２４は、混雑度を「通常」と推定し（ステップＳ３０）、混雑度情報１１５をメモリ部１１に格納して、処理がステップＳ３２に移行する。 When the congestion rate is "0.3" or more (No in step S27), the congestion degree calculation unit 124 determines whether or not the congestion rate is less than "0.6", which is an example of the second threshold value (No). Step S29). When the congestion rate is less than "0.6" (and "0.3" or more) (Yes in step S29), the congestion degree calculation unit 124 estimates the congestion degree to be "normal" (step S30) and determines the congestion degree. The information 115 is stored in the memory unit 11, and the process proceeds to step S32.

一方、混雑率が“０．６”以上の場合（ステップＳ２９でＮｏ）、混雑度算出部１２４は、混雑度を「混んでいる」と推定し（ステップＳ３１）、混雑度情報１１５をメモリ部１１に格納して、処理がステップＳ３２に移行する。 On the other hand, when the congestion rate is "0.6" or more (No in step S29), the congestion degree calculation unit 124 estimates that the congestion degree is "crowded" (step S31), and stores the congestion degree information 115 in the memory unit. Stored in 11, the process proceeds to step S32.

ステップＳ３２では、サーバ４の情報提示部１４が、混雑度情報１１５に基づき、店舗３の混雑度の情報を提示する。例えば、情報提示部１４は、端末装置６からの要求に応じて、店舗３の混雑度の情報を含むＷｅｂページを端末装置６に表示させてよい。 In step S32, the information presentation unit 14 of the server 4 presents the congestion degree information of the store 3 based on the congestion degree information 115. For example, the information presenting unit 14 may display a Web page including information on the degree of congestion of the store 3 on the terminal device 6 in response to a request from the terminal device 6.

以上により、混雑状況推定フェーズの処理が終了する。 As a result, the processing of the congestion status estimation phase is completed.

〔１－５〕ハードウェア構成例
次に、図１６を参照して、一実施形態に係るサーバ４のハードウェア構成例について説明する。以下、サーバ４の一例としてコンピュータ１０を例に挙げて、コンピュータ１０のハードウェア構成例について説明する。なお、端末装置６についても、サーバ４と同様のハードウェア構成をそなえてよい。 [1-5] Hardware Configuration Example Next, a hardware configuration example of the server 4 according to the embodiment will be described with reference to FIG. Hereinafter, a computer 10 will be taken as an example of the server 4, and a hardware configuration example of the computer 10 will be described. The terminal device 6 may also have the same hardware configuration as the server 4.

図１６に示すように、コンピュータ１０は、例示的に、プロセッサ１０ａ、メモリ１０ｂ、記憶部１０ｃ、ＩＦ（Interface）部１０ｄ、Ｉ／Ｏ（Input / Output）部１０ｅ、及び読取部１０ｆをそなえてよい。 As shown in FIG. 16, the computer 10 is exemplified by a processor 10a, a memory 10b, a storage unit 10c, an IF (Interface) unit 10d, an I / O (Input / Output) unit 10e, and a reading unit 10f. good.

プロセッサ１０ａは、種々の制御や演算を行なう演算処理装置の一例である。プロセッサ１０ａは、コンピュータ１０内の各ブロックとバス１０ｉで相互に通信可能に接続されてよい。プロセッサ１０ａとしては、例えば、ＣＰＵ、ＭＰＵ、ＤＳＰ、ＡＳＩＣ、ＦＰＧＡ等の集積回路（ＩＣ）が用いられてもよい。なお、ＣＰＵはCentral Processing Unitの略称であり、ＭＰＵはMicro Processing Unitの略称である。ＤＳＰはDigital Signal Processorの略称であり、ＡＳＩＣはApplication Specific Integrated Circuitの略称であり、ＦＰＧＡはField-Programmable Gate Arrayの略称である。 The processor 10a is an example of an arithmetic processing unit that performs various controls and operations. The processor 10a may be connected to each block in the computer 10 so as to be communicable with each other by the bus 10i. As the processor 10a, for example, an integrated circuit (IC) such as a CPU, MPU, DSP, ASIC, or FPGA may be used. CPU is an abbreviation for Central Processing Unit, and MPU is an abbreviation for Micro Processing Unit. DSP is an abbreviation for Digital Signal Processor, ASIC is an abbreviation for Application Specific Integrated Circuit, and FPGA is an abbreviation for Field-Programmable Gate Array.

メモリ１０ｂは、種々のデータやプログラム等の情報を格納するハードウェアの一例である。メモリ１０ｂとしては、例えばＲＡＭ等の揮発性メモリが挙げられる。 The memory 10b is an example of hardware that stores information such as various data and programs. Examples of the memory 10b include volatile memories such as RAM.

記憶部１０ｃは、種々のデータやプログラム等の情報を格納するハードウェアの一例である。記憶部１０ｃとしては、例えばＨＤＤ等の磁気ディスク装置、ＳＳＤ等の半導体ドライブ装置、不揮発性メモリ等の各種記憶装置が挙げられる。不揮発性メモリとしては、例えば、フラッシュメモリ、ＳＣＭ（Storage Class Memory）、ＲＯＭ（Read Only Memory）等が挙げられる。 The storage unit 10c is an example of hardware that stores information such as various data and programs. Examples of the storage unit 10c include a magnetic disk device such as an HDD, a semiconductor drive device such as an SSD, and various storage devices such as a non-volatile memory. Examples of the non-volatile memory include flash memory, SCM (Storage Class Memory), ROM (Read Only Memory) and the like.

なお、図１に示すメモリ部１１は、例えば、サーバ４のメモリ１０ｂ及び記憶部１０ｃの少なくとも一方の記憶領域により実現されてもよい。 The memory unit 11 shown in FIG. 1 may be realized by, for example, at least one storage area of the memory 10b and the storage unit 10c of the server 4.

また、記憶部１０ｃは、コンピュータ１０の各種機能の全部若しくは一部を実現するプログラム１０ｇを格納してよい。プロセッサ１０ａは、記憶部１０ｃに格納されたプログラム１０ｇをメモリ１０ｂに展開して実行することにより、図１に示すサーバ４としての機能を実現できる。 Further, the storage unit 10c may store a program 10g that realizes all or a part of various functions of the computer 10. The processor 10a can realize the function as the server 4 shown in FIG. 1 by expanding and executing the program 10g stored in the storage unit 10c in the memory 10b.

ＩＦ部１０ｄは、ネットワーク５との間の接続及び通信の制御等を行なう通信インタフェースの一例である。例えば、ＩＦ部１０ｄは、ＬＡＮ、或いは、光通信（例えばＦＣ（Fibre Channel；ファイバチャネル））等に準拠したアダプタを含んでよい。例えば、プログラム１０ｇは、当該通信インタフェースを介してネットワーク５からコンピュータ１０にダウンロードされ、記憶部１０ｃに格納されてもよい。 The IF unit 10d is an example of a communication interface that controls connection and communication with the network 5. For example, the IF unit 10d may include a LAN or an adapter compliant with optical communication (for example, FC (Fibre Channel)). For example, the program 10g may be downloaded from the network 5 to the computer 10 via the communication interface and stored in the storage unit 10c.

Ｉ／Ｏ部１０ｅは、マウス、キーボード、又は操作ボタン等の入力部、並びに、タッチパネルディスプレイ、ＬＣＤ（Liquid Crystal Display）等のモニタ、プロジェクタ、又はプリンタ等の出力部、の一方又は双方を含んでよい。 The I / O unit 10e includes one or both of an input unit such as a mouse, a keyboard, or an operation button, and an output unit such as a touch panel display, a monitor such as an LCD (Liquid Crystal Display), a projector, or a printer. good.

読取部１０ｆは、記録媒体１０ｈに記録されたデータやプログラムの情報を読み出すリーダの一例である。読取部１０ｆは、記録媒体１０ｈを接続可能又は挿入可能な接続端子又は装置を含んでよい。読取部１０ｆとしては、例えば、ＵＳＢ（Universal Serial Bus）等に準拠したアダプタ、記録ディスクへのアクセスを行なうドライブ装置、ＳＤカード等のフラッシュメモリへのアクセスを行なうカードリーダ等が挙げられる。なお、記録媒体１０ｈにはプログラム１０ｇが格納されてもよく、読取部１０ｆが記録媒体１０ｈからプログラム１０ｇを読み出して記憶部１０ｃに格納してもよい。 The reading unit 10f is an example of a reader that reads data and program information recorded on the recording medium 10h. The reading unit 10f may include a connection terminal or device to which the recording medium 10h can be connected or inserted. Examples of the reading unit 10f include an adapter compliant with USB (Universal Serial Bus), a drive device for accessing a recording disk, a card reader for accessing a flash memory such as an SD card, and the like. The program 10g may be stored in the recording medium 10h, or the reading unit 10f may read the program 10g from the recording medium 10h and store it in the storage unit 10c.

記録媒体１０ｈとしては、例示的に、磁気／光ディスクやフラッシュメモリ等の非一時的な記録媒体が挙げられる。磁気／光ディスクとしては、例示的に、フレキシブルディスク、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ブルーレイディスク、ＨＶＤ（Holographic Versatile Disc）等が挙げられる。フラッシュメモリとしては、例示的に、ＵＳＢメモリやＳＤカード等が挙げられる。なお、ＣＤとしては、例示的に、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＣＤ－ＲＷ等が挙げられる。また、ＤＶＤとしては、例示的に、ＤＶＤ－ＲＯＭ、ＤＶＤ－ＲＡＭ、ＤＶＤ－Ｒ、ＤＶＤ－ＲＷ、ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等が挙げられる。 Examples of the recording medium 10h include a non-temporary recording medium such as a magnetic / optical disk or a flash memory. Examples of the magnetic / optical disk include flexible discs, CDs (Compact Discs), DVDs (Digital Versatile Discs), Blu-ray discs, HVDs (Holographic Versatile Discs), and the like. Examples of the flash memory include a USB memory and an SD card. Examples of the CD include a CD-ROM, a CD-R, a CD-RW, and the like. Examples of the DVD include a DVD-ROM, a DVD-RAM, a DVD-R, a DVD-RW, a DVD + R, a DVD + RW, and the like.

上述したコンピュータ１０のハードウェア構成は例示である。従って、サーバ４における、コンピュータ１０内でのハードウェアの増減（例えば任意のブロックの追加や削除）、分割、任意の組み合わせでの統合、又は、バスの追加若しくは削除等は適宜行なわれてもよい。 The hardware configuration of the computer 10 described above is an example. Therefore, in the server 4, the increase / decrease of hardware (for example, addition or deletion of arbitrary blocks), division, integration in any combination, addition or deletion of buses, etc. in the computer 10 may be performed as appropriate. ..

〔２〕その他
上述した一実施形態に係る技術は、以下のように変形、変更して実施することができる。 [2] Others The technique according to the above-described embodiment can be modified or modified as follows.

例えば、図１に示すサーバ４の各機能ブロックは、それぞれ任意の組み合わせで併合してもよく、分割してもよい。また、図１に示す制御部１２の各機能ブロックは、それぞれ任意の組み合わせで併合してもよく、分割してもよい。 For example, each functional block of the server 4 shown in FIG. 1 may be merged or divided in any combination. Further, each functional block of the control unit 12 shown in FIG. 1 may be merged or divided in any combination.

さらに、図１６に示すコンピュータ１０のプロセッサ１０ａは、シングルプロセッサやシングルコアプロセッサに限定されるものではなく、マルチプロセッサやマルチコアプロセッサであってもよい。 Further, the processor 10a of the computer 10 shown in FIG. 16 is not limited to a single processor or a single core processor, and may be a multiprocessor or a multicore processor.

上述した一実施形態では、映像に対する人の頭部の検出モデルの認識結果に基づき滞留領域を推定する手法を用いて、店舗映像の入力から混雑度を算出する処理について説明した。しかしながら、推定した滞留領域の情報は、上述した態様での利用に限定されるものではなく、種々の分析或いは推定に用いられてもよい。 In the above-described embodiment, the process of calculating the degree of congestion from the input of the store image is described by using the method of estimating the retention area based on the recognition result of the detection model of the human head for the image. However, the estimated retention region information is not limited to the use in the above-described embodiment, and may be used for various analyzes or estimations.

また、一実施形態において、座席推定部１２３は、階層化クラスタリングを行なうものとしたが、これに限定されるものではない。例えば、店舗３に設けられた座席の数が事前に判明している場合には、座席推定部１２３は、座席数と同数のクラスタを作成するように、非階層化クラスタリングを行なってもよい。 Further, in one embodiment, the seat estimation unit 123 performs layered clustering, but is not limited to this. For example, when the number of seats provided in the store 3 is known in advance, the seat estimation unit 123 may perform non-layered clustering so as to create clusters having the same number of seats.

座席数の情報は、例えば、サーバ４或いは管理者等によりネットワーク５を介してインターネット等から取得され、データベース、例えばメモリ部１１に格納されてよい。この場合、情報取得部１２１は、座席推定部１２３によるクラスタリングに際して、メモリ部１１から、撮影空間３０における座席数の情報を取得する取得部の一例として機能してよい。 The information on the number of seats may be acquired from the Internet or the like via the network 5 by, for example, a server 4 or an administrator, and stored in a database, for example, a memory unit 11. In this case, the information acquisition unit 121 may function as an example of an acquisition unit that acquires information on the number of seats in the shooting space 30 from the memory unit 11 during clustering by the seat estimation unit 123.

店舗３の座席数の情報は、インターネットにおいて施設や店舗３のホームページや、紹介サイト等に掲載されている場合が多く、容易に取得可能である。従って、初期設定におけるコストの増加を抑制しつつ、正確な座席数に基づき座席位置を正確に推定することができる。 Information on the number of seats in the store 3 is often posted on the facility, the homepage of the store 3, the introduction site, or the like on the Internet, and can be easily obtained. Therefore, it is possible to accurately estimate the seat position based on the accurate number of seats while suppressing the increase in cost in the initial setting.

さらに、一実施形態において、サーバ４が１つの監視カメラ２からの映像データに基づいて座席位置及び混雑度を推定するものとしたが、これに限定されるものではない。 Further, in one embodiment, the server 4 estimates the seat position and the degree of congestion based on the video data from one surveillance camera 2, but the present invention is not limited to this.

例えば、サーバ４は、店舗３に設けられた複数の監視カメラ２の各々から映像データを取得する場合、監視カメラ２ごとに、座席位置及び混雑度を推定してよい。そして、サーバ４は、推定した監視カメラ２ごとの混雑度（或いは混雑率）を例えば平均化することで、店舗３における混雑度を推定してよい。 For example, when the server 4 acquires video data from each of the plurality of surveillance cameras 2 provided in the store 3, the server 4 may estimate the seat position and the degree of congestion for each surveillance camera 2. Then, the server 4 may estimate the degree of congestion in the store 3 by, for example, averaging the estimated degree of congestion (or the degree of congestion) for each surveillance camera 2.

〔３〕付記
以上の実施形態に関し、さらに以下の付記を開示する。 [3] Additional notes The following additional notes will be further disclosed with respect to the above embodiments.

（付記１）
空間を撮影した複数の第１画像の各々から検出された物体について、検出された各物体の位置情報と、前記複数の第１画像の各々における各物体の検出状況と、に基づき、前記複数の第１画像間で相互に関連する物体をグループ化し、
前記グループ化の結果に基づき、前記空間において各物体が滞留する滞留領域を推定する、
処理をコンピュータに実行させる、推定プログラム。 (Appendix 1)
For the objects detected from each of the plurality of first images obtained by photographing the space, the plurality of objects are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. Grouping objects that are related to each other between the first images
Based on the result of the grouping, the residence area where each object stays in the space is estimated.
An estimation program that lets a computer perform processing.

（付記２）
前記空間を撮影した第２画像から検出された物体の位置情報と、推定された前記空間における前記滞留領域の情報と、に基づき、前記滞留領域の混雑度を推定する、
処理を前記コンピュータに実行させる、付記１に記載の推定プログラム。 (Appendix 2)
The degree of congestion of the stagnant region is estimated based on the position information of the object detected from the second image obtained by photographing the space and the estimated information of the stagnant region in the space.
The estimation program according to Appendix 1, which causes the computer to execute the process.

（付記３）
所定数以上の第１画像間において同一位置に存在すると判断した前記所定数以上の物体を一の静的物体と対応付けて管理する、
処理を前記コンピュータに実行させ、
前記グループ化は、前記複数の第１画像間で、距離に関する条件を満たす静的物体同士を前記相互に関連する物体としてグループ化する、
付記１又は付記２に記載の推定プログラム。 (Appendix 3)
A predetermined number or more of objects determined to exist at the same position among a predetermined number or more of the first images are managed in association with one static object.
Let the computer perform the process
In the grouping, static objects satisfying the condition regarding distance are grouped as the interconnected objects among the plurality of first images.
The estimation program described in Appendix 1 or Appendix 2.

（付記４）
前記管理は、前記所定数以上の第１画像間において、物体の画像成分が一致すると判断した前記所定数以上の物体を前記一の静的物体と対応付けて管理する、
付記３に記載の推定プログラム。 (Appendix 4)
The management manages the predetermined number or more of objects determined to match the image components of the objects among the predetermined number or more of the first images in association with the one static object.
The estimation program described in Appendix 3.

（付記５）
前記グループ化は、
前記複数の第１画像の各々における各物体の検出状況を示す情報であって、前記静的物体ごとに、前記複数の第１画像の各々において当該静的物体に対応付けられた物体が存在するか否かを示す情報を生成し、
前記生成した情報に基づき、対応する物体が一の第１画像内に存在する静的物体同士をグループ化の対象から除外する、
付記３又は付記４に記載の推定プログラム。 (Appendix 5)
The grouping is
Information indicating the detection status of each object in each of the plurality of first images, and for each of the static objects, there is an object associated with the static object in each of the plurality of first images. Generates information indicating whether or not
Based on the generated information, static objects whose corresponding objects exist in one first image are excluded from the grouping target.
The estimation program according to Appendix 3 or Appendix 4.

（付記６）
前記グループ化は、階層化クラスタリングを行なう、
付記１～５のいずれか１項に記載の推定プログラム。 (Appendix 6)
The grouping performs hierarchical clustering.
The estimation program according to any one of Supplementary Provisions 1 to 5.

（付記７）
前記空間における前記滞留領域の数をデータベースから取得する、
処理を前記コンピュータに実行させ、
前記グループ化は、取得した前記滞留領域の数のグループを作成するように、非階層化クラスタリングを行なう、
付記１～５のいずれか１項に記載の推定プログラム。 (Appendix 7)
The number of the stagnant areas in the space is obtained from the database.
Let the computer perform the process
The grouping performs non-hierarchical clustering so as to create a group of the number of acquired residence regions.
The estimation program according to any one of Supplementary Provisions 1 to 5.

（付記８）
前記物体は人体の頭部であり、
前記滞留領域は座席領域であり、
画像の特徴量を検出するニューラルネットワークを用いて、前記複数の第１画像の各々から人体の頭部である前記物体を検出する、
処理を前記コンピュータに実行させる、付記１～７のいずれか１項に記載の推定プログラム。 (Appendix 8)
The object is the head of the human body
The stagnation area is a seating area.
Using a neural network that detects the feature amount of an image, the object that is the head of the human body is detected from each of the plurality of first images.
The estimation program according to any one of Supplementary note 1 to 7, wherein the processing is executed by the computer.

（付記９）
空間を撮影した複数の第１画像の各々から検出された物体について、検出された各物体の位置情報と、前記複数の第１画像の各々における各物体の検出状況と、に基づき、前記複数の第１画像間で相互に関連する物体をグループ化するグループ化部と、
前記グループ化の結果に基づき、前記空間において各物体が滞留する滞留領域を推定する推定部と、
をそなえる推定システム。 (Appendix 9)
For the objects detected from each of the plurality of first images obtained by photographing the space, the plurality of objects are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. A grouping unit that groups objects that are related to each other between the first images,
Based on the result of the grouping, an estimation unit that estimates the retention area where each object stays in the space, and an estimation unit.
Estimating system with.

（付記１０）
前記空間を撮影した第２画像から検出された物体の位置情報と、推定された前記空間における前記滞留領域の情報と、に基づき、前記滞留領域の混雑度を推定する混雑度推定部、
をそなえる、付記９に記載の推定システム。 (Appendix 10)
A congestion degree estimation unit that estimates the degree of congestion in the residence area based on the position information of the object detected from the second image obtained by photographing the space and the estimated information on the retention area in the space.
The estimation system described in Appendix 9 is provided.

（付記１１）
所定数以上の第１画像間において同一位置に存在すると判断した前記所定数以上の物体を一の静的物体と対応付けて管理する管理部、をそなえ、
前記グループ化部は、前記複数の第１画像間で、距離に関する条件を満たす静的物体同士を前記相互に関連する物体としてグループ化する、
付記９又は付記１０に記載の推定システム。 (Appendix 11)
It is equipped with a management unit that manages the predetermined number or more of objects determined to exist at the same position among the predetermined number or more of the first images in association with one static object.
The grouping unit groups static objects satisfying the condition of distance between the plurality of first images as the mutually related objects.
The estimation system according to Appendix 9 or Appendix 10.

（付記１２）
前記管理部は、前記所定数以上の第１画像間において、物体の画像成分が一致すると判断した前記所定数以上の物体を前記一の静的物体と対応付けて管理する、
付記１１に記載の推定システム。 (Appendix 12)
The management unit manages the predetermined number or more of objects determined to match the image components of the objects among the predetermined number or more of the first images in association with the one static object.
The estimation system according to Appendix 11.

（付記１３）
前記グループ化部は、
前記複数の第１画像の各々における各物体の検出状況を示す情報であって、前記静的物体ごとに、前記複数の第１画像の各々において当該静的物体に対応付けられた物体が存在するか否かを示す情報を生成し、
前記生成した情報に基づき、対応する物体が一の第１画像内に存在する静的物体同士をグループ化の対象から除外する、
付記１１又は付記１２に記載の推定システム。 (Appendix 13)
The grouping unit
Information indicating the detection status of each object in each of the plurality of first images, and for each of the static objects, there is an object associated with the static object in each of the plurality of first images. Generates information indicating whether or not
Based on the generated information, static objects whose corresponding objects exist in one first image are excluded from the grouping target.
The estimation system according to Appendix 11 or Appendix 12.

（付記１４）
前記グループ化部は、階層化クラスタリングにより前記グループ化を行なう、
付記９～１３のいずれか１項に記載の推定システム。 (Appendix 14)
The grouping unit performs the grouping by hierarchical clustering.
The estimation system according to any one of Supplementary note 9 to 13.

（付記１５）
前記空間における前記滞留領域の数をデータベースから取得する取得部、をそなえ、
前記グループ化部は、取得した前記滞留領域の数のグループを作成するように、非階層化クラスタリングにより前記グループ化を行なう、
付記９～１３のいずれか１項に記載の推定システム。 (Appendix 15)
It is equipped with an acquisition unit that acquires the number of the retention areas in the space from the database.
The grouping unit performs the grouping by non-hierarchical clustering so as to create a group of the number of acquired residence regions.
The estimation system according to any one of Supplementary note 9 to 13.

（付記１６）
前記物体は人体の頭部であり、
前記滞留領域は座席領域であり、
画像の特徴量を検出するニューラルネットワークであって、前記複数の第１画像の各々から人体の頭部である前記物体を検出する前記ニューラルネットワーク、をそなえる、付記９～１５のいずれか１項に記載の推定システム。 (Appendix 16)
The object is the head of the human body
The stagnation area is a seating area.
In any one of the appendices 9 to 15, the neural network that detects the feature amount of the image and that detects the object that is the head of the human body from each of the plurality of first images. The estimation system described.

（付記１７）
空間を撮影した複数の第１画像の各々から検出された物体について、検出された各物体の位置情報と、前記複数の第１画像の各々における各物体の検出状況と、に基づき、前記複数の第１画像間で相互に関連する物体をグループ化し、
前記グループ化の結果に基づき、前記空間において各物体が滞留する滞留領域を推定する、推定方法。 (Appendix 17)
For the objects detected from each of the plurality of first images obtained by photographing the space, the plurality of objects are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. Grouping objects that are related to each other between the first images
An estimation method for estimating a retention region in which each object stays in the space based on the result of the grouping.

（付記１８）
前記空間を撮影した第２画像から検出された物体の位置情報と、推定された前記空間における前記滞留領域の情報と、に基づき、前記滞留領域の混雑度を推定する、
付記１７に記載の推定方法。 (Appendix 18)
The degree of congestion of the stagnant region is estimated based on the position information of the object detected from the second image obtained by photographing the space and the estimated information of the stagnant region in the space.
The estimation method described in Appendix 17.

（付記１９）
所定数以上の第１画像間において同一位置に存在すると判断した前記所定数以上の物体を一の静的物体と対応付けて管理し、
前記グループ化は、前記複数の第１画像間で、距離に関する条件を満たす静的物体同士を前記相互に関連する物体としてグループ化する、
付記１７又は付記１８に記載の推定方法。 (Appendix 19)
The predetermined number or more of the objects determined to exist at the same position among the predetermined number or more of the first images are managed in association with one static object.
In the grouping, static objects satisfying the condition regarding distance are grouped as the interconnected objects among the plurality of first images.
The estimation method according to Appendix 17 or Appendix 18.

（付記２０）
前記管理は、前記所定数以上の第１画像間において、物体の画像成分が一致すると判断した前記所定数以上の物体を前記一の静的物体と対応付けて管理する、
付記１９に記載の推定方法。 (Appendix 20)
The management manages the predetermined number or more of objects determined to match the image components of the objects among the predetermined number or more of the first images in association with the one static object.
The estimation method described in Appendix 19.

１混雑度推定システム
２監視カメラ
３店舗
４サーバ
５ネットワーク
６端末装置
１０コンピュータ
１１メモリ部
１１１映像データ
１１２検出情報
１１３ボックス情報
１１４座席情報
１１５混雑度情報
１２制御部
１２１情報取得部
１２２静的ボックス判定部
１２３座席推定部
１２４混雑度算出部
１３ＮＮ（ニューラルネットワーク）
１４情報提示部 1 Congestion degree estimation system 2 Surveillance camera 3 Store 4 Server 5 Network 6 Terminal equipment 10 Computer 11 Memory unit 111 Video data 112 Detection information 113 Box information 114 Seat information 115 Congestion degree information 12 Control unit 121 Information acquisition unit 122 Static box judgment Part 123 Seat estimation part 124 Congestion degree calculation part 13 NN (Neural network)
14 Information presentation section

Claims

For the objects detected from each of the plurality of first images obtained by photographing the space, the plurality of objects are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. Grouping objects that are related to each other between the first images
Based on the result of the grouping, the residence area where each object stays in the space is estimated.
Let the computer do the processing,
Among the first images having a predetermined number or more, the objects having the predetermined number or more determined to match the image components of the objects are managed in association with one object.
Further processing is performed by the computer,
The grouping includes a process of grouping one object satisfying the condition regarding distance between the plurality of first images as the mutually related object.
The grouping further
Information indicating the detection status of each object in each of the plurality of first images, and for each of the one object, there is an object associated with the one object in each of the plurality of first images. Generates information indicating whether or not
Including a process of excluding one object whose corresponding object exists in one first image from the target of grouping based on the generated information.
Estimate program.

The degree of congestion of the stagnant region is estimated based on the position information of the object detected from the second image obtained by photographing the space and the estimated information of the stagnant region in the space.
The estimation program according to claim 1, wherein the processing is executed by the computer.

The management manages an object that is determined to exist at the same position among the predetermined number or more of the first images and that the image components are determined to match in association with the one object.
The estimation program according to claim 1 or 2.

The grouping performs hierarchical clustering.
The estimation program according to any one of claims 1 to 3 .

The number of the stagnant areas in the space is obtained from the database.
Let the computer perform the process
The grouping performs non-hierarchical clustering so as to create a group of the number of acquired residence regions.
The estimation program according to any one of claims 1 to 3 .

The object is the head of the human body
The stagnation area is a seating area.
Using a neural network that detects the feature amount of an image, the object that is the head of the human body is detected from each of the plurality of first images.
The estimation program according to any one of claims 1 to 5 , which causes the computer to execute the process.

For the objects detected from each of the plurality of first images obtained by photographing the space, the plurality of objects are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. A grouping unit that groups objects that are related to each other between the first images,
Based on the result of the grouping, an estimation unit that estimates the retention area where each object stays in the space, and an estimation unit.
It is an estimation system that has
The estimation system further includes a management unit that manages the predetermined number or more of objects determined to match the image components of the objects among the predetermined number or more of the first images in association with one object.
In the grouping, one object satisfying the condition regarding the distance is grouped as the mutually related object among the plurality of first images.
The grouping further
Information indicating the detection status of each object in each of the plurality of first images, and for each of the one object, there is an object associated with the one object in each of the plurality of first images. Generates information indicating whether or not
Based on the generated information, one object whose corresponding object exists in one first image is excluded from the grouping target.
Estimating system.

For the objects detected from each of the plurality of first images obtained by photographing the space, the plurality of objects are based on the position information of each detected object and the detection status of each object in each of the plurality of first images. Grouping objects that are related to each other between the first images
Based on the result of the grouping, the residence area where each object stays in the space is estimated.
The computer executes the process,
Among the first images having a predetermined number or more, the objects having a predetermined number or more determined to match the image components of the objects are managed in association with one object.
Further processing is performed by the computer,
In the grouping, one object satisfying the condition regarding the distance is grouped as the mutually related object among the plurality of first images.
The grouping further
Information indicating the detection status of each object in each of the plurality of first images, and for each of the one object, there is an object associated with the one object in each of the plurality of first images. Generates information indicating whether or not
Based on the generated information, one object whose corresponding object exists in one first image is excluded from the grouping target.
Estimating method.