JP6809114B2

JP6809114B2 - Information processing equipment, image processing system, program

Info

Publication number: JP6809114B2
Application number: JP2016201168A
Authority: JP
Inventors: 亮介笠原
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-10-12
Filing date: 2016-10-12
Publication date: 2021-01-06
Anticipated expiration: 2036-10-12
Also published as: JP2018063551A

Description

本発明は、情報処理装置、画像処理システム及びプログラムに関する。 The present invention relates to an information processing device, an image processing system and a program.

通信環境の整備や通信コストの低下などに伴い、インターネット上のデータやサービスが急激に増加した。これによりクラウドサービスと呼ばれるサービス形態が知られるようになった。クラウドサービスはユーザがインターネットを介してサービスを受け、必要に応じてサービス料金を支払うサービスの形態である。クラウドサービスを初めとするインターネット上のサービスでは、世界中のデータを収集できるため、ビッグデータと呼ばれる大量のデータを利用することが現実的に可能になり、これを活用したデータ処理が活発になっている。 Data and services on the Internet have increased sharply due to improvements in the communication environment and reductions in communication costs. As a result, a service form called a cloud service has become known. A cloud service is a form of service in which a user receives a service via the Internet and pays a service fee as needed. Since services on the Internet such as cloud services can collect data from all over the world, it is practically possible to use a large amount of data called big data, and data processing utilizing this has become active. ing.

また、スマートフォンを始めとしたインターネットに接続されるハードウェアが急増しており、モノのインターネット（Internet of Things, IoT）という概念が生まれた。モノがインターネットに接続されることで、モノを必要とするサービスをクラウドサービスが提供することが可能になり、サービス提供者はますます多様なクラウドサービスを容易に提供することが可能になっていく。 In addition, the number of hardware connected to the Internet, such as smartphones, has increased rapidly, and the concept of the Internet of Things (IoT) was born. By connecting things to the Internet, cloud services will be able to provide services that require things, and service providers will be able to easily provide a wider variety of cloud services. ..

クラウドサービスの１つとして撮像装置が撮像した画像の解析が挙げられる。画像に写っている人や人物を識別したり、写っている物体の一般的な名称を特定したりする画像認識の処理アルゴリズムの開発が進んでいる。画像が入力されれば、これらのアルゴリズムを用いることで、インテリジェントな情報を得ることができるようになる。このアルゴリズムは、画像を撮像する撮像装置に内蔵されている処理チップで実行することもできるが、上記のようにクラウドサービスとしてクラウド上で実行することもできる。 One of the cloud services is the analysis of images captured by an imaging device. Development of image recognition processing algorithms that identify people and people in images and identify general names of objects in images is in progress. Once the image is input, intelligent information can be obtained by using these algorithms. This algorithm can be executed by a processing chip built in the image pickup device that captures an image, but can also be executed on the cloud as a cloud service as described above.

クラウドサービスにおいて、モノとインターネットを通信で結ぶための仕組みとしてWebAPIが知られている。撮像装置が撮像した画像がWebAPIで画像認識の対象となることで、画像から多くの情報を抽出できる。例えば、人の有無や数、車両の有無や数、障害物の有無、信号機の状態など目的に応じて種々の情報をクラウドサービスが抽出できる。 In cloud services, WebAPI is known as a mechanism for connecting things and the Internet via communication. A lot of information can be extracted from the image by making the image captured by the image pickup device a target of image recognition by WebAPI. For example, the cloud service can extract various information according to the purpose such as the presence / absence and number of people, the presence / absence and number of vehicles, the presence / absence of obstacles, and the state of traffic lights.

人間の視覚は全体の感覚の７割といわれているが、インテリジェントな視覚情報を提供するクラウドサービスのWebAPIがあれば、ユーザは他のWebAPIを適宜組み合わせて簡単に高度なアプリケーションを作ることができる。 It is said that human vision accounts for 70% of the total sense, but if there is a cloud service WebAPI that provides intelligent visual information, users can easily create advanced applications by appropriately combining other WebAPIs. ..

このような画像認識は動画に対して行うことも可能である（例えば、特許文献１参照。）。特許文献１には、認識処理を行う前後のフレームで、最も文字認識に適しているフレームを用いて認識処理を行う方法が開示されている。 Such image recognition can also be performed on moving images (see, for example, Patent Document 1). Patent Document 1 discloses a method of performing recognition processing using a frame most suitable for character recognition among frames before and after performing recognition processing.

しかしながら、従来の技術では、動画の認識精度が低下するおそれがあるという問題がある。すなわち、撮像装置等から動画をクラウド側に送信する必要があるが、動画のデータ量は非常に多く、広い帯域が必要となり、通信コストが高くなってしまう。通信コストを低減するには、動画を高い圧縮率で圧縮すればよいが、圧縮率が高いと画像が劣化し画像認識の精度が低下するおそれがある。動画を送信せず記憶装置に記憶された動画に画像認識処理を行うだけだとしても、圧縮率が低ければ容量を圧迫し、圧縮率が高ければ画像認識の精度が低下するおそれがある。 However, the conventional technique has a problem that the recognition accuracy of moving images may be lowered. That is, it is necessary to transmit a moving image from an imaging device or the like to the cloud side, but the amount of moving image data is very large, a wide band is required, and the communication cost is high. In order to reduce the communication cost, the moving image may be compressed with a high compression rate, but if the compression rate is high, the image may be deteriorated and the accuracy of image recognition may be lowered. Even if the image recognition process is performed only on the moving image stored in the storage device without transmitting the moving image, if the compression rate is low, the capacity is compressed, and if the compression rate is high, the accuracy of image recognition may decrease.

本発明は、動画の認識精度の低下を抑制できる情報処理装置の提供を行うことを目的とする。 An object of the present invention is to provide an information processing device capable of suppressing a decrease in recognition accuracy of moving images.

本発明は、画像から対象を認識する情報処理装置であって、動画データを複数の静止画像に展開する動画展開手段と、前記静止画像の種類に応じて認識処理の対象とするか否かを決定し、認識処理のパラメータを設定する認識処理パラメータ設定手段と、前記認識処理パラメータ設定手段が認識処理の対象とすると決定した種類の前記静止画像に対し前記対象の認識処理を行う画像認識手段と、を有し、
前記動画データは前記静止画像ごとに画質指標値を有しており、
前記認識処理パラメータ設定手段は、前記画質指標値がしきい値以上の前記静止画像を認識処理の対象に決定することを特徴とする情報処理装置、を提供する。 The present invention is an information processing device that recognizes an object from an image, and determines whether or not the moving image developing means for expanding the moving image data into a plurality of still images and whether or not the object is to be recognized according to the type of the still image. determined, a recognition processing parameter setting means for setting the parameters of the recognition process, and the recognition processing image recognition means to the type of the still image determined as parameter setting means is a target recognition processing for recognizing processing of the object Have,
The moving image data has an image quality index value for each still image.
The recognition processing parameter setting means provides an information processing apparatus characterized in that the still image whose image quality index value is equal to or higher than a threshold value is determined as a target of recognition processing .

動画の認識精度の低下を抑制できる情報処理装置を提供することができる。 It is possible to provide an information processing device capable of suppressing a decrease in moving image recognition accuracy.

画像処理システムの概略的な動作を説明する図の一例である。It is an example of the figure explaining the schematic operation of an image processing system. 画像処理システムのシステム構成例を示す図である。It is a figure which shows the system configuration example of an image processing system. 画像処理システムにより利用可能なアプリケーションの一例を示す図である。It is a figure which shows an example of the application which can be used by an image processing system. 複数のＷｅｂサービスの連携の一例を示す図である。It is a figure which shows an example of cooperation of a plurality of Web services. 認識サーバの概略的なハードウェア構成図の一例である。This is an example of a schematic hardware configuration diagram of the recognition server. 認識サーバの機能をブロック状に示す機能ブロック図の一例である。This is an example of a functional block diagram showing the functions of the recognition server in a block shape. 認識処理パラメータ設定部が変化の大小に応じて認識処理を行うフレームを決定する手順を示すフローチャート図の一例である。This is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit determines a frame for performing recognition processing according to the magnitude of change. 認識処理パラメータ設定部が画像の変化の大小に応じて認識処理を行うフレームとテンプレートを決定する手順を示すフローチャート図の一例である。This is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit determines a frame and a template to perform recognition processing according to the magnitude of a change in an image. 認識処理パラメータ設定部が画質に応じて対象検出のしきい値を変更する手順を示すフローチャート図の一例である。This is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit changes the target detection threshold value according to the image quality. 認識処理パラメータ設定部が特定のフレームを認識対象に指示する手順を示すフローチャート図の一例である。This is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit instructs a specific frame to be recognized.

以下、本発明を実施するための形態について図面を参照しながら実施例を挙げて説明する。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings with reference to examples.

図１は、本実施形態の画像処理システム１００の概略的な動作を説明する図の一例である。撮像装置１０は動画を撮像しており適宜、動画の画像データ（動画データ）を認識サーバ３０に送信する。認識サーバ３０は画像認識を行い、WebAPIを介して認識結果を各種のクラウドサービスに提供する。
（１）撮像装置１０は動画を高い圧縮率で圧縮し、認識サーバ３０に送信する。圧縮率を高めることで通信コストを低減できる。
（２）しかし、圧縮率が高いことは圧縮ノイズが含まれやすいことを意味し、画像の認識精度が低下するおそれがある。ここで、圧縮された動画は三種類のフレーム (Ｉフレーム、Ｂフレーム、Ｐフレームなど)を有するが、圧縮ノイズの大きさ（画質の良さ）はフレームによって異なる（圧縮ノイズの大きさは必ずしもフレームの種類のみに依存しないが、ここでは説明の便宜上、フレームの種類に着目する）。 FIG. 1 is an example of a diagram illustrating a schematic operation of the image processing system 100 of the present embodiment. The image pickup device 10 is capturing a moving image, and appropriately transmits the image data (moving image data) of the moving image to the recognition server 30. The recognition server 30 performs image recognition and provides the recognition result to various cloud services via the Web API.
(1) The image pickup apparatus 10 compresses the moving image at a high compression rate and transmits it to the recognition server 30. Communication costs can be reduced by increasing the compression rate.
(2) However, a high compression rate means that compression noise is likely to be included, and the image recognition accuracy may decrease. Here, the compressed moving image has three types of frames (I frame, B frame, P frame, etc.), but the magnitude of compression noise (good image quality) differs depending on the frame (the magnitude of compression noise is not necessarily a frame). Although it does not depend only on the type of frame, we will focus on the type of frame here for convenience of explanation).

そこで、認識サーバ３０はフレームの種類に応じて認識処理のパラメータを変更する。詳細は後述されるが、例えば各フレーム用に用意された対象の検出のためのテンプレートをフレームの種類によって切り替えたり、対象の検出のためのしきい値を変更したりする。フレームの種類によって異なる画質等に応じたパラメータを使って認識処理が可能になり、高圧縮率の動画に対しても高い精度の画像認識が可能となる。 Therefore, the recognition server 30 changes the parameters of the recognition process according to the type of the frame. Details will be described later, but for example, the template for target detection prepared for each frame is switched depending on the frame type, and the threshold value for target detection is changed. Recognition processing can be performed using parameters according to the image quality, etc., which differ depending on the type of frame, and highly accurate image recognition is possible even for moving images with a high compression rate.

なお、パラメータを変更するだけでなく、所定の種類のフレームのみを認識対象に決定してもよい。例えば、圧縮ノイズが少ない特定のフレーム（例えば、Ｉフレーム）のみに認識処理を行う。この場合も、高圧縮率の動画に対して高い精度の画像認識が可能となる。 In addition to changing the parameters, only a predetermined type of frame may be determined as the recognition target. For example, the recognition process is performed only on a specific frame (for example, I frame) having less compression noise. In this case as well, highly accurate image recognition is possible for moving images having a high compression rate.

＜用語について＞
特許請求の範囲の静止画像の種類とは、動画データを動画として再生するために動画データに含まれる静止画像をいう。例えば、静止画像の種類としてI、Ｐ，Ｂフレームがある。 <Terminology>
The type of still image in the claims refers to a still image included in the moving image data in order to reproduce the moving image data as a moving image. For example, there are I, P, and B frames as types of still images.

静止画像の画質とは、静止画像の良さの程度だけでなく、静止画像の性質が異なることをいう。例えば、画質の違いはＩ，Ｐ，Ｂフレームが含む情報の違いにより生じうる。 The image quality of a still image means that not only the degree of goodness of the still image but also the properties of the still image are different. For example, the difference in image quality can be caused by the difference in the information contained in the I, P, and B frames.

＜システム構成例＞
図２は、画像処理システム１００のシステム構成例を示す図である。ネットワークを介して撮像装置１０と認識サーバ３０が接続されている。撮像装置１０はインターネットに繋がるモノという意味でＩＯＴ（Internet of Things）と呼ばれる場合がある。撮像装置１０は、ＣＣＤやＣＭＯＳなどの光電変換素子を備えレンズを通って入射した光を、輝度情報を有する画像データに変換する。具体的には、デジタルスチルカメラ又はデジタルビデオカメラと呼ばれるが、撮像機能を有していればカメラと呼ばれていなくてもよい。例えば、スマートフォン、タブレット端末、ＰＣ（Personal Computer）、ゲーム機などにはカメラを有するものがある。また、例えば、複合機、プロジェクタ、テレビ会議端末、電子黒板、又は、ドローンなどでもよい。また、カメラは外付けされてよく、カメラが外付けされた機器がカメラとして動作してもよい。 <System configuration example>
FIG. 2 is a diagram showing a system configuration example of the image processing system 100. The imaging device 10 and the recognition server 30 are connected via a network. The image pickup apparatus 10 may be called an IOT (Internet of Things) in the sense that it is a thing connected to the Internet. The image pickup apparatus 10 includes a photoelectric conversion element such as a CCD or CMOS, and converts the light incident through the lens into image data having luminance information. Specifically, it is called a digital still camera or a digital video camera, but it does not have to be called a camera as long as it has an imaging function. For example, some smartphones, tablet terminals, PCs (Personal Computers), game machines, and the like have a camera. Further, for example, a multifunction device, a projector, a video conference terminal, an electronic blackboard, a drone, or the like may be used. Further, the camera may be externally attached, and a device to which the camera is externally attached may operate as a camera.

また、撮像装置１０が本実施形態の画像の認識処理を行ってもよく、この場合、撮像装置１０は認識結果を認識サーバ３０又はＷｅｂサービス３０１に送信する。 Further, the image pickup device 10 may perform the image recognition process of the present embodiment, and in this case, the image pickup device 10 transmits the recognition result to the recognition server 30 or the Web service 301.

認識サーバ３０はＰＣやサーバ装置などを実体とする情報処理装置である。認識サーバ３０はWebAPIを介して各種のＷｅｂサービス３０１、ＷｅｂのＤＢサービス３０２、又はユーザが作成したアプリ３０３と接続されている。これらに認識結果を送信し、高度なアプリケーションを実現している。なお、WebAPIとは、機器と機器がネットワークを介して処理を行う際の取り決め（処理の依頼方法やデータの指定方法など）をいう。Ｗｅｂサービス３０１やＷｅｂのＤＢサービス３０２はWebAPIを公開しているため、認識サーバ３０とＷｅｂサービス３０１又はＷｅｂのＤＢサービス３０２が連携できるようになる。ユーザが作成したアプリ３０３のWebAPIはユーザにとって既知なので同様に認識サーバ３０がこれを利用できる。 The recognition server 30 is an information processing device whose substance is a PC, a server device, or the like. The recognition server 30 is connected to various Web services 301, Web DB services 302, or an application 303 created by a user via a Web API. The recognition result is sent to these to realize an advanced application. In addition, WebAPI refers to an agreement (method of requesting processing, method of specifying data, etc.) when a device and a device perform processing via a network. Since the Web service 301 and the Web DB service 302 expose the Web API, the recognition server 30 and the Web service 301 or the Web DB service 302 can be linked with each other. Since the Web API of the application 303 created by the user is known to the user, the recognition server 30 can use it as well.

なお、本実施形態のアプリケーションとは、認識サーバ３０の認識結果を利用して有用な情報を得ること又は機器の制御を行う応用例又はこれに準ずる意味である。必ずしもある１つのソフトウェアを指していない。 The application of the present embodiment means an application example in which useful information is obtained by using the recognition result of the recognition server 30 or equipment is controlled, or a meaning equivalent thereto. It does not necessarily refer to one piece of software.

図２の構成によれば、撮像装置１０が撮像した画像を認識サーバ３０が認識して所定の対象を検出し、Ｗｅｂサービス３０１、ＷｅｂのＤＢサービス３０２、又は、ユーザが作成したアプリ３０３の少なくとも１つを介して、ユーザ又は制御対象３０４に情報が送信される。ユーザがどのような情報を得るか、又は、どのような制御対象がどのように制御されるかがアプリケーションになる。 According to the configuration of FIG. 2, the recognition server 30 recognizes the image captured by the imaging device 10 to detect a predetermined target, and at least the Web service 301, the Web DB service 302, or the application 303 created by the user. Information is transmitted to the user or the controlled object 304 via one. The application is what kind of information the user obtains or what kind of control target is controlled and how.

なお、図２に示したＷｅｂサービス３０１同士を連携させるためのＷｅｂサービスがあり、このような連携Ｗｅｂサービス３０１ａをユーザが利用することで、認識サーバ３０と他のＷｅｂサービス３０１とを連携させることができる。連携Ｗｅｂサービス３０１ａの一例としてＩＦＴＴＴ、zapier及びmyThings等がある。これらの連携Ｗｅｂサービス３０１ａは、認識サーバ３０と他のＷｅｂサービス３０１を連携させるインタフェースとなる。 There is a Web service for linking the Web services 301 shown in FIG. 2, and the recognition server 30 and another Web service 301 can be linked by the user using such a linked Web service 301a. Can be done. IFTTT, zapier, myThings, etc. are examples of the linked Web service 301a. These linked Web services 301a serve as an interface for linking the recognition server 30 and another Web service 301.

連携Ｗｅｂサービス３０１ａとは、任意のＷｅｂサービス（又はスマートフォンなどの端末装置）の作業を契機にして、他のＷｅｂサービスのアクションを自動化させることにより作業効率化を図るＷｅｂサービスである。ユーザは連携Ｗｅｂサービス３０１ａを使用することで、認識サーバ３０の認識結果を利用して種々のアプリケーションを容易に構築できる。 The linked Web service 301a is a Web service that improves work efficiency by automating the actions of other Web services triggered by the work of an arbitrary Web service (or a terminal device such as a smartphone). By using the linked Web service 301a, the user can easily construct various applications by using the recognition result of the recognition server 30.

ＷｅｂのＤＢサービス３０２は主にデータを蓄積するＷｅｂサービスである。ＷｅｂのＤＢサービス３０２にもインタフェースとしての役割を果たす連携Ｗｅｂサービス３０２ａが存在する。例えばfluentdという連携Ｗｅｂサービス３０２ａが知られている。ユーザが作成したアプリ３０３の場合、WebAPIもユーザが作成するので連携Ｗｅｂサービス３０１ａ、３０２ａは不要になるが、連携Ｗｅｂサービス３０１ａ、３０２ａを利用されてもよい。なお、ユーザが作成したアプリ３０３はどのようなものでもよく、また、様々である。このユーザには企業も個人も含まれ、ある企業に特有のアプリ３０３や個人に特有のアプリ３０３を任意に作成できる。企業の場合、認識結果を警備や設備の制御又はＦＡ（ファクトリーオートメーション）等に使用するアプリ３０３が考えられ、個人の場合、認識結果で静止画像や動画の画像データ等を整理するアプリ３０３等が考えられる。 The Web DB service 302 is a Web service that mainly stores data. The Web DB service 302 also has a linked Web service 302a that serves as an interface. For example, a linked Web service 302a called fluentd is known. In the case of the application 303 created by the user, since the Web API is also created by the user, the linked Web services 301a and 302a are unnecessary, but the linked Web services 301a and 302a may be used. The application 303 created by the user may be any kind and may be various. This user includes both a company and an individual, and can arbitrarily create an application 303 specific to a certain company and an application 303 specific to an individual. In the case of a company, an application 303 that uses the recognition result for security, equipment control, FA (factory automation), etc. can be considered, and in the case of an individual, an application 303 that organizes image data of still images and moving images based on the recognition result, etc. Conceivable.

認識サーバ３０は撮像装置１０に画像を送るようにリクエストを送る。リクエストの契機となるのは、画像処理システム１００のユーザの操作、画僧の認識のＷｅｂサービスが直接若しくは間接に要求された場合、又は、認識サーバ３０に設定された一定時間が経過した場合等である。なお、ユーザからの要求、Ｗｅｂサービス３０１からの要求、又は、一定時間の経過を契機とするリクエストは、撮像装置１０を指定して行われる。撮像装置１０は撮像装置ＩＤにより指定される。あるいは、ユーザがリクエストする場合は撮像装置ＩＤと対応付けられたユーザＩＤが指定される。ＩＤはIdentificationの略であり識別子や識別情報という意味である。ＩＤは複数の対象から、ある特定の対象を一意的に区別するために用いられる名称、符号、文字列、数値又はこれらのうち１つ以上の組み合わせをいう。 The recognition server 30 sends a request to send an image to the image pickup apparatus 10. The request is triggered when the user's operation of the image processing system 100, the Web service for recognition of the painter is directly or indirectly requested, or when a certain period of time set in the recognition server 30 has elapsed. Is. A request from the user, a request from the Web service 301, or a request triggered by the passage of a certain period of time is made by designating the imaging device 10. The image pickup device 10 is designated by the image pickup device ID. Alternatively, when the user requests, a user ID associated with the imaging device ID is specified. ID is an abbreviation for Identification and means an identifier or identification information. An ID refers to a name, a code, a character string, a numerical value, or a combination of one or more of these, which is used to uniquely distinguish a specific object from a plurality of objects.

また、リクエストは、撮像装置ＩＤ又はユーザＩＤに加え、撮像条件（カラー、モノクロ等、認識処理に適した撮像条件）、送信する動画データの撮像期間等を含んでもよい。また、撮像装置１０が画像の認識処理を行う場合、本実施形態の認識処理のパラメータの情報を含むものとする。 Further, the request may include, in addition to the imaging device ID or the user ID, imaging conditions (imaging conditions suitable for recognition processing such as color and monochrome), imaging period of moving image data to be transmitted, and the like. Further, when the image pickup apparatus 10 performs the image recognition process, the information of the parameters of the recognition process of the present embodiment is included.

リクエストを取得した撮像装置１０は、新たに画像を取得して認識サーバ３０に送信する。あるいは、常時、取得している（撮像している）画像のうち最新の画像を認識サーバ３０に返送する。その際、撮像装置１０から認識サーバ３０までの通信経路で他者による盗聴が成功しないように、撮像装置１０側で画像の暗号化を施してから送信する。 The image pickup apparatus 10 which has acquired the request newly acquires an image and transmits it to the recognition server 30. Alternatively, the latest image among the acquired (captured) images is always returned to the recognition server 30. At that time, the image is encrypted on the image pickup device 10 side before transmission so that eavesdropping by another person does not succeed in the communication path from the image pickup device 10 to the recognition server 30.

画像を受信した認識サーバ３０は暗号を解除し、予め定まっている又は指定された認識処理を画像に施す。認識処理により得られた認識結果をＷｅｂサービス３０１、ＷｅｂのＤＢサービス３０２又はユーザが作成したアプリ３０３にWebAPIを用いて送信する。この後、これらの処理結果がユーザ又は制御対象３０４に情報が送信される。後述するように画像そのものは送信されないが、処理結果と共に画像を送信してもよい。 The recognition server 30 that has received the image decrypts the code and performs a predetermined or designated recognition process on the image. The recognition result obtained by the recognition process is transmitted to the Web service 301, the Web DB service 302, or the application 303 created by the user using the Web API. After that, information on these processing results is transmitted to the user or the control target 304. Although the image itself is not transmitted as described later, the image may be transmitted together with the processing result.

認識サーバ３０は、個別のユーザごとに画像処理システム１００の利用料を課金するようになっており、１回のリクエストに応答するたびに、その数をカウントし、そのカウント数を元に、個別のユーザの課金額を決定する。 The recognition server 30 charges the usage fee of the image processing system 100 for each individual user, counts the number each time it responds to one request, and individually based on the count number. Determine the billing amount of the user.

なお、WebAPIのフォーマットは認識結果が格納されていればXML、JSON形式又はCSV形式など何でもよい。認識結果を取得したＷｅｂサービス３０１又はユーザが作成したアプリ３０３は、認識結果をユーザ又は制御対象３０４に送信する。これにより、制御対象３０４は例えば、敷地への侵入者が検知されたと判断して警報を吹鳴したり、ユーザにメールを送信したりする。また、自宅の玄関にユーザ（例えば家人）が接近したと判断して、ドアの鍵を解錠するように自宅の施錠機構を駆動する。あるいは、自宅の玄関に不審者が接近したと判断して、ドアの鍵を施錠（施錠の確認を含む）したりセキュリティレベルを上げたりするように制御する。 The Web API format may be any format such as XML, JSON format or CSV format as long as the recognition result is stored. The Web service 301 that has acquired the recognition result or the application 303 created by the user transmits the recognition result to the user or the control target 304. As a result, the controlled object 304 determines, for example, that an intruder into the site has been detected, sounds an alarm, or sends an e-mail to the user. In addition, it is determined that a user (for example, a householder) has approached the front door of the home, and the lock mechanism of the home is driven so as to unlock the door. Alternatively, it is determined that a suspicious person has approached the front door of the house, and the door key is locked (including confirmation of the lock) and the security level is increased.

撮像装置１０が送信する画像は、そのままの画像（ビットマップやＲＡＷデータ）でもよいが、通信コストを考慮してjpg画像のように圧縮された画像でもよい。あるいは、画像そのものでなく、SIFT,Surf,Haar-like特徴、HOG特徴など画像に基づいた特徴量でもよい。ただし、例えば人検出の場合はモノクロ画像で問題ないが、一般の物体認識ではカラー情報が必要になるなど、認識処理の種類によって適切な圧縮方法又は情報の削減方法が異なる。 The image transmitted by the image pickup apparatus 10 may be an image as it is (bitmap or RAW data), or may be a compressed image such as a jpg image in consideration of communication cost. Alternatively, the feature amount based on the image such as SIFT, Surf, Haar-like feature, HOG feature, etc. may be used instead of the image itself. However, for example, in the case of human detection, there is no problem with a monochrome image, but color information is required for general object recognition, and an appropriate compression method or information reduction method differs depending on the type of recognition processing.

非圧縮(例えば、ビットマップ形式)又は可逆圧縮(たとえばtiff形式)の画像を撮像装置１０が送信すれば、情報が失われずに認識サーバ３０に届くため、任意の認識処理に対応することが可能である。しかし、撮像装置１０から認識サーバ３０に送信する画像のデータ量が多くなるため、認識サーバ３０がリクエストしてからユーザ又は制御対象３０４が認識結果を取得するためのレイテンシ（遅延）が大きくなったり、通信回線を通るデータ量が多くなったりして、通信コストが増大してしまう。 If the image pickup device 10 transmits an uncompressed (for example, bitmap format) or losslessly compressed (for example, tiff format) image, the information is delivered to the recognition server 30 without being lost, so that any recognition process can be supported. Is. However, since the amount of image data transmitted from the imaging device 10 to the recognition server 30 increases, the latency (delay) for the user or the control target 304 to acquire the recognition result after the request by the recognition server 30 increases. , The amount of data passing through the communication line increases, and the communication cost increases.

そこで、認識処理の種類によって、撮像装置１０が、例えば非可逆圧縮のjpg画像を用いたり、その場合にも細かい解像度の情報が必要でない認識処理の場合には圧縮率を高めたり、送信する画像の解像度を小さくしたりする。また、モノクロ画像でよい認識処理の場合には、撮像装置１０がモノクロ画像を送信するなど、認識処理の種類にあわせて、認識サーバ３０がどのような動画データを送信させるかを撮像装置１０に対し指示することが望ましい。 Therefore, depending on the type of recognition processing, the image pickup apparatus 10 uses, for example, a jpg image with lossy compression, or in the case of recognition processing in which fine resolution information is not required, the compression rate is increased or the image to be transmitted is transmitted. Reduce the resolution of. Further, in the case of recognition processing in which a monochrome image is sufficient, the image pickup device 10 is informed of what kind of moving image data is transmitted by the recognition server 30 according to the type of recognition process, such as the image pickup device 10 transmitting a monochrome image. It is desirable to instruct.

さらに、人認識を行う際の特徴量としてHOG特徴を認識サーバ３０が使う場合には、撮像装置１０側で画像をHOG特徴に変換しておき、HOG特徴のデータを可逆圧縮したものを認識サーバ３０に送信する。すなわち、認識処理の種類に適した特徴量を抽出しておき、その特徴量のみを可逆圧縮して認識サーバ３０に送信すると、認識処理を良好に行わせ、かつ通信量を抑えることができる。 Further, when the recognition server 30 uses the HOG feature as the feature amount when performing human recognition, the image is converted into the HOG feature on the imaging device 10 side, and the HOG feature data is losslessly compressed and the recognition server is used. Send to 30. That is, if a feature amount suitable for the type of recognition processing is extracted, and only the feature amount is losslessly compressed and transmitted to the recognition server 30, the recognition processing can be performed well and the communication amount can be suppressed.

さらに、撮像装置１０がある場所に固定された定点撮像装置の場合、撮像装置１０が撮像する画像の背景はほぼ変化がないため、撮像装置１０は背景とそれ以外の部分を分け、撮像装置１０側から認識サーバ３０に送信する画像又は特徴量は背景以外の部分とすることで、さらに通信データ量を削減することができる。 Further, in the case of a fixed point image pickup device fixed at a place where the image pickup device 10 is located, the background of the image captured by the image pickup device 10 is almost unchanged. Therefore, the image pickup device 10 separates the background and other parts, and the image pickup device 10 is separated. By setting the image or feature amount transmitted from the side to the recognition server 30 to a portion other than the background, the amount of communication data can be further reduced.

なお、認識サーバ３０が行う認識処理が人の手を振る動作を検出する場合など、連続したフレーム（複数の静止画像、又は短い動画）が必要になる場合には、撮像装置１０側は一定量のフレームを撮像し、その静止画像群（あるいは動画）を認識サーバ３０に送信する。その場合も、適切な動画圧縮処理を施して送信することが望ましい。 When a continuous frame (a plurality of still images or a short moving image) is required, such as when the recognition process performed by the recognition server 30 detects a movement of waving a person, the image pickup device 10 side has a fixed amount. The frame is imaged, and the still image group (or moving image) is transmitted to the recognition server 30. Even in that case, it is desirable to perform appropriate video compression processing before transmission.

認識サーバ３０が一定時間ごとに撮像装置１０に対しリクエストを発生させる場合、ユーザからは簡単なWebAPIコマンドで撮像装置１０の撮像条件を設定できる。例えば、どの撮像装置１０で、どのくらいの間隔で、どのように撮像するかを設定できる。また、撮像装置１０が認識処理を行う場合はどのような認識処理を実行するかを設定したり、どのような特徴量を抽出するかを設定したりできる。そうすることで、画像処理システム１００は、例えば、１秒ごとに人がいないかどうかを監視し、人がいる場合には、設定されたスマートフォンや携帯電話にアラートを送信するようなアプリケーションを簡単に実現できる。 When the recognition server 30 generates a request to the imaging device 10 at regular intervals, the user can set the imaging conditions of the imaging device 10 with a simple Web API command. For example, it is possible to set which image pickup device 10, how often, and how to take images. Further, when the image pickup apparatus 10 performs the recognition process, it is possible to set what kind of recognition process is executed and what kind of feature amount is to be extracted. By doing so, the image processing system 100 can easily make an application that monitors whether or not there is a person every second, and if there is a person, sends an alert to a set smartphone or mobile phone. Can be realized.

また、撮像装置１０はカラー画像を取得できるものでもモノクロ画像を取得するものでもよい。また、アプリケーションによっては分光画像や偏光画像を取得できる撮像装置１０であってもよい。また、距離情報が取得できるステレオ撮像装置や、ＴＯＦ（Time Of Flight）などで距離情報を取得できる単眼撮像装置を用いるとよい。また、全天球画像を取得できる撮像装置でもよい。 Further, the image pickup apparatus 10 may acquire a color image or a monochrome image. Further, depending on the application, the imaging device 10 that can acquire a spectroscopic image or a polarized image may be used. Further, it is preferable to use a stereo imaging device capable of acquiring distance information or a monocular imaging device capable of acquiring distance information by TOF (Time Of Flight) or the like. Further, an imaging device capable of acquiring a spherical image may be used.

認識サーバ３０では撮像装置１０ごとに撮像装置１０の種類が登録されており、その撮像装置１０の種類に応じた最適な認識処理を行う。また、暗い場所を撮像できるように撮像装置１０又は撮像装置１０の周辺に照明が設置されていてもよく、この場合、撮像装置１０に撮像のリクエストがあった時のみ照明をＯＮすると消費電力を削減できる。 In the recognition server 30, the type of the image pickup device 10 is registered for each image pickup device 10, and the optimum recognition process is performed according to the type of the image pickup device 10. Further, lighting may be installed around the imaging device 10 or the imaging device 10 so as to be able to image a dark place. In this case, turning on the illumination only when the imaging device 10 receives a request for imaging consumes power. Can be reduced.

例えば、人を検出するアプリケーションでは、認識サーバ３０が定期的に人を検出する処理を行っている場合に、認識サーバ３０が人を検出したら、該当の撮像装置１０に所定のコマンドを送り、撮像装置１０が高フレームレートでその後の一定時間の動画を記録しておく。例えば、認識サーバ３０は、不審者が現れた後の画像を細かい時系列で記録できるため、後ほど時間をかけて認識サーバ３０や関係者などが解析を行うことができる。 For example, in an application for detecting a person, when the recognition server 30 periodically performs a process of detecting a person, when the recognition server 30 detects a person, a predetermined command is sent to the corresponding image pickup device 10 to perform imaging. The device 10 records a moving image for a certain period of time thereafter at a high frame rate. For example, since the recognition server 30 can record images after the appearance of a suspicious person in a fine time series, the recognition server 30 and related persons can analyze the images later over time.

また、認識サーバ３０がユーザに送信する情報は、認識結果の情報だけとし、認識処理前の元画像をユーザが復元できる情報を除去する。すなわち、画像は返送せずに、人が何人写っているかなど認識結果のテキスト情報のみ送る。これにより、プライバシーの侵害を避けることができる。 Further, the information transmitted by the recognition server 30 to the user is limited to the recognition result information, and the information that allows the user to restore the original image before the recognition process is removed. That is, the image is not returned, but only the text information of the recognition result such as how many people are shown is sent. This avoids invasion of privacy.

＜アプリケーション＞
図３を用いて、画像処理システム１００により利用可能なアプリケーションについて補足する。図３は、画像処理システム１００により利用可能なアプリケーションの一例を示す。図３では、カテゴリーごとにアプリケーションの一例が示されている。カテゴリーは、例えば、交通、セキュリティ、オフィス、介護、建設、店、保健、工場、農業、家、漁業、畜産、監視等、及び物流であるがこれらには限定されない。 <Application>
FIG. 3 is used to supplement the applications available by the image processing system 100. FIG. 3 shows an example of an application that can be used by the image processing system 100. FIG. 3 shows an example of an application for each category. Categories include, but are not limited to, for example, transportation, security, offices, long-term care, construction, stores, health, factories, agriculture, homes, fisheries, livestock, surveillance, and logistics.

例えば、交通のカテゴリーに交通量調査というアプリケーションがある。交通量を調査するには通過する各車両を検出する必要がある。従来では、車両検出が可能な専用で高価な撮像装置１０を用いてシステムを構成する必要があり、交通量調査の場所が変わるとシステムを移動して設置する必要があった。また、移動可能であるため盗難のリスクもあり、一般には実用化されていない。 For example, there is an application called traffic survey in the traffic category. It is necessary to detect each passing vehicle in order to investigate the traffic volume. In the past, it was necessary to configure the system using a dedicated and expensive imaging device 10 capable of detecting a vehicle, and it was necessary to move and install the system when the location of the traffic volume survey changed. In addition, since it is movable, there is a risk of theft, and it has not been put into practical use in general.

しかし、本実施形態の画像処理システム１００によれば、安価で通信機能を有する撮像装置１０が測定地点に固定されていればよい。ユーザは認識サーバ３０にＰＣなどでアクセスし、交通量調査を行いたい場所にある撮像装置１０の撮像装置ＩＤ及び車両検出処理を指定する。これにより、現在、撮像装置１０に写っている車両の台数を遠隔地より測定可能となる。時間を指定して例えば５秒ごとに撮像するようにユーザが設定し、認識結果の出力先としてＷｅｂサービスのＤＢサービス３０２を指定しておけば、自動で車両数を検出するアプリケーションを容易に実現可能である。 However, according to the image processing system 100 of the present embodiment, it is sufficient that the image pickup device 10 which is inexpensive and has a communication function is fixed at the measurement point. The user accesses the recognition server 30 with a PC or the like, and specifies the image pickup device ID and the vehicle detection process of the image pickup device 10 at the location where the traffic volume survey is desired. This makes it possible to measure the number of vehicles currently captured by the imaging device 10 from a remote location. If the user sets the image to be imaged every 5 seconds by specifying the time and the DB service 302 of the Web service is specified as the output destination of the recognition result, an application that automatically detects the number of vehicles can be easily realized. It is possible.

＜ＷｅｂＡＰＩについて＞
認識サーバ３０がＷｅｂサーバとして提供される場合のWebAPIの一例をいくつか説明する。
１．撮像装置１０に写っている顔の数を返す。
http://rirs.xxxxxxx.com/mysite/cgi-bin/face.cgi
２．撮像装置１０に写っている人の数を返す。
http://rirs.xxxxxxx.com/mysite/cgi-bin/human.cgi
３．撮像装置１０に写っているものの名称を返す。
http://rirs.xxxxxxx.com/mysite/cgi-bin/god.cgi
４．１secごとに撮像装置１０に写っている顔の数を返す。
http://rirs.xxxxxxx.com/mysite/cgi-bin/start_face.cgi
５．１secごとに撮像装置１０に写っている顔の数を返すのを止める。
http://rirs.xxxxxxx.com/mysite/cgi-bin/stop_face.cgi
ユーザはスマートフォンやＰＣで連携Ｗｅｂサービス３０１ａにアクセスし、１〜５のＵＲＬと任意のＷｅｂサービス３０１ｂを連携させる。これにより、以下のようなＷｅｂサービスの自動化（アプリケーションの構築）が可能になる。 <About WebAPI>
Some examples of Web API when the recognition server 30 is provided as a Web server will be described.
1. 1. The number of faces in the image pickup device 10 is returned.
http://rirs.xxxxxxx.com/mysite/cgi-bin/face.cgi
2. 2. The number of people in the image pickup device 10 is returned.
http://rirs.xxxxxxx.com/mysite/cgi-bin/human.cgi
3. 3. The name of what is reflected in the image pickup apparatus 10 is returned.
http://rirs.xxxxxxx.com/mysite/cgi-bin/god.cgi
4.1 The number of faces in the image pickup device 10 is returned every 1 sec.
http://rirs.xxxxxxx.com/mysite/cgi-bin/start_face.cgi
Stop returning the number of faces in the image pickup device 10 every 5.1 seconds.
http://rirs.xxxxxxx.com/mysite/cgi-bin/stop_face.cgi
The user accesses the linked Web service 301a with a smartphone or a PC, and links the URLs 1 to 5 with an arbitrary Web service 301b. This makes it possible to automate the following Web services (construction of applications).

図４は、複数のＷｅｂサービスの連携の一例を示す図である。図４（ａ）の１番目のＷｅｂサービスは本実施形態の認識サーバ３０が提供するＷｅｂサービスであり、例えば「１．撮像装置１０に写っている顔の数を返す。」である。Ｗｅｂサービス３０１ｂは「席に人がいたら扇風機などを自動的にＯＮにする」である。ユーザ又は制御対象３０４は扇風機である。連携Ｗｅｂサービス３０１ａは１番目と２番目のＷｅｂサービスを連携させる。これにより、撮像装置１０に顔が写っている場合、自動的に扇風機をＯＮにするアプリケーションが実現される。 FIG. 4 is a diagram showing an example of cooperation between a plurality of Web services. The first Web service in FIG. 4A is a Web service provided by the recognition server 30 of the present embodiment, and is, for example, "1. Returns the number of faces captured in the image pickup device 10." The Web service 301b is "automatically turn on the electric fan or the like when there is a person in the seat". The user or the control target 304 is an electric fan. The linked Web service 301a links the first and second Web services. As a result, an application for automatically turning on the electric fan when a face is reflected in the image pickup device 10 is realized.

図４（ｂ）の１番目のＷｅｂサービスは本実施形態の認識サーバ３０が提供するＷｅｂサービスであり、例えば「２．撮像装置１０に写っている人の数を返す。」である。Ｗｅｂサービス３０１ｃは「指定された電話番号に自動で電話をかける」である。ユーザ又は制御対象３０４はユーザへの着信や電話をかける装置が相当する。連携Ｗｅｂサービス３０１ａは１番目と２番目のＷｅｂサービスを連携させる。これにより、夜間に人が侵入した場合、自動的に指定された電話番号に電話をかけるアプリケーションが実現される。 The first Web service in FIG. 4B is a Web service provided by the recognition server 30 of the present embodiment, and is, for example, "2. Returns the number of people in the image pickup device 10." The Web service 301c is "automatically make a call to a designated telephone number". The user or the control target 304 corresponds to a device that makes an incoming call or a telephone call to the user. The linked Web service 301a links the first and second Web services. As a result, when a person intrudes at night, an application that automatically calls a specified telephone number is realized.

＜認識サーバ３０のハードウェア構成例＞
図５は、認識サーバ３０の概略的なハードウェア構成図の一例である。認識サーバ３０は、ＣＰＵ２０１と、ＣＰＵ２０１が使用するデータの高速アクセスを可能とするメモリ２０２とを備える。ＣＰＵ２０１及びメモリ２０２は、システム・バス２０３を介して、認識サーバ３０の他のデバイス又はドライバ、例えば、グラフィックス・ドライバ２０４及びネットワーク・ドライバ（ＮＩＣ）２０５へと接続されている。
グラフィックス・ドライバ２０４は、バスを介してＬＣＤ（ディスプレイ装置）２０６に接続されて、ＣＰＵ２０１による処理結果をモニタする。また、ネットワーク・ドライバ２０５は、トランスポート層レベル及び物理層レベルで認識サーバ３０をネットワークＮへと接続して、撮像装置１０やＷｅｂサービス３０１とのセッションを確立させている。
システム・バス２０３には、さらにＩ／Ｏバス・ブリッジ２０７が接続されている。Ｉ／Ｏバス・ブリッジ２０７の下流側には、ＰＣＩなどのＩ／Ｏバス２０８を介して、ＩＤＥ、ＡＴＡ、ＡＴＡＰＩ、シリアルＡＴＡ、ＳＣＳＩ、ＵＳＢなどにより、ＨＤＤ（ハードディスクドライブ）２０９などの記憶装置が接続されている。ＨＤＤ２０９は認識サーバ３０の全体を制御するプログラム２０９ｐを記憶している。ＨＤＤ２０９はＳＳＤ（Solid State Drive）でもよい。 <Hardware configuration example of recognition server 30>
FIG. 5 is an example of a schematic hardware configuration diagram of the recognition server 30. The recognition server 30 includes a CPU 201 and a memory 202 that enables high-speed access to data used by the CPU 201. The CPU 201 and the memory 202 are connected to other devices or drivers of the recognition server 30, such as the graphics driver 204 and the network driver (NIC) 205, via the system bus 203.
The graphics driver 204 is connected to the LCD (display device) 206 via a bus to monitor the processing result by the CPU 201. Further, the network driver 205 connects the recognition server 30 to the network N at the transport layer level and the physical layer level to establish a session with the image pickup apparatus 10 and the Web service 301.
An I / O bus bridge 207 is further connected to the system bus 203. On the downstream side of the I / O bus bridge 207, a storage device such as an HDD (hard disk drive) 209 is used by IDE, ATA, ATAPI, serial ATA, SCSI, USB, etc. via an I / O bus 208 such as PCI. Is connected. The HDD 209 stores a program 209p that controls the entire recognition server 30. HDD 209 may be SSD (Solid State Drive).

また、Ｉ／Ｏバス２０８には、ＵＳＢなどのバスを介して、キーボード及びマウス（ポインティング・デバイスと呼ばれる）などの入力装置２１０が接続され、システム管理者などのオペレータによる入力及び指令を受け付けている。 Further, an input device 210 such as a keyboard and a mouse (called a pointing device) is connected to the I / O bus 208 via a bus such as USB, and receives inputs and commands from an operator such as a system administrator. There is.

なお、図示した認識サーバ３０のハードウェア構成は、１つの筐体に収納されていたりひとまとまりの装置として備えられていたりする必要はなく、認識サーバ３０が備えていることが好ましいハード的な要素を示す。また、クラウドコンピューティングに対応するため、本実施例の認識サーバ３０の物理的な構成は固定的でなくてもよく、負荷に応じてハード的なリソースが動的に接続・切断されることで構成されてよい。 The hardware configuration of the recognition server 30 shown in the figure does not have to be housed in one housing or provided as a group of devices, and it is preferable that the recognition server 30 has a hardware element. Is shown. Further, in order to support cloud computing, the physical configuration of the recognition server 30 of this embodiment does not have to be fixed, and hardware resources are dynamically connected / disconnected according to the load. It may be configured.

＜機能について＞
図６は、認識サーバ３０の機能をブロック状に示す機能ブロック図の一例である。認識サーバ３０は、動画展開部３１、認識処理部３３、及び認識処理パラメータ設定部３２を有している。これらは、図５に示されているＣＰＵ２０１がプログラム２０９ｐに従って出力する命令等によって実現される機能又は手段である。 <About functions>
FIG. 6 is an example of a functional block diagram showing the functions of the recognition server 30 in a block shape. The recognition server 30 has a moving image development unit 31, a recognition processing unit 33, and a recognition processing parameter setting unit 32. These are functions or means realized by an instruction or the like output by the CPU 201 shown in FIG. 5 according to the program 209p.

動画展開部３１は、撮像装置１０などのＩＯＴから動画を取得したり、ＳＤカードなどの記憶媒体から動画を読み出したり、動画を展開したりする。例えばMpeg1, Mpeg2,Mpeg4、又はH.264のデコーダである。動画展開部３１は動画に含まれる静止画像（後述するＩ，Ｐ，Ｂフレーム）を抽出する。 The moving image development unit 31 acquires a moving image from an IOT such as an imaging device 10, reads a moving image from a storage medium such as an SD card, and develops the moving image. For example, Mpeg1, Mpeg2, Mpeg4, or H.264 decoder. The moving image development unit 31 extracts still images (I, P, B frames described later) included in the moving image.

認識処理パラメータ設定部３２は、認識処理部３３が画像の認識処理を行う際のパラメータを認識処理部３３に設定する。詳細は後述されるが、Ｉ，Ｐ，Ｂフレームのいずれであるかに応じて認識処理部３３に認識処理のためのパラメータを設定する。また、Ｉ，Ｐ，Ｂフレームのいずれであるかに応じて認識処理を行うか否かを認識処理部３３に通知する。 The recognition processing parameter setting unit 32 sets the parameters when the recognition processing unit 33 performs the image recognition processing in the recognition processing unit 33. Although the details will be described later, parameters for recognition processing are set in the recognition processing unit 33 according to which of the I, P, and B frames. In addition, the recognition processing unit 33 is notified whether or not the recognition processing is performed according to which of the I, P, and B frames.

認識処理部３３は、認識処理パラメータ設定部３２が設定したパラメータに従って画像の認識処理を行う。例えば、認識対象を画像から検出したり、対象の動作を検出したりする。 The recognition processing unit 33 performs image recognition processing according to the parameters set by the recognition processing parameter setting unit 32. For example, the recognition target is detected from the image, or the movement of the target is detected.

＜動画に対する認識方法＞
以下では、認識サーバ３０が動画をどのように認識するかについて説明する。一般に、撮像装置１０が撮像した動画をネットワーク経由で認識サーバ３０まで送信する場合、又は、ＳＤカードなどの記憶媒体に記憶する場合、動画が圧縮される。動画の圧縮方法としては、Mpeg1, Mpeg2,Mpeg4又は、H.264などの標準化された圧縮方法が知られているが、いずれも、圧縮された動画は以下のような画質が異なる３つのフレームで構成されている。なお、この画質とは画像の良さの程度を言うだけでなく、画像の性質が異なることも含んでいる。
Ｉフレーム：そのフレーム全ての情報を保持したフレーム（キーフレームという）
Ｐフレーム：時間的に前の「Ｉフレーム」との差異を保持したフレーム
Ｂフレーム：時間的に前後の「Iフレーム」「Ｐフレーム」の差異を保持したフレーム（Ｂフレームとの差異が保持される場合もある）
また、これらＩ、Ｐ、Ｂフレームに分けて非可逆圧縮を行う技術を、ＧＯＰ（Group of Picture）と呼ぶ。ここで、圧縮率を上げていった場合、圧縮率を上げるほど画質が低下していき、画像認識の精度も低下していく。ただし、圧縮された画像は上記のようなフレーム構成となっており、圧縮する際には、例えばIフレームはシーンが変わった場合のような前フレームからの変化が大きいフレームで入力されたり、一定間隔で入力されたりする。このような理由で、動きが激しいシーンにおいては、Ｉフレームの画質は差分データであるＰフレームやＢフレームと比較して劣化が少ない。また、逆に背景が固定されているようなシーンの場合には、Iフレームと比較して、差分の情報を蓄積できるＰフレームやＢフレームの方が画質の劣化が少ない。 <Recognition method for video>
Hereinafter, how the recognition server 30 recognizes the moving image will be described. Generally, when the moving image captured by the imaging device 10 is transmitted to the recognition server 30 via the network or stored in a storage medium such as an SD card, the moving image is compressed. As a video compression method, standardized compression methods such as Mpeg1, Mpeg2, Mpeg4, and H.264 are known, but in each case, the compressed video has the following three frames with different image quality. It is configured. It should be noted that this image quality not only refers to the degree of goodness of the image, but also includes that the properties of the image are different.
I-frame: A frame that holds all the information in that frame (called a key frame)
P frame: A frame that retains the difference from the previous "I frame" in time B frame: A frame that retains the difference between the "I frame" and "P frame" before and after in time (the difference from the B frame is retained) In some cases)
Further, a technique for performing lossy compression by dividing into these I, P, and B frames is called a GOP (Group of Picture). Here, when the compression rate is increased, the image quality deteriorates as the compression rate increases, and the accuracy of image recognition also decreases. However, the compressed image has the above frame structure, and when compressing, for example, the I frame is input as a frame with a large change from the previous frame, such as when the scene changes, or is constant. It may be entered at intervals. For this reason, in a scene with a lot of movement, the image quality of the I frame is less deteriorated than that of the P frame and the B frame which are the difference data. On the contrary, in the case of a scene in which the background is fixed, the image quality of the P frame and the B frame, which can store the difference information, is less deteriorated than that of the I frame.

<<画像の変化に基づくフレームの選択>>
Iフレームの画質は、シーンが大きく変わるフレームで多いので、例えば、背景が常時変わっていく車載された撮像装置１０が撮像した動画を認識サーバ３０が認識する場合、画質の低いＰフレームや、Ｂフレームよりも、Iフレームのみを選んで使用し画像認識処理を行うことで、認識精度を上げることができる。すなわち、変化が大きいフレームの場合には、認識サーバ３０は画質が高いと考えられるIフレームを用いて画像認識処理を行う。 << Selection of frame based on image change >>
Since the image quality of the I frame is often the frame in which the scene changes significantly, for example, when the recognition server 30 recognizes the moving image captured by the in-vehicle image pickup device 10 whose background is constantly changing, the P frame having a low image quality or the B The recognition accuracy can be improved by selecting and using only the I frame rather than the frame and performing image recognition processing. That is, in the case of a frame having a large change, the recognition server 30 performs image recognition processing using an I frame considered to have high image quality.

逆に、動きが少ないフレームではＰフレームやＢフレームの方がＩフレームよりも画質が高いので、認識サーバ３０はＰ、Ｂフレームで画像認識処理を行う。 On the contrary, since the image quality of the P frame and the B frame is higher than that of the I frame in the frame with less movement, the recognition server 30 performs the image recognition process in the P and B frames.

従って、変化の程度を判断し、変化の大小に応じて認識処理を行うフレームを切り替えることが好適になる。変化の大小は、例えば着目しているフレームと時間的に一つ前のフレームとの画素値の差をとり、その絶対値の総和で判断することができる。 Therefore, it is preferable to determine the degree of change and switch the frame for performing the recognition process according to the magnitude of the change. The magnitude of the change can be determined, for example, by taking the difference in pixel values between the frame of interest and the frame immediately before in time, and the sum of the absolute values.

図７は、認識処理パラメータ設定部３２が変化の大小に応じて認識処理を行うフレームを決定する手順を示すフローチャート図の一例である。 FIG. 7 is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit 32 determines a frame for performing recognition processing according to the magnitude of the change.

まず、動画展開部３１は動画をＩ，Ｐ、Ｂフレームに展開する（Ｓ１０）。 First, the moving image development unit 31 develops the moving image into I, P, and B frames (S10).

認識処理パラメータ設定部３２は、時系列の２つのフレームの差分を算出する（Ｓ２０）。この場合の２つのフレームは、時間的に最も近いＩフレーム同士、時間的に最も近いＰフレーム同士、又は、時間的に最も近いＢフレーム同士の少なくとも１つ以上である。 The recognition processing parameter setting unit 32 calculates the difference between the two frames in the time series (S20). In this case, the two frames are at least one or more of the I frames closest in time, the P frames closest in time, or the B frames closest in time.

次に、認識処理パラメータ設定部３２は、差分がしきい値以上か否かを判断する（Ｓ３０）。ステップＳ３０の判断がＹｅｓの場合、変化（動き）が大きいので、認識処理パラメータ設定部３２は画質がよいと考えられるＩフレームに画像認識処理を行うよう認識処理部３３に通知する（Ｓ４０）。 Next, the recognition processing parameter setting unit 32 determines whether or not the difference is equal to or greater than the threshold value (S30). If the determination in step S30 is Yes, the change (movement) is large, so the recognition processing parameter setting unit 32 notifies the recognition processing unit 33 to perform the image recognition processing on the I frame considered to have good image quality (S40).

ステップＳ２０の判断がＮｏの場合、変化（動き）が少ないので、認識処理パラメータ設定部３２は画質がよいと考えられるＰ又はＢフレームに画像認識処理を行うよう認識処理部３３に通知する（Ｓ５０）。 If the determination in step S20 is No, there is little change (movement), so the recognition processing parameter setting unit 32 notifies the recognition processing unit 33 to perform image recognition processing on the P or B frame considered to have good image quality (S50). ).

こうすることで画質が高いフレームを選択して画像認識処理を行うことができるので、トータルの認識精度を上げることができる。 By doing so, it is possible to select a frame with high image quality and perform image recognition processing, so that the total recognition accuracy can be improved.

<<テンプレートの変更>>
図７の処理のようにＩフレーム又はＰ、Ｂフレームを選択した場合、さらにテンプレートも変更することが有効である。テンプレートとは、画像から認識される対象の特徴が反映された比較の基準となるデータ（画像認識の場合は画像に関するデータ）である。例えば、画像内の看板等を検出する場合を考える。背景が変化する車載された撮像装置１０の動画の場合、Iフレームでは、比較的看板がはっきりと写っているのに対し、ＢやＰフレームではぼけて写っている場合が多い。従って、看板とマッチしやすいテンプレートもIフレームとＢ、Ｐフレームとで異なる。 << Change template >>
When an I frame or a P or B frame is selected as in the process of FIG. 7, it is effective to further change the template. The template is data that serves as a reference for comparison (data related to an image in the case of image recognition) that reflects the characteristics of the object recognized from the image. For example, consider the case of detecting a signboard or the like in an image. In the case of a moving image of the in-vehicle imaging device 10 whose background changes, the signboard is relatively clearly shown in the I frame, whereas it is often blurred in the B or P frame. Therefore, the template that easily matches the signboard is also different between the I frame and the B and P frames.

そこで、Iフレームでのテンプレートと、Ｂ，Ｐフレームでのテンプレートを変えて、認識処理部３３がテンプレートマッチングを行うことで、認識精度を高めることができる。 Therefore, the recognition accuracy can be improved by changing the template in the I frame and the template in the B and P frames and performing template matching by the recognition processing unit 33.

図８は、認識処理パラメータ設定部３２が画像の変化の大小に応じて認識処理を行うフレームとテンプレートを決定する手順を示すフローチャート図の一例である。なお、図８では主に図７との相違を説明する。 FIG. 8 is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit 32 determines a frame and a template for performing recognition processing according to the magnitude of change in the image. Note that FIG. 8 mainly describes the difference from FIG. 7.

ステップＳ１０〜Ｓ５０は図７と同様である。ステップＳ６０で、認識処理パラメータ設定部３２はＩフレーム用のテンプレートを認識処理部３３に指示し、ステップＳ７０で、認識処理パラメータ設定部３２はＰ，Ｂフレーム用のテンプレートを認識処理部３３に指示する。 Steps S10 to S50 are the same as in FIG. In step S60, the recognition processing parameter setting unit 32 instructs the recognition processing unit 33 of the template for the I frame, and in step S70, the recognition processing parameter setting unit 32 instructs the recognition processing unit 33 of the template for the P and B frames. To do.

これにより、認識処理部３３はフレームの種類に適切なテンプレートで画像に認識処理を行うことができる。 As a result, the recognition processing unit 33 can perform recognition processing on the image with a template suitable for the type of frame.

<<画質又はフレームの種類に応じた対象を検出するしきい値の変更>>
画質が劣化したフレームに対して画像認識処理が実行されると、対象物体の検出率が落ちる上に、誤検出が多く発生してしまうという不都合がある。検出率の低下と誤検出はどちらも少ない方がよい。例えば、検出率が低下した場合、動画であれば別のフレームで検出可能な場合があるが、所定の制御が遅れたり制御できなかったりする状況が生じうる。また、誤検出が増えた場合、不要警報が増大する。例えば車両の自動ブレーキのために周囲の障害物を認識サーバ３０（又は車載の撮像装置１０でもよい）が認識する場合、周囲に障害物がないのに、障害物があると誤検出して車両がブレーキをかけるおそれがある。動画であれば不要警報はさらに増大してしまう。 << Change the threshold value to detect the target according to the image quality or frame type >>
When the image recognition process is executed for a frame whose image quality has deteriorated, there is a disadvantage that the detection rate of the target object is lowered and many false detections occur. It is better that both the decrease in the detection rate and the false detection are small. For example, when the detection rate is lowered, if it is a moving image, it may be detected in another frame, but there may be a situation where a predetermined control is delayed or cannot be controlled. In addition, when the number of false positives increases, unnecessary alarms increase. For example, when the recognition server 30 (or the in-vehicle imaging device 10 may be used) recognizes surrounding obstacles for automatic braking of the vehicle, the vehicle erroneously detects that there are obstacles even though there are no obstacles in the surroundings. May brake. If it is a video, unnecessary alarms will increase further.

このような不都合を抑制するためには、画質に応じて、物体を検出したと判断するか否かのしきい値を変更することが有効である。すなわち、画質が低い場合にはしきい値を高く設定することで、認識サーバ３０は検出対象の物体らしさが高くならないと検出しないようになる。従って、誤検出を低減できる。逆に画質が高い場合、しきい値を低く設定することで、物体らしさが高くなくても対象を検出するので認識率が向上する。従って、検出率を向上させ、誤検出を抑制でき、トータルの認識精度を上げることができる。 In order to suppress such inconvenience, it is effective to change the threshold value of whether or not it is determined that the object has been detected according to the image quality. That is, when the image quality is low, the threshold value is set high so that the recognition server 30 does not detect the object unless the object to be detected becomes high. Therefore, erroneous detection can be reduced. On the contrary, when the image quality is high, by setting the threshold value low, the object is detected even if the object-likeness is not high, so that the recognition rate is improved. Therefore, the detection rate can be improved, erroneous detection can be suppressed, and the total recognition accuracy can be improved.

画質に関する情報は、PSNR（ピーク信号対雑音比：Peak Signal-to-Noise Ratio）やSSIM（Structural Similarity)と言った画質指標値に基づき判断できる。PNSRは、値が小さいほど劣化が大きく、値が高いほど劣化していないことを示す。SSIMは人間が感じる違いをより正確に指標化した画質指標値である。 Information on image quality can be determined based on image quality index values such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity). The smaller the value of PNSR, the greater the deterioration, and the higher the value, the less the deterioration. SSIM is an image quality index value that more accurately indexes the differences that humans perceive.

画質指標値は認識サーバ３０が算出できる。しかし、撮像装置１０が圧縮する際に、圧縮前後のフレームを比較し、PSNRやSSIMと言った画質指標値を各フレームごとに記憶しておくことが好ましい。認識処理パラメータ設定部３２はPSNRやSSIMに基づいて、物体を検出したと判断するか否かのしきい値を変更できる。 The image quality index value can be calculated by the recognition server 30. However, when the image pickup apparatus 10 compresses, it is preferable to compare the frames before and after the compression and store the image quality index values such as PSNR and SSIM for each frame. The recognition processing parameter setting unit 32 can change the threshold value for determining whether or not an object has been detected based on PSNR or SSIM.

図９（ａ）は、認識処理パラメータ設定部３２が画質に応じて対象検出のしきい値を変更する手順を示すフローチャート図の一例である。 FIG. 9A is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit 32 changes the target detection threshold value according to the image quality.

認識処理パラメータ設定部３２は、各フレームの画質指標値を取得する（Ｓ２０）。画質は１つのフレームから判断可能なので、Ｉ、Ｐ、Ｂのそれぞれのフレームから画質指標値を取得する。あるいは、数フレーム置きに画質指標値を取得してもよい。 The recognition processing parameter setting unit 32 acquires the image quality index value of each frame (S20). Since the image quality can be determined from one frame, the image quality index value is acquired from each of the I, P, and B frames. Alternatively, the image quality index value may be acquired every few frames.

次に、認識処理パラメータ設定部３２は、画質指標値がしきい値以上か否かを判断する（Ｓ３０）。ステップＳ３０の判断がＹｅｓの場合、画質が高いので物体を検出したと判断するか否かのしきい値をしきい値Ａに設定する（Ｓ４０）。しきい値Ａは、認識の対象物をある程度の確率で検出できる比較的、低い値である。 Next, the recognition processing parameter setting unit 32 determines whether or not the image quality index value is equal to or greater than the threshold value (S30). If the determination in step S30 is Yes, the threshold value for determining whether or not an object has been detected is set to the threshold value A because the image quality is high (S40). The threshold value A is a relatively low value that can detect the object to be recognized with a certain probability.

ステップＳ３０の判断がＮｏの場合、画質が低いので物体を検出したと判断するか否かのしきい値をしきい値Ｂに設定する（Ｓ５０）。しきい値Ｂは、認識の対象物の確からしさが十分に高いと考えられる値である。なお、しきい値Ａ＜しきい値Ｂである。 If the determination in step S30 is No, the image quality is low, so a threshold value for determining whether or not an object has been detected is set in the threshold value B (S50). The threshold value B is a value at which the certainty of the object to be recognized is considered to be sufficiently high. It should be noted that the threshold value A <threshold value B.

こうすることで画質に応じて物体を検出したと判断するか否かのしきい値を変更できるので、検出率の低下を抑制し誤検出を低下させることができる。 By doing so, it is possible to change the threshold value for determining whether or not the object is detected according to the image quality, so that it is possible to suppress the decrease in the detection rate and reduce the false detection.

また、画質はフレームの種類によってある程度、推定することができる。例えば、上記のように背景が常に変わっていく車載された撮像装置１０の動画のような場合、Ｉフレーム（キーフレーム）の画質が高いので、Ｉフレームか否かで物体を検出したと判断するか否かのしきい値を変更することができる。 In addition, the image quality can be estimated to some extent depending on the type of frame. For example, in the case of a moving image of an in-vehicle imaging device 10 whose background is constantly changing as described above, since the image quality of the I frame (key frame) is high, it is determined that the object is detected depending on whether it is the I frame or not. You can change the threshold for whether or not.

図９（ｂ）は、認識処理パラメータ設定部３２がフレームの種類に応じて対象検出のしきい値を変更する手順を示すフローチャート図の一例である。 FIG. 9B is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit 32 changes the target detection threshold value according to the type of frame.

次に、認識処理パラメータ設定部３２は、展開された動画のフレームがＩフレームか否かを判断する（Ｓ２０）。 Next, the recognition processing parameter setting unit 32 determines whether or not the frame of the developed moving image is an I frame (S20).

Ｉフレームの場合、画質が高いと推定されるので物体を検出したと判断するか否かのしきい値をしきい値Ａに設定する（Ｓ３０）。 In the case of the I frame, since it is estimated that the image quality is high, the threshold value for determining whether or not the object is detected is set to the threshold value A (S30).

Ｉフレームでない場合、画質が低いと推定されるので、物体を検出したと判断するか否かのしきい値をしきい値Ｂに設定する（Ｓ４０）。なお、しきい値Ａ＜しきい値Ｂである。 If it is not an I frame, it is estimated that the image quality is low, so a threshold value for determining whether or not an object has been detected is set in the threshold value B (S40). It should be noted that the threshold value A <threshold value B.

なお、図９（ｂ）ではＩフレームに対し物体を検出したと判断するか否かのしきい値を小さくしたが、背景が固定されているようなシーンの場合には、画質が高いと推定されるＰフレーム又はＢフレームに対し物体を検出したと判断するか否かのしきい値を小さくしてもよい。 In FIG. 9B, the threshold value for determining whether or not an object was detected for the I frame was reduced, but it is estimated that the image quality is high in the case of a scene in which the background is fixed. The threshold value for determining whether or not an object has been detected may be reduced with respect to the P frame or the B frame.

物体を検出したと判断するか否かのしきい値に限らず、以下のようなパラメータをフレームの種類に応じて設定してもよい。例えば信号機の色情報の検出において、信号機はカラーであるが、各フレーム種類により色情報の精度が異なる。このため、フレームの種類によって、例えば赤信号を検出するしきい値や、色範囲を変更することで認識精度を高めることができる。信号機の赤、青、黄の色情報をＹ，Ｕ，Ｖで表す場合、フレーム種類によって色情報が取り得る範囲をY_min, Y_max, U_min, U_max, V_min, V_maxで設定する。例えば、Ｉフレームで赤信号と判断するY_min, Y_max, U_min, U_max, V_min, V_maxと、Ｐ、Ｂフレームで赤信号と判断するY_min, Y_max, U_min, U_max, V_min, V_maxを変更する。青信号や黄信号についても同様である。 Not limited to the threshold value for determining whether or not an object has been detected, the following parameters may be set according to the type of frame. For example, in detecting the color information of a traffic light, the traffic light is in color, but the accuracy of the color information differs depending on each frame type. Therefore, the recognition accuracy can be improved by changing, for example, the threshold value for detecting a red signal or the color range depending on the type of frame. When the red, blue, and yellow color information of a traffic light is represented by Y, U, and V, the range in which the color information can be obtained is set by Y_min, Y_max, U_min, U_max, V_min, and V_max depending on the frame type. For example, Y_min, Y_max, U_min, U_max, V_min, V_max, which are judged to be red lights in the I frame, and Y_min, Y_max, U_min, U_max, V_min, V_max, which are judged to be red lights in the P and B frames, are changed. The same applies to green and yellow traffic lights.

図９（ｃ）は、認識処理パラメータ設定部３２がフレームの種類に応じて色情報のしきい値を変更する手順を示すフローチャート図の一例である。ステップＳ１０、Ｓ２０は図９（ｂ）と同様である。 FIG. 9C is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit 32 changes the threshold value of the color information according to the type of the frame. Steps S10 and S20 are the same as in FIG. 9B.

ステップＳ３０では、認識処理パラメータ設定部３２は、Ｉフレーム用の色情報の検出のためのしきい値又は色範囲を設定し、ステップＳ４０でＰ，Ｂフレーム用の色情報の検出のためのしきい値又は色範囲を設定する。 In step S30, the recognition processing parameter setting unit 32 sets a threshold value or a color range for detecting the color information for the I frame, and in step S40, sets the threshold value or the color range for detecting the color information for the P and B frames. Set a threshold or color range.

このように、フレームの種類に応じて色情報の検出のためのしきい値又は色範囲を設定することで、色の検出精度が向上される。 In this way, by setting the threshold value or the color range for detecting the color information according to the type of the frame, the color detection accuracy is improved.

<<Ｉフレーム又はＰ，Ｂフレームのみを用いた認識処理>>
Ｉフレーム又はＰ，Ｂフレームのみを用いて認識処理を行ってもよい。例えば、背景が次々に変化する場所に設置された監視用の撮像装置１０が撮像した画像データから人物を認識する画像処理において、画質が高いIフレームの画像のみを用いて認識することで、検出率を向上させ誤検出を低下させることができる。また、背景がほとんど変化しない場所に設置される監視用の撮像装置１０においては、動き情報がない分、積算の情報量が多いＢフレームやＰフレームの方が、画質が良くなる場合があるため、例えばＰフレームのみを用いて認識処理を行う。 << Recognition processing using only I frame or P, B frame >>
The recognition process may be performed using only the I frame or the P and B frames. For example, in image processing for recognizing a person from image data captured by a monitoring image pickup device 10 installed in a place where the background changes one after another, detection is performed by recognizing only a high-quality I-frame image. The rate can be increased and false positives can be reduced. Further, in the imaging device 10 for monitoring installed in a place where the background hardly changes, the image quality may be better in the B frame and the P frame having a large amount of integrated information because there is no motion information. For example, the recognition process is performed using only the P frame.

図１０は、認識処理パラメータ設定部３２が特定のフレームを認識対象に指示する手順を示すフローチャート図の一例である。図１０では、背景が次々に変化する場所に設置された監視用の撮像装置１０が撮像した画像データ内の人物を認識する画像処理を認識サーバ３０が行うものとする。 FIG. 10 is an example of a flowchart showing a procedure in which the recognition processing parameter setting unit 32 instructs a recognition target of a specific frame. In FIG. 10, it is assumed that the recognition server 30 performs image processing for recognizing a person in the image data captured by the monitoring image pickup device 10 installed in a place where the background changes one after another.

まず、動画展開部３１は動画をＩ，Ｐ，Ｂフレームに展開する（Ｓ１０）。 First, the moving image development unit 31 develops the moving image into I, P, and B frames (S10).

次に、認識処理パラメータ設定部３２は、Ｉフレームか否かを判断する（Ｓ２０）。Ｉフレームの場合、認識処理パラメータ設定部３２は認識処理部３３に対しＩフレームに対する認識処理を許可する（Ｓ３０）。 Next, the recognition processing parameter setting unit 32 determines whether or not it is an I frame (S20). In the case of an I-frame, the recognition processing parameter setting unit 32 allows the recognition processing unit 33 to perform recognition processing for the I-frame (S30).

Ｉフレームでない場合、認識処理パラメータ設定部３２は認識処理部３３に対しＩフレームに対する認識処理を許可しない（Ｓ４０）。 If it is not an I frame, the recognition processing parameter setting unit 32 does not allow the recognition processing unit 33 to perform recognition processing for the I frame (S40).

こうすることで、画質が高いIフレームの画像のみを用いて認識処理を行うことができる。 By doing so, the recognition process can be performed using only the high-quality I-frame image.

図１０では、背景が次々に変化する場所に設置された監視用の撮像装置１０が撮像した画像データの人物を認識する画像処理を例にしたが、背景がほとんど変化しない場所に設置する撮像装置１０の画像データに対する画像処理では、Ｐ又はＢフレームに対し認識処理が許可され、Ｉフレームには認識処理が許可されない。 In FIG. 10, an image processing for recognizing a person in image data captured by a monitoring image pickup device 10 installed in a place where the background changes one after another is taken as an example, but an image pickup device installed in a place where the background hardly changes. In the image processing for the image data of 10, the recognition process is permitted for the P or B frame, and the recognition process is not permitted for the I frame.

また、図１０では、フレームの種類で認識処理の有無を制御したが、図７，８のように変化の大きさでＩフレームかＰ，Ｂフレームかを判断し、認識処理の有無を制御してもよい。この場合、パラメータの設定は認識処理の対象となるフレームに対してのみ行えばよい。また、図９のように画質指標値で画質の良否、又はフレームの種類を判断し、認識処理の有無を制御してもよい。画質が良いフレームは認識処理の対象となる。この場合、パラメータの設定は認識処理の対象となるフレームに対してのみ行えばよい。 Further, in FIG. 10, the presence / absence of recognition processing is controlled by the type of frame, but as shown in FIGS. 7 and 8, it is determined whether the frame is an I frame or a P or B frame, and the presence / absence of recognition processing is controlled. You may. In this case, the parameters need only be set for the frame to be recognized. Further, as shown in FIG. 9, the quality of the image quality or the type of the frame may be determined from the image quality index value to control the presence or absence of the recognition process. Frames with good image quality are subject to recognition processing. In this case, the parameters need only be set for the frame to be recognized.

＜まとめ＞
以上のように、フレームの種類に応じて認識処理のパラメータを変更することで、高圧縮率の動画に対しても高い精度の画像認識が可能となる。また、所定の種類のフレームのみに認識処理を行うことで、高圧縮率の動画に対して高い精度の画像認識が可能となる。圧縮率が高いので通信コスト又は記憶装置のコストを低減でき、画像データが高圧縮率でも認識精度の低下を抑制できる。 <Summary>
As described above, by changing the recognition processing parameters according to the type of frame, highly accurate image recognition is possible even for a moving image having a high compression rate. Further, by performing the recognition process only on a predetermined type of frame, it is possible to recognize an image with high accuracy for a moving image having a high compression rate. Since the compression rate is high, the communication cost or the cost of the storage device can be reduced, and even if the image data has a high compression rate, a decrease in recognition accuracy can be suppressed.

＜その他の適用例＞
以上、本発明を実施するための最良の形態について実施例を用いて説明したが、本発明はこうした実施例に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 <Other application examples>
The best mode for carrying out the present invention has been described above with reference to Examples, but the present invention is not limited to these Examples, and various modifications are made without departing from the gist of the present invention. And substitutions can be made.

例えば、本実施形態では、動画を例に説明したが連続した静止画像であって、同様のフレームを有する画像データに対して適用できる。 For example, in the present embodiment, although moving images have been described as an example, it can be applied to continuous still images and image data having similar frames.

また、動画の種類としてMpeg1, Mpeg2,又は、H.264を挙げたが、AVI（Audio Video Interleave）、MOV、ASF（Advanced Systems Format）、Ogg（オッグ）、Matroska（マトリョーシカ）、DivX（デイビックス）等の動画にも適用できる。また、Ｉ，Ｐ，Ｂフレームを有する動画であれば、これらの規格以外の動画に適用できる。 In addition, Mpeg1, Mpeg2, or H.264 was mentioned as the type of video, but AVI (Audio Video Interleave), MOV, ASF (Advanced Systems Format), Ogg (Ogg), Matroska (Matroska), DivX (Davix) ) Etc. can also be applied to videos. Further, any moving image having I, P, and B frames can be applied to moving images other than these standards.

なお、動画展開部３１は動画展開手段の一例であり、認識処理パラメータ設定部３２は認識処理パラメータ設定手段の一例であり、認識処理部３３は画像認識手段の一例である。Ｉフレームは第一のフレームの一例であり、Ｐフレームは第二のフレームの一例であり、Ｂフレームは第三のフレームの一例である。 The moving image developing unit 31 is an example of the moving image developing means, the recognition processing parameter setting unit 32 is an example of the recognition processing parameter setting means, and the recognition processing unit 33 is an example of the image recognition means. The I frame is an example of the first frame, the P frame is an example of the second frame, and the B frame is an example of the third frame.

１０撮像装置
３０認識サーバ
３１動画展開部
３２認識処理パラメータ設定部
３３認識処理部
１００画像処理システム 10 Image pickup device 30 Recognition server 31 Video development unit 32 Recognition processing parameter setting unit 33 Recognition processing unit 100 Image processing system

特開2009-88944号公報Japanese Unexamined Patent Publication No. 2009-88944

Claims

An information processing device that recognizes an object from an image
A video expansion method that expands video data into multiple still images,
A recognition processing parameter setting means that determines whether or not to be the target of the recognition processing according to the type of the still image and sets the parameters of the recognition processing.
Have a, an image recognition means for performing recognition processing of the object with respect to the type of the still image determined to be a target for the recognition parameter setting means recognition process,
The moving image data has an image quality index value for each still image.
The information processing device is characterized in that the recognition processing parameter setting means determines the still image whose image quality index value is equal to or higher than a threshold value as a target of recognition processing .

The type of still image is the first frame that holds all the information of the frame.
A second frame that retains the difference from the previous first frame in time, or
The information processing apparatus according to claim 1 , which is a third frame that retains the difference between the first frame and the second frame that are back and forth in time.

The information processing apparatus according to claim 2 , wherein the recognition processing parameter setting means determines the first frame as a target of recognition processing.

The recognition processing parameter setting means calculates the magnitude of the change in the still image in time series, and obtains the magnitude of the change.
When the size is equal to or larger than the threshold value, the first frame is determined as the target of recognition processing.
The information processing apparatus according to claim 2 , wherein when the size is not equal to or larger than a threshold value, the second frame or the third frame is determined as the target of recognition processing.

An image processing system having an information processing device that detects an object from an image and a Web service that communicates via a network.
A video expansion method that expands video data into multiple still images,
The moving image data has an image quality index value for each still image, and the still image whose image quality index value is equal to or higher than the threshold value is determined as a target of recognition processing, and depends on the type of the determined still image. The recognition processing parameter setting means for setting the recognition processing parameters,
It has an image recognition means that performs the recognition processing of the target on the still image based on the parameters set by the recognition processing parameter setting means.
An image processing system in which the information processing device transmits the recognition result to a Web service via a Web API to which the recognition result is transmitted to provide information created by the Web service, or the Web service controls a device.

An information processing device that recognizes an object from an image, a video expansion means that expands video data into multiple still images, and
The moving image data has an image quality index value for each still image, and the still image whose image quality index value is equal to or higher than the threshold value is determined as a target of recognition processing, and depends on the type of the determined still image. The recognition processing parameter setting means for setting the recognition processing parameters,
An image recognition means that performs the target recognition processing on the still image based on the parameters set by the recognition processing parameter setting means.
A program characterized by functioning as