WO2009096208A1 - Object recognition system, object recognition method, and object recognition program - Google Patents

Object recognition system, object recognition method, and object recognition program Download PDF

Info

Publication number
WO2009096208A1
WO2009096208A1 PCT/JP2009/050126 JP2009050126W WO2009096208A1 WO 2009096208 A1 WO2009096208 A1 WO 2009096208A1 JP 2009050126 W JP2009050126 W JP 2009050126W WO 2009096208 A1 WO2009096208 A1 WO 2009096208A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition
still image
probability
image
score
Prior art date
Application number
PCT/JP2009/050126
Other languages
French (fr)
Japanese (ja)
Inventor
Toshinori Hosoi
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Publication of WO2009096208A1 publication Critical patent/WO2009096208A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the present invention relates to an object recognition system, an object recognition method, and an object recognition program for recognizing an object from an image.
  • a technique for recognizing a category of an object is used in various fields.
  • a category is a term in the pattern recognition field, refers to the classification of patterns, and is sometimes called a class.
  • type and “class” apply.
  • an image is identified as either “automobile” or “not a car”, there are two categories, “automobile” and “not a car”.
  • a template feature vector
  • the category to be recognized can be identified from the image.
  • a pattern consists of all kinds of data including images, sounds and characters.
  • a technique for extracting a partial area that seems to be a predetermined object from a moving image and recognizing the category of the object from the image of this partial area mainly a technique of recognizing based on image variation information in time series, and a moving image
  • Non-Patent Document 1 An example of a method for recognizing a category of an object based on image variation information in time series is disclosed in Non-Patent Document 1.
  • the technique disclosed in Non-Patent Document 1 uses an optical flow direction distribution to recognize an object category from a partial region extracted from a moving image. For example, if the recognition target is a rigid body such as an automobile, the direction of the optical flow is uniform overall, but if the recognition target is a non-rigid body such as a pedestrian, the optical flow is not uniform. Identify both.
  • Non-Patent Document 2 discloses an example of a method for directly recognizing a category of an object from a plurality of pieces of image information by regarding a moving image as a plurality of still images.
  • this Non-Patent Document 2 by using the “constrained mutual subspace method” described in Patent Document 1, moving image face recognition is performed using a recognition algorithm that directly obtains a recognition result from a plurality of data. Yes.
  • This method can be recognized accurately even when the frame rate is low, and further, since it can learn including fluctuations of the object, a high recognition rate can be expected.
  • Patent Document 2 An example of a method for recognizing an object category by recognizing each still image constituting a moving image is Patent Document 2, Patent Document 3, Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3 Is disclosed.
  • recognizing a category of an object in a moving image using a recognition method based on still images it is necessary to perform a comprehensive recognition process by integrating individual recognition results for a plurality of still images in time series. .
  • the majority vote method uses a large number of still images in which it is difficult to recognize objects. There was a problem that it could not be recognized.
  • the still image in which the object is difficult to recognize is, for example, “a partial region image in which the object to be recognized cannot be accurately identified” or “a different object is reflected in front of the object to be recognized. ”Image without reflection”, “Image under unexpected illumination fluctuations”, “Image when the posture of the object to be recognized changes significantly”, “Lens distortion, halation, blur, etc. “Images due to image quality fluctuations and noise”.
  • a method of selecting the maximum value among the recognition scores of each still image and using this as an integrated score is also easy.
  • This method can recognize the category of an object almost accurately even when "a number of images in which an object cannot be accurately specified is included in a plurality of partial areas input as recognition targets" Since only the recognition result of a specific still image is used and information on other still images is discarded, excellent performance cannot be obtained.
  • the individual still image recognition rate cannot be 100%, even if an image similar to the object to be recognized is accidentally input, the result that it cannot be identified and is erroneously output is output. It is impossible to avoid accidental misrecognition.
  • each of the general-purpose technologies described above has a problem that there is a situation where it is difficult to recognize an object.
  • a common situation where it is difficult to recognize is “a case where a plurality of images in which an object cannot be accurately specified is included in images of a plurality of partial areas input as recognition targets”.
  • this technique can be generally solved if the maximum score among the recognition scores for each still image is used as the integrated score, but this method is superior because most of the information about still images is discarded. There was a problem that the recognition rate could not be obtained, and a problem that the influence of accidental misrecognition could not be suppressed.
  • FIG. 5 shows an example of a partial area of a recognition target person extracted from each frame image in a moving image of a person, which is caused by a positional shift of the partial area, a size error of the partial area, a change in posture of the recognition target person, and the like.
  • the partial area image that is difficult to recognize is indicated by an arrow.
  • Kanade “A Statistical Method for 3D Object Detection Applied to Faces andCars”, IEEE Conference onComputer Vision and Pattern Recognition, 2000 Japanese Unexamined Patent Publication No. 2000-30065 JP 2005-285011 A Japanese Patent Laid-Open No. 2005-79999
  • An operation in which an integrated score corresponding to a probability that an object in at least n images (n is a natural number equal to or less than the number of all still images) of images is actually a recognition target category is specified in advance.
  • Object recognition system wherein the total score calculation means for calculating according to the equation, that is an object in a moving image on the basis of the total score and a determining means for determining whether a recognition object category.
  • FIG. 4 shows the probability P t1 ( ⁇ c ), in which the still image probability calculation means 130 recognizes each frame image from the still image recognition scores s t1 , s t2 , s t3 ,.
  • the integrated score calculation means 14 calculates “probability that at least one of the plurality of still images is the recognition target category” according to the formula [1] using the respective probabilities obtained for the plurality of still images. Then, an integrated score is calculated based on this probability value (step S150 in FIG. 2, integrated score calculating step).
  • This calculation may be a calculation according to the [Equation 1], a calculation according to the [Equation 2] which is a logarithm of the [Equation 1], or an [Equation 3] or [Equation 4]. It may be calculated according to the formula.
  • Case 1 shown in FIG. 6 is a case where a still image that easily recognizes an object is always input.
  • Case 2 is a case where one still image is extremely difficult to recognize, for example, when extraction of an image region of an object has failed.
  • Case 3 is a case where extraction of an image region of an object has failed with two still images.
  • Case 4 is a case in which it is recognized that the category is generally a recognition target category although it is not easily recognized.
  • Case 5 is a case where it is extremely difficult to recognize most still images.
  • Case 6 is a case where the recognition target category is not generally recognized although it is not easily recognized.
  • Case 2 When the total product of the probabilities that each still image object is the target category is used as the integrated score (the third column in FIG. 6), the case 2 is clearly recognized as the score decreases sharply compared to the case 1 It becomes a value lower than Case 4 that cannot be determined to be the target category. However, in the case of the first embodiment (fifth column in FIG. 6), Case 2 can obtain a higher integrated score than Case 4. This means that, according to the first embodiment, “whether the subject is in the recognition target category can be correctly recognized for a moving image with few scenes where it is difficult to recognize the subject”.
  • the start time t 1 may be the time when the acquisition of the time-series data of the object can be started, or may be a time that is a certain time before the latest time t M. Good. Further, any number of still images (number of frames) from the start time t 1 to the latest time t M can be recognized. In general, the greater the number of still images, the higher the probability that a scene that easily recognizes an object is included, and the recognition rate tends to improve. However, the greater the number of still images, the higher the probability that a feature quantity that is likely to be a recognition target category appears.
  • the determination unit 15 determines whether or not the object shown in the moving image is “human” by performing threshold processing on the integrated score.
  • This threshold may be set based on the result of the experiment. For example, in a system operating environment where an integrated score as shown in FIG. 6 can be obtained, the threshold value may be set to 0.5 so that cases 5 and 6 in FIG. 6 can be correctly rejected.
  • the second embodiment can be applied when there are three or more recognition target categories such as “human” and “non-human”.
  • the identification can be correctly performed even when the number of categories is three or more.
  • a personal computer is used as the data processing device 1 and a semiconductor memory is used as the storage device 2.
  • the identification parameter storage unit 21 and the probability calculation parameter storage unit 22 can be regarded as part of the semiconductor memory.
  • Still picture recognition means 12, still picture probability calculation means 13, integrated score calculation means 14, and result determination means 15 are realized as functions of a CPU of a personal computer.
  • step S210 in FIG. 9 it is determined whether or not the object shown in the moving image is “human”. If the object reflected in the video is determined to be “human”, the final recognition result is terminated as “human”, and if not, whether the object reflected in the video is “automobile” is determined. judge. If it is determined as “automobile”, the final recognition result is ended as “automobile”, otherwise the recognition result is ended as “something that is neither a human nor a car”.
  • the function contents of the still image recognition means 12, the still image probability calculation means 13, the integrated score calculation means 14, and the determination means 15 in the first and second embodiments are programmed and executed by a computer. It may be configured.
  • the present invention can be applied to object monitoring applications such as accurately recognizing a category of an object such as a person or a car from a moving image taken by a camera.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

When receiving a partial region that looks like an object in a moving picture and recognizing a category of the object, an object recognition system can recognize the category of the object in the moving picture if even a part of a frame image with which it is not hard to recognize the object is included in the moving picture. A still image recognition means (12) performs an image recognition separately on a plurality of still images constituting a moving picture and obtains still image recognition scores. A still image probability calculation means (13) calculates, from the still image recognition scores, the probabilities that objects in the still images belong to a recognition target category. An integrated score calculation means (14) calculates, from the probabilities calculated by the still image probability calculation means (13), the integrated score corresponding to the probability that the objects in at least n (n is a natural number less than or equal to the total number of still images) images of the still images are in the recognition target category. A determination means (15) determines, based on the integrated score, whether the object in the moving picture belongs to the recognition target category or not.

Description

物体認識システム,物体認識方法および物体認識用プログラムObject recognition system, object recognition method, and object recognition program
 本発明は、映像から物体を認識する物体認識システム,物体認識方法および物体認識用プログラムに関する。 The present invention relates to an object recognition system, an object recognition method, and an object recognition program for recognizing an object from an image.
 車載カメラの映像から周辺車両の存在を認識する装置や、監視カメラに撮影された人間の映像を解析して映っている人が特定の人物であるか否かを判定する装置など、動画中の物体のカテゴリを認識する技術が様々な分野で利用されている。 A device that recognizes the presence of a nearby vehicle from the image of the in-vehicle camera, a device that analyzes whether the human image captured by the surveillance camera is a specific person, etc. A technique for recognizing a category of an object is used in various fields.
 ここで、カテゴリとは、パターン認識分野の用語で、パターンの分類を指し、クラスという場合もある。一般的な用語で言い換えると、「種類」や「部類」が当てはまる。例えば、画像を「自動車」か「自動車でない」のどちらかに識別する場合、カテゴリは「自動車」及び「自動車でない」の2つということになる。また、「子供」,「大人」,「老人」,「人間でない」のいずれかであるかを識別する場合は、カテゴリは4つになる。認識させたいカテゴリに応じたテンプレート(特徴ベクトル)を予め記憶させておけば画像から認識対象のカテゴリを識別することができる。パターンとは、画像、音声、文字を始めとしたあらゆるデータからなる。 Here, a category is a term in the pattern recognition field, refers to the classification of patterns, and is sometimes called a class. In other words, in general terms, “type” and “class” apply. For example, when an image is identified as either “automobile” or “not a car”, there are two categories, “automobile” and “not a car”. In addition, when identifying any of “child”, “adult”, “old man”, and “non-human”, there are four categories. If a template (feature vector) corresponding to the category to be recognized is stored in advance, the category to be recognized can be identified from the image. A pattern consists of all kinds of data including images, sounds and characters.
 動画像中から所定の物体らしい部分領域を抽出して、この部分領域の画像から物体のカテゴリを認識する技術としては、主に、時系列での画像変動情報に基づいて認識する手法と、動画を複数枚の画像とみなしてその複数の画像情報から物体のカテゴリを直接認識する手法と、動画を構成する静止画1枚ずつで画像認識を行いそれらの認識結果を統合して最終的な判定を行う手法とに分類することができる。 As a technique for extracting a partial area that seems to be a predetermined object from a moving image and recognizing the category of the object from the image of this partial area, mainly a technique of recognizing based on image variation information in time series, and a moving image A method for directly recognizing a category of an object from multiple pieces of image information by regarding the image as a plurality of images, and recognizing each of the still images that make up a moving image and integrating the recognition results for final determination It can be classified into the method of performing.
 時系列での画像変動情報に基づいて物体のカテゴリを認識する手法については、その一例が、非特許文献1に開示されている。非特許文献1に開示された技術は、動画中から抽出された部分領域から物体のカテゴリを認識するのに、オプティカルフローの方向の分布を用いている。例えば、認識対象が自動車のような剛体の場合はオプティカルフローの方向が全体的に均一になるが、認識対象が歩行者のような非剛体の場合はオプティカルフローが均一でないので、この違いを基に両者を識別する。 An example of a method for recognizing a category of an object based on image variation information in time series is disclosed in Non-Patent Document 1. The technique disclosed in Non-Patent Document 1 uses an optical flow direction distribution to recognize an object category from a partial region extracted from a moving image. For example, if the recognition target is a rigid body such as an automobile, the direction of the optical flow is uniform overall, but if the recognition target is a non-rigid body such as a pedestrian, the optical flow is not uniform. Identify both.
 しかし、この方式では、入力映像の単位時間当たりの静止画枚数が少ない(フレームレートが低い)場合に、オプティカルフローを求めることが困難になるため、正しく認識できなかった。また、認識対象として抽出される部分領域が正確に物体を特定していない場合にも、正確な認識ができなかった。例えば、物体より大きい領域を認識対象の部分領域として抽出してしまうと、想定外なオプティカルフローの分布が得られるため、正しく認識できるとは限らなかった。 However, with this method, when the number of still images per unit time of the input video is small (the frame rate is low), it becomes difficult to obtain the optical flow, and thus it cannot be correctly recognized. Further, even when the partial area extracted as the recognition target does not accurately specify the object, accurate recognition cannot be performed. For example, if a region larger than the object is extracted as a partial region to be recognized, an unexpected optical flow distribution is obtained, and thus it cannot always be recognized correctly.
 一方、動画を複数枚の静止画とみなしてその複数の画像情報から物体のカテゴリを直接認識する手法については、その一例が、非特許文献2に開示されている。この非特許文献2では、特許文献1に記載されている「制約相互部分空間法」を用いることで、複数のデータから直接認識結果を求める認識アルゴリズムを利用して動画像の顔認識を行っている。この手法は、フレームレートが低い場合にも正確に認識でき、さらには、物体の変動を含めて学習できるため、高い認識率を期待できる。 On the other hand, Non-Patent Document 2 discloses an example of a method for directly recognizing a category of an object from a plurality of pieces of image information by regarding a moving image as a plurality of still images. In this Non-Patent Document 2, by using the “constrained mutual subspace method” described in Patent Document 1, moving image face recognition is performed using a recognition algorithm that directly obtains a recognition result from a plurality of data. Yes. This method can be recognized accurately even when the frame rate is low, and further, since it can learn including fluctuations of the object, a high recognition rate can be expected.
 しかし、この手法は、認識対象とされる複数の部分領域画像の中に物体を正確に特定できていない画像が多く含まれている場合、特徴量の変化を正しく検出できないので、正確な認識処理ができなかった。 However, with this method, if there are many images that cannot accurately identify an object among multiple partial area images that are to be recognized, changes in feature values cannot be detected correctly. I could not.
 動画を構成する静止画1枚ずつで画像認識を行い物体のカテゴリを認識する手法については、その一例が、特許文献2,特許文献3,非特許文献1,非特許文献2,非特許文献3に開示されている。静止画像に基づいた認識手法を用いて動画中の物体のカテゴリを認識する場合、時系列に沿った複数の静止画像についての個別の認識結果を統合して総合的に認識する処理が必要になる。 An example of a method for recognizing an object category by recognizing each still image constituting a moving image is Patent Document 2, Patent Document 3, Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3 Is disclosed. When recognizing a category of an object in a moving image using a recognition method based on still images, it is necessary to perform a comprehensive recognition process by integrating individual recognition results for a plurality of still images in time series. .
 複数の認識結果を統合する手法としては、複数の認識結果について多数決を行う手法が容易である。しかし、この多数決を行う手法は、物体の認識が困難な静止画像が動画中に多く含まれている場合、物体のカテゴリが識別される静止画の数が少ないので、最終的に物体のカテゴリを認識できないという問題があった。ここで、物体の認識が困難な静止画像とは、例えば、「認識対象の物体を正確に特定できていない部分領域画像」,「認識対象の物体の手前に別の物が映っており一部が映っていない画像」,「想定外の照明変動の下での画像」,「認識対象物体の姿勢が大きく変動したときの画像」,「レンズ歪みやハレーション,滲みといった撮像デバイスや映像伝送系に起因する画質の変動・ノイズを受けた画像」が挙げられる。 As a method for integrating a plurality of recognition results, a method of making a majority decision on a plurality of recognition results is easy. However, the majority vote method uses a large number of still images in which it is difficult to recognize objects. There was a problem that it could not be recognized. Here, the still image in which the object is difficult to recognize is, for example, “a partial region image in which the object to be recognized cannot be accurately identified” or “a different object is reflected in front of the object to be recognized. ”Image without reflection”, “Image under unexpected illumination fluctuations”, “Image when the posture of the object to be recognized changes significantly”, “Lens distortion, halation, blur, etc. “Images due to image quality fluctuations and noise”.
 また、認識結果を統合する他の手法としては、全ての静止画像が認識対象カテゴリである確率を求めて、これを統合スコアとして閾値処理する手法も容易である。しかし、この手法も同様に、物体の認識が困難な静止画像が動画中に多く含まれる場合に確率値が小さくなり、物体のカテゴリの認識ができない。 Also, as another method for integrating the recognition results, it is easy to obtain a probability that all still images are in the recognition target category and perform threshold processing as an integrated score. However, in this method as well, when there are many still images in which it is difficult to recognize an object in the moving image, the probability value becomes small and the category of the object cannot be recognized.
 さらに、認識結果を統合する他の手法として、各静止画像の認識スコアのうちの最大値を選択してこれを統合スコアとして利用する手法も容易である。この手法は、「認識対象として入力される複数の部分領域の中に物体を正確に特定できていない画像が多く含まれている」場合であっても概ね正確に物体のカテゴリを認識できるが、特定の静止画像の認識結果のみを利用し、その他の静止画の情報を捨ててしまうので、優れた性能を得られない。また、個別の静止画認識率が100%になることはあり得ないので、認識対象の物体に類似する画像が偶然に入力されても識別できず誤って認識対象であるという結果を出力してしまうといった偶発的な誤認識を回避することができない。 Furthermore, as another method of integrating the recognition results, a method of selecting the maximum value among the recognition scores of each still image and using this as an integrated score is also easy. This method can recognize the category of an object almost accurately even when "a number of images in which an object cannot be accurately specified is included in a plurality of partial areas input as recognition targets" Since only the recognition result of a specific still image is used and information on other still images is discarded, excellent performance cannot be obtained. In addition, since the individual still image recognition rate cannot be 100%, even if an image similar to the object to be recognized is accidentally input, the result that it cannot be identified and is erroneously output is output. It is impossible to avoid accidental misrecognition.
 このように、上述したそれぞれの汎用技術においては、物体を認識しづらい状況が存在するという問題があった。特に、共通した認識しづらい状況として、「認識対象として入力される複数の部分領域の画像の中に物体を正確に特定できていない画像が多く含まれている場合」がある。ただ、各静止画像についての認識スコアのうちの最大値を統合スコアとする手法ならば、この課題を概ね解決できるが、この手法は、大半の静止画についての情報を捨てているために優れた認識率を得られないという問題や、さらには偶発的な誤認識の影響を抑えられないといった問題があった。 As described above, each of the general-purpose technologies described above has a problem that there is a situation where it is difficult to recognize an object. In particular, a common situation where it is difficult to recognize is “a case where a plurality of images in which an object cannot be accurately specified is included in images of a plurality of partial areas input as recognition targets”. However, this technique can be generally solved if the maximum score among the recognition scores for each still image is used as the integrated score, but this method is superior because most of the information about still images is discarded. There was a problem that the recognition rate could not be obtained, and a problem that the influence of accidental misrecognition could not be suppressed.
 ここで、認識対象として入力される「物体に相当する部分領域」が物体を正確に特定できていないという状況を解決するために、あらゆる可能性を考慮して、入力された部分領域だけでなくそれに比較的近い位置・大きさの部分領域に対して網羅的に認識処理を実行する手法を容易に類推できるが、この手法では演算量が爆発的に増加するため、映像を長時間かけて解析処理しても問題にならないような一部の用途以外では、現実的ではない。上述した物体らしい部分領域の具体例を図5に例示する。図5は、人物を撮影した動画における各フレーム画像から抽出された認識対象人物の部分領域の一例であり、部分領域の位置ずれ,部分領域のサイズ誤り,認識対象人物の姿勢の変動などが原因で認識しづらい部分領域画像を矢印で指し示している。 Here, in order to solve the situation where the “partial region corresponding to the object” input as the recognition target cannot accurately identify the object, not only the input partial region Although it is possible to easily analogize a method that comprehensively executes recognition processing on a partial region with a relatively close position and size, this method increases the amount of computation explosively, and therefore analyzes the video over a long period of time. It is not practical except for some uses where processing does not cause a problem. A specific example of the partial area that is likely to be an object is illustrated in FIG. FIG. 5 shows an example of a partial area of a recognition target person extracted from each frame image in a moving image of a person, which is caused by a positional shift of the partial area, a size error of the partial area, a change in posture of the recognition target person, and the like. The partial area image that is difficult to recognize is indicated by an arrow.
特開2000-30065号公報Japanese Unexamined Patent Publication No. 2000-30065 特開2005-285011号公報JP 2005-285011 A 特開2005-79999号公報Japanese Patent Laid-Open No. 2005-79999
 上述した技術の問題点は、物体を認識しづらいシーンが多く含まれている動画からはその物体のカテゴリを認識できないことである。その理由は、画像から特徴量の全てまたは大半を正しく取得できてないとその画像中の物体のカテゴリを認識できないからである。物体を認識しづらいシーンの例としては、「認識対象として入力された画像の部分領域が正しく物体の位置とサイズを特定できていないシーン」が挙げられる。 The problem with the technology described above is that the category of an object cannot be recognized from a moving image that includes many scenes in which it is difficult to recognize the object. This is because the category of the object in the image cannot be recognized unless all or most of the feature values are correctly acquired from the image. An example of a scene in which it is difficult to recognize an object is “a scene in which a partial region of an image input as a recognition target cannot correctly specify the position and size of the object”.
 また、この問題点を概ね解決する技術として、動画を構成する静止画1枚ずつを個別に認識し、最高スコアを採用して動画中の物体のカテゴリを認識する技術があるが、その技術では、最高スコアを得た特定の静止画以外の静止画に関する情報を全て捨てているため、誤認識する可能性が高い。また、「偶然に認識対象カテゴリの物体らしい形状の別のものが撮像された静止画像1枚が入力されて、誤って認識対象カテゴリであるという判定結果を出力してしまう」場合が発生する可能性は0ではない。このような偶発的な誤認識が発生する頻度は、実際に認識をする場面に応じて異なるため、頻度に応じて、最終的な認識結果を調整できることが望ましい。 In addition, as a technology for solving this problem, there is a technology for recognizing each still image constituting a moving picture individually and adopting the highest score to recognize the category of the object in the moving picture. Since all the information related to still images other than the specific still image that has obtained the highest score is discarded, there is a high possibility of erroneous recognition. In addition, there may be a case where “a single still image in which another object having a shape that is likely to be an object of the recognition target category is captured is input and the determination result that it is the recognition target category is erroneously output”. Sex is not zero. Since the frequency at which such accidental misrecognition occurs depends on the actual recognition scene, it is desirable that the final recognition result can be adjusted according to the frequency.
 そこで、本発明は、物体を認識しづらいシーンが多く含まれている動画像からでもその物体のカテゴリを精度よく認識する物体認識システム,物体認識方法及び物体認識用プログラムを提供することを、その目的とする。 Therefore, the present invention provides an object recognition system, an object recognition method, and an object recognition program for accurately recognizing a category of an object even from a moving image including many scenes in which it is difficult to recognize the object. Objective.
 上記目的を達成するため、本発明の物体認識システムは、動画像からその被写体である物体のカテゴリを認識する物体認識システムであり、動画像を構成する複数の静止画像に対し画像中の物体が認識対象カテゴリであるか否か個別に認識して静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識手段と、この静止画認識スコアに対応して、この静止画認識スコアの場合に静止画像中の物体が実際に認識対象カテゴリである確率を算出する静止画確率算出手段と、この静止画確率算出手段によって静止画像毎に算出された複数の確率値を基に複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が実際に認識対象カテゴリである確率に対応した統合スコアを予め特定された演算関数に従って算出する統合スコア算出手段と、この統合スコアに基づいて動画像中の物体が認識対象カテゴリであるか否かを判定する判定手段とを備えたことを特徴とする物体認識システム。 In order to achieve the above object, an object recognition system of the present invention is an object recognition system that recognizes a category of an object that is a subject from a moving image, and an object in the image is included in a plurality of still images constituting the moving image. Still image recognition means for individually recognizing whether the category is a recognition target and outputting a recognition score calculated for each still image as a still image recognition score, and corresponding to the still image recognition score, the still image recognition score In this case, a still image probability calculating means for calculating a probability that an object in a still image is actually a recognition target category, and a plurality of probability values calculated for each still image by the still image probability calculating means. An operation in which an integrated score corresponding to a probability that an object in at least n images (n is a natural number equal to or less than the number of all still images) of images is actually a recognition target category is specified in advance. Object recognition system, wherein the total score calculation means for calculating according to the equation, that is an object in a moving image on the basis of the total score and a determining means for determining whether a recognition object category.
 また、本発明の物体認識方法は、動画像を構成する各静止画像に対し画像中の物体が認識対象カテゴリであるか否かの画像認識処理を個別に行い静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識ステップと、この静止画認識スコアに基づいて対応する静止画像中の物体が認識対象カテゴリである確率を算出する静止画確率算出ステップと、この静止画確率算出ステップで静止画像毎に算出された複数の確率値を基に複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が実際に認識対象カテゴリである確率に対応した統合スコアを予め特定された演算関数に従って算出する統合スコア算出ステップと、この統合スコアに基づいて動画像中の物体が認識対象カテゴリであるか否かを判定する判定ステップとを設けたことを特徴とする。 In addition, the object recognition method of the present invention separately performs image recognition processing on whether or not an object in the image is a recognition target category for each still image constituting a moving image, and calculates a recognition score calculated for each still image. A still image recognition step for outputting as a still image recognition score, a still image probability calculation step for calculating a probability that an object in the corresponding still image is a recognition target category based on the still image recognition score, and the still image probability calculation Probability that an object in at least n images (n is a natural number equal to or less than the total number of still images) among the plurality of still images based on the plurality of probability values calculated for each still image in the step is actually a recognition target category. An integrated score calculation step for calculating an integrated score corresponding to the above in accordance with an arithmetic function specified in advance, and whether an object in the moving image is a recognition target category based on the integrated score Or it is characterized by providing and determining steps.
 また、本発明の物体認識用プログラムは、動画像を構成する各静止画像に対し画像中の物体が認識対象カテゴリであるか否かの画像認識処理を個別に行い静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識機能と、この静止画認識スコアに基づいて対応する静止画像中の物体が認識対象カテゴリである確率を算出する静止画確率算出機能と、この静止画確率算出機能で静止画像毎に算出された複数の確率値を基に複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が実際に認識対象カテゴリである確率に対応した統合スコアを予め特定された演算関数に従って算出する統合スコア算出機能と、この統合スコアに基づいて動画像中の物体が認識対象カテゴリであるか否かを判定する判定機能とをコンピュータに実行させることを特徴とする。 Further, the object recognition program of the present invention performs a recognition score calculated for each still image by individually performing image recognition processing on whether or not an object in the image is a recognition target category for each still image constituting a moving image. As a still image recognition score, a still image probability calculation function for calculating a probability that the corresponding object in the still image is a recognition target category based on the still image recognition score, and the still image probability An object in at least n images (n is a natural number equal to or less than the number of all still images) among a plurality of still images based on a plurality of probability values calculated for each still image by the calculation function is actually a recognition target category. An integrated score calculation function that calculates an integrated score corresponding to the probability according to a predetermined arithmetic function, and whether or not an object in the moving image is a recognition target category based on the integrated score. Characterized in that to execute a determination function to computers.
 本発明は以上のように構成されるため、これにより、複数の静止画像のうちの少なくとも所定枚数の画像中の物体が認識対象カテゴリである確率を算出し、この確率に基づいて統合スコアを算出するので、物体を認識しづらい静止画が動画像中に多く含まれていても物体を認識しづらくない静止画が所定枚数分含まれていれば有効な統合スコアを得ることができ、物体を認識しづらい静止画が動画像中に多く含まれていてもその物体のカテゴリを精度よく認識することができる。 Since the present invention is configured as described above, it calculates the probability that at least a predetermined number of objects among a plurality of still images are recognition target categories, and calculates an integrated score based on this probability. Therefore, if a predetermined number of still images that are difficult to recognize objects are included even if many still images that are difficult to recognize objects are included in the moving image, an effective integrated score can be obtained. Even if many still images that are difficult to recognize are included in the moving image, the category of the object can be accurately recognized.
 以下、本発明における一実施形態を、図面を参照して説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
 [第1実施形態]
 図1は、本発明における第1実施形態の物体認識システムの構成を示すブロック図である。本第1実施形態の物体認識システムは、図1に示すように、入力情報を解析するデータ処理装置1と、情報を記憶する記憶装置2とから構成されている。
[First Embodiment]
FIG. 1 is a block diagram showing the configuration of the object recognition system according to the first embodiment of the present invention. As shown in FIG. 1, the object recognition system according to the first embodiment includes a data processing device 1 that analyzes input information and a storage device 2 that stores information.
 データ処理装置1は、動画像を構成する複数の静止画像に対して画像中の物体が認識対象カテゴリであるか否かを個別に認識し静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識手段12と、この算出された静止画認識スコアで静止画像中の物体が認識対象カテゴリである確率を静止画認識確率として算出する静止画確率算出手段13と、この算出された複数の静止画認識確率の値を基に複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が認識対象カテゴリである確率に基づく統合スコアを演算関数に従って算出する統合スコア算出手段14と、この統合スコアに基づいて動画像中の物体が認識対象カテゴリであるか否かを判定する判定手段15とを備えている。 The data processing apparatus 1 individually recognizes whether or not an object in an image is a recognition target category for a plurality of still images constituting a moving image, and uses a recognition score calculated for each still image as a still image recognition score. Still image recognition means 12 for outputting, still image probability calculation means 13 for calculating the probability that an object in the still image is a recognition target category with the calculated still image recognition score, and the calculated Based on a plurality of still image recognition probabilities, an integrated function based on a probability that an object in at least n images (n is a natural number equal to or less than the number of all still images) among the plurality of still images is a recognition target category. Integrated score calculation means 14 calculated according to the above, and determination means 15 for determining whether or not an object in the moving image is a recognition target category based on the integrated score.
 ここで、上述したカテゴリとは、パターン認識技術の分野でパターンの分類を指す用語であり、クラスという場合もある。一般的な用語で言い換えると、「種類」や「部類」であり、例えば、画像を「自動車」か「自動車でない」のどちらかに識別する場合、カテゴリは「自動車」及び「自動車でない」の2つということになる。また、「子供」,「大人」,「老人」,「人間でない」のいずれかであるかを識別する場合は、カテゴリは4つになる。認識させたいカテゴリに応じたテンプレート(特徴ベクトル)を予め記憶させておけば画像から認識対象物のカテゴリを識別することができる。パターンとは、画像、音声、文字を始めとしたあらゆるデータである。 Here, the above-mentioned category is a term indicating the classification of patterns in the field of pattern recognition technology, and is sometimes referred to as a class. In other words, in terms of “kind” or “class”, for example, when an image is identified as “car” or “not a car”, the category is “car” or “not a car”. It will be one. In addition, when identifying any of “child”, “adult”, “old man”, and “non-human”, there are four categories. If a template (feature vector) corresponding to the category to be recognized is stored in advance, the category of the recognition object can be identified from the image. A pattern is any data including images, sounds and characters.
 記憶装置2は、静止画認識手段12が静止画像に対して画像認識処理を行い認識対象カテゴリらしさを表す静止画認識スコアを求めるためのパラメータを保持する識別パラメータ記憶部21と、静止画確率算出手段13が静止画認識スコアからそれに対応する静止画像が認識対象カテゴリである確率を算出するためのパラメータを保持する確率算出用パラメータ記憶部22とを備えている。 The storage device 2 includes an identification parameter storage unit 21 that holds parameters for the still image recognition unit 12 to perform image recognition processing on a still image and obtain a still image recognition score representing the recognition target category, and a still image probability calculation. The means 13 includes a probability calculation parameter storage unit 22 that holds parameters for calculating a probability that a corresponding still image is a recognition target category from the still image recognition score.
 データ処理装置1における静止画認識手段12は、動画像を構成する各静止画像から抽出された特徴量を入力し、識別パラメータ記憶部21で保持されているパラメータに従って、入力された特徴量が認識対象カテゴリであるか否か認識する。認識結果として、認識対象カテゴリとの類似度を示す静止画認識スコアを静止画ごとに算出する。 The still image recognition means 12 in the data processing apparatus 1 inputs the feature amount extracted from each still image constituting the moving image, and recognizes the input feature amount according to the parameters held in the identification parameter storage unit 21. Recognize whether the target category. As a recognition result, a still image recognition score indicating a similarity to the recognition target category is calculated for each still image.
 ここで、特徴量とは、画像における認識対象物体の特徴を表す情報であり、物体を識別するために利用する入力画像データそのもの、または入力画像データを加工したデータである。特徴量の代表的な例としては、オプティカルフロー、画像の輝度パターン、画像の周波数成分などの情報がある。静止画認識手段12が入力する特徴量は、各静止画像情報から求められるデータであり、具体的な例としては、画像の輝度データ,画像の輝度ヒストグラムデータ,画像から抽出した勾配情報データ,画像から抽出した周波数データ,複数の画像について差分情報を抽出したデータ,またはこれらの組み合わせであってもよい。 Here, the feature amount is information representing the feature of the recognition target object in the image, and is input image data itself used for identifying the object or data obtained by processing the input image data. Typical examples of the feature amount include information such as an optical flow, an image luminance pattern, and an image frequency component. The feature quantity input by the still image recognition means 12 is data obtained from each piece of still image information. Specific examples include image brightness data, image brightness histogram data, gradient information data extracted from the image, image May be frequency data extracted from, data obtained by extracting difference information for a plurality of images, or a combination thereof.
 この静止画認識手段12に実行される画像認識の手法としては、統計的なパターン認識法を用いればよく、例えば、ニューラルネットワークの一種であるパーセプトロン,サポートベクトルマシン,最尤推定,ベイズ推定,学習ベクトル量子化,部分空間法などを利用すればよい。識別パラメータ記憶部21が予め保持するパラメータは、静止画の画像認識に必要なパラメータであって、例えば、学習ベクトル量子化を用いて認識(識別)するように構成した場合は、学習された参照ベクトルである。 As a method of image recognition executed by the still image recognition means 12, a statistical pattern recognition method may be used. For example, a perceptron which is a kind of neural network, a support vector machine, maximum likelihood estimation, Bayesian estimation, learning Vector quantization, subspace method, etc. may be used. Parameters stored in advance in the identification parameter storage unit 21 are parameters necessary for image recognition of a still image. For example, when configured to recognize (identify) using learning vector quantization, a learned reference is stored. Is a vector.
 また、静止画認識手段12は、動画像を構成するうちの一部の静止画像に対して認識対象カテゴリであるか否かの画像認識を実行するように構成してもよく、この場合、判定手段15がその一部の静止画のみから算出された統合スコアに対して閾値処理を行って、閾値よりも高い統合スコアが得られていれば、静止画認識手段12は残りの静止画についての処理を省略する。このように処理を省略することで高速な画像認識処理が実現できる。 Further, the still image recognition means 12 may be configured to execute image recognition as to whether or not it is a recognition target category for some of the still images included in the moving image. If the means 15 performs threshold processing on the integrated score calculated from only some of the still images and an integrated score higher than the threshold is obtained, the still image recognition means 12 determines the remaining still images. The process is omitted. By omitting the processing in this way, high-speed image recognition processing can be realized.
 静止画確率算出手段13は、静止画認識手段12で計算された静止画認識スコアから、確率算出用パラメータ記憶部22に記憶されたパラメータに従って、「算出された静止画認識スコアの場合に静止画中の物体が実際に認識対象カテゴリである確率」を静止画認識確率として計算する。確率算出用パラメータ記憶部22が保持するパラメータは、例えば、静止画認識確率を静止画認識スコアの関数として予めモデル化したデータや、静止画認識スコアと静止画認識確率とを対応付けた変換表であり、いずれも実験による統計結果を基に予め作成された情報である。 The still image probability calculation unit 13 determines whether the still image recognition score calculated by the still image recognition unit 12 is “in the case of a calculated still image recognition score, a still image according to a parameter stored in the probability calculation parameter storage unit 22. "Probability that the object inside is actually a recognition target category" is calculated as a still image recognition probability. The parameters stored in the probability calculation parameter storage unit 22 are, for example, data obtained by previously modeling a still image recognition probability as a function of a still image recognition score, or a conversion table in which a still image recognition score and a still image recognition probability are associated with each other. Both are information created in advance based on statistical results from experiments.
 統合スコア算出手段14は、複数の静止画像についてそれぞれ求められた静止画認識確率の数値群を基に、「少なくともn枚(nは全静止画像数以下の自然数)の静止画像が認識対象カテゴリである確率」に対応するスコアである統合スコアを算出する。 The integrated score calculation means 14 determines that “at least n still images (n is a natural number equal to or less than the total number of still images) still images in the recognition target category based on the numerical group of still image recognition probabilities obtained for each of a plurality of still images. An integrated score which is a score corresponding to “a certain probability” is calculated.
 統合スコア算出手段14は、例えば、静止画認識手段12によって画像認識が行われた静止画(フレーム画像)の枚数をM,時刻t{t=t,t,t・・・t}の静止画像中の物体が認識対象カテゴリωである確率をP(ω)とすると、[数1]式に従って、「認識処理を行った複数の静止画のうち少なくとも1枚が認識対象カテゴリである確率」を算出して、それを統合スコアS(ω)として出力するように構成してもよい。 Total score calculating means 14, for example, still picture image recognition has been performed by the still-image recognition means 12 the number of (frame images) M, the time t {t = t 1, t 2, t 3 ··· t M }, If the probability that the object in the still image is the recognition target category ω c is P tc ), “at least one of the plurality of still images subjected to recognition processing is recognized according to the equation [1]. It is also possible to calculate the “probability that is the target category” and output it as the integrated score S Mc ).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 [数1]式は、時刻tの静止画像が認識対象カテゴリωでない確率{1-P(ω)}の静止画像の枚数分の総積である「認識処理を行った全ての静止画中の物体が認識対象カテゴリではない確率」の余事象の確率「認識処理を行った全ての静止画像のうち少なくとも1枚の画像中の物体が実際に認識対象カテゴリである確率」を示す式である。 [Expression 1] is expressed by “the total product of the number of still images having a probability {1-P tc )}” at which the still image at time t is not the recognition target category ω c. Expression indicating the probability of an after event “probability that an object in a picture is not a recognition target category” “a probability that an object in at least one image among all still images subjected to recognition processing is actually a recognition target category” It is.
 この[数1]式で求まる確率を閾値処理するという動作は、単純に静止画1枚それぞれが認識対象カテゴリである確率を閾値処理する動作とは異なる。[数1]式は、それぞれの静止画内の物体が認識対象カテゴリである確率を全て計算対象とする。つまり、[数1]式で求まる確率は、静止画1枚単位では認識対象カテゴリである確率が低い画像の影響を無視せず、統合スコアに反映させている。 The operation of thresholding the probability obtained by this [Expression 1] is different from the operation of simply thresholding the probability that each still image is a recognition target category. In the formula (1), all the probabilities that an object in each still image is in the recognition target category are calculated. In other words, the probability obtained by the formula [1] is reflected in the integrated score without ignoring the influence of an image having a low probability of being a recognition target category for each still image.
 ここで、統合スコア算出手段14は、[数1]式に従って算出した値を統合スコアS(ω)として出力してもよいが、[数1]式の右辺第1項は定数なのでこれを削除した[数2]式に従って算出した値を統合スコアS(ω)として出力するように構成してもよい。 Here, the integrated score calculating means 14 may output the value calculated according to the [Equation 1] as the integrated score S Mc ). However, since the first term on the right side of the [Equation 1] is a constant, this A value calculated in accordance with the [Expression 2] equation, in which is deleted, may be output as the integrated score S Mc ).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 また、対数をとった[数3]式に従って算出した値を統合スコアS(ω)として出力するように構成してもよい。この[数3]式は総積演算ではないので、[数1]及び[数2]式と比べて計算量を抑えてシステムの負担を軽減することができる。 Further, a value calculated according to the equation [Equation 3] taking a logarithm may be output as the integrated score S Mc ). Since this [Equation 3] is not a total product operation, the amount of calculation can be reduced and the burden on the system can be reduced as compared with the [Equation 1] and [Equation 2].
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 さらに、[数4]式のように対数和を静止画枚数Mで除算すれば、Mの値に応じて閾値を調整する必要がない。 Furthermore, if the logarithmic sum is divided by the number of still images M as in [Formula 4], it is not necessary to adjust the threshold according to the value of M.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 また、[数5]式に従って、[数2]式で求めた値を見やすいように(1/M)乗して1から引いた値を統合スコアS(ω)として出力するように構成してもよい。 Further, according to the formula [5], the value obtained by the formula [2] is raised to (1 / M) so that it can be easily seen, and a value subtracted from 1 is output as the integrated score S Mc ). May be.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 別の例として、統合スコア算出手段14は、[数6]式に従って算出される値を統合スコアS(ω)として出力するように構成してもよい。[数6]式は、「認識処理を行った複数の静止画のうち少なくとも2枚が認識対象カテゴリである確率」を示す式である。 As another example, the integrated score calculation means 14 may be configured to output a value calculated according to the formula [6] as an integrated score S Mc ). [Expression 6] is an expression indicating “probability that at least two of the plurality of still images subjected to recognition processing are in the recognition target category”.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 統合スコア算出手段14は、統合スコアに係る「少なくともn枚(nは全静止画像数以下の自然数)の静止画像が認識対象カテゴリである確率」のnを「1」,「2」等に予め設定する。具体的には、実験結果からシステムの用途に合致した良好な認識性能が得られるようにnの値を決定しておけばよい。 The integrated score calculation means 14 preliminarily sets n of “probability that at least n still images (n is a natural number equal to or less than the total number of still images) still image recognition category” related to the integrated score to “1”, “2”, and the like. Set. Specifically, the value of n may be determined so that good recognition performance that matches the application of the system can be obtained from the experimental results.
 本第1実施形態対象の物体認識システムは、対象カテゴリであると認識しやすい静止画がn枚以上の動画を対象カテゴリであると認識するので、nの値が小さい場合、対象カテゴリであると認識しやすい静止画が少ない動画でも対象カテゴリであると認識するが、nの値が大きい場合、対象カテゴリであると認識しやすい静止画が少ない動画を認識対象カテゴリであると認識しない。 The object recognition system of the first embodiment target recognizes n or more moving images that are easily recognized as the target category as the target category. Therefore, when the value of n is small, the target category is the target category. A moving image with few still images that are easy to recognize is recognized as the target category, but if the value of n is large, a moving image with few still images that are easy to recognize as the target category is not recognized as the recognition target category.
 たとえば、人間/非人間という2つのカテゴリを識別する場合、nを「1」に設定すると、人間の静止画が1枚だけ含まれていれば、その動画を正確に「人間」と認識するが、nを「2」に設定すると、人間の静止画が1枚だけしか含まれていなかったら、たとえその動画が「人間」であっても「非人間」と認識してしまう可能性が高い。その反面、nを「1」とする場合、人間に酷似した静止画が偶然に一枚混入しただけで、たとえその動画が「非人間」であっても「人間」と認識してしまう。しかし、nを「2」とすれば、人間に酷似した静止画が偶然に一枚混入しても統合スコアの値に影響しないので正確に「非人間」と認識する可能性が高い。 For example, when two categories of human / non-human are identified and n is set to “1”, if only one human still image is included, the moving image is accurately recognized as “human”. When n is set to “2”, if only one human still image is included, there is a high possibility that even if the moving image is “human”, it is recognized as “non-human”. On the other hand, when n is set to “1”, a single still image that closely resembles a human is accidentally mixed, and even if the moving image is “non-human”, it is recognized as “human”. However, if n is set to “2”, even if one still image very similar to a human being is accidentally mixed, the integrated score value is not affected.
 このように本実施形態の物体認識システムは、上述したnの値を変更すると認識結果の精度が変わるので、システムの用途に応じてnの値を変えることで、高い性能を得られるように調整が可能となる。 As described above, since the accuracy of the recognition result changes when the value of n described above is changed, the object recognition system of the present embodiment is adjusted so that high performance can be obtained by changing the value of n according to the use of the system. Is possible.
 また、通常、偶発的な誤認識が時系列上連続する静止画(フレーム)について起こる可能性は高くないので、統合スコア算出手段14は、上述したnに「時系列上で隣接する」という条件を付与し、n枚を「時系列上隣接するm枚(mは2以上で全静止画像数以下の自然数)」に設定して、「複数の静止画像のうち少なくとも時系列上隣接するm枚の画像中の物体が実際に認識対象カテゴリである確率」に対応する総合スコアを算出する機能を備えている。 In addition, since the possibility that accidental misrecognition usually occurs in a still image (frame) that is continuous in time series is not high, the integrated score calculation means 14 determines that the above condition is “adjacent in time series” to n. And set n to “m adjacent in time series (m is a natural number greater than or equal to 2 and less than the total number of still images)” and “m at least time-series adjacent among a plurality of still images” Is provided with a function of calculating a total score corresponding to the probability that an object in the image is actually a recognition target category.
 例えば、[数7]式に従って、「認識処理を行った複数の静止画のうち少なくとも時系列上で隣接する2枚の画像中の物体が認識対象カテゴリである確率」を算出するように構成してもよい。これにより、偶発的な静止画認識の失敗の影響を抑えることができる。 For example, in accordance with [Expression 7], the “probability that an object in at least two images adjacent in time series among a plurality of still images subjected to recognition processing is a recognition target category” is calculated. May be. Thereby, the influence of accidental failure of still image recognition can be suppressed.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 また、統合スコア算出手段14は、外部入力等に従って、上述したnの値を切り替えて設定する機能を備えており、nの値の切り替えに応じて用いる演算関数を変更して統合スコアを算出する。 The integrated score calculation means 14 has a function of switching and setting the above-described n value according to an external input or the like, and calculates an integrated score by changing an arithmetic function used according to the switching of the n value. .
 このnの値をについては、実験結果で得られる情報を参考にしてユーザが決定するとよい。具体的には、大量の動画像について静止画1枚1枚の認識を行い、認識対象カテゴリの動画像の場合と他のカテゴリの動画像の場合とにおいて、認識対象カテゴリとして受け入れた静止画枚数別の動画の割合分布を計測しておき、この両分布を参考にしてユーザがnの値を決定するようにするとよい。このようにすると、ユーザは、認識対象カテゴリの動画像に対する認識と他のカテゴリの動画像に対する認識とのどちらの認識精度を重視するか判断してnの値を決定することができる。 The value of n should be determined by the user with reference to the information obtained from the experimental results. Specifically, the number of still images received as a recognition target category is recognized for each of the still images for a large number of moving images, and in the case of moving images of the recognition target category and the moving images of other categories. It is preferable to measure the percentage distribution of another moving image and let the user determine the value of n with reference to both distributions. In this way, the user can determine the value of n by determining which of the recognition accuracy of the recognition of the moving image of the recognition target category or the recognition of the moving image of the other category is important.
 また、認識対象カテゴリとして受け入れた静止画枚数別の割合分布は、画面中の特定の部分が高くなっていたり低くなっていたりする場合や、特定の時間帯において変動するため、画像中の物体の位置や撮影時刻に応じてnの値を切りかえて設定するのが望ましい。 In addition, the ratio distribution by the number of still images accepted as the recognition target category varies when the specific part of the screen is high or low, or fluctuates in a specific time zone. It is desirable to switch and set the value of n according to the position and shooting time.
 本第1実施形態における判定手段15は、統合スコア算出手段14により計算された統合スコアに基づいて、動画像中の物体が認識対象カテゴリであるか否かの判定を行う。この判定の手法としては、予め設定された閾値を用いて閾値処理することにより判定すればよい。この閾値は、実験を行った結果を基に決定された値でもよいし、統合スコアの計算に用いた静止画の枚数Mと静止画認識手段12として用いた画像認識エンジンの性能である静止画認識率とに基づいて決定された値でもよい。 The determination unit 15 in the first embodiment determines whether or not an object in the moving image is a recognition target category based on the integrated score calculated by the integrated score calculation unit 14. As a method for this determination, the determination may be made by threshold processing using a preset threshold. This threshold value may be a value determined based on the result of an experiment, or the number M of still images used for calculation of the integrated score and the still image which is the performance of the image recognition engine used as the still image recognition means 12. It may be a value determined based on the recognition rate.
 次に、本第1実施形態の物体認識システムの動作について説明する。ここで、以下の動作説明は、本発明の物体認識方法の実施形態となるので、物体認識方法の各ステップを対応する動作の記述に沿って記す。 Next, the operation of the object recognition system of the first embodiment will be described. Here, since the following description of the operation is an embodiment of the object recognition method of the present invention, each step of the object recognition method is described along with a description of the corresponding operation.
 図2は、本第1実施形態の物体認識システムの動作を示すフローチャートである。まず、外部入力に応じて統合スコア算出手段14がnの値を「1」に設定する(演算設定変更ステップ)。そして、物体を撮像した動画像を構成する各静止画における特徴量を静止画認識手段12が入力し、この静止画像内の物体が認識対象カテゴリであるか否かの画像認識を識別パラメータ記憶部21に保持されているパラメータに基づいて実行し、静止画像毎に認識スコア(静止画認識スコア)を算出する(図2のステップS120,静止画認識ステップ)。 FIG. 2 is a flowchart showing the operation of the object recognition system of the first embodiment. First, the integrated score calculation means 14 sets the value of n to “1” according to an external input (calculation setting change step). Then, the still image recognition means 12 inputs the feature amount in each still image constituting the moving image obtained by imaging the object, and the identification parameter storage unit determines whether or not the object in the still image is the recognition target category. The recognition score (still image recognition score) is calculated for each still image (step S120 in FIG. 2, still image recognition step).
 続いて、静止画確率算出手段13が静止画認識スコアを入力し、対応する静止画像が認識対象カテゴリである確率を確率算出用パラメータ記憶部22に記憶されたパラメータに基づいて算出する(図2のステップS130,静止画確率算出ステップ)。そして、このような静止画像入力から確率の演算までの動作を複数の静止画像に対して繰り返し実行する(図2のステップS140)。 Subsequently, the still image probability calculation means 13 inputs the still image recognition score, and calculates the probability that the corresponding still image is the recognition target category based on the parameters stored in the probability calculation parameter storage unit 22 (FIG. 2). Step S130, still image probability calculation step). Then, the operations from the still image input to the probability calculation are repeatedly executed for a plurality of still images (step S140 in FIG. 2).
 図3及び図4は、このくり返し処理を通して、各静止画の静止画認識スコアとそれに対する確率値が求まる様子を表した概念図である。 3 and 4 are conceptual diagrams showing how a still image recognition score and a probability value for each still image are obtained through this repetition process.
 図3は、静止画認識手段120が、時刻t,t,t,…,tにおける静止画Xt1,Xt2,Xt3,…,XtMをそれぞれ個別に画像認識して、静止画認識スコアst1,st2,st3,…,stMを出力することを示す図である。 Figure 3 is a still picture recognition means 120, the time t 1, t 2, t 3 , ..., still picture X t1, X t2, X t3 at t M, ..., and image recognition individually the X tM, still image recognition score s t1, s t2, s t3 , ..., it illustrates outputting the s tM.
 図4は、静止画確率算出手段130が、各フレーム画像の静止画認識スコアst1,st2,st3,…,stMから各フレーム画像の認識対象である確率Pt1(ω),Pt2(ω),Pt3(ω),…,PtM(ω)を算出することを示す図である。 FIG. 4 shows the probability P t1c ), in which the still image probability calculation means 130 recognizes each frame image from the still image recognition scores s t1 , s t2 , s t3 ,. P t2 (ω c), P t3 (ω c), ..., it illustrates calculating a P tM (ω c).
 続いて、統合スコア算出手段14が、複数の静止画についてそれぞれ求められた確率を用いて、[数1]式に従って「複数の静止画のうち少なくとも1枚が認識対象カテゴリである確率」を算出し、この確率値を基に統合スコアを計算する(図2のステップS150,統合スコア算出ステップ)。この計算は、[数1]式にしたがった計算でも良いし、[数1]式の対数をとった[数2]式をしたがった計算でもよいし、[数3]式又は[数4]式にしたがった計算でもよい。 Subsequently, the integrated score calculation means 14 calculates “probability that at least one of the plurality of still images is the recognition target category” according to the formula [1] using the respective probabilities obtained for the plurality of still images. Then, an integrated score is calculated based on this probability value (step S150 in FIG. 2, integrated score calculating step). This calculation may be a calculation according to the [Equation 1], a calculation according to the [Equation 2] which is a logarithm of the [Equation 1], or an [Equation 3] or [Equation 4]. It may be calculated according to the formula.
 そして、判定手段15が統合スコアを閾値処理して、入力された動画像に示された物体が認識対象カテゴリであるか否かを判定する(図2のステップS160,判定ステップ)。 Then, the determination unit 15 performs threshold processing on the integrated score to determine whether or not the object indicated in the input moving image is a recognition target category (step S160 in FIG. 2, determination step).
 このように、本第1実施形態によれば、静止画確率算出手段13が、各静止画についての認識スコアから、静止画像中の物体が実際に認識対象カテゴリである確率を静止画毎に算出し、この個々の確率値を直接閾値処理せず、統合スコア算出手段14が、複数の確率値から「複数枚の静止画のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が認識対象カテゴリである確率」に対応した統合スコアを算出し、判定手段15が統合スコアを閾値と比較して動画像中の物体が認識対象カテゴリであるか否かを判定するので、複数ある静止画の大半で画像認識に失敗していても動画像中の物体が認識対象カテゴリであるか否かを正確に認識できる。さらには、特定の静止画に係る情報を捨てることなく、全ての静止画の情報(確率値)を用いて統合スコアを算出するので、良好な認識性能を得られる。また、統合スコアに係る設定枚数nを変更することで、システムの用途に応じた調整が可能である。 As described above, according to the first embodiment, the still image probability calculation unit 13 calculates, for each still image, the probability that the object in the still image is actually the recognition target category from the recognition score for each still image. However, the integrated score calculation means 14 does not directly threshold the individual probability values, and the integrated score calculation means 14 determines that “at least n images (n is a natural number equal to or less than the total number of still images) among the plurality of still images”. The integrated score corresponding to “the probability that the object in the recognition target category” is calculated, and the determination means 15 compares the integrated score with a threshold value to determine whether or not the object in the moving image is in the recognition target category. Even if the image recognition fails in most of the plurality of still images, it can be accurately recognized whether or not the object in the moving image is in the recognition target category. Furthermore, since the integrated score is calculated using information (probability values) of all the still images without discarding the information related to the specific still image, good recognition performance can be obtained. Further, by changing the set number n of the integrated score, it is possible to adjust according to the use of the system.
 次に、本第1実施形態の具体例について説明する。 Next, a specific example of the first embodiment will be described.
 図6は、静止画の枚数が5枚である場合の6つの具体例(事例)を説明する表である。図6に示す表の第2列は、各静止画の認識結果に基づいて算出される物体が認識対象カテゴリである確率の値の例を示しており、数値が大きい静止画ほど認識対象カテゴリに類似していることを意味している。 FIG. 6 is a table for explaining six specific examples (examples) when the number of still images is five. The second column of the table shown in FIG. 6 shows an example of the probability value that the object calculated based on the recognition result of each still image is the recognition target category. It means that it is similar.
 図6に示す表の第3列は、上述した背景技術で用いられる「全ての静止画が認識対象カテゴリである確率」に基づいた統合スコアであり、[数8]式に示すように、確率の総積を見やすいように(1/M)乗した値である。 The third column of the table shown in FIG. 6 is an integrated score based on the “probability that all still images are recognition target categories” used in the background art described above. As shown in the equation [8], the probability is It is a value raised to the power of (1 / M) so that it can be easily seen.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 図6に示す表の第4列は、上述した背景技術で用いられる「静止画認識スコアの最大値」に対応する統合スコアである。ここでは、第2列に示す各確率値のうちの最大値を取っている。 The fourth column of the table shown in FIG. 6 is an integrated score corresponding to the “maximum value of still image recognition score” used in the background art described above. Here, the maximum value among the probability values shown in the second column is taken.
 図6に示す表の第5列は、本第1実施形態で用いる統合スコアであり、「複数の静止画のうち少なくとも1枚が実際に認識対象カテゴリである確率」に基づいた統合スコアであり、[数5]式に示すように、[数2]式で求めた値を見やすいように(1/M)乗して1から引いた値である。 The fifth column of the table shown in FIG. 6 is an integrated score used in the first embodiment, and is an integrated score based on “probability that at least one of a plurality of still images is actually a recognition target category”. As shown in [Equation 5], the value obtained by [Equation 2] is subtracted from 1 after being raised to (1 / M) so that it can be easily seen.
 図6に示す事例1は、物体を認識しやすい静止画像が常に入力された場合である。事例2は、1枚の静止画が著しく認識しづらい場合であり、例えば、物体の画像領域の抽出に失敗している場合である。事例3は、2枚の静止画で物体の画像領域の抽出に失敗している場合である。事例4は、著しくは認識しづらく無いものの全体的に概ね認識対象カテゴリであると認識された場合である。事例5は、大半の静止画において著しく認識しづらい場合である。事例6は、著しくは認識しづらく無いものの全体的に概ね認識対象カテゴリではない場合である。 Case 1 shown in FIG. 6 is a case where a still image that easily recognizes an object is always input. Case 2 is a case where one still image is extremely difficult to recognize, for example, when extraction of an image region of an object has failed. Case 3 is a case where extraction of an image region of an object has failed with two still images. Case 4 is a case in which it is recognized that the category is generally a recognition target category although it is not easily recognized. Case 5 is a case where it is extremely difficult to recognize most still images. Case 6 is a case where the recognition target category is not generally recognized although it is not easily recognized.
 ここで、「著しく認識しづらい静止画を含んだ動画像に映った物体のカテゴリを正しく認識する」とすると、事例1と2と3については、明らかに認識対象カテゴリであると認識すべきであり、事例6は認識対象カテゴリではないと認識するべきである。事例4と5については、認識対象カテゴリであると認識するのが概ね正しい。 Here, if “recognize the category of an object reflected in a moving image including a still image that is extremely difficult to recognize”, cases 1, 2, and 3 should be clearly recognized as recognition target categories. Yes, it should be recognized that Case 6 is not a recognition target category. In cases 4 and 5, it is generally correct to recognize that the category is a recognition target category.
 まず、映像中において物体を著しく認識しづらい静止画が比較的少ない場合である事例2について説明する。 First, Case 2 will be described, which is a case where there are relatively few still images in which it is difficult to recognize objects in the video.
 統合スコアとして各静止画の物体が対象カテゴリである確率の総積を用いる場合(図6の第3列)、事例2は、事例1と比較してスコアが急激に低下するとともに、明らかに認識対象カテゴリであるとは断定できない事例4よりも低い値となってしまう。しかし、本第1実施形態の場合(図6の第5列)では、事例2は事例4より高い統合スコアが得られる。これは、本第1実施形態によって「被写体を認識しづらいシーンが少ない動画について被写体が認識対象カテゴリであるか否かを正しく認識できる」ということである。 When the total product of the probabilities that each still image object is the target category is used as the integrated score (the third column in FIG. 6), the case 2 is clearly recognized as the score decreases sharply compared to the case 1 It becomes a value lower than Case 4 that cannot be determined to be the target category. However, in the case of the first embodiment (fifth column in FIG. 6), Case 2 can obtain a higher integrated score than Case 4. This means that, according to the first embodiment, “whether the subject is in the recognition target category can be correctly recognized for a moving image with few scenes where it is difficult to recognize the subject”.
 次に、動画映像中において物体を著しく認識しづらい静止画が多い場合である事例5について説明すると、統合スコアとして各静止画の物体が対象カテゴリである確率の総積を用いる場合(図6の第3列)は、事例5は事例6と比較して低いスコアとなるため、閾値を調整しても事例5を認識対象カテゴリとし、かつ事例6を認識対象カテゴリでないと正しく判定することができない。一方、本第1実施形態の場合(図6の第5列)では、閾値を調整すれば正しく判定することが可能である。これは、本第1実施形態によって「著しく被写体を認識しづらいシーンが多い動画について、被写体が認識対象カテゴリであるか否かを正しく認識することが可能である」ということである。 Next, Case 5 in which there are many still images in which it is difficult to recognize objects remarkably in the moving image will be described. In the case where the total product of probabilities that each still image object is the target category is used as the integrated score (FIG. 6). In the third column), Case 5 has a lower score than Case 6, so even if the threshold value is adjusted, Case 5 is set as a recognition target category, and Case 6 cannot be correctly determined unless it is a recognition target category. . On the other hand, in the case of the first embodiment (fifth column in FIG. 6), it is possible to correctly determine by adjusting the threshold value. This means that, according to the first embodiment, it is possible to correctly recognize whether or not the subject is in the recognition target category for a moving image having a large number of scenes in which it is difficult to recognize the subject.
 一方、統合スコアとして最大スコアを採用する場合(図6の第4列)でも、本第1実施形態と同様に「著しく被写体を認識しづらいシーンが多い動画について、被写体が認識対象カテゴリであるか否かを正しく認識することが可能である」。しかし、事例5のような確率の組み合わせの場合、本当は認識対象カテゴリではないが、静止画の1枚に偶発的に認識対象カテゴリらしいノイズが発生したという事象(図7参照)も考えられる。この場合には、最大値を統合スコアとすると、認識対象カテゴリであると誤判定してしまうが、本第1実施形態ならば、統合スコアの閾値を調整しておくことで、認識対象カテゴリではないと判定することも可能である。 On the other hand, even when the maximum score is adopted as the integrated score (fourth column in FIG. 6), as in the first embodiment, “whether the subject is a recognition target category for a video having many scenes that are extremely difficult to recognize the subject. It is possible to correctly recognize whether or not. However, in the case of the combination of probabilities as in Case 5, there may be a phenomenon (see FIG. 7) that noise that is not a recognition target category is accidentally generated in one of the still images. In this case, if the maximum value is the integrated score, it is erroneously determined that it is a recognition target category. However, in the first embodiment, by adjusting the threshold value of the integrated score, the recognition target category It is also possible to determine that there is no.
 ここで、認識に利用する静止画の時刻の選び方については、始端時刻tを物体の時系列データを取得開始できた時点としてもよいし、最新時刻tから一定時間だけ前の時刻としてもよい。また、始端時刻tから最新時刻tまでの静止画枚数(フレーム数)はいくつでも認識可能である。一般に、静止画枚数が多いほど物体を認識しやすいシーンが含まれる確率が高まるので、認識率が向上する傾向がある。ただし、静止画枚数が多いほど、偶発的に認識対象カテゴリらしい特徴量が発現する確率が高まる。 Here, as to how to select the time of the still image used for recognition, the start time t 1 may be the time when the acquisition of the time-series data of the object can be started, or may be a time that is a certain time before the latest time t M. Good. Further, any number of still images (number of frames) from the start time t 1 to the latest time t M can be recognized. In general, the greater the number of still images, the higher the probability that a scene that easily recognizes an object is included, and the recognition rate tends to improve. However, the greater the number of still images, the higher the probability that a feature quantity that is likely to be a recognition target category appears.
 また、図2に示すステップS140の繰り返し動作は、ステップS110,S120,S130のそれぞれを個別に繰り返し動作させても、全く同じ効果を得ることができる。 Also, the repeated operation of step S140 shown in FIG. 2 can achieve exactly the same effect even if each of steps S110, S120, and S130 is individually repeated.
 また、統計的パターン認識分野におけるベイズの公式の事前確率が認識対象カテゴリと非認識対象カテゴリで同一と仮定する場合、静止画確率算出手段13は、「静止画中の物体が認識対象カテゴリである確率」を求めるのではなく、「静止画中の物体が認識対象カテゴリであるという条件下で静止画認識スコアの算出値が生起する確率」を求めるように構成してもよく、統合スコア算出手段14がその確率を[数1]式から[数7]式に当てはめて、「複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)で静止画認識スコアの算出値が生起する確率」に基づく統合スコアを計算すれば、同様の効果を得ることができる。 When it is assumed that the Bayes formula prior probabilities in the statistical pattern recognition field are the same in the recognition target category and the non-recognition target category, the still image probability calculation means 13 indicates that “the object in the still image is the recognition target category. Instead of calculating the “probability”, it may be configured to obtain “the probability that the calculated value of the still image recognition score occurs under the condition that the object in the still image is the recognition target category”, and the integrated score calculating means 14 applies the probability to [Expression 1] to [Expression 7], and the calculated value of the still image recognition score is “at least n out of a plurality of still images (n is a natural number equal to or less than the total number of still images)”. A similar effect can be obtained by calculating an integrated score based on the “probability of occurrence”.
 また、静止画認識手段12は、動画像の一部を構成する各静止画像に対して認識対象カテゴリであるか否かの画像認識を実行するように構成してもよく、この場合、判定手段15が動画像の一部の静止画のみから求めた統合スコアに対して閾値処理を行って、閾値よりも高い統合スコアが得られていれば、残りの静止画についての処理を省略して判定を行う。このように処理を省略することで高速な画像認識処理が実現できる。 Further, the still image recognition means 12 may be configured to execute image recognition as to whether or not it is a recognition target category for each still image constituting a part of the moving image. 15 performs threshold processing on the integrated score obtained from only some still images of the moving image, and if an integrated score higher than the threshold is obtained, determination is performed by omitting processing for the remaining still images. I do. By omitting the processing in this way, high-speed image recognition processing can be realized.
 次に、本第1実施形態の物体認識システムの適用例として、認識対象カテゴリを「人間」とし動画像に映った物体が「人間」であるか否かを認識する場合について説明する。 Next, as an application example of the object recognition system of the first embodiment, a case will be described in which the recognition target category is “human” and it is recognized whether or not the object shown in the moving image is “human”.
 上述したデータ処理装置1としてパーソナルコンピュータ、記憶装置2として半導体メモリを用いる。この場合、識別パラメータ記憶部21と確率算出用パラメータ記憶部22は半導体メモリ上の一部とみなせる。また、静止画認識手段12,静止画確率算出手段13,統合スコア算出手段14,結果判定手段15は、パーソナルコンピュータのCPUの機能として実現される。 A personal computer is used as the data processing device 1 described above, and a semiconductor memory is used as the storage device 2. In this case, the identification parameter storage unit 21 and the probability calculation parameter storage unit 22 can be regarded as part of the semiconductor memory. Still picture recognition means 12, still picture probability calculation means 13, integrated score calculation means 14, and result determination means 15 are realized as functions of a CPU of a personal computer.
 静止画認識手段12は、「一般化学習ベクトル量子化」を用いて各静止画を画像認識する。識別パラメータ記憶部21には、パラメータとして「一般化学習ベクトル量子化」で識別を実行するために必要な「参照ベクトル」をあらかじめ保持しておく。静止画確率算出手段13が、確率算出用パラメータ記憶部22に記憶された静止画認識スコアと認識対象カテゴリである確率値とを1対1に対応付けた変換表を参照して、静止画が認識対象カテゴリである確率を算出する。 The still image recognition means 12 recognizes each still image using “generalized learning vector quantization”. The identification parameter storage unit 21 stores in advance “reference vectors” necessary for performing identification by “generalized learning vector quantization” as parameters. The still image probability calculation means 13 refers to the conversion table in which the still image recognition score stored in the probability calculation parameter storage unit 22 and the probability value that is the recognition target category are associated with each other in a one-to-one correspondence. The probability of being a recognition target category is calculated.
 まず、図2のステップS120に相当する動作として、静止画認識手段12が、物体の静止画像から特徴量としてエッジを抽出したデータを入力として、「一般化学習ベクトル量子化」を用いてその静止画中の物体が「人間」であるか否か画像認識し、静止画認識スコアを算出する。 First, as an operation corresponding to step S120 in FIG. 2, the still image recognition unit 12 receives data obtained by extracting an edge as a feature amount from a still image of an object, and uses “generalized learning vector quantization” to perform the still image recognition. Whether or not the object in the image is “human” is recognized, and a still image recognition score is calculated.
 続いて、図2のステップS130に相当する動作として、静止画確率算出手段13が、静止画認識スコアから確率算出用パラメータ記憶部22に記憶された変換表に基づいて、静止画中の物体が「人間」である確率を算出する。そして、ここまでの処理を動画中の全ての静止画について実行し、静止画の個数分の確率値を得る。このとき、処理対象の静止画は、現在時刻より一定時間前の時刻から、現在時刻までの静止画とする。 Subsequently, as an operation corresponding to step S130 in FIG. 2, the still image probability calculating unit 13 determines the object in the still image based on the conversion table stored in the probability calculating parameter storage unit 22 from the still image recognition score. The probability of being “human” is calculated. Then, the processing so far is executed for all the still images in the moving image, and the probability values for the number of still images are obtained. At this time, the still image to be processed is a still image from a time before the current time to a current time.
 続いて、図2のステップS150に相当する動作として、統合スコア算出手段14が統合スコアを算出する。例えば、nを1に設定した場合は[数4]式に従って「少なくとも1枚の静止画中の物体が実際に人間である確率」を統合スコアとして計算すればよい。さらに、画像の中心付近において、物体を認識しづらくない静止画が2枚以上観測されると予想できたならば、物体が画像中心付近にある場合のみnを2に設定し[数6]式に従って、「少なくとも2枚の静止画が人間である確率」を統合スコアとして計算すればよい。 Subsequently, as an operation corresponding to step S150 in FIG. 2, the integrated score calculation means 14 calculates an integrated score. For example, when n is set to 1, “the probability that an object in at least one still image is actually a human” may be calculated as an integrated score according to the equation [4]. Furthermore, if it can be expected that two or more still images that are difficult to recognize the object are observed near the center of the image, n is set to 2 only when the object is near the center of the image, Accordingly, the “probability that at least two still images are humans” may be calculated as an integrated score.
 そして、図2のステップS160に相当する動作として、判定手段15が、統合スコアを閾値処理することで、動画像に映った物体が「人間」であるか否かを判定する。この閾値は、実験の結果を基に設定すればよい。例えば、図6に示すような統合スコアを得られるようなシステム稼働環境では、図6の事例5,6を正しく棄却できるように、閾値を0.5とすればよい。 Then, as an operation corresponding to step S160 in FIG. 2, the determination unit 15 determines whether or not the object shown in the moving image is “human” by performing threshold processing on the integrated score. This threshold may be set based on the result of the experiment. For example, in a system operating environment where an integrated score as shown in FIG. 6 can be obtained, the threshold value may be set to 0.5 so that cases 5 and 6 in FIG. 6 can be correctly rejected.
 本例では、[数4]式または[数6]式に従って統合スコアを計算する、すなわち複数枚の静止画のうち少なくとも所定枚数が認識対象カテゴリである確率に基づいた統合スコアを算出する。このため、静止画の大半で物体の認識に失敗していても物体が認識対象カテゴリであるか否かを正しく認識でき、偶発的な誤認識をある程度抑えることができる。さらには、判定基準に「確率」を用いているので、通常「人間でない」と判定される静止画についてもどの程度人間らしいかという情報を利用し、特定の静止画に係る情報を捨てることなく、全ての静止画に係る情報に基づいてスコアを算出するので、良好な認識性能を得られる。また、外部入力に応じて、[数4]式または[数6]式を選択して統合スコアを算出するため、システムの稼動環境にあわせた調整が可能である。 In this example, the integrated score is calculated according to the [Equation 4] or [Equation 6], that is, the integrated score is calculated based on the probability that at least a predetermined number of still images are the recognition target category. For this reason, even if most of the still images fail to recognize the object, it can be correctly recognized whether the object is in the recognition target category, and accidental misrecognition can be suppressed to some extent. Furthermore, since “probability” is used as a criterion, it is possible to use information on how human-like a still image is normally determined to be “non-human” without throwing away information related to a specific still image, Since the score is calculated based on information relating to all still images, good recognition performance can be obtained. In addition, since the integrated score is calculated by selecting the [Expression 4] or [Expression 6] according to the external input, it is possible to make adjustments according to the operating environment of the system.
 [第2実施形態]
 次に、本発明における第2の実施形態について説明する。本第2実施形態は、認識対象カテゴリが「人間である」,「人間でない」などの2つではなく、3つ以上ある場合に適用できる。
[Second Embodiment]
Next, a second embodiment of the present invention will be described. The second embodiment can be applied when there are three or more recognition target categories such as “human” and “non-human”.
 図8は、本第2実施形態の物体認識システムの構成を示す機能ブロック図である。本第2実施形態の物体認識システムは、上述した第1実施形態の構成と同一であるが、処理の流れと、識別パラメータ記憶部21及び確率算出用パラメータ記憶部22に保持される情報の内容が異なる構成となっている。 FIG. 8 is a functional block diagram showing the configuration of the object recognition system of the second embodiment. The object recognition system of the second embodiment is the same as the configuration of the first embodiment described above, but the flow of processing and the contents of information held in the identification parameter storage unit 21 and the probability calculation parameter storage unit 22 Are different configurations.
 本第2実施形態における識別パラメータ記憶部21は、1つのカテゴリに関する情報だけでなく、複数のカテゴリに関するパラメータを保持している。確率算出用パラメータ記憶部22に保持されるパラメータも複数のカテゴリに関するパラメータである。 The identification parameter storage unit 21 in the second embodiment holds not only information related to one category but also parameters related to a plurality of categories. Parameters stored in the probability calculation parameter storage unit 22 are also parameters related to a plurality of categories.
 次に、本第2実施形態の物体認識システムの動作について説明する。 Next, the operation of the object recognition system of the second embodiment will be described.
 図9は、本第2実施形態の物体認識システムの動作を示すフローチャートである。以下では、カテゴリ数をNとして説明を進める。まず、物体が第1のカテゴリであるか否か、すなわち1番目の認識対象カテゴリであるか否かを、図2のステップS110からステップS150までの動作により識別する(図9のステップS210)。識別結果として、1番目のカテゴリであると判定された場合は、最終結果を1番目のカテゴリとして終了し、1番目のカテゴリではないと判定された場合は(図9のステップS220のノー)、その物体が2番目のカテゴリであるか否かの識別ステップを行う。このようにして、最大でN-1番目のカテゴリまで識別ステップ(図9のステップS210)を繰り返し実行することにより、動画像中の物体がどのカテゴリであるかを判定する。 FIG. 9 is a flowchart showing the operation of the object recognition system of the second embodiment. In the following description, the number of categories is assumed to be N. First, whether or not the object is in the first category, that is, whether or not it is in the first recognition target category, is identified by the operations from step S110 to step S150 in FIG. 2 (step S210 in FIG. 9). When it is determined that the first category is the identification result, the final result is ended as the first category, and when it is determined that it is not the first category (No in step S220 in FIG. 9), An identification step is performed to determine whether the object is in the second category. In this manner, by repeatedly executing the identification step (step S210 in FIG. 9) up to the N−1th category at the maximum, it is determined which category the object in the moving image is.
 このように本第2実施形態によれば、第1実施形態と同様の効果に加えて、カテゴリ数が3つ以上の場合にも識別を正しく行うことができる。 As described above, according to the second embodiment, in addition to the same effects as those of the first embodiment, the identification can be correctly performed even when the number of categories is three or more.
 次に、本第2実施形態の一例として、動画像に映った物体が「人間」であるか、または「自動車」であるか、または「人間でも自動車でもない何か」であるか否かを認識する場合について説明する。 Next, as an example of the second embodiment, it is determined whether or not the object shown in the moving image is “human”, “automobile”, or “something that is neither human nor automobile”. A case of recognition will be described.
 データ処理装置1としてパーソナルコンピュータ、記憶装置2として半導体メモリを用いる。この場合、識別パラメータ記憶部21と確率算出用パラメータ記憶部22は半導体メモリ上の一部とみなせる。また、静止画認識手段12,静止画確率算出手段13,統合スコア算出手段14,結果判定手段15は、パーソナルコンピュータのCPUの機能として実現される。 A personal computer is used as the data processing device 1 and a semiconductor memory is used as the storage device 2. In this case, the identification parameter storage unit 21 and the probability calculation parameter storage unit 22 can be regarded as part of the semiconductor memory. Still picture recognition means 12, still picture probability calculation means 13, integrated score calculation means 14, and result determination means 15 are realized as functions of a CPU of a personal computer.
 静止画認識手段12は、「一般化学習ベクトル量子化」を用いて各静止画を画像認識する。識別パラメータ記憶部21には、パラメータとして「一般化学習ベクトル量子化」で識別を実行するために必要な「参照ベクトル」をあらかじめ保持しておく。静止画確率算出手段13は、確率算出用パラメータ記憶部22に記憶された静止画認識スコアと認識対象カテゴリである確率値とを1対1に対応付けた変換表を参照して、静止画中の物体が認識対象カテゴリである確率を算出する。 The still image recognition means 12 recognizes each still image using “generalized learning vector quantization”. The identification parameter storage unit 21 stores in advance “reference vectors” necessary for performing identification by “generalized learning vector quantization” as parameters. The still image probability calculation means 13 refers to the conversion table in which the still image recognition score stored in the probability calculation parameter storage unit 22 and the probability value that is the recognition target category are associated one-to-one with reference to the conversion table. The probability that the object is a recognition target category is calculated.
 まず、図9のステップS210に相当する動作として、動画に映った物体が「人間」であるか否かを判定する。動画に映った物体が「人間」と判定されていれば最終的な認識結果を「人間である」として終了し、そうでなければ、動画に映った物体が「自動車」であるか否かを判定する。「自動車」と判定されていれば最終的な認識結果を「自動車である」として終了し、そうでなければ認識結果を「人間でも自動車でもない何か」として終了する。 First, as an operation corresponding to step S210 in FIG. 9, it is determined whether or not the object shown in the moving image is “human”. If the object reflected in the video is determined to be “human”, the final recognition result is terminated as “human”, and if not, whether the object reflected in the video is “automobile” is determined. judge. If it is determined as “automobile”, the final recognition result is ended as “automobile”, otherwise the recognition result is ended as “something that is neither a human nor a car”.
 このように、本第2実施形態によれば、動画内の物体が「人間」であるか否かだけでなく「自動車」であるか否かの識別も正しく行うことができる。 As described above, according to the second embodiment, it is possible to correctly identify not only whether the object in the moving image is “human” but also whether it is “automobile”.
 ここで、本第1および第2実施形態における静止画認識手段12,静止画確率算出手段13,統合スコア算出手段14,判定手段15については、その機能内容をプログラム化してコンピュータに実行させるように構成してもよい。 Here, the function contents of the still image recognition means 12, the still image probability calculation means 13, the integrated score calculation means 14, and the determination means 15 in the first and second embodiments are programmed and executed by a computer. It may be configured.
 以上、実施形態(及び実施例)を参照して本願発明を説明したが、本願発明は上記実施形態(及び実施例)に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 As mentioned above, although this invention was demonstrated with reference to embodiment (and an Example), this invention is not limited to the said embodiment (and Example). Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 この出願は2008年1月31日に出願された日本出願特願2008-021832を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2008-021832, filed on January 31, 2008, the entire disclosure of which is incorporated herein.
 本発明によれば、カメラで撮影された動画像から人物や自動車といった物体のカテゴリを正確に認識するといった物体監視用途に適用できる。 The present invention can be applied to object monitoring applications such as accurately recognizing a category of an object such as a person or a car from a moving image taken by a camera.
本発明における第1実施形態の物体認識システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the object recognition system of 1st Embodiment in this invention. 図1に開示した実施形態の物体認識システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the object recognition system of embodiment disclosed in FIG. 図1に開示した実施形態における静止画認識手段の動作を示す説明図である。It is explanatory drawing which shows operation | movement of the still image recognition means in embodiment disclosed in FIG. 図1に開示した実施形態における静止画確率手段の動作を示す説明図である。It is explanatory drawing which shows operation | movement of the still image probability means in embodiment disclosed in FIG. 動画の時系列変化の一例を示す説明図である。It is explanatory drawing which shows an example of the time series change of a moving image. 図1に開示した実施形態の具体例を説明する図である。It is a figure explaining the specific example of embodiment disclosed in FIG. 偶発的に高い値の静止画認識スコアが検出される場合のスコア変動の模式図である。It is a schematic diagram of a score fluctuation | variation in case the still image recognition score of a high value is detected accidentally. 本発明における第2実施形態の物体認識システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the object recognition system of 2nd Embodiment in this invention. 図8に開示した実施形態の物体認識システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the object recognition system of embodiment disclosed in FIG.
符号の説明Explanation of symbols
 1 データ処理装置
 2 記憶装置
 12 静止画認識手段
 13 静止画確率算出手段
 14 統合スコア算出手段
 15 判定手段
 21 識別パラメータ記憶部
 22 確率算出用パラメータ記憶部
DESCRIPTION OF SYMBOLS 1 Data processor 2 Memory | storage device 12 Still image recognition means 13 Still image probability calculation means 14 Integrated score calculation means 15 Determination means 21 Identification parameter memory | storage part 22 Parameter storage part for probability calculation

Claims (33)

  1.  動画像からその被写体である物体のカテゴリを認識する物体認識システムにおいて、
     前記動画像を構成する複数の静止画像に対し画像中の物体が認識対象カテゴリであるか否か個別に認識して静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識手段と、
     この算出された静止画認識スコアに対応して、この静止画認識スコアで前記静止画像中の物体が実際に前記認識対象カテゴリである確率を算出する静止画確率算出手段と、
     この静止画確率算出手段により前記静止画像毎に算出された複数の確率値から前記複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が実際に前記認識対象カテゴリである確率に対応する総合スコアを予め定められた演算関数に従って算出する統合スコア算出手段と、
     この統合スコアに基づいて前記動画像中の物体が前記認識対象カテゴリであるか否かを判定する判定手段とを備えたことを特徴とする物体認識システム。
    In an object recognition system that recognizes a category of an object that is a subject from a moving image,
    Still image recognition means for individually recognizing whether or not an object in the image is a recognition target category for a plurality of still images constituting the moving image and outputting a recognition score calculated for each still image as a still image recognition score When,
    Corresponding to the calculated still image recognition score, a still image probability calculating means for calculating a probability that an object in the still image is actually the recognition target category with the still image recognition score;
    An object in at least n images (n is a natural number equal to or less than the total number of still images) among the plurality of still images from the plurality of probability values calculated for each still image by the still image probability calculating unit is actually the above-mentioned. Integrated score calculation means for calculating a total score corresponding to the probability of being a recognition target category according to a predetermined calculation function;
    An object recognition system comprising: determination means for determining whether an object in the moving image is in the recognition target category based on the integrated score.
  2.  前記請求項1に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、前記n枚を時系列上隣接するm枚(mは2以上で全静止画像数以下の自然数)に設定して、前記複数の静止画像のうち少なくとも時系列上隣接するm枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する総合スコアを算出する機能を備えたことを特徴とする物体認識システム。
    The object recognition system according to claim 1,
    The integrated score calculation means sets the n images to m adjacent in time series (m is a natural number equal to or greater than 2 and equal to or less than the number of all still images), and is adjacent in at least time series among the plurality of still images. An object recognition system comprising a function of calculating a total score corresponding to a probability that an object in m images is actually the recognition target category.
  3.  前記請求項1又は2に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、外部入力に応じて前記nの値を切り替え設定する機能を備えると共に、この切り替えに応じて用いる演算関数を変更して前記統合スコアを算出する機能を備えたことを特徴とする物体認識システム。
    In the object recognition system according to claim 1 or 2,
    The integrated score calculation means has a function of switching and setting the value of n according to an external input, and also has a function of calculating the integrated score by changing a calculation function used according to the switching. An object recognition system.
  4.  前記請求項3に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、前記画像中の物体の位置に基づいて前記nの値を設定する機能を備えたことを特徴とする物体認識システム。
    The object recognition system according to claim 3, wherein
    The integrated score calculation means has a function of setting the value of n based on the position of an object in the image.
  5.  前記請求項3に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、前記動画像の撮像された時間帯に基づいて前記nの値を設定する機能を備えたことを特徴とする物体認識システム。
    The object recognition system according to claim 3, wherein
    The integrated score calculation means has a function of setting the value of n based on a time zone when the moving image is captured.
  6.  前記請求項1乃至5のいずれか一項に記載の物体認識システムにおいて、
     前記静止画認識手段は、前記動画像の一部を構成する各静止画像に対して画像中の物体が前記認識対象カテゴリであるか否か個別に認識することを特徴とする物体認識システム。
    The object recognition system according to any one of claims 1 to 5,
    The object recognition system, wherein the still image recognition means individually recognizes whether or not an object in the image is the recognition target category for each still image constituting a part of the moving image.
  7.  前記請求項1乃至6のいずれか一項に記載の物体認識システムにおいて、
     前記静止画認識手段は、前記判定手段によって前記動画像中の物体が前記認識対象カテゴリでないと判定された場合、前記動画像を構成する複数の静止画像に対し画像中の物体が他のカテゴリであるか否か個別に認識して複数の前記静止画認識スコアを算出することを特徴とする物体認識システム。
    In the object recognition system according to any one of claims 1 to 6,
    When the determination unit determines that the object in the moving image is not in the recognition target category, the still image recognition unit determines that the object in the image is in another category with respect to the plurality of still images constituting the moving image. An object recognition system, wherein the plurality of still image recognition scores are calculated by individually recognizing whether or not there is any.
  8.  前記請求項1乃至7のいずれか一項に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、前記n枚を1枚に設定して、前記複数の静止画像のうちの少なくとも1枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する総合スコアを算出することを特徴とする物体認識システム。
    In the object recognition system according to any one of claims 1 to 7,
    The integrated score calculation means sets the n number as one, and calculates an overall score corresponding to the probability that an object in at least one of the plurality of still images is actually the recognition target category. An object recognition system characterized by calculating.
  9.  前記請求項1乃至7のいずれか一項に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、前記n枚を2枚に設定して、前記複数の静止画像のうち少なくとも2枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する総合スコアを算出することを特徴とする物体認識システム。
    In the object recognition system according to any one of claims 1 to 7,
    The integrated score calculation means sets the n sheets to two, and calculates a total score corresponding to the probability that an object in at least two of the plurality of still images is actually the recognition target category. An object recognition system characterized by
  10.  前記請求項2乃至7のいずれか一項に記載の物体認識システムにおいて、
     前記統合スコア算出手段は、前記m枚を2枚に設定して、前記複数の静止画像のうち少なくとも時系列上隣接する2枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する総合スコアを算出することを特徴とする物体認識システム。
    In the object recognition system according to any one of claims 2 to 7,
    The integrated score calculation means sets the m sheets to two, and corresponds to the probability that an object in at least two images adjacent in time series among the plurality of still images is actually the recognition target category. An object recognition system characterized by calculating an overall score.
  11.  前記請求項1乃至7に記載の物体認識システムにおいて、
     前記静止画確率算出手段が、前記静止画像中の物体が認識対象カテゴリである確率に代えて、前記静止画像中の物体が認識対象カテゴリであるという条件下で前記静止画認識手段に算出された静止画認識スコアが生起する確率を計算すると共に、
     前記統合スコア算出手段が、前記静止画確率算出手段により前記静止画像毎に算出された複数の確率値に基づいて、前記複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像で前記算出された静止画認識スコアが生起する確率に対応したスコアを統合スコアとして計算することを特徴とする物体認識システム。
    The object recognition system according to any one of claims 1 to 7,
    The still image probability calculation means is calculated by the still image recognition means under the condition that the object in the still image is in the recognition target category instead of the probability that the object in the still image is in the recognition target category. While calculating the probability that a still image recognition score will occur,
    Based on a plurality of probability values calculated for each of the still images by the still image probability calculation unit, the integrated score calculation unit is at least n of the plurality of still images (n is a natural number equal to or less than the total number of still images). The object recognition system is characterized in that a score corresponding to the probability that the calculated still image recognition score occurs in the image is calculated as an integrated score.
  12.  動画像からその被写体である物体のカテゴリを認識する物体認識方法において、
     前記動画像を構成する複数の静止画像に対し画像中の物体が認識対象カテゴリであるか否か個別に認識して静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識ステップと、
     この算出された静止画認識スコアに対応して、この静止画認識スコアで前記静止画像中の物体が前記認識対象カテゴリである確率を算出する静止画確率算出ステップと、
     この静止画確率算出ステップで前記静止画像毎に算出された複数の確率値から前記複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを予め定められた演算関数に従って算出する統合スコア算出ステップと、
     この統合スコアに基づいて前記動画像中の物体が前記認識対象カテゴリであるか否かを判定する判定ステップとを実行することを特徴とする物体認識方法。
    In an object recognition method for recognizing a category of an object as a subject from a moving image,
    Still image recognition step of outputting a recognition score calculated for each still image as a still image recognition score by individually recognizing whether or not an object in the image is a recognition target category for a plurality of still images constituting the moving image When,
    Corresponding to the calculated still image recognition score, a still image probability calculating step of calculating a probability that an object in the still image is the recognition target category with the still image recognition score;
    An object in at least n images (n is a natural number equal to or less than the total number of still images) among the plurality of still images from the plurality of probability values calculated for each still image in the still image probability calculating step is actually the above-mentioned. An integrated score calculating step for calculating an integrated score corresponding to the probability of the recognition target category according to a predetermined calculation function;
    And a determination step of determining whether or not the object in the moving image is in the recognition target category based on the integrated score.
  13.  前記請求項12に記載の物体認識方法において、
     前記統合スコア算出ステップでは、前記n枚を時系列上隣接するm枚(mは2以上で全静止画像数以下の自然数)に設定して、前記複数の静止画像のうち少なくとも時系列上隣接するm枚(mは2以上で全静止画像数以下の自然数)の画像中の物体が実際に前記認識対象カテゴリである確率に対応する総合スコアを算出することを特徴とする物体認識方法。
    The object recognition method according to claim 12, wherein:
    In the integrated score calculation step, the n images are set to m adjacent in time series (m is a natural number equal to or greater than 2 and equal to or less than the total number of still images), and are at least time-adjacent among the plurality of still images. An object recognition method, comprising: calculating a total score corresponding to a probability that an object in m images (m is a natural number equal to or greater than 2 and equal to or less than the number of all still images) is actually the recognition target category.
  14.  前記請求項12または13に記載の物体認識方法において、
     外部入力に応じて前記nの値を切り替えて設定する演算設定変更ステップを実行し、
     前記統合スコア算出ステップでは、前記演算設定変更ステップで設定されたnの値に応じて用いる演算関数を変更し前記統合スコアを算出することを特徴とする物体認識方法。
    The object recognition method according to claim 12 or 13, wherein:
    A calculation setting changing step of switching and setting the value of n according to an external input;
    In the integrated score calculating step, the integrated score is calculated by changing a calculation function used according to the value of n set in the calculation setting changing step.
  15.  前記請求項14に記載の物体認識方法において、
     前記演算設定変更ステップでは、前記画像中の物体の位置に基づいて前記nの値を設定することを特徴とする物体認識方法。
    The object recognition method according to claim 14, wherein:
    In the calculation setting changing step, the value of n is set based on the position of the object in the image.
  16.  前記請求項14に記載の物体認識方法において、
     前記演算設定変更ステップでは、前記動画像の撮像された時間帯に基づいて前記nの値を設定することを特徴とする物体認識方法。
    The object recognition method according to claim 14, wherein:
    In the calculation setting changing step, the value of n is set based on a time zone when the moving image is captured.
  17.  前記請求項12乃至16のいずれか一項に記載の物体認識方法において、
     前記静止画認識ステップでは、前記動画像の一部を構成する各静止画像に対して画像中の物体が前記認識対象カテゴリであるか否か個別に認識することを特徴とする物体認識方法。
    The object recognition method according to any one of claims 12 to 16,
    In the still image recognition step, an object recognition method characterized by individually recognizing whether or not an object in the image is the recognition target category for each still image constituting a part of the moving image.
  18.  前記請求項12乃至17のいずれか一項に記載の物体認識方法において、
     前記判定ステップで前記動画像中の物体が前記認識対象カテゴリでないと判定された場合、前記認識対象カテゴリを他のカテゴリに変更して前記静止画認識ステップと、前記静止画確率算出ステップと、前記統合スコア算出ステップと、前記判定ステップとを再度実行することを特徴とする物体認識方法。
    The object recognition method according to any one of claims 12 to 17,
    When it is determined that the object in the moving image is not the recognition target category in the determination step, the recognition target category is changed to another category, the still image recognition step, the still image probability calculation step, An object recognition method, wherein the integrated score calculation step and the determination step are executed again.
  19.  前記請求項12乃至18のいずれか一項に記載の物体認識方法において、
     前記統合スコア算出ステップでは、前記複数の静止画像のうち少なくとも1枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出することを特徴とする物体認識方法。
    The object recognition method according to any one of claims 12 to 18,
    In the integrated score calculating step, an integrated score corresponding to a probability that an object in at least one of the plurality of still images is actually in the recognition target category is calculated.
  20.  前記請求項12乃至18のいずれか一項に記載の物体認識方法において、
     前記統合スコア算出ステップでは、前記複数の静止画像のうち少なくとも2枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出することを特徴とする物体認識方法。
    The object recognition method according to any one of claims 12 to 18,
    In the integrated score calculating step, an integrated score corresponding to a probability that an object in at least two of the plurality of still images is actually in the recognition target category is calculated.
  21.  前記請求項13乃至18のいずれか一項に記載の物体認識方法において、
     前記統合スコア算出ステップでは、前記複数の静止画像のうち少なくとも時系列上隣接する2枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出することを特徴とする物体認識方法。
    The object recognition method according to any one of claims 13 to 18,
    In the integrated score calculating step, an integrated score corresponding to a probability that an object in at least two images adjacent in time series among the plurality of still images is actually the recognition target category is calculated. Object recognition method.
  22.  前記請求項12乃至18のいずれか一項に記載の物体認識方法において、
     前記静止画確率算出ステップでは、前記静止画像中の物体が認識対象カテゴリである確率に代えて、前記静止画像中の物体が前記認識対象カテゴリであるという条件下で前記静止画認識ステップにより算出された静止画認識スコアが生起する確率を計算し、
     前記統合スコア算出ステップでは、前記静止画確率算出ステップにより前記静止画像毎に算出された複数の確率値に基づいて、前記複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像で前記算出された静止画認識スコアが生起する確率に対応したスコアを前記統合スコアとして計算することを特徴とする物体認識方法。
    The object recognition method according to any one of claims 12 to 18,
    In the still image probability calculation step, instead of the probability that the object in the still image is the recognition target category, the still image probability calculation step is calculated by the still image recognition step under the condition that the object in the still image is the recognition target category. Calculate the probability that a still image recognition score will occur,
    In the integrated score calculation step, at least n of the plurality of still images (n is a natural number equal to or less than the total number of still images) based on the plurality of probability values calculated for each of the still images in the still image probability calculation step. A score corresponding to the probability that the calculated still image recognition score occurs in the image of (2) is calculated as the integrated score.
  23.  前記動画像を構成する複数の静止画像に対し画像中の物体が認識対象カテゴリであるか否か個別に認識して静止画像毎に算出した認識スコアを静止画認識スコアとして出力する静止画認識機能と、
     この算出された静止画認識スコアに対応して、この静止画認識スコアで前記静止画像中の物体が実際に前記認識対象カテゴリである確率を算出する静止画確率算出機能と、
     この静止画確率算出機能で前記静止画像毎に算出された複数の確率値から前記複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)の画像中の物体が実際に前記認識対象カテゴリである確率に対応した統合スコアを予め定められた演算関数に従って算出する統合スコア算出機能と、
     この統合スコアに基づいて前記動画像中の物体が前記認識対象カテゴリであるか否かを判定する判定機能とをコンピュータに実行させることを特徴とする物体認識用プログラム。
    A still image recognition function for individually recognizing whether or not an object in the image is a recognition target category for a plurality of still images constituting the moving image and outputting a recognition score calculated for each still image as a still image recognition score When,
    Corresponding to the calculated still image recognition score, a still image probability calculation function for calculating a probability that an object in the still image is actually the recognition target category with the still image recognition score;
    Objects in at least n images (n is a natural number equal to or less than the total number of still images) among the plurality of still images from the plurality of probability values calculated for each still image by the still image probability calculation function are actually the above-described images. An integrated score calculation function for calculating an integrated score corresponding to the probability of the recognition target category according to a predetermined arithmetic function;
    An object recognition program that causes a computer to execute a determination function for determining whether or not an object in the moving image is in the recognition target category based on the integrated score.
  24.  前記請求項23に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記統合スコア算出機能として、前記n枚を時系列上隣接するm枚(mは2以上で全静止画像数以下の自然数)に設定して、前記複数の静止画像のうち少なくとも時系列上隣接するm枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to claim 23,
    In the computer, as the integrated score calculation function, the n images are set to m adjacent in time series (m is a natural number equal to or greater than 2 and equal to or less than the total number of still images), and at least the time among the plurality of still images An object recognition program that executes a function of calculating an integrated score corresponding to a probability that an object in m images adjacent in a sequence is actually the recognition target category.
  25.  前記請求項23または24に記載の物体認識用プログラムにおいて、
     外部入力に応じて前記nの値を切り替えて設定する演算設定変更機能と共に、
     前記統合スコア算出機能を、前記演算設定変更機能で切り替え設定された前記nの値に応じて用いる演算関数を変更して前記統合スコアを算出する機能として前記コンピュータに実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to claim 23 or 24,
    Along with a calculation setting change function for switching and setting the value of n according to an external input,
    An object that causes the computer to execute the integrated score calculation function as a function of calculating the integrated score by changing a calculation function used according to the value of n switched and set by the calculation setting change function Recognition program.
  26.  前記請求項25に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記演算設定変更機能として、前記画像中の物体の位置に基づいて前記枚数を設定する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to claim 25,
    An object recognition program that causes the computer to execute a function of setting the number of sheets based on a position of an object in the image as the calculation setting change function.
  27.  前記請求項25に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記演算設定変更機能として、前記動画像の撮像された時間帯に基づいて前記枚数を設定する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to claim 25,
    An object recognition program that causes the computer to execute, as the calculation setting change function, a function of setting the number of sheets based on a time zone when the moving image is captured.
  28.  前記請求項23乃至27のいずれか一項に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記静止画認識機能として、前記動画像の一部を構成する各静止画像に対して画像中の物体が前記認識対象カテゴリであるか否か個別に認識する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to any one of claims 23 to 27,
    Causing the computer to execute, as the still image recognition function, a function for individually recognizing whether or not an object in the image is the recognition target category for each still image constituting a part of the moving image. A feature recognition program.
  29.  前記請求項23乃至28のいずれか一項に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記静止画認識機能として、前記判定機能により前記動画像中の物体が前記認識対象カテゴリでないと判定された場合、前記動画像を構成する各静止画像について画像中の物体が他のカテゴリであるか否か個別に認識し前記静止画認識スコアを算出する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to any one of claims 23 to 28,
    When the determination function determines that the object in the moving image is not the recognition target category as the still image recognition function, the computer includes an object in the image for each still image constituting the moving image. An object recognition program characterized by executing a function of individually recognizing whether it is a category or not and calculating the still image recognition score.
  30.  前記請求項23乃至29のいずれか一項に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記統合スコア算出機能として、前記複数の静止画像のうち少なくとも1枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to any one of claims 23 to 29,
    Causing the computer to execute, as the integrated score calculation function, a function of calculating an integrated score corresponding to a probability that an object in at least one of the plurality of still images is actually the recognition target category. A feature recognition program.
  31.  前記請求項23乃至29のいずれか一項に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記統合スコア算出機能として、前記複数の静止画像のうち少なくとも2枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to any one of claims 23 to 29,
    Causing the computer to execute, as the integrated score calculation function, a function of calculating an integrated score corresponding to a probability that an object in at least two of the plurality of still images is actually the recognition target category. A feature recognition program.
  32.  前記請求項24乃至29のいずれか一項に記載の物体認識用プログラムにおいて、
     前記コンピュータに、前記統合スコア算出機能として、前記複数の静止画像のうち少なくとも時系列上隣接する2枚の画像中の物体が実際に前記認識対象カテゴリである確率に対応する統合スコアを算出する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to any one of claims 24 to 29,
    A function of calculating an integrated score corresponding to a probability that an object in at least two images adjacent in time series among the plurality of still images is actually the recognition target category as the integrated score calculating function in the computer. An object recognition program characterized by causing
  33.  前記請求項23乃至29のいずれか一項に記載の物体認識用プログラムにおいて、
     前記コンピュータに、
     前記静止画確率算出機能として、前記静止画像中の物体が認識対象カテゴリである確率に代えて、前記静止画像中の物体が認識対象カテゴリであるという条件下で前記静止画認識機能で算出された静止画認識スコアが生起する確率を計算する機能を実行させ、
     前記統合スコア算出機能として、前記静止画確率算出機能により前記静止画像毎に算出された複数の確率値に基づいて、前記複数の静止画像のうち少なくともn枚(nは全静止画像数以下の自然数)で前記算出された静止画認識スコアが生起する確率に対応したスコアを前記統合スコアとして計算する機能を実行させることを特徴とする物体認識用プログラム。
    In the object recognition program according to any one of claims 23 to 29,
    In the computer,
    As the still image probability calculation function, instead of the probability that the object in the still image is in the recognition target category, the still image probability calculation function is calculated by the still image recognition function under the condition that the object in the still image is in the recognition target category. Execute the function to calculate the probability that the still image recognition score will occur,
    As the integrated score calculation function, based on a plurality of probability values calculated for each still image by the still image probability calculation function, at least n of the plurality of still images (n is a natural number equal to or less than the total number of still images) ) To execute a function of calculating a score corresponding to the probability that the calculated still image recognition score occurs as the integrated score.
PCT/JP2009/050126 2008-01-31 2009-01-08 Object recognition system, object recognition method, and object recognition program WO2009096208A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2008-021832 2008-01-31
JP2008021832 2008-01-31

Publications (1)

Publication Number Publication Date
WO2009096208A1 true WO2009096208A1 (en) 2009-08-06

Family

ID=40912561

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2009/050126 WO2009096208A1 (en) 2008-01-31 2009-01-08 Object recognition system, object recognition method, and object recognition program

Country Status (1)

Country Link
WO (1) WO2009096208A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012230671A (en) * 2011-04-22 2012-11-22 Mitsubishi Electric Corp Method for classifying objects in scene
JP5644773B2 (en) * 2009-11-25 2014-12-24 日本電気株式会社 Apparatus and method for collating face images
WO2020084684A1 (en) * 2018-10-23 2020-04-30 日本電気株式会社 Image recognition system, image recognition method, and image recognition program
WO2020194497A1 (en) * 2019-03-26 2020-10-01 日本電気株式会社 Information processing device, personal identification device, information processing method, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0620049A (en) * 1992-06-23 1994-01-28 Japan Radio Co Ltd Intruder identification system
JPH09330415A (en) * 1996-06-10 1997-12-22 Hitachi Ltd Picture monitoring method and system therefor
JP2004118359A (en) * 2002-09-24 2004-04-15 Toshiba Corp Figure recognizing device, figure recognizing method and passing controller
JP2004258931A (en) * 2003-02-25 2004-09-16 Matsushita Electric Works Ltd Image processing method, image processor, and image processing program
JP2005354578A (en) * 2004-06-14 2005-12-22 Denso Corp Object detection/tracking device
JP2006271657A (en) * 2005-03-29 2006-10-12 Namco Bandai Games Inc Program, information storage medium, and image pickup and display device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0620049A (en) * 1992-06-23 1994-01-28 Japan Radio Co Ltd Intruder identification system
JPH09330415A (en) * 1996-06-10 1997-12-22 Hitachi Ltd Picture monitoring method and system therefor
JP2004118359A (en) * 2002-09-24 2004-04-15 Toshiba Corp Figure recognizing device, figure recognizing method and passing controller
JP2004258931A (en) * 2003-02-25 2004-09-16 Matsushita Electric Works Ltd Image processing method, image processor, and image processing program
JP2005354578A (en) * 2004-06-14 2005-12-22 Denso Corp Object detection/tracking device
JP2006271657A (en) * 2005-03-29 2006-10-12 Namco Bandai Games Inc Program, information storage medium, and image pickup and display device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5644773B2 (en) * 2009-11-25 2014-12-24 日本電気株式会社 Apparatus and method for collating face images
JP2012230671A (en) * 2011-04-22 2012-11-22 Mitsubishi Electric Corp Method for classifying objects in scene
WO2020084684A1 (en) * 2018-10-23 2020-04-30 日本電気株式会社 Image recognition system, image recognition method, and image recognition program
JPWO2020084684A1 (en) * 2018-10-23 2021-09-02 日本電気株式会社 Image recognition system, image recognition method and image recognition program
WO2020194497A1 (en) * 2019-03-26 2020-10-01 日本電気株式会社 Information processing device, personal identification device, information processing method, and storage medium
JPWO2020194497A1 (en) * 2019-03-26 2021-12-02 日本電気株式会社 Information processing device, personal identification device, information processing method and storage medium
JP7248102B2 (en) 2019-03-26 2023-03-29 日本電気株式会社 Information processing device, personal identification device, information processing method and storage medium

Similar Documents

Publication Publication Date Title
US8989442B2 (en) Robust feature fusion for multi-view object tracking
US8218819B2 (en) Foreground object detection in a video surveillance system
JP5010905B2 (en) Face recognition device
CN111539265B (en) Method for detecting abnormal behavior in elevator car
JP4858612B2 (en) Object recognition system, object recognition method, and object recognition program
US20070230797A1 (en) Method, apparatus, and program for detecting sightlines
CN109033955B (en) Face tracking method and system
JP6185919B2 (en) Method and system for improving person counting by fusing human detection modality results
KR101558547B1 (en) Age Cognition Method that is powerful to change of Face Pose and System thereof
KR20220063256A (en) Method and device for controlling the cabin environment
JP2012190159A (en) Information processing device, information processing method, and program
JP7392488B2 (en) Recognition method, device, and image processing device for false detection of remains
WO2009096208A1 (en) Object recognition system, object recognition method, and object recognition program
US11494906B2 (en) Object detection device, object detection method, and program
WO2015037973A1 (en) A face identification method
Zhang et al. A novel efficient method for abnormal face detection in ATM
Guo et al. A fast algorithm face detection and head pose estimation for driver assistant system
JP2007510994A (en) Object tracking in video images
JP4455980B2 (en) Moving image processing method, moving image processing apparatus, moving image processing program, and recording medium recording the program
JP2018036870A (en) Image processing device, and program
CN115720664A (en) Object position estimating apparatus, object position estimating method, and recording medium
CN112541425B (en) Emotion detection method, emotion detection device, emotion detection medium and electronic equipment
US20230177716A1 (en) Information processing device, non-transitory computer-readable storage medium, and information processing method
Arunachalam et al. Automatic fast video object detection and tracking on video surveillance system
WO2011092848A1 (en) Object detection device and face detection device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09705708

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09705708

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP