WO2023176562A1 - Information processing apparatus, information processing method, and information processing program - Google Patents

Information processing apparatus, information processing method, and information processing program Download PDF

Info

Publication number
WO2023176562A1
WO2023176562A1 PCT/JP2023/008411 JP2023008411W WO2023176562A1 WO 2023176562 A1 WO2023176562 A1 WO 2023176562A1 JP 2023008411 W JP2023008411 W JP 2023008411W WO 2023176562 A1 WO2023176562 A1 WO 2023176562A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information processing
extraction
tracking target
statistical information
Prior art date
Application number
PCT/JP2023/008411
Other languages
French (fr)
Japanese (ja)
Inventor
雄二 永松
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2023176562A1 publication Critical patent/WO2023176562A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present disclosure relates to an information processing device, an information processing method, and an information processing program.
  • Patent Document 1 A technique for recognizing surrounding people and objects is disclosed in Patent Document 1, for example.
  • Patent Document 1 a rectangular filter is used when extracting a person or object from an acquired surrounding image.
  • a rectangular filter is used, a large amount of background images that become noise are included, and for example, the accuracy of identifying the same tracking target between frames (tracking accuracy) may be reduced. Therefore, it is desirable to provide an information processing device, an information processing method, and an information processing program that can suppress noise to a low level.
  • An information processing device includes an object detection section and an object image extraction section.
  • the object detection unit generates statistical information of a pixel region to be tracked included in the input image.
  • the object image extraction unit determines an extraction region to be tracked based on the statistical information, and extracts an object image from the input image as an image of the extraction region.
  • An information processing method includes the following two. (A1) Generating statistical information on the pixel area of the tracking target included in the input image (A2) Determining the extraction area of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction area to do
  • the information processing program causes a computer to execute the following two things.
  • (B1) Generating statistical information on the pixel area of the tracking target included in the input image
  • (B2) Determining the extraction area of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction area to do
  • an extraction region of a tracking target is determined based on statistical information of a pixel region of a tracking target included in an input image, and , the object image is extracted as the image of the extraction area.
  • the proportion of the background image included in the object image can be reduced compared to the case of extracting the tracking target image using a rectangular filter. It can be suppressed.
  • FIG. 1 is a diagram illustrating an example of functional blocks of an information processing system according to a first embodiment of the present disclosure.
  • FIG. 2 is a diagram illustrating a method for determining the same object using two object images.
  • FIG. 3 is a diagram illustrating a method of acquiring an object image from an input image.
  • FIG. 4 is a diagram illustrating an example of information processing in the information processing system.
  • FIG. 5 is a diagram showing an example in which an offset is provided in the extraction area.
  • FIG. 6 is a diagram showing an example of the t distribution.
  • FIG. 7 is a diagram illustrating a method for determining identical objects using a histogram.
  • FIG. 8 is a diagram illustrating a modified example of the functional blocks of the information processing system in FIG. 1.
  • FIG. 9 is a diagram illustrating an example of functional blocks of an information processing system according to the second embodiment of the present disclosure.
  • a pixel region corresponding to a tracking target is extracted for each frame, and the same tracking target is identified between frames using the degree of matching of pixel regions as an index.
  • a method is known in which a rectangular or specific shaped filter is swept over an image.
  • a filter is sometimes called a kernel.
  • FIG. 1 shows an example of functional blocks of an information processing system 1.
  • the information processing system 1 includes a sensor device section 10, an object detection section 20, a storage section 30, an object image extraction section 40, and an object tracking section 50.
  • the object detection section 20, object image extraction section 40, and object tracking section 50 are configured to include, for example, a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit).
  • the object detecting section 20, the object image extracting section 40, and the object tracking section 50 are configured to read information written in the program 33 by loading various programs (for example, a program 33 described later) stored in the storage section 30 into the CPU. Perform a series of steps.
  • the object detection section 20, object image extraction section 40, and object tracking section 50 may be configured with an MPU (Micro-Processing Unit) that executes the respective functions.
  • MPU Micro-Processing Unit
  • the sensor device section 10 includes, for example, a sensor element that recognizes the external environment and acquires environmental data corresponding to the recognized external environment.
  • the sensor element outputs the acquired environmental data to the object detection section 20.
  • the sensor element is, for example, an RGB camera, an RGB-D camera, a depth sensor, an infrared sensor, or an event camera.
  • the RGB camera is, for example, a single-application visible light image sensor that outputs an RGB image obtained by receiving visible light and converting it into an electrical signal.
  • the RGB-D camera is, for example, a binocular visible light image sensor, and outputs an RGB-D image (an RGB image and a distance image obtained from parallax).
  • the depth sensor is, for example, a ToF (Time of Flight) sensor or a Lider (Laser Imaging Detection and Ranging) sensor, and outputs a distance image obtained by measuring scattered light in response to pulsed laser irradiation.
  • the infrared sensor outputs an infrared image obtained by, for example, receiving infrared light and converting it into an electrical signal.
  • the event camera is, for example, a single-purpose visible light image sensor, and outputs a difference between RGB images (difference image) between frames.
  • the sensor device unit 10 outputs various images (for example, an RGB image, an RGB-D image, a distance image, an infrared image, or a difference image) obtained from the external environment as an input image Iin (see FIG. 2(A)). ).
  • the object detection unit 20 generates statistical information of the pixel region of the tracking target included in the input image Iin obtained by the sensor device unit 10 and type information of the tracking target.
  • the object detection unit 20 stores the generated statistical information and type information, and the input image Iin in the storage unit 30.
  • the statistical information generated by the object detection unit 20 includes the average position ( ⁇ x, ⁇ y) and variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy) of the pixel region to be tracked (see FIG. 2(B)).
  • the statistical information generated by the object detection section 20 is stored in the statistical information 31 in the storage section 30.
  • the average position ( ⁇ x, ⁇ y) is, for example, two-dimensional coordinates corresponding to the center position in the X-axis direction and the center position in the Y-axis direction in the pixel region of the tracking target included in the input image Iin.
  • the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy) are the variance-covariance values of the pixel region to be tracked included in the input image Iin.
  • ⁇ xx is the (1, 1) element in the variance-covariance matrix.
  • ⁇ yy is the (2,2) element in the variance-covariance matrix.
  • ⁇ xy is the (2,1), (1,2) element in the variance-covariance matrix.
  • the type information generated by the object detection unit 20 includes, for example, the name of the tracked object, such as a person or a car.
  • the name of the tracked object roughly indicates the shape, size, and other characteristics of the tracked object.
  • the type information generated by the object detection section 20 is stored in the type information 32 in the storage section 30.
  • the object detection unit 20 has a machine learning model such as a neural network.
  • This neural network uses, for example, a learning image (an image containing a tracking target) and statistical information (average position ( ⁇ x, ⁇ y) and variance/covariance value ( ⁇ xx , ⁇ xy, ⁇ yy)) and tracking target type information included in the learning image as teaching data.
  • this neural network calculates statistical information (average position ( ⁇ x, ⁇ y) and variance/covariance values ( ⁇ xx, ⁇ xy, ⁇ yy )) and the tracking target type information included in the input image Iin.
  • the object detection unit 20 uses a neural network to output statistical information (average position ( ⁇ x, ⁇ y) and variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy)) and tracking target type information from the input image Iin. do.
  • the storage unit 30 is, for example, a recording medium such as a semiconductor memory or a hard disk.
  • the storage unit 30 stores various programs (for example, a program 33).
  • the program 33 describes a series of procedures for realizing the respective functions of the object detection section 20, object image extraction section 40, and object tracking section 50.
  • the storage unit 30 stores various data (for example, input image Iin, object image Iob, statistical information 31, and type information 32) generated by executing the program 33.
  • the object image extraction unit 40 determines the extraction area PA of the tracking target based on the statistical information generated by the object detection unit 20 and the type information of the tracking target (see FIG. 2(C)). Specifically, the object image extraction unit 40 determines the extraction area PA based on the average position ( ⁇ x, ⁇ y), the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy), and the tracking target type information.
  • the extraction area PA has an elliptical shape.
  • the object image extraction unit 40 derives the radius of the ellipse based on the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy), and corrects the derived radius of the ellipse based on the tracking target type information.
  • the value derived in this way is the Mahalanobis distance.
  • the ellipse whose radius is the Mahalanobis distance has a spread corresponding to the pixel distribution of the tracking target (see FIG. 2C). Therefore, by using such an elliptical filter, it is possible to obtain an image (object image Iob) in which the background image that becomes noise is suppressed (see FIG. 2(D)).
  • the vehicle to be tracked can be surrounded by an ellipse. Therefore, for example, by determining an ellipse with a Mahalanobis distance of 3 as the extraction area PA, it is possible to cover approximately 99.7% of the pixel area to be tracked with the extraction area PA, and to suppress the introduction of noise.
  • the extraction area PA when the tracking target is a person, for example, if an ellipse with a Mahalanobis distance of 3 is determined as the extraction area PA, the extraction area PA will also cover the ends of the person's limbs, which may cause noise to be mixed in. There is a possibility that the amount will be large and cause problems in tracking the target. In this way, by changing the setting of the Mahalanobis distance depending on the type of the tracked object, it is possible to perform optimal object tracking for each tracked object.
  • the object image extraction unit 40 corrects the radius of the ellipse derived based on the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy) to be smaller.
  • the object image extraction unit 40 greatly corrects the radius of the ellipse derived based on the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy).
  • the object image extraction unit 40 outputs the value derived in this way as the ellipse radius (rx, ry) of the extraction area PA.
  • the derived ellipse radius (rx, ry) is stored in the statistical information 31 in the storage unit 30 in association with the identifier of the tracked object, along with the average position ( ⁇ x, ⁇ y) and variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy). Ru.
  • the object image extraction unit 40 further extracts an object image Iob from the input image Iin as an image of the extraction area PA (see FIG. 2(D)). That is, the object image extraction unit 40 extracts the object image Iob from the input image Iin based on the average position ( ⁇ x, ⁇ y) and the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy).
  • the object tracking unit 50 tracks a tracking target included in the input image Iin. In order to perform such tracking, the object tracking unit 50 identifies the same tracking target between frames. Specifically, the object tracking unit 50 determines whether the same tracking target is included in a plurality of input images Iin obtained at different times.
  • the object tracking unit 50 determines whether or not the two input images Iin(t) and Iin(t+1) include the same tracking target based on the degree of matching. As a result, if the same tracking target is included in the two input images Iin(t) and Iin(t+1), the object tracking unit 50, for example, selects an object image Iob(t+1) that includes the tracking target. and the average position ( ⁇ x(t+1), ⁇ y(t+1)) of the pixel area to be tracked are output to the outside.
  • FIG. 3 shows an example of information processing in the information processing system 1.
  • the sensor device section 10 acquires the input image Iin (step S101).
  • the object detection unit 20 generates statistical information of the pixel region of the tracking target included in the input image Iin obtained by the sensor device unit 10 and type information of the tracking target (step S102).
  • the object image extraction unit 40 determines the extraction area PA of the pixels of the tracking target based on the statistical information generated by the object detection unit 20 and the type information of the tracking target (step S103).
  • the object image extraction unit 40 extracts the object image Iob from the input image Iin as an image of the extraction area PA (step S104).
  • the object tracking unit 50 identifies the same tracking target between frames. Specifically, the object tracking unit 50 determines whether the same tracking target is included in a plurality of input images Iin obtained at different times (step S105). As a result, if the same tracking target is included in the plurality of input images Iin (step S105; Y), the object tracking unit 50 tracks the tracking target (step S106). The object tracking unit 50 outputs, for example, an object image Iob including the tracking target and the average position ( ⁇ x, ⁇ y) of the pixel region of the tracking target to the outside. On the other hand, if the same tracking target is not included in the plurality of input images Iin (step S105; N), the object tracking unit 50 ends tracking of the tracking target (step S107). For example, the object tracking unit 50 outputs an error code to the outside.
  • the extraction area PA of the tracking target is determined based on the statistical information of the pixel area of the tracking target included in the input image Iin, and the object image Iob is extracted from the input image Iin as an image of the extraction area PA. Ru.
  • the proportion of the background image included in the object image Iob can be reduced compared to the case of extracting the area of the tracking target using a rectangular filter. can be suppressed. As a result, noise can be kept low.
  • the radius of the extraction area PA (ellipse radius (rx, ry)) is determined based on the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy), and the average position ( ⁇ x, ⁇ y) and ellipse radius
  • the extraction area PA is determined based on (rx, ry).
  • the extraction area PA is determined based on the statistical information of the pixel area of the tracking target and the type information of the tracking target.
  • the ellipse radius (rx, ry) can be adjusted depending on the type of tracking target, and the proportion of the background image included in the object image Iob can be effectively suppressed. As a result, noise can be kept low.
  • statistical information and tracking target type information are generated from the input image Iin using a neural network.
  • the amount of calculation can be kept low compared to the case where the filter is swept.
  • the object image extraction unit 40 determines not only the radius of the extraction area PA but also the average position ( ⁇ x, An offset ⁇ of ⁇ y) may be determined.
  • facial information can be important information for tracking.
  • the entire face can be included in the object image Iob by moving the extraction area PA to a position above the average position ( ⁇ x, ⁇ y) by + ⁇ .
  • the position of the face can be predicted from the size of the ellipse radius (rx, ry)
  • the value of ⁇ can be determined based on the ellipse radius (rx, ry) of the ellipse.
  • the object image extraction unit 40 determines the offset ⁇ of the average position ( ⁇ x, ⁇ y) based on the size of the ellipse radius (rx, ry). good. By doing this, it is possible to perform tracking using a face image.
  • the object detection unit 20 corrects the generated variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy) using the t-distribution. ') may be derived.
  • the Mahalanobis distance represents the distribution of data when the pixel distribution of the tracking target is assumed to be a normal distribution.
  • the ellipse radius (rx, ry) may be derived using a distribution with a larger variance than the normal distribution. Specifically, for a tracking target with a small pixel area size, the variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy) obtained assuming that the pixel distribution of the tracking target is a normal distribution are calculated using the t distribution.
  • the ellipse radius (rx, ry) may be derived using the correction values ( ⁇ xx', ⁇ xy', ⁇ yy') obtained thereby.
  • the t distribution is, for example, a probability density function as shown in FIG.
  • This probability density function is a conditional probability density function in which the average position ( ⁇ x, ⁇ y) and variance-covariance values ( ⁇ xx, ⁇ xy, ⁇ yy), which are parameters of a normal distribution, are known.
  • This probability density function further has a property that as the degree of freedom ⁇ approaches infinity, it approaches a normal distribution.
  • the degree of freedom ⁇ is small, that is, when the number of pixels included in the pixel region of the tracking target is small, the pixel distribution of the tracking target has a gentle distribution that is a squashed normal distribution.
  • the object tracking unit 50 generates a histogram Hg for each object image Iob, and based on the generated histogram Hg, as shown in FIG. It may be determined whether the same tracking target is included in a plurality of input images Iin having different times.
  • the object tracking unit 50 calculates the average position ( ⁇ x(t), ⁇ y(t)) and ellipse radius (rx(t), ry(t)) at time t, and the time t+1.
  • the average position ( ⁇ x(t+1), ⁇ y(t+1)) and the ellipse radius (rx(t+1), ry(t+1)) are compared.
  • the object tracking unit 50 compares the histogram Hg(t) at time t and the histogram Hg(t+1) at time t+1, for example, as shown in FIG.
  • the object tracking unit 50 determines whether or not the two input images Iin(t) and Iin(t+1) include the same tracking target based on the degree of matching. As a result, if the same tracking target is included in the two input images Iin(t) and Iin(t+1), the object tracking unit 50, for example, selects an object image Iob(t+1) that includes the tracking target. and the average position ( ⁇ x(t+1), ⁇ y(t+1)) of the pixel area to be tracked are output to the outside. By using the histogram Hg for discrimination in this way, it becomes possible to perform the above discrimination with higher accuracy.
  • the information processing system 1 may further include a frame interpolation unit 60, for example, as shown in FIG.
  • the frame interpolation unit 60 includes, for example, a CPU and a GPU.
  • the frame interpolation unit 60 executes a series of procedures described in the program 33 by loading various programs (for example, the program 33) stored in the storage unit 30 into the CPU.
  • the frame interpolation unit 60 may be configured with an MPU that executes its functions.
  • the object detection unit 20 generates statistical information at a predetermined operating frequency.
  • the frame interpolation unit 60 performs frame interpolation of the statistical information generated by the object detection unit 20 using a Kalman filter.
  • the Kalman filter When the statistical information generated by the object detecting section 20 is input, the Kalman filter generates an estimated value of the statistical information at a frequency higher than the operating frequency of the object detecting section 20 based on the input statistical information.
  • the object detection unit 20 outputs the estimated value of the statistical information generated using the Kalman filter to the object image extraction unit 40.
  • the object image extraction unit 40 determines the extraction area PA of the tracking target based on the estimated value of the statistical information generated by the frame interpolation unit 60 and the tracking target type information generated by the object detection unit 20.
  • the operating frequency of the neural network can be reduced to 1/N.
  • the amount of calculation of the neural network can be reduced.
  • FIG. 9 shows an example of functional blocks of the information processing system 2.
  • the information processing system 2 includes an image DB (DataBase) generation unit instead of the object tracking unit 50 in the information processing system 2 according to the above embodiment and modifications A to D thereof. 70 and an image DB 80 are provided.
  • image DB DataBase
  • the image DB generation unit 70 stores the generated object image Iob in the image DB 80 every time the object image Iob is generated by the object image extraction unit 40.
  • the image DB 80 stores a large number of object images Iob.
  • the image DB 80 can be used, for example, as an image for learning a neural network (an image including a tracking target).
  • the object detection section 20, the storage section 30, the object image extraction section 40, and the object tracking section 50 may all be installed in a common information processing device.
  • the sensor device section 10, the object detection section 20, the storage section 30, the object image extraction section 40, and the object tracking section 50 may all be installed in a common information processing device.
  • the object detection section 20, the storage section 30, the object image extraction section 40, the image DB generation section 70, and the image DB 80 may all be installed in a common information processing device.
  • the sensor device section 10, the object detection section 20, the storage section 30, the object image extraction section 40, the image DB generation section 70, and the image DB 80 are all installed in a common information processing device. You can leave it there.
  • the present disclosure can take the following configuration.
  • an object detection unit that generates statistical information of a pixel region to be tracked included in the input image;
  • An information processing apparatus comprising: an object image extraction unit that determines an extraction area of the tracking target based on the statistical information and extracts an object image from the input image as an image of the extraction area.
  • the statistical information includes an average position and a variance-covariance value of the pixel region to be tracked,
  • the extraction area has an elliptical shape
  • the information processing device determines a radius of the extraction region based on the variance-covariance value, and determines the extraction region based on the average position and the radius.
  • the object detection unit derives a correction value by correcting the generated variance-covariance value using a t-distribution, The information processing device according to (2), wherein the object image extraction unit determines the radius of the extraction area based on the correction value.
  • the object detection unit generates the statistical information and the tracking target type information from the input image, The information processing device according to (1), wherein the object image extraction unit determines the extraction area based on the statistical information and the type information.
  • the statistical information includes an average position and a variance-covariance value of the pixel distribution of the tracking target,
  • the extraction area has an elliptical shape,
  • the object image extraction unit determines a radius of the extraction area and an offset of the average position based on the variance-covariance value and the type information, and determines the radius of the extraction area and the offset of the average position based on the average position, the radius, and the offset.
  • the information processing device according to (4), which determines an extraction area.
  • the object detection unit derives a correction value by correcting the generated variance-covariance value using a t-distribution, The information processing device according to (5), wherein the object image extraction unit determines the radius of the extraction area and the offset of the average position based on the correction value and the type information. (7) The information processing device according to any one of (1) to (3), wherein the object detection unit outputs the statistical information from the input image using a neural network. (8) The information processing device according to any one of (4) to (6), wherein the object detection unit outputs the statistical information and the type information from the input image using a neural network.
  • the standard object detection unit generates the statistical information at a predetermined operating frequency
  • the information processing device according to any one of (1) to (8), further comprising a frame interpolation unit that performs frame interpolation of the statistical information generated by the systematic object detection unit using a Kalman filter.
  • Information processing device (10) According to any one of (1) to (9), further comprising an object tracking unit that determines whether the same tracking target is included in the plurality of input images obtained at different times. information processing equipment.
  • Generating statistical information of a pixel region to be tracked included in an input image An information processing method comprising: determining an extraction region of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction region.
  • An information processing program that causes a computer to execute the following steps: determining an extraction region of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction region.
  • an extraction region of a tracking target is determined based on statistical information of a pixel region of a tracking target included in an input image, and , the object image is extracted as the image of the extraction area.
  • the proportion of the background image included in the object image can be reduced compared to the case of extracting the image of the tracking target using a rectangular filter. It can be suppressed. As a result, noise can be kept low.

Abstract

An information processing apparatus according to an embodiment of the present disclosure comprises an object detecting unit and an object image extracting unit. The object detecting unit generates statistical information relating to a pixel region to be tracked, included in an input image. The object image extracting unit determines an extraction region to be tracked, on the basis of the statistical information, and extracts an object image, as an image of the extraction region, from the input image.

Description

情報処理装置、情報処理方法および情報処理プログラムInformation processing device, information processing method, and information processing program
 本開示は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present disclosure relates to an information processing device, an information processing method, and an information processing program.
 近年、外部環境を認識し、認識された環境に応じて自律的に移動するロボットなどの移動体のニーズが高まっている。周囲の人物や物体を認識するための技術が、例えば、特許文献1に開示されている。 In recent years, there has been an increasing need for mobile objects such as robots that recognize the external environment and move autonomously according to the recognized environment. A technique for recognizing surrounding people and objects is disclosed in Patent Document 1, for example.
特開2010-220122号公報Japanese Patent Application Publication No. 2010-220122
 ところで、特許文献1には、取得した周囲の画像の中から人物や物体を抽出する際に、矩形のフィルタが用いられる。しかし、矩形のフィルタを用いた場合には、ノイズとなる背景画像が多く含まれてしまい、例えば、フレーム間で同一の追跡対象を特定する精度(追跡精度)が低下する可能性がある。従って、ノイズを低く抑えることの可能な情報処理装置、情報処理方法および情報処理プログラムを提供することが望ましい。 By the way, in Patent Document 1, a rectangular filter is used when extracting a person or object from an acquired surrounding image. However, when a rectangular filter is used, a large amount of background images that become noise are included, and for example, the accuracy of identifying the same tracking target between frames (tracking accuracy) may be reduced. Therefore, it is desirable to provide an information processing device, an information processing method, and an information processing program that can suppress noise to a low level.
 本開示の一実施の形態に係る情報処理装置は、オブジェクト検出部と、オブジェクト画像抽出部とを備えている。オブジェクト検出部は、入力画像に含まれる追跡対象の画素領域の統計情報を生成する。オブジェクト画像抽出部は、統計情報に基づいて追跡対象の抽出領域を決定し、入力画像から、抽出領域の画像としてオブジェクト画像を抽出する。 An information processing device according to an embodiment of the present disclosure includes an object detection section and an object image extraction section. The object detection unit generates statistical information of a pixel region to be tracked included in the input image. The object image extraction unit determines an extraction region to be tracked based on the statistical information, and extracts an object image from the input image as an image of the extraction region.
 本開示の一実施の形態に係る情報処理方法は、以下の2つを含む。
(A1)入力画像に含まれる追跡対象の画素領域の統計情報を生成すること
(A2)統計情報に基づいて追跡対象の抽出領域を決定し、入力画像から、抽出領域の画像としてオブジェクト画像を抽出すること
An information processing method according to an embodiment of the present disclosure includes the following two.
(A1) Generating statistical information on the pixel area of the tracking target included in the input image (A2) Determining the extraction area of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction area to do
 本開示の一実施の形態に係る情報処理プログラムは、以下の2つをコンピュータに実行させる。
(B1)入力画像に含まれる追跡対象の画素領域の統計情報を生成すること
(B2)統計情報に基づいて追跡対象の抽出領域を決定し、入力画像から、抽出領域の画像としてオブジェクト画像を抽出すること
The information processing program according to an embodiment of the present disclosure causes a computer to execute the following two things.
(B1) Generating statistical information on the pixel area of the tracking target included in the input image (B2) Determining the extraction area of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction area to do
 本開示の一実施の形態に係る情報処理装置、情報処理方法および情報処理プログラムでは、入力画像に含まれる追跡対象の画素領域の統計情報に基づいて追跡対象の抽出領域が決定され、入力画像から、抽出領域の画像としてオブジェクト画像が抽出される。このように、追跡対象の統計情報に基づいて追跡対象の画像を抽出することにより、矩形のフィルタを用いて追跡対象の画像を抽出する場合と比べて、オブジェクト画像に含まれる背景画像の割合が抑えられる。 In an information processing device, an information processing method, and an information processing program according to an embodiment of the present disclosure, an extraction region of a tracking target is determined based on statistical information of a pixel region of a tracking target included in an input image, and , the object image is extracted as the image of the extraction area. In this way, by extracting the tracking target image based on the tracking target's statistical information, the proportion of the background image included in the object image can be reduced compared to the case of extracting the tracking target image using a rectangular filter. It can be suppressed.
図1は、本開示の第1の実施の形態に係る情報処理システムの機能ブロック例を表す図である。FIG. 1 is a diagram illustrating an example of functional blocks of an information processing system according to a first embodiment of the present disclosure. 図2は、2つのオブジェクト画像を用いた同一オブジェクトの判別方法について説明する図である。FIG. 2 is a diagram illustrating a method for determining the same object using two object images. 図3は、入力画像からオブジェクト画像を取得する方法について説明する図である。FIG. 3 is a diagram illustrating a method of acquiring an object image from an input image. 図4は、情報処理システムにおける情報処理の一例を表す図である。FIG. 4 is a diagram illustrating an example of information processing in the information processing system. 図5は、抽出領域にオフセットを設けた例を表す図である。FIG. 5 is a diagram showing an example in which an offset is provided in the extraction area. 図6は、t分布の一例を表す図である。FIG. 6 is a diagram showing an example of the t distribution. 図7は、ヒストグラムを用いた同一オブジェクトの判別方法について説明する図である。FIG. 7 is a diagram illustrating a method for determining identical objects using a histogram. 図8は、図1の情報処理システムの機能ブロックの一変形例を表す図である。FIG. 8 is a diagram illustrating a modified example of the functional blocks of the information processing system in FIG. 1. 図9は、本開示の第2の実施の形態に係る情報処理システムの機能ブロック例を表す図である。FIG. 9 is a diagram illustrating an example of functional blocks of an information processing system according to the second embodiment of the present disclosure.
 以下、本開示を実施するための形態について、図面を参照して詳細に説明する。なお、説明は以下の順序で行う。
 
1.背景
2.第1の実施の形態(図1~図4)
3.変形例(図5~図7)
4.第2の実施の形態(図8)
 
Hereinafter, embodiments for carrying out the present disclosure will be described in detail with reference to the drawings. Note that the explanation will be given in the following order.

1. Background 2. First embodiment (FIGS. 1 to 4)
3. Modifications (Figures 5 to 7)
4. Second embodiment (FIG. 8)
<1.背景>
 オフィスや町中等を走行する自律移動型のロボットのニーズが高まっている。ロボットの周囲の人物や物体を追跡することで、行動予測による衝突回避や、追跡対象の追従等の行動が可能となる。ロボットはバッテリーで駆動されるので、低消費電力でありながら、高精度な追跡手法が求められる。
<1. Background>
There is a growing need for autonomous mobile robots that can move around offices and towns. By tracking people and objects around the robot, it becomes possible to perform actions such as avoiding collisions and following the tracked target by predicting their actions. Since the robot is powered by batteries, it requires a highly accurate tracking method with low power consumption.
 従来から、周囲の画像を取得して周囲の人物や物体を追跡する手法が知られている。このような手法では、追跡対象に該当する画素領域がフレームごとに抽出され、画素領域の一致度等を指標としてフレーム間で同一の追跡対象が特定される。ここで、追跡対象の画素領域を抽出する際に、画像に対して矩形や特定の形状のフィルタをスイープさせる方法が知られている。フィルタは、カーネルと呼ばれる場合もある。 Conventionally, methods of acquiring surrounding images and tracking surrounding people and objects have been known. In such a method, a pixel region corresponding to a tracking target is extracted for each frame, and the same tracking target is identified between frames using the degree of matching of pixel regions as an index. Here, when extracting a pixel region to be tracked, a method is known in which a rectangular or specific shaped filter is swept over an image. A filter is sometimes called a kernel.
 しかし、この方法には、様々な課題が存在する。まず、画像中の追跡対象のサイズは未知なので、画像に対して様々なサイズのフィルタをスイープさせることが必要となる。そのため、演算量が多くなるという問題がある。また、手や足などの部分的な箇所についても検出を行う際には、それらに対応するフィルタをスイープさせることが必要となる。また、画像中の人物や物体の数は未知であるので、最もマッチした画素領域を抽出するようにした場合には、複数の人物の検出が困難になる。一方で、マッチングの閾値を設けることで、複数の人物の検出を試みた場合には、同一人物を複数回、検出してしまう可能性がある。 However, there are various problems with this method. First, since the size of the tracking target in the image is unknown, it is necessary to sweep filters of various sizes over the image. Therefore, there is a problem that the amount of calculation increases. Furthermore, when detecting partial locations such as hands and feet, it is necessary to sweep the filters corresponding to those locations. Furthermore, since the number of people or objects in an image is unknown, if the most matching pixel region is extracted, it will be difficult to detect multiple people. On the other hand, if a matching threshold is provided and multiple people are attempted to be detected, there is a possibility that the same person will be detected multiple times.
 そこで、スイープの代わりにニューラルネットワークを用いて追跡対象の画素領域を抽出することが考えられる(例えば、特許文献1参照)。この場合、スイープの必要がないので、演算量を減らすことができる。また、画像中の人物や物体の数が未知であった場合であっても、追跡対象を高精度に特定することが可能である。手や足などの部分的な箇所についても検出可能であり、オクルージョンに対応することも可能である。しかし、ニューラルネットワークを用いた追跡対象の抽出において、追跡対象を囲む矩形のフィルタを用いた場合には、ノイズとなる背景画像が多く含まれてしまい、パターンマッチング精度が低下する可能性がある。 Therefore, it is conceivable to extract the pixel region to be tracked using a neural network instead of the sweep (for example, see Patent Document 1). In this case, since there is no need for sweeping, the amount of calculation can be reduced. Further, even if the number of people or objects in an image is unknown, it is possible to identify the tracking target with high accuracy. It is also possible to detect partial areas such as hands and feet, and it is also possible to deal with occlusion. However, when extracting a tracking target using a neural network, if a rectangular filter surrounding the tracking target is used, a large amount of background images that become noise will be included, which may reduce pattern matching accuracy.
 ニューラルネットワークを用いた追跡対象の抽出において、例えば、人物の形状に合わせた特定の形状のフィルタを用いた場合には、ノイズとなる背景画像がほとんど含まれず、パターンマッチング精度が向上する可能性がある。しかし、ニューラルネットワークでは、画素単位で推論が行われるので、そのような特定の形状のフィルタを用いた場合には、演算量の多いネットワークが必要になる。このように、従来の方法では、演算量およびノイズ量の双方を低く抑えることが難しいという問題があった。そこで、本願発明者は、演算量およびノイズ量の双方を低く抑えることの可能な方策について、以下に説明する。 When extracting a tracking target using a neural network, for example, if a filter with a specific shape that matches the shape of a person is used, almost no background image that becomes noise will be included, and pattern matching accuracy may be improved. be. However, since neural networks perform inference on a pixel-by-pixel basis, when a filter with such a specific shape is used, a network with a large amount of calculation is required. As described above, the conventional method has a problem in that it is difficult to keep both the amount of calculations and the amount of noise low. Therefore, the inventor of the present application will explain below a measure that can reduce both the amount of calculation and the amount of noise.
<1.第1の実施の形態>
[構成]
 本開示の第1の実施の形態に係る情報処理システム1について説明する。図1は、情報処理システム1の機能ブロック例を表したものである。情報処理システム1は、例えば、図1に示したように、センサデバイス部10、オブジェクト検出部20、記憶部30、オブジェクト画像抽出部40およびオブジェクト追跡部50を備えている。
<1. First embodiment>
[composition]
An information processing system 1 according to a first embodiment of the present disclosure will be described. FIG. 1 shows an example of functional blocks of an information processing system 1. As shown in FIG. For example, as shown in FIG. 1, the information processing system 1 includes a sensor device section 10, an object detection section 20, a storage section 30, an object image extraction section 40, and an object tracking section 50.
 オブジェクト検出部20、オブジェクト画像抽出部40およびオブジェクト追跡部50は、例えば、CPU(Central Processing Unit)およびGPU(Graphics Processing Unit)を含んで構成されている。オブジェクト検出部20、オブジェクト画像抽出部40およびオブジェクト追跡部50は、例えば、記憶部30に記憶された各種プログラム(例えば、後述のプログラム33)がCPUにロードされることにより、プログラム33に記述された一連の手順を実行する。なお、オブジェクト検出部20、オブジェクト画像抽出部40およびオブジェクト追跡部50が、それぞれの機能を実行するMPU(Micro-Processing Unit)で構成されていてもよい。 The object detection section 20, object image extraction section 40, and object tracking section 50 are configured to include, for example, a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The object detecting section 20, the object image extracting section 40, and the object tracking section 50, for example, are configured to read information written in the program 33 by loading various programs (for example, a program 33 described later) stored in the storage section 30 into the CPU. Perform a series of steps. Note that the object detection section 20, object image extraction section 40, and object tracking section 50 may be configured with an MPU (Micro-Processing Unit) that executes the respective functions.
 センサデバイス部10は、例えば、外部環境を認識し、認識した外部環境に対応する環境データを取得するセンサ素子を有している。センサ素子は、取得した環境データをオブジェクト検出部20に出力する。センサ素子は、例えば、RGBカメラ、RGB-Dカメラ、深度センサ、赤外線センサ、イベントカメラである。 The sensor device section 10 includes, for example, a sensor element that recognizes the external environment and acquires environmental data corresponding to the recognized external environment. The sensor element outputs the acquired environmental data to the object detection section 20. The sensor element is, for example, an RGB camera, an RGB-D camera, a depth sensor, an infrared sensor, or an event camera.
 RGBカメラは、例えば、単願の可視光画像センサであり、可視光を受光し電気信号に変換することにより得られるRGB画像を出力する。RGB-Dカメラは、例えば、双眼の可視光画像センサであり、RGB-D画像(RGB画像と、視差から得られる距離画像と)を出力する。深度センサは、例えば、ToF(Time of Flight)センサ、または、Lider(Laser Imaging Detection and Ranging)であり、パルス状のレーザー照射に対する散乱光を測定することにより得られる距離画像を出力する。赤外線センサは、例えば、赤外線を受光し電気信号に変換することにより得られた赤外線画像を出力する。イベントカメラは、例えば、単願の可視光画像センサであり、フレーム間のRGB画像の差分(差分画像)を出力する。センサデバイス部10は、例えば、外部環境から得られた各種画像(例えば、RGB画像、RGB-D画像、距離画像、赤外線画像または差分画像)を入力画像Iinとして出力する(図2(A)参照)。 The RGB camera is, for example, a single-application visible light image sensor that outputs an RGB image obtained by receiving visible light and converting it into an electrical signal. The RGB-D camera is, for example, a binocular visible light image sensor, and outputs an RGB-D image (an RGB image and a distance image obtained from parallax). The depth sensor is, for example, a ToF (Time of Flight) sensor or a Lider (Laser Imaging Detection and Ranging) sensor, and outputs a distance image obtained by measuring scattered light in response to pulsed laser irradiation. The infrared sensor outputs an infrared image obtained by, for example, receiving infrared light and converting it into an electrical signal. The event camera is, for example, a single-purpose visible light image sensor, and outputs a difference between RGB images (difference image) between frames. The sensor device unit 10 outputs various images (for example, an RGB image, an RGB-D image, a distance image, an infrared image, or a difference image) obtained from the external environment as an input image Iin (see FIG. 2(A)). ).
 オブジェクト検出部20は、センサデバイス部10で得られた入力画像Iinに含まれる追跡対象の画素領域の統計情報と、追跡対象の種別情報とを生成する。オブジェクト検出部20は、生成した統計情報および種別情報と、入力画像Iinとを記憶部30に格納する。オブジェクト検出部20で生成される統計情報には、追跡対象の画素領域の平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy)が含まれる(図2(B)参照)。オブジェクト検出部20で生成された統計情報は、記憶部30における統計情報31に格納される。 The object detection unit 20 generates statistical information of the pixel region of the tracking target included in the input image Iin obtained by the sensor device unit 10 and type information of the tracking target. The object detection unit 20 stores the generated statistical information and type information, and the input image Iin in the storage unit 30. The statistical information generated by the object detection unit 20 includes the average position (μx, μy) and variance-covariance values (ρxx, ρxy, ρyy) of the pixel region to be tracked (see FIG. 2(B)). The statistical information generated by the object detection section 20 is stored in the statistical information 31 in the storage section 30.
 平均位置(μx,μy)は、例えば、入力画像Iinに含まれる追跡対象の画素領域におけるX軸方向の中央位置およびY軸方向の中央位置に対応する2次元座標である。分散共分散値(ρxx,ρxy,ρyy)は、入力画像Iinに含まれる追跡対象の画素領域の分散共分散値である。ρxxは、分散共分散行列における(1,1)の要素である。ρyyは、分散共分散行列における(2,2)の要素である。ρxyは、分散共分散行列における(2,1)、(1,2)の要素である。 The average position (μx, μy) is, for example, two-dimensional coordinates corresponding to the center position in the X-axis direction and the center position in the Y-axis direction in the pixel region of the tracking target included in the input image Iin. The variance-covariance values (ρxx, ρxy, ρyy) are the variance-covariance values of the pixel region to be tracked included in the input image Iin. ρxx is the (1, 1) element in the variance-covariance matrix. ρyy is the (2,2) element in the variance-covariance matrix. ρxy is the (2,1), (1,2) element in the variance-covariance matrix.
 オブジェクト検出部20で生成される種別情報には、例えば、人や車といった、追跡対象の名称が含まれる。追跡対象の名称は、追跡対象の形状や大きさなどの特徴を大まかに示す。オブジェクト検出部20で生成された種別情報は、記憶部30における種別情報32に格納される。 The type information generated by the object detection unit 20 includes, for example, the name of the tracked object, such as a person or a car. The name of the tracked object roughly indicates the shape, size, and other characteristics of the tracked object. The type information generated by the object detection section 20 is stored in the type information 32 in the storage section 30.
 オブジェクト検出部20は、ニューラルネットワークなどの機械学習モデルを有している。このニューラルネットワークは、例えば、学習用の画像(追跡対象を含む画像)と、学習用の画像に含まれる追跡対象の画素領域の統計情報(平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy))と、学習用の画像に含まれる追跡対象の種別情報とを教示データとして学習した学習モデルである。このニューラルネットワークは、入力画像Iinが入力されると、入力された入力画像Iinに含まれる追跡対象の画素領域の統計情報(平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy))と、入力された入力画像Iinに含まれる追跡対象の種別情報とを出力する。オブジェクト検出部20は、ニューラルネットワークを用いて、入力画像Iinから、統計情報(平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy))と、追跡対象の種別情報とを出力する。 The object detection unit 20 has a machine learning model such as a neural network. This neural network uses, for example, a learning image (an image containing a tracking target) and statistical information (average position (μx, μy) and variance/covariance value (ρxx , ρxy, ρyy)) and tracking target type information included in the learning image as teaching data. When input image Iin is input, this neural network calculates statistical information (average position (μx, μy) and variance/covariance values (ρxx, ρxy, ρyy )) and the tracking target type information included in the input image Iin. The object detection unit 20 uses a neural network to output statistical information (average position (μx, μy) and variance-covariance values (ρxx, ρxy, ρyy)) and tracking target type information from the input image Iin. do.
 記憶部30は、例えば、半導体メモリやハードディスク等の記録媒体である。記憶部30には、各種プログラム(例えばプログラム33)が格納されている。プログラム33は、オブジェクト検出部20、オブジェクト画像抽出部40およびオブジェクト追跡部50のそれぞれの機能を実現するための一連の手順が記述されたものである。記憶部30には、プログラム33が実行されることにより生成される各種データ(例えば、入力画像Iin、オブジェクト画像Iob、統計情報31および種別情報32)が格納される。 The storage unit 30 is, for example, a recording medium such as a semiconductor memory or a hard disk. The storage unit 30 stores various programs (for example, a program 33). The program 33 describes a series of procedures for realizing the respective functions of the object detection section 20, object image extraction section 40, and object tracking section 50. The storage unit 30 stores various data (for example, input image Iin, object image Iob, statistical information 31, and type information 32) generated by executing the program 33.
 オブジェクト画像抽出部40は、オブジェクト検出部20で生成された統計情報および追跡対象の種別情報に基づいて追跡対象の抽出領域PAを決定する(図2(C)参照)。具体的には、オブジェクト画像抽出部40は、平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy)と、追跡対象の種別情報とに基づいて、抽出領域PAを決定する。抽出領域PAは、楕円形状である。オブジェクト画像抽出部40は、その楕円の半径を、分散共分散値(ρxx,ρxy,ρyy)に基づいて導出するとともに、導出した楕円の半径を、追跡対象の種別情報に基づいて補正する。このようにして導出した値(半径(rx,ry))は、マハラノビス距離である。マハラノビス距離を半径とする楕円は、追跡対象の画素分布に応じた広がりを有している(図2(C)参照)。そのため、このような楕円形状のフィルタを用いることで、ノイズとなる背景画像の混入を抑えた画像(オブジェクト画像Iob)を取得することが可能である(図2(D)参照)。 The object image extraction unit 40 determines the extraction area PA of the tracking target based on the statistical information generated by the object detection unit 20 and the type information of the tracking target (see FIG. 2(C)). Specifically, the object image extraction unit 40 determines the extraction area PA based on the average position (μx, μy), the variance-covariance values (ρxx, ρxy, ρyy), and the tracking target type information. The extraction area PA has an elliptical shape. The object image extraction unit 40 derives the radius of the ellipse based on the variance-covariance values (ρxx, ρxy, ρyy), and corrects the derived radius of the ellipse based on the tracking target type information. The value derived in this way (radius (rx, ry)) is the Mahalanobis distance. The ellipse whose radius is the Mahalanobis distance has a spread corresponding to the pixel distribution of the tracking target (see FIG. 2C). Therefore, by using such an elliptical filter, it is possible to obtain an image (object image Iob) in which the background image that becomes noise is suppressed (see FIG. 2(D)).
 例えば、追跡対象が車である場合、車は全体的に丸みを帯びた卵型の形状となっている。そのため、車をどの方向から見たとしても、追跡対象としての車を楕円で囲むことができる。そのため、例えば、マハラノビス距離を3とする楕円を抽出領域PAとして決定することにより、追跡対象の画素領域のおおよそ99.7%を抽出領域PAでカバーするとともに、ノイズの混入を抑えることができる。一方で、追跡対象が人である場合に、例えば、マハラノビス距離を3とする楕円を抽出領域PAとして決定したときには、抽出領域PAが人の手足の端部もカバーすることになり、ノイズの混入量が多くなり、対象追跡に支障をきたす可能性がある。このように、追跡対象物の種別によってマハラノビス距離の設定を変えることで追跡対象ごとに最適な対象追跡を行うことが可能となる。 For example, if the tracked object is a car, the car has an overall rounded oval shape. Therefore, no matter which direction the vehicle is viewed from, the vehicle to be tracked can be surrounded by an ellipse. Therefore, for example, by determining an ellipse with a Mahalanobis distance of 3 as the extraction area PA, it is possible to cover approximately 99.7% of the pixel area to be tracked with the extraction area PA, and to suppress the introduction of noise. On the other hand, when the tracking target is a person, for example, if an ellipse with a Mahalanobis distance of 3 is determined as the extraction area PA, the extraction area PA will also cover the ends of the person's limbs, which may cause noise to be mixed in. There is a possibility that the amount will be large and cause problems in tracking the target. In this way, by changing the setting of the Mahalanobis distance depending on the type of the tracked object, it is possible to perform optimal object tracking for each tracked object.
 オブジェクト画像抽出部40は、例えば、追跡対象の種別情報が人である場合、分散共分散値(ρxx,ρxy,ρyy)に基づいて導出した楕円の半径を小さく補正する。オブジェクト画像抽出部40は、例えば、追跡対象の種別情報が乗用車である場合、分散共分散値(ρxx,ρxy,ρyy)に基づいて導出した楕円の半径を大きく補正する。オブジェクト画像抽出部40は、そのようにして導出した値を抽出領域PAの楕円半径(rx,ry)として出力する。導出した楕円半径(rx,ry)は、平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy)とともに、追跡対象の識別子と関連付けて、記憶部30における統計情報31に格納される。 For example, when the tracking target type information is a person, the object image extraction unit 40 corrects the radius of the ellipse derived based on the variance-covariance values (ρxx, ρxy, ρyy) to be smaller. For example, when the tracking target type information is a passenger car, the object image extraction unit 40 greatly corrects the radius of the ellipse derived based on the variance-covariance values (ρxx, ρxy, ρyy). The object image extraction unit 40 outputs the value derived in this way as the ellipse radius (rx, ry) of the extraction area PA. The derived ellipse radius (rx, ry) is stored in the statistical information 31 in the storage unit 30 in association with the identifier of the tracked object, along with the average position (μx, μy) and variance-covariance values (ρxx, ρxy, ρyy). Ru.
 オブジェクト画像抽出部40は、さらに、入力画像Iinから、抽出領域PAの画像としてオブジェクト画像Iobを抽出する(図2(D)参照)。つまり、オブジェクト画像抽出部40は、平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy)に基づいて入力画像Iinからオブジェクト画像Iobを抽出する。 The object image extraction unit 40 further extracts an object image Iob from the input image Iin as an image of the extraction area PA (see FIG. 2(D)). That is, the object image extraction unit 40 extracts the object image Iob from the input image Iin based on the average position (μx, μy) and the variance-covariance values (ρxx, ρxy, ρyy).
 オブジェクト追跡部50は、入力画像Iinに含まれる追跡対象を追跡する。オブジェクト追跡部50は、そのような追跡を行うために、フレーム間で同一の追跡対象を特定する。具体的には、オブジェクト追跡部50は、得られた時刻の互いに異なる複数の入力画像Iinに同一の追跡対象が含まれているか否かの判別を行う。 The object tracking unit 50 tracks a tracking target included in the input image Iin. In order to perform such tracking, the object tracking unit 50 identifies the same tracking target between frames. Specifically, the object tracking unit 50 determines whether the same tracking target is included in a plurality of input images Iin obtained at different times.
 オブジェクト追跡部50は、例えば、図3に示したように、時刻tの入力画像Iin(t)から得られた平均位置(μx(t),μy(t))および楕円半径(rx(t),ry(t))と、時刻t+1の入力画像Iin(t+1)から得られた平均位置(μx(t+1),μy(t+1))および楕円半径(rx(t+1),ry(t+1))とを対比する。そして、オブジェクト追跡部50は、これらの一致度に基づいて、2枚の入力画像Iin(t),Iin(t+1)に同一の追跡対象が含まれているか否かの判別を行う。その結果、2枚の入力画像Iin(t),Iin(t+1)に同一の追跡対象が含まれている場合には、オブジェクト追跡部50は、例えば、その追跡対象を含むオブジェクト画像Iob(t+1)と、その追跡対象の画素領域の平均位置(μx(t+1),μy(t+1))とを外部に出力する。 For example, as shown in FIG. , ry(t)), the average position (μx(t+1), μy(t+1)) and ellipse radius (rx(t+1), ry(t+1)) obtained from the input image Iin(t+1) at time t+1. Contrast. Then, the object tracking unit 50 determines whether or not the two input images Iin(t) and Iin(t+1) include the same tracking target based on the degree of matching. As a result, if the same tracking target is included in the two input images Iin(t) and Iin(t+1), the object tracking unit 50, for example, selects an object image Iob(t+1) that includes the tracking target. and the average position (μx(t+1), μy(t+1)) of the pixel area to be tracked are output to the outside.
[動作]
 次に、情報処理システム1における情報処理について説明する。図3は、情報処理システム1における情報処理の一例を表したものである。
[motion]
Next, information processing in the information processing system 1 will be explained. FIG. 3 shows an example of information processing in the information processing system 1.
 まず、センサデバイス部10が、入力画像Iinを取得する(ステップS101)。次に、オブジェクト検出部20は、センサデバイス部10で得られた入力画像Iinに含まれる追跡対象の画素領域の統計情報と、追跡対象の種別情報とを生成する(ステップS102)。次に、オブジェクト画像抽出部40は、オブジェクト検出部20で生成された統計情報と、追跡対象の種別情報とに基づいて追跡対象の画素の抽出領域PAを決定する(ステップS103)。次に、オブジェクト画像抽出部40は、入力画像Iinから、抽出領域PAの画像としてオブジェクト画像Iobを抽出する(ステップS104)。 First, the sensor device section 10 acquires the input image Iin (step S101). Next, the object detection unit 20 generates statistical information of the pixel region of the tracking target included in the input image Iin obtained by the sensor device unit 10 and type information of the tracking target (step S102). Next, the object image extraction unit 40 determines the extraction area PA of the pixels of the tracking target based on the statistical information generated by the object detection unit 20 and the type information of the tracking target (step S103). Next, the object image extraction unit 40 extracts the object image Iob from the input image Iin as an image of the extraction area PA (step S104).
 次に、オブジェクト追跡部50は、フレーム間で同一の追跡対象を特定する。具体的には、オブジェクト追跡部50は、得られた時刻の互いに異なる複数の入力画像Iinに同一の追跡対象が含まれているか否かの判別を行う(ステップS105)。その結果、複数の入力画像Iinに同一の追跡対象が含まれている場合(ステップS105;Y)には、オブジェクト追跡部50は、その追跡対象を追跡する(ステップS106)。オブジェクト追跡部50は、例えば、その追跡対象を含むオブジェクト画像Iobと、その追跡対象の画素領域の平均位置(μx,μy)とを外部に出力する。一方、複数の入力画像Iinに同一の追跡対象が含まれていなかった場合(ステップS105;N)には、オブジェクト追跡部50は、その追跡対象の追跡を終了する(ステップS107)。オブジェクト追跡部50は、例えば、エラーコードを外部に出力する。 Next, the object tracking unit 50 identifies the same tracking target between frames. Specifically, the object tracking unit 50 determines whether the same tracking target is included in a plurality of input images Iin obtained at different times (step S105). As a result, if the same tracking target is included in the plurality of input images Iin (step S105; Y), the object tracking unit 50 tracks the tracking target (step S106). The object tracking unit 50 outputs, for example, an object image Iob including the tracking target and the average position (μx, μy) of the pixel region of the tracking target to the outside. On the other hand, if the same tracking target is not included in the plurality of input images Iin (step S105; N), the object tracking unit 50 ends tracking of the tracking target (step S107). For example, the object tracking unit 50 outputs an error code to the outside.
[効果]
 次に、情報処理システム1の効果について説明する。
[effect]
Next, the effects of the information processing system 1 will be explained.
 本実施の形態では、入力画像Iinに含まれる追跡対象の画素領域の統計情報に基づいて追跡対象の抽出領域PAが決定され、入力画像Iinから、抽出領域PAの画像としてオブジェクト画像Iobが抽出される。このように、追跡対象の統計情報に基づいて追跡対象の画像を抽出することにより、矩形のフィルタを用いて追跡対象の領域を抽出する場合と比べて、オブジェクト画像Iobに含まれる背景画像の割合が抑えられる。その結果、ノイズを低く抑えることができる。 In this embodiment, the extraction area PA of the tracking target is determined based on the statistical information of the pixel area of the tracking target included in the input image Iin, and the object image Iob is extracted from the input image Iin as an image of the extraction area PA. Ru. In this way, by extracting the image of the tracking target based on the statistical information of the tracking target, the proportion of the background image included in the object image Iob can be reduced compared to the case of extracting the area of the tracking target using a rectangular filter. can be suppressed. As a result, noise can be kept low.
 また、本実施の形態では、分散共分散値(ρxx,ρxy,ρyy)に基づいて抽出領域PAの半径(楕円半径(rx,ry))が決定され、平均位置(μx,μy)および楕円半径(rx,ry)に基づいて抽出領域PAが決定される。このように、追跡対象の画素領域の分散共分散値に基づいて抽出領域PAを決定することにより、矩形のフィルタを用いる場合と比べて、オブジェクト画像Iobに含まれる背景画像の割合が抑えられる。その結果、ノイズを低く抑えることができる。 Furthermore, in this embodiment, the radius of the extraction area PA (ellipse radius (rx, ry)) is determined based on the variance-covariance values (ρxx, ρxy, ρyy), and the average position (μx, μy) and ellipse radius The extraction area PA is determined based on (rx, ry). In this way, by determining the extraction area PA based on the variance-covariance values of the pixel area to be tracked, the proportion of the background image included in the object image Iob can be suppressed compared to the case where a rectangular filter is used. As a result, noise can be kept low.
 また、本実施の形態では、追跡対象の画素領域の統計情報と、追跡対象の種別情報とに基づいて抽出領域PAが決定される。これにより、例えば、追跡対象の種別に応じて楕円半径(rx,ry)を調整することができ、オブジェクト画像Iobに含まれる背景画像の割合を効果的に抑えることができる。その結果、ノイズを低く抑えることができる。 Furthermore, in this embodiment, the extraction area PA is determined based on the statistical information of the pixel area of the tracking target and the type information of the tracking target. Thereby, for example, the ellipse radius (rx, ry) can be adjusted depending on the type of tracking target, and the proportion of the background image included in the object image Iob can be effectively suppressed. As a result, noise can be kept low.
 また、本実施の形態では、ニューラルネットワークを用いて、入力画像Iinから統計情報と、追跡対象の種別情報とが生成される。これにより、フィルタをスイープさせる場合と比べて、演算量を低く抑えることができる。 Furthermore, in this embodiment, statistical information and tracking target type information are generated from the input image Iin using a neural network. Thereby, the amount of calculation can be kept low compared to the case where the filter is swept.
 また、本実施の形態では、得られた時刻の互いに異なる複数の入力画像Iinに同一の追跡対象が含まれているか否かの判別が行われる。これにより、フレーム間で同一の追跡対象を特定したときには、入力画像Iinに含まれる追跡対象を追跡することができる。 Furthermore, in this embodiment, it is determined whether the same tracking target is included in a plurality of input images Iin obtained at different times. Thereby, when the same tracking target is identified between frames, the tracking target included in the input image Iin can be tracked.
<2.変形例>
 以下、上記実施の形態に係る情報処理システム1の変形例について説明する。以下の変形例では、上記実施の形態と共通の構成に同一の符号を付して説明する。
<2. Modified example>
Hereinafter, a modification of the information processing system 1 according to the above embodiment will be described. In the following modified examples, the same components as those in the above embodiment will be described with the same reference numerals.
[変形例A]
 上記実施の形態において、オブジェクト画像抽出部40は、分散共分散値(ρxx,ρxy,ρyy)と、追跡対象の種別情報とに基づいて、抽出領域PAの半径だけでなく、平均位置(μx,μy)のオフセットαを決定してもよい。
[Modification A]
In the embodiment described above, the object image extraction unit 40 determines not only the radius of the extraction area PA but also the average position (μx, An offset α of μy) may be determined.
 例えば、追跡対象が人である場合、顔の情報が追跡の重要な情報となり得る。例えば、街中の歩行者を追跡対象とするとき、平均位置(μx,μy)よりも+αだけ上の位置に抽出領域PAを移動することで、顔全体をオブジェクト画像Iobに含ませることができる。また、楕円半径(rx,ry)の大きさから、顔の位置を予測することができるので、αの値は、楕円の楕円半径(rx,ry)に基づいて決定することができる。そこで、オブジェクト画像抽出部40は、追跡対象の種別情報が人である場合に、楕円半径(rx,ry)の大きさに基づいて、平均位置(μx,μy)のオフセットαを決定してもよい。このようにすることで、顔の画像を利用した追跡などを行うことができる。 For example, if the tracking target is a person, facial information can be important information for tracking. For example, when a pedestrian in the city is to be tracked, the entire face can be included in the object image Iob by moving the extraction area PA to a position above the average position (μx, μy) by +α. Further, since the position of the face can be predicted from the size of the ellipse radius (rx, ry), the value of α can be determined based on the ellipse radius (rx, ry) of the ellipse. Therefore, when the tracking target type information is a person, the object image extraction unit 40 determines the offset α of the average position (μx, μy) based on the size of the ellipse radius (rx, ry). good. By doing this, it is possible to perform tracking using a face image.
[変形例B]
 上記実施の形態および変形例Aにおいて、オブジェクト検出部20は、生成した分散共分散値(ρxx,ρxy,ρyy)を、t分布を用いて補正することにより補正値(ρxx’,ρxy’,ρyy’)を導出してもよい。
[Modification B]
In the above embodiment and modification A, the object detection unit 20 corrects the generated variance-covariance values (ρxx, ρxy, ρyy) using the t-distribution. ') may be derived.
 マハラノビス距離は、追跡対象の画素分布を正規分布と仮定した場合のデータの分布を表す。ニューラルネットワークでは、追跡対象の画素領域のサイズが大きい程、推論の精度が向上する傾向にあり、逆にサイズが小さい程、推論の精度が低下する。そこで、画素領域のサイズが小さい追跡対象に対しては、正規分布よりも分散の大きな分布を用いて楕円半径(rx,ry)を導出してもよい。具体的には、画素領域のサイズが小さい追跡対象に対しては、追跡対象の画素分布を正規分布と仮定して求めた分散共分散値(ρxx,ρxy,ρyy)を、t分布を用いて補正し、それにより得られた補正値(ρxx’,ρxy’,ρyy’)を用いて、楕円半径(rx,ry)を導出してもよい。 The Mahalanobis distance represents the distribution of data when the pixel distribution of the tracking target is assumed to be a normal distribution. In neural networks, the larger the size of the pixel region to be tracked, the more accurate the inference tends to be, and conversely, the smaller the size, the lower the accuracy of the inference. Therefore, for a tracking target whose pixel area is small in size, the ellipse radius (rx, ry) may be derived using a distribution with a larger variance than the normal distribution. Specifically, for a tracking target with a small pixel area size, the variance-covariance values (ρxx, ρxy, ρyy) obtained assuming that the pixel distribution of the tracking target is a normal distribution are calculated using the t distribution. The ellipse radius (rx, ry) may be derived using the correction values (ρxx', ρxy', ρyy') obtained thereby.
 ここで、t分布は、例えば、図6に示したような確率密度関数である。この確率密度関数は、正規分布の母数である平均位置(μx,μy)および分散共分散値(ρxx,ρxy,ρyy)を既知とする条件付き確率密度関数である。この確率密度関数は、さらに、自由度νが無限大に近づくにつれて、正規分布に近づく性質を有している。自由度νが小さいとき、つまり、追跡対象の画素領域に含まれる画素数が少ないとき、追跡対象の画素分布は、正規分布を押しつぶしたような、なだらかな分布になっている。 Here, the t distribution is, for example, a probability density function as shown in FIG. This probability density function is a conditional probability density function in which the average position (μx, μy) and variance-covariance values (ρxx, ρxy, ρyy), which are parameters of a normal distribution, are known. This probability density function further has a property that as the degree of freedom ν approaches infinity, it approaches a normal distribution. When the degree of freedom ν is small, that is, when the number of pixels included in the pixel region of the tracking target is small, the pixel distribution of the tracking target has a gentle distribution that is a squashed normal distribution.
 本変形例では、t分布の、このような特徴を利用して、追跡対象の画素領域に含まれる画素数が少ないときに、正規分布と仮定した求めた分散共分散値(ρxx,ρxy,ρyy)が補正される。これにより、追跡対象の画素領域に含まれる画素数が少ない場合であっても、追跡精度の低下を抑制することができる。 In this modification, by utilizing such characteristics of the t-distribution, when the number of pixels included in the pixel region to be tracked is small, the obtained variance-covariance values (ρxx, ρxy, ρyy ) is corrected. Thereby, even if the number of pixels included in the pixel region to be tracked is small, it is possible to suppress a decrease in tracking accuracy.
[変形例C]
 上記実施の形態および変形例A,Bにおいて、オブジェクト追跡部50は、例えば、図7に示したように、オブジェクト画像IobごとにヒストグラムHgを生成し、生成したヒストグラムHgに基づいて、得られた時刻の互いに異なる複数の入力画像Iinに同一の追跡対象が含まれているか否かの判別を行うようにしてもよい。
[Modification C]
In the above embodiment and modifications A and B, the object tracking unit 50 generates a histogram Hg for each object image Iob, and based on the generated histogram Hg, as shown in FIG. It may be determined whether the same tracking target is included in a plurality of input images Iin having different times.
 オブジェクト追跡部50は、例えば、図7に示したように、時刻tの平均位置(μx(t),μy(t))および楕円半径(rx(t),ry(t))と、時刻t+1の平均位置(μx(t+1),μy(t+1))および楕円半径(rx(t+1),ry(t+1))とを対比する。さらに、オブジェクト追跡部50は、例えば、図7に示したように、時刻tのヒストグラムHg(t)と、時刻t+1のヒストグラムHg(t+1)とを対比する。オブジェクト追跡部50は、これらの一致度に基づいて、2枚の入力画像Iin(t),Iin(t+1)に同一の追跡対象が含まれているか否かの判別を行う。その結果、2枚の入力画像Iin(t),Iin(t+1)に同一の追跡対象が含まれている場合には、オブジェクト追跡部50は、例えば、その追跡対象を含むオブジェクト画像Iob(t+1)と、その追跡対象の画素領域の平均位置(μx(t+1),μy(t+1))とを外部に出力する。このように判別にヒストグラムHgを用いることにより、より高精度に上記の判別を行うことが可能となる。 For example, as shown in FIG. 7, the object tracking unit 50 calculates the average position (μx(t), μy(t)) and ellipse radius (rx(t), ry(t)) at time t, and the time t+1. The average position (μx(t+1), μy(t+1)) and the ellipse radius (rx(t+1), ry(t+1)) are compared. Further, the object tracking unit 50 compares the histogram Hg(t) at time t and the histogram Hg(t+1) at time t+1, for example, as shown in FIG. The object tracking unit 50 determines whether or not the two input images Iin(t) and Iin(t+1) include the same tracking target based on the degree of matching. As a result, if the same tracking target is included in the two input images Iin(t) and Iin(t+1), the object tracking unit 50, for example, selects an object image Iob(t+1) that includes the tracking target. and the average position (μx(t+1), μy(t+1)) of the pixel area to be tracked are output to the outside. By using the histogram Hg for discrimination in this way, it becomes possible to perform the above discrimination with higher accuracy.
[変形例D]
 上記実施の形態および変形例A~Cにおいて、情報処理システム1は、例えば、図8に示したように、フレーム補間部60を更に備えていてもよい。フレーム補間部60は、例えば、CPUおよびGPUを含んで構成されている。フレーム補間部60は、例えば、記憶部30に記憶された各種プログラム(例えば、プログラム33)がCPUにロードされることにより、プログラム33に記述された一連の手順を実行する。なお、フレーム補間部60が、その機能を実行するMPUで構成されていてもよい。
[Modification D]
In the above embodiments and modifications A to C, the information processing system 1 may further include a frame interpolation unit 60, for example, as shown in FIG. The frame interpolation unit 60 includes, for example, a CPU and a GPU. For example, the frame interpolation unit 60 executes a series of procedures described in the program 33 by loading various programs (for example, the program 33) stored in the storage unit 30 into the CPU. Note that the frame interpolation unit 60 may be configured with an MPU that executes its functions.
 ここで、オブジェクト検出部20は、所定の動作周波数で統計情報を生成しているとする。このとき、フレーム補間部60は、カルマンフィルタを用いて、オブジェクト検出部20で生成された統計情報のフレーム補間を行う。カルマンフィルタは、オブジェクト検出部20で生成された統計情報が入力されると、入力された統計情報に基づいて、オブジェクト検出部20の動作周波数よりも高い周波数で、統計情報の推定値を生成する。オブジェクト検出部20は、カルマンフィルタを用いて生成した統計情報の推定値をオブジェクト画像抽出部40に出力する。オブジェクト画像抽出部40は、フレーム補間部60で生成された統計情報の推定値と、オブジェクト検出部20で生成された追跡対象の種別情報とに基づいて追跡対象の抽出領域PAを決定する。 Here, it is assumed that the object detection unit 20 generates statistical information at a predetermined operating frequency. At this time, the frame interpolation unit 60 performs frame interpolation of the statistical information generated by the object detection unit 20 using a Kalman filter. When the statistical information generated by the object detecting section 20 is input, the Kalman filter generates an estimated value of the statistical information at a frequency higher than the operating frequency of the object detecting section 20 based on the input statistical information. The object detection unit 20 outputs the estimated value of the statistical information generated using the Kalman filter to the object image extraction unit 40. The object image extraction unit 40 determines the extraction area PA of the tracking target based on the estimated value of the statistical information generated by the frame interpolation unit 60 and the tracking target type information generated by the object detection unit 20.
 従って、オブジェクト検出部20のN倍の動作周波数でフレーム補間部60を動作させることにより、ニューラルネットワークの動作周波数を1/Nに低減することができる。その結果、ニューラルネットワークの演算量を低減することができる。 Therefore, by operating the frame interpolation section 60 at an operating frequency N times that of the object detection section 20, the operating frequency of the neural network can be reduced to 1/N. As a result, the amount of calculation of the neural network can be reduced.
<3.第2の実施の形態>
 次に、本開示の第2の実施の形態に係る情報処理システム2について説明する。図9は、情報処理システム2の機能ブロック例を表したものである。情報処理システム2は、例えば、図9に示したように、上記実施の形態およびその変形例A~Dに係る情報処理システム2において、オブジェクト追跡部50の代わりに、画像DB(DataBase)生成部70および画像DB80が設けられたものに相当する。
<3. Second embodiment>
Next, an information processing system 2 according to a second embodiment of the present disclosure will be described. FIG. 9 shows an example of functional blocks of the information processing system 2. As shown in FIG. For example, as shown in FIG. 9, the information processing system 2 includes an image DB (DataBase) generation unit instead of the object tracking unit 50 in the information processing system 2 according to the above embodiment and modifications A to D thereof. 70 and an image DB 80 are provided.
 画像DB生成部70は、オブジェクト画像抽出部40でオブジェクト画像Iobが生成されるたびに、生成されたオブジェクト画像Iobを画像DB80に格納する。画像DB80には、多数のオブジェクト画像Iobが格納されている。画像DB80は、例えば、ニューラルネットワークの学習用の画像(追跡対象を含む画像)として利用され得る。 The image DB generation unit 70 stores the generated object image Iob in the image DB 80 every time the object image Iob is generated by the object image extraction unit 40. The image DB 80 stores a large number of object images Iob. The image DB 80 can be used, for example, as an image for learning a neural network (an image including a tracking target).
 以上、複数の実施の形態およびその変形例を挙げて本開示を説明したが、本開示は上記実施の形態等に限定されるものではなく、種々変形が可能である。 Although the present disclosure has been described above with reference to a plurality of embodiments and modifications thereof, the present disclosure is not limited to the above embodiments, etc., and various modifications are possible.
 例えば、上記実施の形態等において、オブジェクト検出部20、記憶部30、オブジェクト画像抽出部40およびオブジェクト追跡部50が全て、共通の情報処理装置に搭載されていてもよい。さらに、例えば、上記実施の形態等において、センサデバイス部10、オブジェクト検出部20、記憶部30、オブジェクト画像抽出部40およびオブジェクト追跡部50が全て、共通の情報処理装置に搭載されていてもよい。 For example, in the above embodiments, the object detection section 20, the storage section 30, the object image extraction section 40, and the object tracking section 50 may all be installed in a common information processing device. Furthermore, for example, in the above embodiments, the sensor device section 10, the object detection section 20, the storage section 30, the object image extraction section 40, and the object tracking section 50 may all be installed in a common information processing device. .
 また、例えば、上記実施の形態等において、オブジェクト検出部20、記憶部30、オブジェクト画像抽出部40、画像DB生成部70および画像DB80が全て、共通の情報処理装置に搭載されていてもよい。さらに、例えば、上記実施の形態等において、センサデバイス部10、オブジェクト検出部20、記憶部30、オブジェクト画像抽出部40、画像DB生成部70および画像DB80が全て、共通の情報処理装置に搭載されていてもよい。 Furthermore, for example, in the above embodiments, the object detection section 20, the storage section 30, the object image extraction section 40, the image DB generation section 70, and the image DB 80 may all be installed in a common information processing device. Further, for example, in the above embodiments, the sensor device section 10, the object detection section 20, the storage section 30, the object image extraction section 40, the image DB generation section 70, and the image DB 80 are all installed in a common information processing device. You can leave it there.
 なお、本明細書中に記載された効果は、あくまで例示である。本開示の効果は、本明細書中に記載された効果に限定されるものではない。本開示が、本明細書中に記載された効果以外の効果を持っていてもよい。 Note that the effects described in this specification are merely examples. The effects of the present disclosure are not limited to the effects described herein. The present disclosure may have advantages other than those described herein.
 また、例えば、本開示は以下のような構成を取ることができる。
(1)
 入力画像に含まれる追跡対象の画素領域の統計情報を生成するオブジェクト検出部と、
 前記統計情報に基づいて前記追跡対象の抽出領域を決定し、前記入力画像から、前記抽出領域の画像としてオブジェクト画像を抽出するオブジェクト画像抽出部と
 を備えた
 情報処理装置。
(2)
 前記統計情報は、前記追跡対象の画素領域の平均位置および分散共分散値を含み、
 前記抽出領域は、楕円形状であり、
 前記オブジェクト画像抽出部は、前記分散共分散値に基づいて前記抽出領域の半径を決定し、前記平均位置および前記半径に基づいて前記抽出領域を決定する
 (1)に記載の情報処理装置。
(3)
 前記オブジェクト検出部は、生成した前記分散共分散値を、t分布を用いて補正することにより補正値を導出し、
 前記オブジェクト画像抽出部は、前記補正値に基づいて前記抽出領域の半径を決定する
 (2)に記載の情報処理装置。
(4)
 前記オブジェクト検出部は、前記入力画像から、前記統計情報と、前記追跡対象の種別情報とを生成し、
 前記オブジェクト画像抽出部は、前記統計情報および前記種別情報に基づいて前記抽出領域を決定する
 (1)に記載の情報処理装置。
(5)
 前記統計情報は、前記追跡対象の画素分布の平均位置および分散共分散値を含み、
 前記抽出領域は、楕円形状であり、
 前記オブジェクト画像抽出部は、前記分散共分散値と、前記種別情報とに基づいて、前記抽出領域の半径および前記平均位置のオフセットを決定し、前記平均位置、前記半径および前記オフセットに基づいて前記抽出領域を決定する
 (4)に記載の情報処理装置。
(6)
 前記オブジェクト検出部は、生成した前記分散共分散値を、t分布を用いて補正することにより補正値を導出し、
 前記オブジェクト画像抽出部は、前記補正値および前記種別情報に基づいて、前記抽出領域の半径および前記平均位置のオフセットを決定する
 (5)に記載の情報処理装置。
(7)
 前記オブジェクト検出部は、ニューラルネットワークを用いて、前記入力画像から、前記統計情報を出力する
 (1)~(3)のいずれか1つに記載の情報処理装置。
(8)
 前記オブジェクト検出部は、ニューラルネットワークを用いて、前記入力画像から、前記統計情報と、前記種別情報とを出力する
 (4)~(6)のいずれか1つに記載の情報処理装置。
(9)
 前記統オブジェクト検出部は、所定の動作周波数で前記統計情報を生成し、
 当該情報処理装置は、カルマンフィルタを用いて、前記統オブジェクト検出部で生成された前記統計情報のフレーム補間を行うフレーム補間部を更に備えた
 (1)~(8)のいずれか1つに記載の情報処理装置。
(10)
 得られた時刻の互いに異なる複数の前記入力画像に同一の前記追跡対象が含まれているか否かの判別を行うオブジェクト追跡部を更に備えた
 (1)~(9)のいずれか1つに記載の情報処理装置。
(11)
 前記オブジェクト追跡部は、前記オブジェクト画像ごとにヒストグラムを生成し、生成した前記ヒストグラムに基づいて、前記判別を行う
 (10)に記載の情報処理装置。
(12)
 入力画像に含まれる追跡対象の画素領域の統計情報を生成することと、
 前記統計情報に基づいて前記追跡対象の抽出領域を決定し、前記入力画像から、前記抽出領域の画像としてオブジェクト画像を抽出することと
 を含む
 情報処理方法。
(13)
 入力画像に含まれる追跡対象の画素領域の統計情報を生成することと、
 前記統計情報に基づいて前記追跡対象の抽出領域を決定し、前記入力画像から、前記抽出領域の画像としてオブジェクト画像を抽出することと
 をコンピュータに実行させる
 情報処理プログラム。
Further, for example, the present disclosure can take the following configuration.
(1)
an object detection unit that generates statistical information of a pixel region to be tracked included in the input image;
An information processing apparatus, comprising: an object image extraction unit that determines an extraction area of the tracking target based on the statistical information and extracts an object image from the input image as an image of the extraction area.
(2)
The statistical information includes an average position and a variance-covariance value of the pixel region to be tracked,
The extraction area has an elliptical shape,
The information processing device according to (1), wherein the object image extraction unit determines a radius of the extraction region based on the variance-covariance value, and determines the extraction region based on the average position and the radius.
(3)
The object detection unit derives a correction value by correcting the generated variance-covariance value using a t-distribution,
The information processing device according to (2), wherein the object image extraction unit determines the radius of the extraction area based on the correction value.
(4)
The object detection unit generates the statistical information and the tracking target type information from the input image,
The information processing device according to (1), wherein the object image extraction unit determines the extraction area based on the statistical information and the type information.
(5)
The statistical information includes an average position and a variance-covariance value of the pixel distribution of the tracking target,
The extraction area has an elliptical shape,
The object image extraction unit determines a radius of the extraction area and an offset of the average position based on the variance-covariance value and the type information, and determines the radius of the extraction area and the offset of the average position based on the average position, the radius, and the offset. The information processing device according to (4), which determines an extraction area.
(6)
The object detection unit derives a correction value by correcting the generated variance-covariance value using a t-distribution,
The information processing device according to (5), wherein the object image extraction unit determines the radius of the extraction area and the offset of the average position based on the correction value and the type information.
(7)
The information processing device according to any one of (1) to (3), wherein the object detection unit outputs the statistical information from the input image using a neural network.
(8)
The information processing device according to any one of (4) to (6), wherein the object detection unit outputs the statistical information and the type information from the input image using a neural network.
(9)
The standard object detection unit generates the statistical information at a predetermined operating frequency,
The information processing device according to any one of (1) to (8), further comprising a frame interpolation unit that performs frame interpolation of the statistical information generated by the systematic object detection unit using a Kalman filter. Information processing device.
(10)
According to any one of (1) to (9), further comprising an object tracking unit that determines whether the same tracking target is included in the plurality of input images obtained at different times. information processing equipment.
(11)
The information processing device according to (10), wherein the object tracking unit generates a histogram for each of the object images, and performs the discrimination based on the generated histogram.
(12)
Generating statistical information of a pixel region to be tracked included in an input image;
An information processing method comprising: determining an extraction region of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction region.
(13)
Generating statistical information of a pixel region to be tracked included in an input image;
An information processing program that causes a computer to execute the following steps: determining an extraction region of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction region.
 本開示の一実施の形態に係る情報処理装置、情報処理方法および情報処理プログラムでは、入力画像に含まれる追跡対象の画素領域の統計情報に基づいて追跡対象の抽出領域が決定され、入力画像から、抽出領域の画像としてオブジェクト画像が抽出される。このように、追跡対象の統計情報に基づいて追跡対象の画像を抽出することにより、矩形のフィルタを用いて追跡対象の画像を抽出する場合と比べて、オブジェクト画像に含まれる背景画像の割合を抑えることができる。その結果、ノイズを低く抑えることができる。 In an information processing device, an information processing method, and an information processing program according to an embodiment of the present disclosure, an extraction region of a tracking target is determined based on statistical information of a pixel region of a tracking target included in an input image, and , the object image is extracted as the image of the extraction area. In this way, by extracting the image of the tracking target based on the statistical information of the tracking target, the proportion of the background image included in the object image can be reduced compared to the case of extracting the image of the tracking target using a rectangular filter. It can be suppressed. As a result, noise can be kept low.
 本出願は、日本国特許庁において2022年3月17日に出願された日本特許出願番号第2022-042856号を基礎として優先権を主張するものであり、この出願のすべての内容を参照によって本出願に援用する。 This application claims priority based on Japanese Patent Application No. 2022-042856 filed at the Japan Patent Office on March 17, 2022, and all contents of this application are incorporated herein by reference. Incorporate it into your application.
 当業者であれば、設計上の要件や他の要因に応じて、種々の修正、コンビネーション、サブコンビネーション、および変更を想到し得るが、それらは添付の請求の範囲やその均等物の範囲に含まれるものであることが理解される。 Various modifications, combinations, subcombinations, and changes may occur to those skilled in the art, depending on design requirements and other factors, which may come within the scope of the appended claims and their equivalents. It is understood that the

Claims (13)

  1.  入力画像に含まれる追跡対象の画素領域の統計情報を生成するオブジェクト検出部と、
     前記統計情報に基づいて前記追跡対象の抽出領域を決定し、前記入力画像から、前記抽出領域の画像としてオブジェクト画像を抽出するオブジェクト画像抽出部と
     を備えた
     情報処理装置。
    an object detection unit that generates statistical information of a pixel region to be tracked included in the input image;
    An information processing apparatus, comprising: an object image extraction unit that determines an extraction area of the tracking target based on the statistical information and extracts an object image from the input image as an image of the extraction area.
  2.  前記統計情報は、前記追跡対象の画素領域の平均位置および分散共分散値を含み、
     前記抽出領域は、楕円形状であり、
     前記オブジェクト画像抽出部は、前記分散共分散値に基づいて前記抽出領域の半径を決定し、前記平均位置および前記半径に基づいて前記抽出領域を決定する
     請求項1に記載の情報処理装置。
    The statistical information includes an average position and a variance-covariance value of the pixel region to be tracked,
    The extraction area has an elliptical shape,
    The information processing device according to claim 1, wherein the object image extraction unit determines a radius of the extraction region based on the variance-covariance value, and determines the extraction region based on the average position and the radius.
  3.  前記オブジェクト検出部は、生成した前記分散共分散値を、t分布を用いて補正することにより補正値を導出し、
     前記オブジェクト画像抽出部は、前記補正値に基づいて前記抽出領域の半径を決定する
     請求項2に記載の情報処理装置。
    The object detection unit derives a correction value by correcting the generated variance-covariance value using a t-distribution,
    The information processing device according to claim 2, wherein the object image extraction unit determines a radius of the extraction area based on the correction value.
  4.  前記オブジェクト検出部は、前記入力画像から、前記統計情報と、前記追跡対象の種別情報とを生成し、
     前記オブジェクト画像抽出部は、前記統計情報および前記種別情報に基づいて前記抽出領域を決定する
     請求項1に記載の情報処理装置。
    The object detection unit generates the statistical information and the tracking target type information from the input image,
    The information processing device according to claim 1, wherein the object image extraction unit determines the extraction area based on the statistical information and the type information.
  5.  前記統計情報は、前記追跡対象の画素分布の平均位置および分散共分散値を含み、
     前記抽出領域は、楕円形状であり、
     前記オブジェクト画像抽出部は、前記分散共分散値と、前記種別情報とに基づいて、前記抽出領域の半径および前記平均位置のオフセットを決定し、前記平均位置、前記半径および前記オフセットに基づいて前記抽出領域を決定する
     請求項4に記載の情報処理装置。
    The statistical information includes an average position and a variance-covariance value of the pixel distribution of the tracking target,
    The extraction area has an elliptical shape,
    The object image extraction unit determines a radius of the extraction area and an offset of the average position based on the variance-covariance value and the type information, and determines the radius of the extraction area and the offset of the average position based on the average position, the radius, and the offset. The information processing device according to claim 4, wherein an extraction area is determined.
  6.  前記オブジェクト検出部は、生成した前記分散共分散値を、t分布を用いて補正することにより補正値を導出し、
     前記オブジェクト画像抽出部は、前記補正値および前記種別情報に基づいて、前記抽出領域の半径および前記平均位置のオフセットを決定する
     請求項5に記載の情報処理装置。
    The object detection unit derives a correction value by correcting the generated variance-covariance value using a t-distribution,
    The information processing device according to claim 5, wherein the object image extraction unit determines a radius of the extraction area and an offset of the average position based on the correction value and the type information.
  7.  前記オブジェクト検出部は、ニューラルネットワークを用いて、前記入力画像から、前記統計情報を出力する
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the object detection unit outputs the statistical information from the input image using a neural network.
  8.  前記オブジェクト検出部は、ニューラルネットワークを用いて、前記入力画像から、前記統計情報と、前記種別情報とを出力する
     請求項4に記載の情報処理装置。
    The information processing device according to claim 4, wherein the object detection unit outputs the statistical information and the type information from the input image using a neural network.
  9.  前記統オブジェクト検出部は、所定の動作周波数で前記統計情報を生成し、
     当該情報処理装置は、カルマンフィルタを用いて、前記統オブジェクト検出部で生成された前記統計情報のフレーム補間を行うフレーム補間部を更に備えた
     請求項1に記載の情報処理装置。
    The standard object detection unit generates the statistical information at a predetermined operating frequency,
    The information processing device according to claim 1, further comprising a frame interpolation unit that performs frame interpolation of the statistical information generated by the statistical object detection unit using a Kalman filter.
  10.  得られた時刻の互いに異なる複数の前記入力画像に同一の前記追跡対象が含まれているか否かの判別を行うオブジェクト追跡部を更に備えた
     請求項1に記載の情報処理装置。
    The information processing apparatus according to claim 1, further comprising an object tracking unit that determines whether or not the same tracking target is included in the plurality of input images obtained at different times.
  11.  前記オブジェクト追跡部は、前記オブジェクト画像ごとにヒストグラムを生成し、生成した前記ヒストグラムに基づいて、前記判別を行う
     請求項10に記載の情報処理装置。
    The information processing device according to claim 10, wherein the object tracking unit generates a histogram for each of the object images, and performs the discrimination based on the generated histogram.
  12.  入力画像に含まれる追跡対象の画素領域の統計情報を生成することと、
     前記統計情報に基づいて前記追跡対象の抽出領域を決定し、前記入力画像から、前記抽出領域の画像としてオブジェクト画像を抽出することと
     を含む
     情報処理方法。
    Generating statistical information of a pixel region to be tracked included in an input image;
    An information processing method comprising: determining an extraction region of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction region.
  13.  入力画像に含まれる追跡対象の画素領域の統計情報を生成することと、
     前記統計情報に基づいて前記追跡対象の抽出領域を決定し、前記入力画像から、前記抽出領域の画像としてオブジェクト画像を抽出することと
     をコンピュータに実行させる
     情報処理プログラム。
    Generating statistical information of a pixel region to be tracked included in an input image;
    An information processing program that causes a computer to execute the following steps: determining an extraction region of the tracking target based on the statistical information, and extracting an object image from the input image as an image of the extraction region.
PCT/JP2023/008411 2022-03-17 2023-03-06 Information processing apparatus, information processing method, and information processing program WO2023176562A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-042856 2022-03-17
JP2022042856 2022-03-17

Publications (1)

Publication Number Publication Date
WO2023176562A1 true WO2023176562A1 (en) 2023-09-21

Family

ID=88023031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/008411 WO2023176562A1 (en) 2022-03-17 2023-03-06 Information processing apparatus, information processing method, and information processing program

Country Status (1)

Country Link
WO (1) WO2023176562A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005122617A (en) * 2003-10-20 2005-05-12 Advanced Telecommunication Research Institute International Real-time object detection and recognition system, and computer-executable program
US20060187305A1 (en) * 2002-07-01 2006-08-24 Trivedi Mohan M Digital processing of video images
JP2018084802A (en) * 2016-11-11 2018-05-31 株式会社東芝 Imaging device, imaging system, and distance information acquisition method
JP2018101165A (en) * 2015-04-27 2018-06-28 国立大学法人 奈良先端科学技術大学院大学 Color image processing method, color image processing program, object recognition method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060187305A1 (en) * 2002-07-01 2006-08-24 Trivedi Mohan M Digital processing of video images
JP2005122617A (en) * 2003-10-20 2005-05-12 Advanced Telecommunication Research Institute International Real-time object detection and recognition system, and computer-executable program
JP2018101165A (en) * 2015-04-27 2018-06-28 国立大学法人 奈良先端科学技術大学院大学 Color image processing method, color image processing program, object recognition method and apparatus
JP2018084802A (en) * 2016-11-11 2018-05-31 株式会社東芝 Imaging device, imaging system, and distance information acquisition method

Similar Documents

Publication Publication Date Title
US10216979B2 (en) Image processing apparatus, image processing method, and storage medium to detect parts of an object
US7912253B2 (en) Object recognition method and apparatus therefor
JP4295799B2 (en) Human posture estimation with data-driven probability propagation
JP4459137B2 (en) Image processing apparatus and method
US9378422B2 (en) Image processing apparatus, image processing method, and storage medium
US9294665B2 (en) Feature extraction apparatus, feature extraction program, and image processing apparatus
EP2339507B1 (en) Head detection and localisation method
Tavakkoli et al. Non-parametric statistical background modeling for efficient foreground region detection
US9082000B2 (en) Image processing device and image processing method
JP7272024B2 (en) Object tracking device, monitoring system and object tracking method
US9443137B2 (en) Apparatus and method for detecting body parts
US9305359B2 (en) Image processing method, image processing apparatus, and computer program product
CN110097050B (en) Pedestrian detection method, device, computer equipment and storage medium
KR20170056860A (en) Method of generating image and apparatus thereof
CN114022830A (en) Target determination method and target determination device
EP3772037A1 (en) Image processing apparatus, method of tracking a target object, and program
JP7086878B2 (en) Learning device, learning method, program and recognition device
KR20210133880A (en) Image depth determining method and living body identification method, circuit, device, and medium
JP2019536164A (en) Image processing apparatus, image processing method, and image processing program
CN104063709A (en) Line-of-sight Detection Apparatus, Method, Image Capturing Apparatus And Control Method
US11462052B2 (en) Image processing device, image processing method, and recording medium
JP2014021602A (en) Image processor and image processing method
WO2023176562A1 (en) Information processing apparatus, information processing method, and information processing program
JP6717049B2 (en) Image analysis apparatus, image analysis method and program
Osman et al. Improved skin detection based on dynamic threshold using multi-colour space

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23770518

Country of ref document: EP

Kind code of ref document: A1