WO2008047774A1

WO2008047774A1 - Moving image processing device, moving image processing method, and moving image processing program

Info

Publication number: WO2008047774A1
Application number: PCT/JP2007/070132
Authority: WO
Inventors: Nobuyuki Matsui; Naotake Kamiura; Teijiro Isokawa; Yuzo Ogawa; Akitsugu Ohtsuka; Kenji Iwatani
Original assignee: Toa Corporation
Priority date: 2006-10-17
Filing date: 2007-10-16
Publication date: 2008-04-24
Also published as: JP2008102589A

Abstract

A novel moving image processing device for detecting a moving object in a moving image by using a self-organized map. A composite video signal from a camera (20) is converted into color image data by an input converting section (50). The color image data is inputted into a feature extracting section (58) through an image dividing section (52) and a frame setting section (56). The feature extracting section (58) extracts an n-dimensional feature of each pixel constituted of color image data of one frame each time the color image data of the one frame is inputted. The extracted feature data is inputted into a control section (60). The control section (60) constitutes a self-organized map of block unit learning type together with a map (62) and identifies which of the moving object region and the background region each pixel constitutes. According to the identification results, an output converting section (70) generates a processed video signal so processed that only the moving object region is displayed.

Description

Specification

Moving image processing apparatus, moving image processing method, and moving image processing program

TECHNICAL FIELD [0001] The present invention relates to a moving image processing apparatus, a moving image processing method, and a moving image processing program, and in particular, a moving object in a moving image using a self-organizing map (SOM). The present invention relates to a moving image processing apparatus, a moving image processing method, and a moving image processing program.

Background art

[0002] SOM maps multidimensional data to a two-dimensional map, and is used, for example, to classify unknown data. As a technology developed from this SOM, for example, there is one disclosed in Patent Document 1, for example. According to the conventional technique disclosed in Patent Document 1, a plurality of cell forces constituting a map are handled in units of blocks that are aggregates, that is, learning is performed in units of the blocks. Then, the unknown data is classified based on the block unit vector data. As a result, more accurate learning and classification of unknown data is realized compared to a general SOM in which learning is performed in a single cell and unknown data is classified based on vector data of the single cell. It is said

[0003] Further, in Patent Document 2, a pseudo map is provided in addition to the unlearned map which is the main map, and learning based on learning data consisting of vector data is performed one by one by this pseudo map, and all learning is performed. After the learning based on the data, the technical skill S is disclosed in which the learning results of the pseudo map are collectively reflected in the unlearned map. According to the prior art disclosed in Patent Document 2, the vector data of each cell constituting the unlearned map does not change during the learning by the pseudo map! / The classification of unknown data based on the vector data of each cell is always performed accurately. The conventional technique disclosed in Patent Document 2 is also applicable to the case where each cell constituting the map is handled in units of blocks, as in the conventional technique disclosed in Patent Document 1. Yes. [0004] Patent Document 1: Japanese Unexamined Patent Publication No. 2006-53842

Patent Document 2: Japanese Unexamined Patent Publication No. 2006-79326

Disclosure of the invention

Problems to be solved by the invention

[0005] Incidentally, in a general SOM, various parameters must be set prior to learning. Moreover, the most important parameters, the learning coefficient and the size of the neighborhood, decrease monotonically as learning progresses. Therefore, when the learning data changes during learning or when new learning data is added, the learning coefficient and the area of the neighborhood cannot be appropriately handled, and accurate learning is performed. The inconvenience that there is no occurs.

[0006] On the other hand, in the conventional techniques disclosed in Patent Documents 1 and 2, each cell is handled in units of blocks, so that no monotonic decrease such as the above-described learning coefficient and the size of the neighborhood. The parameter is eliminated. Therefore, even when the learning data changes during learning or when new learning data is added, it is possible to cope with it sufficiently. This means that image data of each frame of a moving image that changes every moment is adopted as learning data, and whether each pixel forms a moving object region or a non-moving object region in each frame. This means that it is expected to be applied to detecting moving objects, in other words, detecting moving objects.

Therefore, the present invention provides a novel moving image processing apparatus, moving image processing method, and moving image processing program capable of accurately detecting a moving object in a moving image using SOM. The purpose.

Means for solving the problem

In order to achieve this object, a moving image processing apparatus according to the present invention includes an image for one frame that forms a moving image including pixels that form a moving object region and pixels that form a non-moving object region. Data is input, n (n; plural) features of this image data are extracted for each pixel, and n-dimensional first vector data is generated, and two-dimensionally arranged n And a map including a plurality of neurons having dimensions of second vector data and belonging to either a moving object region or a non-migrating animal region class. Furthermore, among a plurality of blocks composed of some neurons adjacent to each other, it The 3rd vector data, which is the statistics of the 2nd vector data of each neuron that constitutes each, constitutes a search unit that searches each pixel for a winner block corresponding to the 1st outer data, and a winner block An identification means for identifying whether each pixel forms a moving object area or a non-moving object area based on the class to which the neuron belongs, the identification result by this identification means, and the first vector data of each pixel. And updating means for updating the second vector data and class of the neurons constituting the winner block corresponding to the pixel. Then, after updating based on all the pixels is performed by the updating means, image data for one new frame constituting the moving image is input to the extracting means.

That is, in the present invention, for each pixel constituting an image of a certain frame of a moving image, the data of the pixel is extracted by n characteristic power extracting means. Then, the extraction means generates n-dimensional first vector data representing the extracted n features for each pixel. On the other hand, each neuron constituting the map has n-dimensional second vector data and belongs to either the moving object region or the non-moving object region class. Then, the search means assembles a plurality of blocks that are a collection of a part of two euroons adjacent to each other, and a winner block corresponding to each pixel is searched for each pixel from the plurality of blocks. The Specifically, for each pixel, the block having the third vector data most corresponding to the first vector data, more specifically, the utarid distance to the first vector data is the largest. A block with short third vector data, a power S winner block. The third vector data is statistics of the second vector data of each neuron constituting each block, and is, for example, an average value. Then, based on the class to which each neuron constituting the winner block belongs, the power of forming each of the moving object region and the non-moving object region by each pixel is identified by the identifying unit. Further, based on the identification result by the identification means and the first vector data of each pixel, the second vector data and class force S of each neuron constituting the winner block corresponding to each pixel, the updating means Updated, and so to speak. Then, after learning based on all the pixels is performed by the updating means, a new one-frame image data is stored. Data is input to the extraction means. That is, every time the frame is changed, the identification of each pixel and the learning based on each pixel after the identification are repeated.

[0010] It should be noted that the present invention may further include display means for displaying only the pixels identified as forming the moving object region by the identifying means. In this way, it is possible to extract only moving objects from the moving image and display them.

[0011] In addition, the search means searches for a winner candidate block whose third vector data corresponds to the first vector data among a plurality of blocks of the same size for each pixel, and this winner candidate Repeating execution means for repeatedly executing the search by the winner candidate searching means for each pixel so as to sequentially search for another winner candidate block having a smaller size in the winner candidate block searched by the searching means, and a winner candidate Determining means for determining, for each pixel, a winner block that has the third vector data corresponding to the first vector among a plurality of winner candidate blocks searched by repeatedly performing a search by the search means; It may be included. According to this configuration, a plurality of winner candidate blocks having different sizes are sequentially searched by a so-called decision tree method. Then, a true winner block is determined from the plurality of winner candidate blocks. In this way, the adoption of the decision tree method for searching for the winner block reduces the amount of calculation required for searching for the winner block and reduces the burden on the search means. This is extremely effective for improving the processing speed of the entire moving image processing apparatus including the search means.

[0012] Further, the updating unit may perform updating based on all the pixels in a batch, that is, in batch, after all the pixels are identified by the identifying unit. In this way, the amount of computation required for updating by the updating means is reduced, and the burden on the updating means is reduced. This is also extremely effective in improving the processing speed of the entire moving image processing apparatus including the updating means.

The image data in the present invention may include color information. In this case, it is desirable that the extraction unit extracts the color information as a feature of the image data. Note that the color information mentioned here may be color space information according to a generally known RGB format, or may be color space information according to a YUV format. CMY for printing Color space information according to the K format may be used.

[0014] Furthermore, the feature of each pixel may include the feature of neighboring pixels in the vicinity of the pixel, for example, peripheral pixels.

[0015] Further, the extracting means may handle a plurality of adjacent pixels as one pixel. In this way, the processing load of the entire moving image processing apparatus including the extracting means is reduced, and it is extremely effective in improving the processing speed of the entire moving image processing apparatus.

[0016] The image processing apparatus may further include a frame setting unit that sets a frame including the moving object region and a part of the non-moving object region, and the extraction unit may handle only the pixels in the frame. This also reduces the processing load on the entire moving image processing apparatus including the extracting means, and is extremely effective in improving the processing speed of the entire moving image processing apparatus. In addition, the possibility that pixels that form non-moving object areas (especially non-moving object areas outside the frame) will be mistakenly identified as moving object areas is reduced, and such pixels become noise. Influence is suppressed.

[0017] A moving image processing method according to the present invention includes η (η; plural Μ) of image data for one frame constituting a moving image including pixels that form a moving object region and pixels that form a non-moving object region. An extraction process for extracting η-dimensional first vector data by extracting a solid feature for each pixel, and either a moving object region or a non-moving animal region, each having η-dimensional second vector data Forming a map in which a plurality of neurons belonging to a certain class form a two-dimensionally arranged map, and constituting each of a plurality of blocks composed of a part of adjacent neurons. A search process for searching each pixel for a winner block whose third vector data, which is statistics of the second vector data of the neuron, corresponds to the first vector data, and a winner block Based on the class to which the neuron that constitutes the pack belongs, the identification process for identifying whether each pixel forms a! /, Shift between the moving object area and the non-moving object area, and the identification result in this identification process and each And updating the second vector data and class of the neurons constituting the winner block corresponding to the pixel based on the first vector data of the pixel of the pixel. After the update based on this, the image data for one new frame will be processed in the extraction process. It is.

[0018] A moving image processing program according to the present invention includes n (n; plural) pieces of image data for one frame constituting a moving image including pixels forming a moving object region and pixels forming a non-moving object region. An extraction procedure for extracting n features for each pixel and generating n-dimensional first vector data, each having n-dimensional second vector data, and either moving object region or non-moving object region A map forming procedure for forming a map in which a plurality of neurons belonging to the class are arranged in a two-dimensional manner. In addition, among the plurality of blocks composed of a part of neurons adjacent to each other, the third vector data, which is the statistics of the second vector data of each neuron constituting each, is assigned to the winner block corresponding to the first vector data. A search procedure for searching for each pixel, and an identification procedure for identifying the power of each pixel forming a moving object region or a non-moving object region based on the class to which the neuron constituting the winner block belongs And an update procedure for updating the second vector data and the class of the neurons constituting the winner block corresponding to the pixel based on the discrimination result by the discrimination procedure and the first vector data of each pixel. Then, the extraction procedure, map formation procedure, search procedure, identification procedure, and update procedure are executed by the computer, and after updating based on all pixels by the update procedure, a new one frame is obtained. The image data is the target of processing by the extraction procedure.

Brief Description of Drawings

FIG. 1 is a diagram showing a schematic configuration of an embodiment of the present invention.

FIG. 2 is an illustrative view showing a relationship between an input image and an output image in the same embodiment.

3 is a block diagram showing a detailed configuration of the moving image processing apparatus in FIG. 1.

4 is an illustrative view for explaining the contents of processing by the image dividing unit in FIG. 3.

FIG. 5 is an illustrative view for explaining the contents of processing by the frame setting unit in FIG. 3;

FIG. 6 is an illustrative view conceptually showing the structure of the map in FIG. 3.

FIG. 7 is an illustrative view for illustrating the contents of processing by the control unit in FIG. 3.

FIG. 8 is an illustrative view conceptually showing a state where the map in FIG. 3 is classified.

FIG. 9 is an illustrative view showing an output image corresponding to FIG. 5. FIG. 10 is an illustrative view showing an actual input image and output image in the same embodiment.

FIG. 11 is a flowchart showing an outline of an object detection task executed by the control unit in FIG. 3.

FIG. 12 is a flowchart following FIG. 11.

FIG. 13 is a flowchart showing details of a winner block search process in FIG. 11.

FIG. 14 is a flowchart showing details of the update preparation process in FIG.

BEST MODE FOR CARRYING OUT THE INVENTION

One embodiment of the present invention will be described with reference to FIGS.

As shown in FIG. 1, a moving image processing system 10 according to the present embodiment includes a color video camera (hereinafter simply referred to as a camera) 20, a moving image processing device 30, and a monitor 40. Have

[0022] The camera 20 is a so-called fixed type, and is fixed at an appropriate place by a fixing tool (not shown). When an optical image of the object scene is incident on the camera 20 through the lens 22, the camera 20 converts the incident optical image into a composite video signal that is an analog electric signal and outputs the composite video signal. The composite video signal output from the camera 20 is input to the moving image processing device 30. The moving image processing device 30 performs the following processing on the input composite video signal.

[0023] For example, it is assumed that an input image according to a composite video signal includes a moving object region 100 and a non-moving object region, that is, a background region 102, as shown in FIG. The moving image processing device 30 takes out only the moving object region 100 out of these, and generates a processed video signal processed so as to display an image obtained by taking out only the moving object region 100. This processed video signal is input to the monitor 40, whereby an image of only the moving object region 100 as shown in FIG. 2B is displayed on the display screen of the monitor 40.

As described above, the moving image processing device 30 has a function of automatically detecting the moving object region 110 in the moving image given from the camera 20 and displaying the moving object region 110 on the monitor 40. In order to realize this function, the moving image processing apparatus 30 is configured as shown in FIG. 3, for example.

[0025] As shown in the figure, the moving image processing device 30 includes a composite video signal from the camera 20. Is input. The input conversion circuit 50 converts the input composite video signal into a digital video signal conforming to the YUV format, that is, color image data.

The color image data converted by the input conversion circuit 50 is sequentially input to the image dividing unit 52 for each frame. Each time this one-frame color image data is input, the image dividing unit 52 converts an input image constituted by the color image data into a plurality of pixels in each of the horizontal and vertical directions, for example, a (a ; Integer of 2 or more) Divide by pixel. Specifically, as shown in FIG. 4 (a), when the number of pixels in the horizontal direction of the input image is H and the number of pixels in the vertical direction is V, the image dividing unit 52 As shown in Fig. 4, the input image is H '(= H / a) in the horizontal direction and V' (= V / a) in the vertical direction, for a total of H 'XV' subsections 110, 110, ... Divide into In the present embodiment, the H XV force is 640 X 480, the a X a force is 4 X 4, and thus H 'X V, ί 160 X 120. That is, the input image is divided into a total of 19200 (= 160 160120Μ / J, sections 110, 110, ...).

The color image data after the division processing by the image dividing unit 52 is sequentially input to the initial detection unit 54 and the frame setting unit 56 one frame at a time. Among these, the initial detection unit 54 detects the moving object region 100 first when it appears. The moving object region 100 is detected by an image processing method such as a generally known frame difference method. Is detected. When the moving object area 100 is detected by the initial detection unit 54, the position (coordinate) data on the image of the pixel representing the moving object area 100, strictly speaking, the small sections 110, 110,. Is input to the frame setting unit 56.

[0028] When the above-described position data is input from the initial detection unit 54, the frame setting unit 56, based on this, as shown in FIG. 5, in the frame image at that time, the moving object region 100 A rectangular frame 120 is set to enclose. The position data of each of the small sections 110, 110,... In the rectangular frame 120, the YUV data of each pixel constituting each of the small sections 110, 110,. Entered.

[0029] The feature extraction unit 58 is for extracting the features of the overall YUV data of each of the small sections 110, 110, ... in the rectangular frame 120. Specifically, the feature extraction unit 5 For each sub-section 110 in the rectangular frame 120, Y data, U data, and V data for each of the 16 (= a X a) pixels in total constituting the sub-section 110 are extracted. Then, the average value and variance value of these Y data, U data, and V data are obtained. As a result, a total of six types of features, that is, the average value and the variance value of each of the Y data, U data, and V data, are extracted for each subsection 110. In addition, the feature extraction unit 58 has a total of 9 solid / J sections, each of the small sections 110 in the rectangular frame 120 and the eight perimeters / J, 110, 110,. 110, 110,... Are grouped together, and Y data, U data, and V data for each pixel constituting a total of nine subsections 110, 110,. Then, the average value and variance value of each of the extracted Y data, U data, and V data are obtained, and the average and variance values of each of these Y data, U data, and V data are obtained! Is also added as a feature of the central sub-section (in other words, the sub-section of interest) 110. In other words, a total of 12 types of features are extracted for each subsection 110. Then, the feature extraction unit 58 uses 12-dimensional feature data X [t, g] = {x [t, g], x [t, g], ··· as first vector data representing these 12 types of features. ·, X

1 2 i

[t, g], · · ·, x _n [t, g]} (t is an index representing the frame number (discrete time), g is

, I is an index representing the feature number (dimension), and the maximum value n of i is n = 12. ) Is generated. This feature data X [t, g] is input to the control unit 60.

The control unit 60, together with the map 62, is for realizing a block-unit learning type SOM. Specifically, the control unit 60 applies the feature data X [t, g] for each small block 110 input from the feature extraction unit 58 to the map 62, so that the small block 110 is moved to the moving object region. Identify whether 100 or background region 102 is formed. At the same time, the control unit 60 learns the map 62 using the feature data after the identification as learning data, updates the reference vector w ^j described later in detail, and performs classification. Note that the map 62 is in an unlearned state with respect to the initial first frame in which the moving object region 100 is detected by the initial detection unit 54 described above, so the position data obtained from the initial detection unit 54 is not included. Based on this, it is identified whether each subsection 110 forms the moving object area 100 or the background area 102. More specifically, as shown in FIG. 6, the map 62 has m × m neurons 64, 64,... Arranged two-dimensionally. In this embodiment, m = 6, that is, 36 (= 6 × 6) neurons 64, 64,... Are provided. Each neuron 64, 64,... Has a reference vector w ^j (j is an index representing the number of each neuron 64 in block 66 described later) as individual second vector data. , Has been granted.

On the other hand, the control unit 60 forms various square blocks 66 composed of 2 × 2 or more neurons 64, 64,. Of these blocks 66, 66,..., The block reference vector B = (b, b,.

1 2

, b,..., b} corresponds most to the above-described feature data X [t, g], more specifically, the Euclidean distance _D = I x [ _t , _g ] -BI is the shortest between the two , And make this a winner block. The block reference vector B mentioned here is a statistic of the reference vector w ^j of each neuron 64, 64,... Constituting each block 66, and is, for example, an average value. Specifically, an arbitrary (i-th) element b of the block reference vector B is expressed by the following equation (1).

[0033] [Equation 1]

Here, α is the total number of neurons 64 constituting the block 66, in other words, the maximum value of the number j of the Euron 64 in the block 66.

However, the total number T of blocks 66 that can be considered on the map 62 is enormous as represented by the following equation 2, and increases exponentially as the size m X m of the map 62 increases. Therefore, obtaining the Euclidean distance D for all of this huge number of blocks 66, 66,..., And thus the winner block, is a considerable burden on the control unit 60.

[0036] [Equation 2]

_T m (m-l) (2m-l)

6

Therefore, the control unit 60 in the present embodiment is based on a decision tree method as shown in FIG. And search for the winner block. First, out of the entire region m X m on the map 62 shown in Fig. (A), the size is one smaller than this [m- l] X [m- 1], as shown in Fig. (B). ] Select all (four) blocks 66, 66,… of size. And among the four selected blocks 66, 66,..., Each block reference vector B = {b, b,..., B, b, b} and feature data X [t, g] Search for the shortest Euclidean distance D = IX [t, g] -BI and make this the winner candidate block. FIG. 7B shows a state in which the block 66 indicated by the second diagonal pattern 68 from the right is a winner candidate block.

[0038] When the winner candidate block 68 of [m-l] X [m-l] size is determined in this way, the control unit 60, in the winner candidate block 68, as shown in FIG. , Select all the blocks 66, 66, ... that are [m-2] X [m-2] size smaller by one. Then, in the same manner as described above, the middle candidate of these blocks 66, 66,... Is searched for a winner candidate block 68 of [m-2] X [m-2] size. Similarly, as shown in FIG. 4D, the control unit 60 further reduces the size by one in the [m-2] X [m-2] size winner candidate block 68 [m-2] — Search for winner candidate block 68 of size 3] X [m—3]. In this search for the winner candidate block 68, as shown in FIG. 5E, a winner candidate block 68 of size 2 X 2 (in this embodiment, [m-4] X [m-4]) is searched. Until you continue.

[0039] When the winner candidate blocks 68, 68,... Of each size from [m—1] X [m—1] size to 2 X 2 sizes are determined, the control unit 60 determines these winner candidate blocks 68, 68. Select the one with the smallest Euclidean distance D = IX [t, g] -BI from the above. Then, the selected winner candidate block 68 is determined as a true winner block.

[0040] By searching for the winner block based on such a decision tree method, the total number T of blocks 66 for which the Euclidean distance D is obtained is drastically reduced from the value represented by the above-mentioned formula 2, and This is the value represented by number 3.

[0041] [Equation 3]

Γ = 4 (w−2) Thereby, the burden on the control unit 60 when searching for the winner block is greatly reduced, and the processing speed of the moving image processing apparatus 30 including the control unit 60 is improved. The control unit 60 searches for a winner block for each of the small sections 110. Then, each time the winner block is determined for each sub-section 110, the reference vector w ^j of each neuron 64 constituting the winner block and the feature data X [t, The cumulative amount of deviation wd ^ t, g] from g] is calculated.

[0044] [Numeric 4 wd _l ⁱ [t, g) = wd _i ⁱ [t, g-l] ₊ ^^

a

At the same time, the control unit 60 calculates the deviation accumulation rate _wr ^j [t, g] based on the following formula 5.

[0046] [Equation 5] w,] = [^ 1 1] + —

a

[0047] When the winner block is determined for all of the small sections 110, 110, ... and the deviation accumulation amount wd ^j [t, g] and the deviation accumulation rate wr ^j [t, g] are calculated, the control unit 60 The reference vector w ^j of each neuron 64 is updated based on the following equation (6).

[0048] [Equation 6] w. (New) =

1 wr '[t, G]

[0049] It should be noted that the winner block is determined for all the small sections 110, 110, ... as described above, and the deviation accumulation amount wd ^j [t, g] and the deviation accumulation rate wr ^j [t, g] are calculated. In the present embodiment, a series of processing by the control unit 60 is called an epoch. In other words, the reference vector w ^j of each neuron 64 is updated collectively, that is, in a batch, every time one epoch is completed. The control unit 60 repeats this epoch several times in one frame, for example, 30 times. Then, after the 30 epochs are executed, the epoch is repeated 30 times in the same manner for the next frame.

[0050] Now, as described above, the control unit 60 determines that each of the small sections 110 for the first frame includes the moving object region 100 and the background region 102 based on the position data provided from the initial detection unit 54. For the force S for identifying which of these is formed, and for the second and subsequent frames, the map 62 is used for the identification. For this reason, the control unit 60 attaches to the first frame. After the identification, the neurons 64, 64,... On the map 62 are classified based on the identification result.

[0051] This classification will be briefly described again here, which is the force disclosed in Patent Documents 1 and 2 described above. First, the control unit 60 assigns to each neuron 64, 64,... Of the winner block corresponding to each small section 110... The identification result in the first frame of the small section 110, that is, the small section 110. Is assigned a predetermined index value representing the force that forms either the moving object region 100 or the background region 102. Then, after assigning an index value to each neuron 64, 64,... Based on the identification results of all the small sections 110, 110,..., Statistics of the index value assigned to each neuron 64, for example, an average value , Ask. Then, it is determined whether the average value is close to the index value of the moving object region 100 or the background region 102, and based on the determination result, each neuron 64 is assigned to the moving object region 100 or the background region 102. Decide if it belongs to a class. As a result, the neurons 64, 64,... On the map 62 are classified into those belonging to the moving object region 100 (lattice pattern) and those belonging to the background region 102 (hatched pattern) as shown in FIG. Divided.

[0052] Using the map 62 classified based on the identification result of the first frame, the control unit 60, for the subsequent second frame, each of the small sections 110, 110, 10 in the rectangular frame 120 described above. ... identifies whether each forms moving object region 100 or background region 102. Specifically, among the neurons 64, 64,... Constituting the winner block, the small section 110 having many belonging to the moving object region 100 is identified as forming the moving object region 100. On the other hand, among the neurons 64, 64,... Constituting the winner block, the small section 110 having many belonging to the background area 102 is identified as forming the background area 102. It should be noted that, among the neurons 64, 64,... Constituting the winner block, the small section 110 in which the number belonging to the moving object region 100 and the number belonging to the background region 102 is the same number is one of the predetermined regions. For example, the moving object region 100 is identified.

The identification result by the control unit 60 is given to the output conversion unit 70. Color image data is sequentially input to the output conversion unit 70 frame by frame from the input conversion unit 50 described above. ing. Each time color image data for one frame is input, the output conversion unit 70 is a small unit that is identified as forming the moving object region 100 by the control unit 60 out of the input image constituted by the color image data. The above-described processed video signal processed so as to display only the pixels constituting the section 110 is generated. When this processed video signal is input to the monitor 40, an image of only the moving object region 100 as shown in FIG. 9 is displayed on the display screen of the monitor 40.

[0054] Further, the control unit 60 classifies each neuron 64, 64, ... on the map 62 again based on the identification result of the second frame. That is, not only the reference vectors w ^{j of the} neurons 64, 64,... But also the classes of the neurons 64, 64,. Then, based on the learned map 62, the next third frame is identified. Thereafter, identification and learning are repeated each time the frame is changed. For each frame after the third frame, the above-described rectangular frame 120 is set based on the identification result of the previous frame. For example, a rectangular frame 120 having the same size as the previous frame is set so as to surround all the small sections 110, 110,... Identified as forming the moving object region 100 in the previous frame.

[0055] When there is no small section 110 forming the moving object region 100 in a certain frame, the control unit 60 stops identification and learning and resets the initial detection unit 54. Thereby, the moving image processing apparatus 30 returns to the initial state before the moving object region 100 appears.

FIG. 10 shows an example of an actual input image and output image of the moving image processing apparatus 30 of the present embodiment. In the figure, the image shown on the left is the input image, and the image shown on the right is the output image. Figures (a), (b), and (c) are images of the first frame, the 20th frame, and the 40th frame, respectively. From FIG. 10, it can be seen that only a person who is going to cross the field of view (field of view) of the camera 20 is detected as a moving object. In other words, the moving image processing device 30 of the present embodiment has made it clear that the moving object can be properly touched with a detection opening.

[0057] The beg controller 60 that realizes the moving object detection using such a map 62 is shown in FIG. The object detection task shown in the flowchart of FIG. 12 is executed.

That is, when the moving object region 100 is detected by the initial detection unit 54, specifically, when the above-described position data as initial identification data is input from the initial detection unit 54, the control unit 60 Then, the process proceeds to step S1 in FIG. 11, and the position data is stored. Then, the process proceeds to step S3. After setting “;!” To the flag F indicating that the moving object region 100 is detected, the process proceeds to step S5.

[0059] In step S5, the control unit 60 initializes the map 62, and in detail sets a random number to each reference vector w ^j of each neuron 64, 64, ... on the map 62. Then, the process proceeds to step S 7, and feature data X [t, g] is acquired from the feature extraction unit 58. The feature data X [t, g] is also stored by the control unit 60. Furthermore, after setting the initial value “1” to the index e indicating the number of executions of the epoch described above in step S9, the control unit 60 sets the small section 110 in the rectangular frame 120 described above in step S11. The initial value “1” is set in the index g representing the number of, and the winner block search process in step S 13 is executed.

[0060] In the winner block search process in step S13, the control unit 60 searches for a winner block based on the above-described decision tree method. When the winner block is determined for the small block 110 that is the current processing target, the control unit 60 proceeds to step S15 and determines whether or not the flag F described above is “0”.

[0061] If the flag F force is not 0 "in step S15, that is, immediately after the moving object region 100 is detected, the control unit 60 proceeds to step S17 and stores the initial identification stored in step S1 described above. Based on the data, the output conversion unit 70 is controlled to display only the moving object region 100. Then, the control unit 60 sets “0” to the flag F in step S19, and then prepares for the update in step S21. Proceed to processing.

[0062] On the other hand, if the flag F force is 0 "in step S15, that is, if there is experience of executing step S17 after the moving object region 100 is detected, the control unit 60 proceeds to step S23. Then, in this step S23, it is determined whether or not the current epoch execution number e is “1”, and if it is “1”, the process proceeds to identification processing in step S25. [0063] In step S25, the control unit 60 applies the feature data X [t, g] of the small section 110 that is the current processing target to the map 62, and the small section 110 moves to the moving object region 100. And the force that forms any of the background regions 102. Then, the process proceeds to step S27, and the output conversion unit 70 is controlled based on the identification result in step S25. That is, the output conversion unit 70 is controlled so that the small section 110 that is the current processing target forms the moving object region 100 and displays it when it does not. After executing step S27, the process proceeds to the update preparation process of step S21.

[0064] In step S21, the control unit 60 calculates an accumulated deviation amount wd ^j [t, g] for each neuron 64 constituting the winner block, based on the above-described equation 4. In addition, the deviation accumulation rate _wr ^j [t, g] is calculated based on _Equation 5. After these calculations, the control unit 60 proceeds to step S29.

[0065] In step S29, the control unit 60 determines whether or not the number g of the small section 110 that is the current processing target has reached the maximum value G, that is, all the small sections 110, 110,. It is determined whether step S13 to step S27 have been executed. If there is a small section 110 for which steps S13 to S27 have not yet been executed, the process proceeds to step S31, and the value of the number g of the small section 110 is incremented by “1”, and then step S Return to 13. On the other hand, when step S13 to step S27 are executed once for all / J, sections 110, 110,..., The process proceeds to step S33.

[0066] In step S33, the control unit 60 updates the reference vector w ^j of each neuron 64 based on the above-described Expression 6. Then, the process proceeds to step S35 in FIG. 12 to determine whether or not the epoch execution count e has reached its maximum value E. As described above, the maximum number of executions E of the epoch in the present embodiment is 30 times.

[0067] In step S35, if the epoch execution count e has not reached the maximum value E (= 30), the control unit 60 that repeats the epoch again proceeds to step S37. Then, after the value of the epoch execution number e is incremented by “1” in step S37, the process returns to step S11 in FIG. On the other hand, when the epoch execution number e reaches the maximum value E, the process proceeds from step S35 to step S39. [0068] In step S39, the control unit 60 determines whether or not the moving object region 100 still exists. When the moving object region 100 exists, the process proceeds to the class dividing process in step S41.

[0069] In the classification process of step S41, the control unit 60 performs the map in the manner described above.

Classify each neuron 64, 64, ... on 62. Then, after this classification is completed, the process proceeds to step S43 where the feature data X [t + 1, g] of a new frame is acquired. After the frame number t is incremented by "1", Return to step S7.

[0070] If the presence of the moving object region 100 is not confirmed in step S39, the control unit 60 proceeds to step S45. In step S45, the initial detection unit 54 is reset, and the series of object detection tasks is completed.

[0071] Here, the winner block search process of step S13 in this object detection task will be described in more detail with reference to FIG.

[0072] In the winner block search process, the control unit 60 first proceeds to step S101, where the map

The entire 62 is set as a temporary winner candidate block 68. In step S103, the size p of the winner candidate block 68 to be searched for is set, and more specifically, the size p = m−l is set.

[0073] Then, the control unit 60 proceeds to step S105, and among all the blocks 66, 66, ... having a size of pXp in the winner candidate block 68, the respective block reference vectors B and the characteristic data Search for the shortest Euclidean distance D between X [t, g]. The control unit 60 stores the block 66 searched in step S105 as the winner candidate block 68 in the next step S107, and also records the Euclidean distance D of the winner candidate block 68.

Further, the control unit 60 proceeds to step S109, and determines whether or not the current block size p has reached the minimum value “2”. If not, the process proceeds to step S 111, the block size p is reduced by “;!”, And the process returns to step S 105. On the other hand, if the block size p has reached the minimum value “2”, the process proceeds to step S113.

[0075] In step S113, the control unit 60 selects the most Euclidean among the plurality of winner candidate blocks 68, 68, ... searched by repeating the above-described steps S105 to S107. Search for a short distance D. Then, the searched winner candidate block 68 is determined as a true winner block, and the winner block search process shown in the flowchart of FIG. 13 is terminated.

Furthermore, with reference to FIG. 14, the update preparation process in step S21 in the object detection task described above will be described in detail.

[0077] In this update preparation process, the control unit 60 first proceeds to step S201, where the index j representing the number of the neuron 64 in the current winner block is an initial value "1"

Then, go to step S203 and set index i representing the feature (dimension) number.

After the initial value “1” is set, the process proceeds to step S205.

In step S205, the control unit 60 calculates the accumulated deviation amount _w ^dj [t, g] based on the above-described equation 4. Then, the calculation result wd ^j [t, g] is stored in the next step S207.

Furthermore, the control unit 60 proceeds to step S209, and calculates the deviation accumulation rate wr ^j [t, g] based on the above equation 5. Then, the calculation result wr ^j [t, g] is stored in the next step S211 and then the process proceeds to step S213.

[0080] In step S213, the control unit 60 determines whether or not the value of the index i representing the feature number has reached the maximum value n (= 12), that is, step S205 to step S211 for all features. It is determined whether or not it has been executed. If there is a feature that has not yet been executed from step S205 to step S211, the process proceeds to step S215, the index i is incremented by "1", and the process returns to step S205. On the other hand, if step S205 to step S211 have been executed for all features, the process proceeds to step S217.

[0081] In step S217, the control unit 60 performs step for the power of whether or not the value of the index j representing the number of the neuron 64 has reached the maximum value α, that is, for all the neurons 64 in the current winner block. It is determined whether or not S203 to step S215 have been executed. If there is a neuron 64 that has not yet executed steps S203 to S215, the process proceeds to step S219, the value of index j is incremented by “1”, and the process returns to step S203. On the other hand, when Steps S203 to S215 are executed for all neurons 64, the update preparation shown in the flowchart of FIG. The process ends.

As described above, according to the present embodiment, a block unit learning type SOM in which each neuron 64, 64,... Constituting the map 62 is handled in units of blocks is used. The moving image processing apparatus 30 for detecting the moving object region 100 can be realized. In addition, regardless of the mode of the moving object region 100 (for example, whether it is blackish or reddish), the characteristics according to the mode can be accurately captured by learning. Therefore, it is possible to respond flexibly and appropriately to various aspects (situations) of the moving object region 100.

In the present embodiment, the force of dividing the input image by a X a pixels as shown in FIG. 4 by the image dividing unit 52 shown in FIG. 3 is not limited to this. For example, a X b (b; an integer different from a) pixels may be divided, or may not be divided extremely, that is, the image dividing unit 52 may be excluded from the configuration of FIG. However, by providing such an image dividing unit 52, the burden on the subsequent stage, particularly the control unit 60, is reduced. This is extremely effective for improving the processing speed of the entire moving image processing apparatus 30 including the control unit 60.

Further, the frame setting unit 56 shown in FIG. 3 sets the rectangular frame 120 as shown in FIG. 5, and the position data of only the small sections 110, 110,... Surrounded by the rectangular frame 120. And the force that allows YUV data to be input to the feature extraction unit 58. That is, the frame setting unit 56 is excluded from the configuration of FIG. 3, and the position data and YUV data of all the small sections 110, 110,... (Or pixels) are input to the feature extraction unit 58. Good. However, by providing such a frame setting unit 56, the burden on the subsequent stage, in particular, the feature extraction unit 58 and the control unit 60 is reduced. This is also extremely effective in improving the processing speed of the entire moving image processing apparatus 30. In particular, when a small section 110 having the same characteristics as the moving object area 100 exists in the background area 102 other than the rectangular frame 120, this may be erroneously identified as constituting the moving object area 100. The effect of the sub-compartment 110 acting as a kind of noise is reduced.

[0085] Further, the feature extraction unit 58 shown in FIG. 3 calculates the average value and the variance of each of the Y data, U data, and V data of the surrounding small sections 110, 110,. A total of 12 types (dimensions) of features were extracted, but this is not a limitation. For example, by eliminating the features of the surrounding subdivisions 110, 110,..., There are six types of Y data, U data, and V data for each subdivision (note subdivision) 110 only, the average value and the variance value. A feature may be extracted, or only one of the average value and the variance value may be extracted. In addition, RGB format color space data may be extracted, and only luminance data may be extracted. Further, position (coordinate) data on the image of each pixel may be extracted together. In other words, an appropriate feature should be extracted according to the situation.

[0086] The force S determined by the control unit 60 shown in FIG. 3 to search for a winner block based on the decision tree method as shown in FIG. 7 is not limited to this. That is, the Euclidean distance D may be obtained for all the blocks 66, 66,... Considered on the map 62, and the winner block may be searched based on the result. However, in this case, since a considerable burden is imposed on the control unit 60 as described above, it is preferable to search for a winner block based on the decision tree method as in the present embodiment.

[0087] Further, the control unit 60, every time one epoch is finished, batchwise, force was decided to update the reference vector w ^j of each neuron 64 S, not limited to this. For example, the reference vector w ^j may be updated each time a winner block is determined for each subdivision 110. In this case, the update formula for the reference vector w ^j is expressed by the following equation (7).

[0088] [Equation 7] \ /,, \ Xi [t, g]-w ^ old)

w- (new) = w; (old) +

a

However, when the reference vector w ^j is updated based on the number 7 every time a winner block is determined, it is needless to say that the burden on the control unit 60 is greater than when updating the batch. Therefore, it is desirable to update the data batchwise as in this embodiment.

[0090] Further, the control unit 60 may repeat the epoch over the number of times other than the force determined to repeat the epoch 30 times per frame. Also, instead of simply repeating the epoch, for example, comparing the accumulated deviation wd ^j [t, g] in the previous epoch and the current epoch, the difference between the two, that is, the quantization error, is a predetermined threshold value. The epoch may be performed for the next frame when: In addition, in the present embodiment, the size of each block 66 is p X p, in other words, the shape of the block 66 is a square, but may be a rectangle. However, in order to simplify the processing by the control unit 60, particularly when determining the winner block, the force S for making the block 66 a square is desirable. Similarly, the map 62 need not be an m × m square, but may be a rectangle, but a square is more convenient.

In the present embodiment, a fixed camera is used as the camera 20 shown in FIG. 1. However, a movable camera having a pan head may be used. In particular, in order to move the moving object region 100 to the center position on the image based on the position (coordinate) data on the image of the moving object region 100 detected by the moving image processing device 30 (control unit 60). If the amount of displacement is determined and the pan head is controlled (pan and tilt) based on this amount of displacement, an automatic tracking function can be realized in which the moving object is always captured at the center of the camera. Even when the automatic tracking function is realized, the identification and learning procedure of the map 62 in units of blocks is the same as that of the fixed camera 20 described in the present embodiment. Needless to say, the position (coordinate) data of each pixel on the image is indispensable for this identification and learning. Accordingly, since there is no particular problem in obtaining the displacement amount referred to here, the present invention is extremely useful for realizing the automatic tracking function.

Note that the moving image processing apparatus 30 in the present embodiment can be realized by a general-purpose computer such as a personal computer. In addition, only a program for causing a general-purpose computer to function as the moving image processing apparatus 30 can be provided.

Claims

The scope of the claims

[1] Image data for one frame constituting a moving image including pixels that form a moving object region and pixels that form a non-moving object region is input, and n (n: multiple) features of the image data are input. Extracting for each pixel to generate n-dimensional first vector data;

A map including a plurality of neurons arranged in two dimensions, each having n-dimensional second vector data, and belonging to one of the classes of the moving object region and the non-moving object region;

The third vector data, which is the statistic of the second vector data of each of the plurality of blocks composed of a part of the neurons adjacent to each other, is selected as the winner block corresponding to the first vector data for each pixel. And an identification for identifying whether each of the pixels forms the moving object region or the non-moving object region based on the class to which the neuron constituting the winner block belongs. Means,

Updating means for updating the second vector data and the class of the neuron constituting the winner block corresponding to the pixel based on the identification result by the identifying means and the first vector data of each pixel;

Comprising

After the updating based on all the pixels is performed by the updating means, new image data for one frame is input to the extracting means.

Moving image processing device.

2. The moving image processing apparatus according to claim 1, further comprising display means for displaying only the pixels identified as forming the moving object region by the identification means.

[3] The search means includes a winner candidate search means for searching, for each pixel, a winner candidate block whose third vector data corresponds to the first outer data among the plurality of blocks having the same size. The search by the winner candidate search means is performed for each pixel so that the winner candidate block searched by the winner candidate search means is sequentially searched for smaller winner sizes and other winner candidate blocks. A repetitive execution means for repetitive execution and a plurality of the search results obtained by repetitively executing a search by the winner candidate search means. 2. The moving image processing apparatus according to claim 1, further comprising: a determining unit that determines, for each pixel, a winner candidate block that has the third vector data corresponding to the first vector data most as the winner block.

4. The moving image processing apparatus according to claim 1, wherein the updating means performs updating based on all the pixels at once after the identifying means has identified all the pixels.

[5] The above image data includes color information,

The features include the color information,

The moving image processing apparatus according to claim 1.

6. The moving image processing apparatus according to claim 1, wherein the feature for each of the pixels includes a feature of a neighboring pixel in the vicinity of the pixel.

7. The moving image processing apparatus according to claim 1, wherein the extracting unit handles a plurality of adjacent pixels as one pixel.

[8] The camera further comprises a frame setting means for setting a frame including the moving object area and a part of the non-moving object area,

The extraction means handles only the pixels within the frame;

The moving image processing apparatus according to claim 1.

[9] Extract n (n; plural) features of image data for one frame constituting a moving image including pixels forming a moving object region and pixels forming a non-moving object region for each pixel. An extraction process to generate n-dimensional first vector data,

A map forming process in which a plurality of neurons each having n-dimensional second vector data and belonging to one of the classes of the moving object region and the non-moving object region are two-dimensionally arranged; and

The third vector data, which is the statistic of the second vector data of each of the plurality of blocks composed of a part of the neurons adjacent to each other, is selected as the winner block corresponding to the first vector data for each pixel. Each of the pixels forms either the moving object region or the non-moving object region based on the search process to search for and the class to which the neuron constituting the winner block belongs. An identification process for identifying

An updating process for updating the second vector data and the class of the neuron constituting the winner block corresponding to the pixel based on the identification result in the identification process and the first vector data of each pixel; ,

Comprising

A moving image processing method in which, after the update process is performed based on all the pixels, the image data for one frame is subjected to processing in the extraction process.

Extract n (n; multiple) features of image data for one frame that make up a moving image including pixels that form a moving object region and pixels that form a non-moving object region. An extraction procedure for generating the first vector data of

A map forming procedure for forming a map having a plurality of neurons each having n-dimensional second vector data and belonging to one of the classes of the moving object region and the non-moving object region;

The third vector data, which is the statistic of the second vector data of each of the plurality of blocks composed of a part of the neurons adjacent to each other, is selected as the winner block corresponding to the first vector data for each pixel. An identification for identifying whether each of the pixels forms the moving object region or the non-moving object region based on the search procedure to search for and the class to which the neuron constituting the winner block belongs Procedure and

An updating procedure for updating the second vector data and the class of the neuron constituting the winner block corresponding to the pixel based on the identification result by the identification procedure and the first vector data of each pixel;

A moving image processing program for causing a computer to execute

After the update based on all the pixels is performed by the update procedure, new one frame of image data is processed by the extraction procedure.

A moving image processing program.