WO2002091303A1 - Systeme et procede de reconnaissance d'humain - Google Patents

Systeme et procede de reconnaissance d'humain Download PDF

Info

Publication number
WO2002091303A1
WO2002091303A1 PCT/JP2001/006645 JP0106645W WO02091303A1 WO 2002091303 A1 WO2002091303 A1 WO 2002091303A1 JP 0106645 W JP0106645 W JP 0106645W WO 02091303 A1 WO02091303 A1 WO 02091303A1
Authority
WO
WIPO (PCT)
Prior art keywords
moving
block
human recognition
array
noise
Prior art date
Application number
PCT/JP2001/006645
Other languages
English (en)
Japanese (ja)
Other versions
WO2002091303A8 (fr
Inventor
Takahiro Narumi
Original Assignee
Systemk Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Systemk Corporation filed Critical Systemk Corporation
Publication of WO2002091303A1 publication Critical patent/WO2002091303A1/fr
Publication of WO2002091303A8 publication Critical patent/WO2002091303A8/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention relates to a human recognition system and a method for recognizing and extracting a human from data in MPEG (MoVingPictRenewsEpsPertsGroup) data. Kage Itoda Technology
  • a moving image captured by a video camera or the like is digitized, and a pixel that changes in the moving image is extracted to extract a human being. No. 6,893,32.
  • the present invention has devised a high-speed and high-accuracy human recognition system and method using characteristics of MPEG data, which is a moving image data format that has been widely used in recent years.
  • MPEG data is composed of still image data and difference data from the still image data.
  • a still image is called an I-picture, and encodes only the image data in the picture.
  • a P picture in which the difference data that records how the picture changes (that is, the difference from the past image) is encoded, the difference from the past image, and the prediction of the motion of the future image Is composed of a total of three types of pictures, ie, B pictures in which
  • One group is formed by repeating a plurality of B pictures and P pictures.
  • a group is a unit for fast-forwarding and playing back MPEG data.
  • a typical example of a group is “ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ B”, and this is repeated to form a unit of fast forward and reverse playback.
  • MPEG data has the following six-layer data structure.
  • Sequence layer A series of MPEG data streams.
  • GOP layer A group of pictures that is a unit of random access. .
  • Macroblock layer block of 16 pixels x 16 pixels. It consists of a luminance block and two chrominance blocks.
  • Block layer A block of 8 pixels x 8 pixels. It consists of a luminance signal or a color difference signal.
  • a first invention is a human recognition system for recognizing a human from a moving image in the MPEG data format, wherein the human recognition system comprises: MPEG data processing means for reading MPEG data; and a P picture of the read MPEG data.
  • a moving block is extracted from the extracted moving block, a moving noise block that becomes noise is deleted from the extracted moving block, and at least one or more moving blocks from which the moving noise block has been deleted are overlapped for each picture, thereby forming a moving area.
  • This is a human recognition system that has a human recognition unit that extracts and compares the moving area and the next moving area and recognizes as a human when they overlap.
  • the human recognizing unit extracts a block whose coordinates are changed and has a correlation from the P picture as the moving block, and the moving block is a moving block.
  • This is a human recognition system that stores whether or not it is in an array.
  • the extracted moving block in the stored array includes a plurality of P pictures in a predetermined range.
  • This is a human recognition system that determines moving blocks that do not exist continuously as moving noise blocks that become noise, and deletes the moving blocks from the array.
  • a fourth invention is a human recognition system, wherein the human recognition means extracts a moving area by taking a logical sum of each block of the array of the pictures to be superimposed when superposing the moving blocks for each picture. is there.
  • the human recognition means calculates an arithmetic sum for each block of the extracted array of the moving areas and the array of the next moving area, and as a result of the arithmetic sum, can be regarded as the overlap.
  • the human recognition system recognizes the array element of the moving area as a human.
  • a sixth invention is the human recognition system, wherein the human recognition system includes display means for statistically processing and displaying the number of persons recognized by the human recognition means.
  • a seventh invention is a human recognition method for recognizing a human from a moving image in MPEG data format, wherein the human recognition method reads MPEG data and extracts a moving block from a P picture of the read MPEG data.
  • a moving noise block that becomes a noise is eliminated from the extracted moving block, and a moving area is extracted by superimposing at least one or more moving blocks from which the moving noise block has been erased for each picture;
  • This is a human recognition method that compares the moving area with the next moving area and recognizes the person as overlapping if they overlap.
  • a block whose coordinates are changed and which has a correlation from the P picture is extracted as a moving block, and whether or not the block is a moving block is arranged. This is the human recognition method stored in.
  • the moving block in which the extracted moving block is not continuously present in a predetermined range of the number of P-pictures in the stored array is provided.
  • This is a human recognition method in which a block is determined as a moving noise block, and a block that is a moving block is deleted from the array.
  • a tenth invention is a human recognition method for extracting a moving area by taking the logical sum of the arrangement of the superimposed pictures for each block when superimposing the moving blocks for each picture.
  • the arrangement of the extracted moving area and the next moving area Arithmetic sum for each block with the array is taken, and if there is an array element that can be regarded as overlapping as a result of the arithmetic sum, human recognition that recognizes the array element of the moving area as a human Is the way.
  • a twelfth invention is the human recognition method, wherein the human recognition method further includes statistically processing and displaying the number of recognized humans.
  • the first to twelfth inventions instead of reproducing a moving image and extracting a person from the image by pattern matching of the image as in the past, simple data processing (arithmetic operation) is performed from a bit string of MPEG data. , Logical operations, and comparisons), humans can be recognized and extracted. With this, the number of people can be instantaneously grasped only by reading the M PEG data that one wants to know the number of people into the human recognition system. That is, unlike the conventional case, the photographing time and the number of persons recognition time are not equal, and processing can be performed in a short time.
  • FIG. 1 is a system configuration diagram showing an example of the system configuration of the present invention.
  • FIG. 2 and FIG. 3 are flowcharts showing an example of the flow of the process of the present invention.
  • Fig. 4 is a conceptual diagram of mobile block extraction.
  • FIG. 5 is a conceptual diagram showing the correlation between blocks.
  • Figure 6 is a conceptual diagram when extracting blocks.
  • Fig. 7 is a conceptual diagram when static noise is eliminated.
  • FIG. 8 is a conceptual diagram when the presence or absence of a moving block is stored as an array.
  • FIG. 9 is a diagram showing an example of an image when 15 P pictures are overlapped.
  • Fig. 10 shows a conceptual diagram when moving blocks are extracted from P-pictures.
  • Fig. 11 shows an example of the screen when statistical processing is added.
  • FIG. 12 is a diagram showing one picture. Of code
  • the human recognition system has MPEG data processing means, human recognition means, and display means.
  • the human recognition system naturally has storage means (not shown) such as a memory for storing data in addition to the above means, and when storing data, the data is stored in this storage means even if not specified. It is a matter of course. When using stored data, it is natural that data is extracted from storage means.
  • the MPEG data processing means is means for reading MPEG data.
  • the human recognition means is a means for extracting a P picture from MPEG data, and extracting and storing a moving block from the P picture.
  • identification is performed based on a header that specifies the type of each picture in the picture layer of MPEG data.
  • a picture is usually processed for each block of 8 pixels ⁇ 8 pixels, and this block is also identified based on a header indicating a block.
  • when processing a block when processing a picture, it is naturally identified by the header 1 even if it is not specified. Have been. It is also a means for erasing the moving noise block from the extracted moving block, extracting humans from the moving block from which the moving noise block has been deleted, and calculating the number of persons.
  • the display means is a means for notifying the user of the number of persons present in the MPEG data and displaying it. In this case, it is also a means for statistically processing and displaying the number of people for each time and each date, or for displaying the trajectory of a human.
  • An example of the process flow of the present invention will be described in detail with reference to the system configuration diagram of FIG. 1 and the flowcharts of FIGS. 2 and 3.
  • the user who wants to know the number of persons reflected in the MPEG data reads the MPEG data into the MPEG data processing means of the human recognition system (S100).
  • MPEG data is composed of one group from a plurality of pictures (I picture, P picture, B picture). That corresponds to the G ⁇ P layer.
  • This GOP layer is composed of a plurality of pictures such as "PBB", for example, ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ .
  • the MPEG data processing means transmits the read MPEG data to the human recognition means.
  • the human recognition means that receives the MPEG data extracts only P pictures from this series of pictures (S110), and changes the coordinates from among the blocks of the two extracted P picture block layers.
  • the blocks that have been correlated with each other are extracted and stored as moving blocks (S120).
  • the presence / absence of movement for each block (value 1 if it is a moving block, value 0 if it is not a moving block) is stored as an array. For example, if you have an array b l o c k [1]
  • Fig. 10 shows a conceptual diagram of a P picture when a moving block is extracted from one P picture (Fig. 1 and Fig. 2). In the picture When there are 64 blocks and the block number of each block is determined as shown in Fig. 8 (b),
  • This movement block is extracted until the MPEG data ends, and after the MPEG data ends, the number of persons in the MPEG data is determined (S140).
  • the number of persons is determined after the MPEG data is ended first, but the number of persons can be recognized in real time as well.
  • the state in which the noise has been erased is updated to an array (S160). That is, the elements of the array of moving noise blocks recognized as noise are updated to 0.
  • an image (superimposed image) is created by superimposing 15 P-pictures at a time (S170).
  • the number of superimposed images is 15, but it may be changed arbitrarily.
  • graphic [k] [1] will take the logical sum (OR) of 15 pictures for each block corresponding to b 1 ock [i] [j].
  • An example of the superimposed image is shown at 10 in FIG.
  • the moving area means that the array of graphic [k] [1] is 1 and the value of 1 is larger than the predetermined number, and those arrays are adjacent in either the vertical or horizontal direction. Say the case.
  • the moving area is 20 in Fig. 9.
  • the moving area judgment array (motion [m] [n] m: block number, n: order of moving area judgment images) is composed of a superimposed image (array) of the first to fifteenth pictures and 2 This is an array created by further superimposing the images (array) of the 16th to 16th images. Create multiple moving area judgment arrays.
  • a diagram of the moving area judgment array is shown in Fig. 9 at 30.
  • Each value of the array mootion [m] [n] indicating the moving area judgment image is represented by the sum of the elements of each block of the two 15-overlapped images. That is, each value of the array mo tion [m] [n] is represented by one of 0, 1, and 2.
  • the value 0 indicates that it is not a moving block in two 15-overlaid images
  • the value 1 indicates that it is a moving block in any one of 15 15-overlaid images.
  • the value 2 indicates a moving block in both superimposed images.
  • the movement area is determined to be human and the number of humans is determined.
  • Count S190
  • tracing the movement of the center element (40 in FIG. 9) of the movement area it is possible to determine that the person is the same person and to trace the movement state of the person.
  • this movement status as data, for example, it is possible to grasp the movement trajectory of the third person counted.
  • Fig. 11 shows an example in which statistical data processing is added.
  • the statistical data of the number of people, the movement trajectory data, and the like extracted by the human recognition means are transmitted to the display means, and the display means can provide the information to the user by displaying the data.
  • overlapping images means that the operation (logical operation, arithmetic operation) of the corresponding block for each picture is performed, and the blocks are actually superimposed and displayed as an image. It is not. In other words, the operation is simply performed by referring to the corresponding block number and picture number of each array.
  • the superimposed image and the moving area determination image are represented as “images” to embody the description of the present embodiment, but each array merely stores and expresses an element. Naturally, it is a data string.
  • each array is shown as a separate array, but this is an array showing logical distinction, but not an array showing physical distinction. Therefore, it goes without saying that the number of sequences may be other than this embodiment.
  • the correlation is that when there are two blocks a and b separately in two pictures A and B, there is a correlation when the information change between a and b is not so large. No, there is no correlation when the information change between a and b is large.
  • Fig. 4 shows a conceptual diagram.
  • Equation 1 when VAROR is the mean square deviation of the picture, Equation 1 holds, and when VAR is the mean square prediction error of the picture, Equation 2 holds.
  • Equation 1 and 2 it is assumed that there is a correlation when the relationship of VAR ⁇ 64 or VAR> 64 and VAROR ⁇ VAR is satisfied.
  • Fig. 5 is a conceptual diagram showing the above relationship.
  • the movement amount can be calculated by performing Equation 3 on the block.
  • Min represents the minimum value of the search
  • h and V represent the number of search pixels in the horizontal and vertical directions
  • P (x + h, y + v) represents the block of the previous picture.
  • a luminance signal pixel value is shown
  • C (x, y) represents a luminance signal pixel value of a block of the next picture.
  • C (X, y) and P (x + h, y + v) are optimal blocks.
  • Fig. 6 shows a conceptual diagram for extracting blocks with the above relationships, that is, correlative and changing coordinates. In FIG.
  • Static noise includes fluctuations in the water surface, irregular reflection of light, fluctuations in branches and leaves, natural environments such as rain and snow, and changes in the background light and shade due to lighting.
  • Dynamic noise is defined as moving objects with a certain speed, such as cars and bicycles, moving objects with a certain size, such as ports, moving objects with a certain size, such as small animals, and fixed speeds such as poles.
  • moving object which refers to noise due to artifacts.
  • the moving block is erased by using the discontinuity of the moving block within a predetermined period (noise elimination unit time). No.
  • FIG. 7 shows a conceptual diagram when a static moving noise block is erased.
  • moving blocks 1, 3, and 4 are not all extracted as moving blocks in all periods within the noise elimination unit time.
  • moving block 1 is derived from 0 to t2 and t3. Extracted only in the period of t4.
  • moving block 2 is extracted in all periods. The fact that it is not extracted in every period means that the moving block is very likely to be noise, that is, by eliminating such moving blocks, the static moving noise blocks are eliminated. Will be done.
  • moving blocks other than the moving blocks extracted in all P pictures within the noise erasing unit time are deleted.
  • the unit of the cycle is the number of pictures, and it is preferable to determine how many pictures are recognized as a moving block.
  • how many pics Whether or not to use the texture as a cycle can be set arbitrarily, but it is preferable that a block that is continuously present in 3 to 7 blocks be a moving block and the other blocks be noise. That is, in an array (block [i] [j]) that stores whether or not it is a moving block, “1” is stored as an element in the number of pictures consecutively for that moving block. What is necessary is just to judge whether it is performed.
  • the dynamic noise is determined in advance by comparing the moving speed, the moving direction, the moving range, and the number of pixels from the moving blocks extracted in all the pictures within the noise elimination unit time. Eliminate dynamic moving noise blocks by eliminating moving blocks that exceed these limits.
  • the moving speed of the moving object can be grasped and the time between P-pictures can be grasped in S120, the moving speed is calculated, and the moving speed is calculated when the moving speed is high or very low. Erase the moving block.
  • the moving block is determined as a dynamic moving noise block and deleted.
  • the moving block is judged and deleted as a dynamic moving noise block. If the moving block has a regular moving speed and moving direction, such as when it is simply rotating, it is judged as dynamic noise and deleted.
  • the number of pixels is a measure of the number of pixels contained in the extracted object. For example, if the number of pixels is small, such as a pole, they are determined as dynamic noise and eliminated.
  • a storage medium storing a software program for realizing the functions of the present embodiment is supplied to the system, and a computer of the system reads and executes the program stored in the storage medium.
  • a computer of the system reads and executes the program stored in the storage medium.
  • the program itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program naturally constitutes the present invention.
  • a storage medium for supplying the program code for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a magnetic tape, a nonvolatile memory card, and the like can be used.
  • the present invention instead of extracting humans from displayed images and counting the number of people as in the conventional case, it is possible to perform a moving object determination and the like from image data itself (the bit string itself) by data processing. In addition, a highly accurate human recognition system has become possible. It is also possible to process the extracted number of humans as statistical data at regular intervals, such as hourly and datewise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

L'invention concerne un système et un procédé destinés à reconnaître un humain avec une vitesse et une précision élevées au moyen des caractéristiques de données MPEG, soit des données d'images animées. Ce système de reconnaissance d'humain destiné à reconnaître un humain à partir d'une image animée sous format MPEG comprend une unité de traitement de données MPEG destinée à la lecture de données MPEG, ainsi qu'une unité de reconnaissance d'humain destinée à extraire un bloc en mouvement à partir d'une image P des données MPEG, à effacer un bruit de bloc en mouvement, soit un bruit issu du bloc en mouvement extrait, à extraire la zone en mouvement par superposition d'au moins un bloc en mouvement, dont le bruit a été effacé, pour chaque image, et à reconnaître l'humain si la zone en mouvement et la zone en mouvement suivante sont superposées lors d'une comparaison.
PCT/JP2001/006645 2001-05-07 2001-08-02 Systeme et procede de reconnaissance d'humain WO2002091303A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2001-136390 2001-05-07
JP2001136390A JP3896259B2 (ja) 2001-05-07 2001-05-07 人間認識システム

Publications (2)

Publication Number Publication Date
WO2002091303A1 true WO2002091303A1 (fr) 2002-11-14
WO2002091303A8 WO2002091303A8 (fr) 2003-02-20

Family

ID=18983674

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2001/006645 WO2002091303A1 (fr) 2001-05-07 2001-08-02 Systeme et procede de reconnaissance d'humain

Country Status (2)

Country Link
JP (1) JP3896259B2 (fr)
WO (1) WO2002091303A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06292203A (ja) * 1993-04-02 1994-10-18 Mitsubishi Electric Corp 動画像解析装置
JPH10207897A (ja) * 1997-01-17 1998-08-07 Fujitsu Ltd 動画像検索システム
JP2001175874A (ja) * 1999-12-15 2001-06-29 Matsushita Electric Ind Co Ltd 動画像内の移動物体検出方法および装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06292203A (ja) * 1993-04-02 1994-10-18 Mitsubishi Electric Corp 動画像解析装置
JPH10207897A (ja) * 1997-01-17 1998-08-07 Fujitsu Ltd 動画像検索システム
JP2001175874A (ja) * 1999-12-15 2001-06-29 Matsushita Electric Ind Co Ltd 動画像内の移動物体検出方法および装置

Also Published As

Publication number Publication date
WO2002091303A8 (fr) 2003-02-20
JP3896259B2 (ja) 2007-03-22
JP2002329208A (ja) 2002-11-15

Similar Documents

Publication Publication Date Title
EP1955205B1 (fr) Procédé et système pour la production de synopsis vidéo
US8818038B2 (en) Method and system for video indexing and video synopsis
Rav-Acha et al. Making a long video short: Dynamic video synopsis
US5923365A (en) Sports event video manipulating system for highlighting movement
JP4981128B2 (ja) 映像からのキーフレーム抽出
JP5247356B2 (ja) 情報処理装置およびその制御方法
JP2009516257A5 (fr)
US8897603B2 (en) Image processing apparatus that selects a plurality of video frames and creates an image based on a plurality of images extracted and selected from the frames
JP4536940B2 (ja) 画像処理装置、画像処理方法、記憶媒体、及びコンピュータプログラム
Pallavi et al. Graph-based multiplayer detection and tracking in broadcast soccer videos
JP2002063577A (ja) 画像解析システム,画像解析方法および画像解析プログラム記録媒体
JP2019101892A (ja) オブジェクト追跡装置及びそのプログラム
WO2002091303A1 (fr) Systeme et procede de reconnaissance d'humain
Toller et al. Video segmentation using combined cues
IL199678A (en) Method and system for video indexing and video synopsis
Takagi et al. Statistical analyzing method of camera motion parameters for categorizing sports video
JP2010041247A (ja) 画像処理装置、方法及びプログラム
Aner-Wolf et al. Beyond key-frames: The physical setting as a video mining primitive
Han et al. Content-based model template adaptation and real-time system for behavior interpretation in sports video

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA KR RU SG US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: C1

Designated state(s): CA KR RU SG US

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

CFP Corrected version of a pamphlet front page

Free format text: UNDER (71) THE NAME IN JAPANESE AND IN ENGLISH CORRECTED AND UNDER (72, 75) THE ADDRESS IN JAPANESEAND IN ENGLISH CORRECTED

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase