WO2018167851A1 - Image processing device, image processing method, and image processing program - Google Patents

Image processing device, image processing method, and image processing program Download PDF

Info

Publication number
WO2018167851A1
WO2018167851A1 PCT/JP2017/010247 JP2017010247W WO2018167851A1 WO 2018167851 A1 WO2018167851 A1 WO 2018167851A1 JP 2017010247 W JP2017010247 W JP 2017010247W WO 2018167851 A1 WO2018167851 A1 WO 2018167851A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
subject
subjects
foreground
photographed
Prior art date
Application number
PCT/JP2017/010247
Other languages
French (fr)
Japanese (ja)
Inventor
亮史 服部
奥村 誠司
守屋 芳美
崇 西辻
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2017/010247 priority Critical patent/WO2018167851A1/en
Priority to CN201780088044.6A priority patent/CN110383295B/en
Priority to MYPI2019004399A priority patent/MY184063A/en
Priority to SG11201906822YA priority patent/SG11201906822YA/en
Priority to JP2019505568A priority patent/JP6559378B2/en
Priority to TW106123456A priority patent/TW201833822A/en
Publication of WO2018167851A1 publication Critical patent/WO2018167851A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06MCOUNTING MECHANISMS; COUNTING OF OBJECTS NOT OTHERWISE PROVIDED FOR
    • G06M11/00Counting of objects distributed at random, e.g. on a surface

Definitions

  • the present invention relates to an image processing apparatus, an image processing method, and an image processing program.
  • Techniques for estimating the number of people from a camera image include a method for counting the number of people based on person detection, a method for estimating the number of people from a foreground area, and the like.
  • the former has an advantage in accuracy when the crowd density is low, but when the crowd density is high, there is a problem that the detection accuracy is lowered due to the influence of occlusion between persons.
  • the latter is inferior to the former in the accuracy of analysis at low density, but can be processed with a small amount of computation even at high density.
  • Patent Document 1 discloses the latter technique. Specifically, in Patent Document 1, a foreground extracted from an image obtained by photographing a crowd by background difference is designated as a person region. And in patent document 1, the number of persons in an image is estimated from the area of a person area. In Patent Document 1, a CG (Computer Graphics) model that imitates a crowd in advance is generated with a plurality of congestion levels. Then, a relational expression between the foreground area and the number of people in consideration of the occlusion between the crowds is derived, and the number of people can be estimated while suppressing the influence of the occlusion.
  • CG Computer Graphics
  • Patent Document 1 has a problem that the CG model generated in advance does not match the foreground extracted from the actual camera image, and an error occurs in the number of people estimation result.
  • the main object of the present invention is to solve the above-described problems. Specifically, a main object of the present invention is to improve the accuracy of estimating the number of subjects shown in a captured image.
  • An image processing apparatus includes: A subject image that is an image of the subject is extracted from a first photographed image that is a photographed image in which the subject is photographed in a photographing space in which the subject is present, and the photographing space is based on the extracted subject image.
  • a reference image generation unit that generates a plurality of reference images in which a plurality of subjects are present by changing the number of subjects existing in the shooting space for each reference image;
  • the second photographed image that is a photographed image in the photographing space having a photographing time different from that of the first photographed image is compared with the plurality of reference images, and the subject imaged in the second photographed image is compared.
  • a number-of-subjects estimation unit for estimating the number.
  • FIG. 3 is a diagram illustrating a functional configuration example of a crowd monitoring apparatus according to the first embodiment.
  • FIG. 3 is a diagram illustrating an internal configuration example of a parameter acquisition unit according to the first embodiment.
  • FIG. 4 is a diagram illustrating an internal configuration example of a number analysis unit according to the first embodiment.
  • 5 is a flowchart illustrating an operation example of the crowd monitoring apparatus according to the first embodiment.
  • 5 is a flowchart illustrating an operation example of a parameter acquisition unit according to the first embodiment.
  • 5 is a flowchart showing an operation example of a number analysis unit according to the first embodiment.
  • FIG. 6 shows an example of a person detection result according to the first embodiment.
  • FIG. 6 shows an example of foreground extraction results according to the first embodiment.
  • FIG. 6 shows an example of a person detection result according to the first embodiment.
  • FIG. 6 is a diagram illustrating an example of information stored for a detected person according to the first embodiment.
  • FIG. 6 is a diagram for explaining a method of estimating the depression angle of the camera according to Embodiment 1;
  • FIG. 6 is a diagram for explaining a method for deriving a horizontal line height according to the first embodiment.
  • FIG. 3 is a diagram for explaining a method for deriving vertical point coordinates according to the first embodiment.
  • FIG. The figure which shows the example of a crowd foreground image of the congestion level 2 which concerns on Embodiment 1.
  • FIG. The figure which shows the example of a crowd foreground image of the congestion level 3 which concerns on Embodiment 1.
  • FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment.
  • FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment.
  • FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment.
  • FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment.
  • FIG. 3 is a diagram illustrating a hardware configuration example of the crowd monitoring apparatus according to the first embodiment.
  • FIG. *** Explanation of configuration *** FIG. 1 shows a functional configuration example of the crowd monitoring apparatus 20 according to the first embodiment.
  • FIG. 22 shows a hardware configuration example of the crowd monitoring apparatus 20 according to the first embodiment.
  • the crowd monitoring device 20 corresponds to an image processing device.
  • the operations performed by the crowd monitoring apparatus 20 correspond to an image processing method and an image processing program.
  • the crowd monitoring device 20 is connected to the camera 10.
  • the camera 10 is installed at a position overlooking the shooting space where the subject exists.
  • the shooting space is a space to be monitored by the camera 10.
  • the subject is a person.
  • a plurality of persons existing in the shooting space are also referred to as a crowd.
  • the crowd monitoring device 20 is a computer.
  • the crowd monitoring device 20 includes a processor 1101, a memory 1102, a network interface 1103, and a storage device 1104 as hardware.
  • the storage device 1104 stores programs that realize the functions of the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 shown in FIG. ing.
  • the program is loaded from the storage device 1104 into the memory 1102.
  • the processor 1101 reads the program from the memory 1102.
  • the processor 1101 executes the program and performs operations of an image reception decoding unit 201, a changeover switch 202, a mode switching control unit 203, a parameter acquisition unit 204, a number analysis unit 206, and an analysis result output unit 207, which will be described later. .
  • the parameter storage 205 shown in FIG. 1 is realized by the storage device 1104.
  • FIG. 23 shows the relationship between the functional configuration shown in FIG. 1 and the hardware configuration shown in FIG. In FIG. 23, it is shown that the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 are realized by the processor 1101.
  • FIG. 23 shows that the parameter storage 205 is realized by the storage device 1104.
  • the network interface 1103 receives the compressed image stream from the camera 10.
  • the image reception decoding unit 201 decodes the compressed image stream distributed from the camera 10 and converts the compressed image stream into an image frame.
  • the image frame is a photographed image in the photographing space.
  • the mode switching control unit 203 controls the operation mode of the crowd monitoring device 20.
  • the operation mode of the crowd monitoring device 20 includes a parameter acquisition mode and a people count mode.
  • the mode change control unit 203 outputs a mode control signal to the changeover switch 202.
  • the changeover switch 202 switches the output destination of the image frame in accordance with the mode control signal from the mode switching control unit 203. More specifically, the changeover switch 202 outputs an image frame to the parameter acquisition unit 204 when the operation mode of the crowd monitoring apparatus 20 is the parameter acquisition mode. On the other hand, if the operation mode of the crowd monitoring device 20 is the people count mode, the changeover switch 202 outputs an image frame to the people analysis unit 206.
  • the image frame output from the changeover switch 202 to the parameter acquisition unit 204 corresponds to the first captured image. Further, the image frame output from the changeover switch 202 to the number-of-people analysis unit 206 corresponds to the second captured image.
  • the image frame output from the changeover switch 202 to the number-of-people analysis unit 206 is a captured image in a shooting space having a shooting time different from that of the image frame output from the changeover switch 202 to the parameter acquisition unit 204.
  • the parameter acquisition unit 204 acquires an analysis parameter for analyzing the number of people using an image frame. More specifically, the parameter acquisition unit 204 estimates the depression angle of the camera 10 as an external parameter of the camera 10. In addition, the parameter acquisition unit 204 extracts a subject image that is a subject image from the image frame that is the first captured image, and generates a plurality of foreground maps based on the extracted subject image and the estimated depression angle of the camera 10. The parameter acquisition unit 204 generates a plurality of foreground maps by changing the number of subjects existing in the shooting space for each foreground map. The foreground map is divided into a plurality of partial areas.
  • the foreground map for each partial area, the number of subjects (number of area subjects) shown in the partial area and the foreground area amount in the partial area are shown.
  • the foreground map is an image that is compared with the image frame when the number analysis unit 206 performs the number analysis.
  • the foreground map corresponds to a reference image.
  • the parameter acquisition unit 204 stores the generated foreground maps in the parameter storage 205 as analysis parameters.
  • the parameter acquisition unit 204 corresponds to a reference image generation unit.
  • the process performed by the parameter acquisition unit 204 corresponds to a reference image generation process.
  • the parameter storage 205 stores the analysis parameters generated by the parameter acquisition unit 204.
  • the number analysis unit 206 performs the number analysis when the operation mode of the crowd monitoring device 20 is the number counting mode. More specifically, the number analysis unit 206 compares the image frame as the second captured image with a plurality of foreground maps generated by the parameter acquisition unit 204, and determines the number of subjects displayed in the image frame. presume.
  • the number analysis unit 206 corresponds to a subject number estimation unit.
  • the processing performed by the number-of-people analysis unit 206 corresponds to subject number estimation processing.
  • the analysis result output unit 207 outputs the result of the number analysis by the number analysis unit 206 to the outside.
  • FIG. 2 shows an internal configuration example of the parameter acquisition unit 204.
  • the parameter acquisition unit 204 includes a person detection unit 2041, a foreground extraction unit 2042, a depression angle estimation unit 2043, a foreground map generation unit 2044, and a person information storage 2045.
  • the person detection unit 2041 detects an image of a person as a subject from an image frame that is a first photographed image.
  • the foreground extraction unit 2042 extracts the foreground image of the person specified by the person detection unit 2041.
  • the depression angle estimation unit 2043 estimates the depression angle of the camera 10 from the person detection result information and the foreground image of many persons.
  • the foreground map generation unit 2044 generates a plurality of foreground maps from the depression angle of the camera 10, the person detection result information of a large number of persons, and the foreground image. Then, the foreground map generation unit 2044 stores the foreground map in the parameter storage 205 as an analysis parameter.
  • Person information storage 2045 stores person detection result information and foreground images.
  • FIG. 3 shows an internal configuration example of the number of persons analysis unit 206.
  • the number analysis unit 206 includes a foreground extraction unit 2061 and a number estimation unit 2062.
  • the foreground extraction unit 2061 extracts the foreground image from the image frame that is the second captured image.
  • the number of persons estimation unit 2062 estimates the number of persons from the foreground image using the analysis parameters stored in the parameter storage storage 205.
  • FIG. 4 is a flowchart showing an operation example of the crowd monitoring device 20.
  • the mode switching control unit 203 sets the operation mode to the parameter acquisition mode (step ST01). That is, the mode switching control unit 203 outputs a mode control signal for notifying the changeover switch 202 of the parameter acquisition mode.
  • the mode switching control unit 203 refers to the parameter storage 205 and confirms whether or not the analysis parameter has already been acquired (step ST02). That is, the mode switching control unit 203 confirms whether or not the analysis parameters are stored in the parameter storage storage 205.
  • step ST02 If the analysis parameter has not yet been acquired (NO in ST02), the analysis parameter is stored in the parameter storage 205 via step ST04, step ST05, and ST06. In step ST02 after the analysis parameters are stored in the parameter storage 205, it is determined that the analysis parameters have been acquired, and the process proceeds to step ST03.
  • step ST02 when the analysis parameter has already been acquired (YES in step ST02), mode switching control unit 203 changes the operation mode to the person count mode (step ST03). If the analysis parameter has already been saved in the parameter saving storage 205 before the crowd monitoring device 20 is activated, the determination in step ST02 is YES. Further, as described above, step ST02 after the analysis parameter is stored in the parameter storage 205 through step ST04, step ST05, and ST06 is also determined as YES.
  • the image reception decoding unit 201 receives a compressed image stream from the camera 10 and decodes at least one image frame of the received compressed image stream (step ST04).
  • the image receiving / decoding unit 201 is a compressed image stream, for example, H.264. 262 / MPEG-2 video, H.264. H.264 / AVC, H.H. Image encoded data compressed by an image compression encoding method such as H.265 / HEVC or JPEG is received.
  • the image receiving / decoding unit 201 also includes, as compressed image streams, for example, MPEG-2 TS (Moving Picture Experts Group 2 Transport Stream), RTP / RTSP (Real-time Transport Protocol / Real Time Stream Protocol MT, MTP).
  • MPEG-2 TS Motion Picture Experts Group 2 Transport Stream
  • RTP / RTSP Real-time Transport Protocol / Real Time Stream Protocol MT, MTP
  • the image receiving / decoding unit 201 may receive, as a compressed image stream, image data encoded by an encoding method other than the above or image data distributed in a distribution format other than the above.
  • the image receiving / decoding unit 201 may receive image data distributed according to an uncompressed transmission standard such as SDI (Serial Digital Interface) or HD (High Definition) -SDI as a compressed image stream.
  • SDI Serial Digital Interface
  • HD High Definition
  • the changeover switch 202 outputs the image frame output from the image reception decoding unit 201 to the parameter acquisition unit 204 or the number of people analysis unit 206 according to the operation mode set by the mode switching control unit 203 (step ST05). That is, the changeover switch 202 outputs an image frame to the parameter acquisition unit 204 if a mode control signal for notifying the parameter acquisition mode is output from the mode change control unit 203. On the other hand, if a mode control signal notifying the number of people count mode is output from the mode switching control unit 203, the changeover switch 202 outputs an image frame to the number of people analysis unit 206.
  • the parameter acquisition unit 204 and the number of people analysis unit 206 are operated exclusively. However, a configuration may be adopted in which the parameter acquisition unit 204 is simultaneously operated while the number of people analysis unit 206 is operated, and the analysis parameters are updated as needed.
  • the parameter acquisition unit 204 performs parameter acquisition processing (step ST06). That is, the parameter acquisition unit 204 sequentially processes the image frames from the changeover switch 202, generates analysis parameters, and stores the generated analysis parameters in the parameter storage 205. The parameter acquisition unit 204 generates an analysis parameter by processing an image frame for a certain period. When the generation of the analysis parameter is completed, the parameter acquisition unit 204 stores the analysis parameter in the parameter storage storage 205. When the analysis parameter is stored in the parameter storage 205, the process returns to step ST02. Details of the operation of the parameter acquisition unit 204 will be described later.
  • the people analysis unit 206 performs a people count process (step ST07). That is, the number analysis unit 206 analyzes the image frame from the changeover switch 202 using the analysis parameter stored in the parameter storage 205, thereby analyzing the number of persons shown in the image frame. Details of the operation of the number of persons analysis unit 206 will be described later.
  • the analysis result output unit 207 outputs the number of people analysis result indicating the analysis result of the number of people analysis unit 206 to the outside (step ST08).
  • the analysis result output unit 207 outputs the number analysis result by, for example, display on a monitor, output to a log file, output to an externally connected device, or transmission to a network. Further, the analysis result output unit 207 may output the result of the number of people in another format. In addition, every time the number analysis result is output from the number analysis unit 206, the result may be output to the outside, or output after a specific period or a specific number of number analysis results are aggregated or statistically processed. It may be. After step ST08, the process returns to step ST04 to process the next image frame.
  • FIG. 5 is a flowchart showing the operation of the parameter acquisition unit 204.
  • the person detection unit 2041 performs person detection processing on the input image frame, and stores person detection result information in the person information storage 2045 (step ST11).
  • the person detection unit 2041 is in a standing state, the contact position between the toes and the ground and the top of the head is visible, and the ratio of occlusion with another person in which no occlusion has occurred with another person. A person who satisfies that is small is detected.
  • the person detection unit 2041 outputs person detection result information to the person information storage 2045 for each detected person.
  • the person detection result information includes, for example, information indicating a rectangle, ellipse, or other shape frame that surrounds the detected person without excess or deficiency, coordinate information indicating the ground contact position of the detected person's ground and feet, and the top of the person's head. The coordinate information shown and the moving speed of the contact point are included.
  • FIG. 7 shows an image of a person detection result by the person detection unit 2041.
  • a person detection technique a technique using a still image-based feature quantity such as HOG (Histogram of Oriented Gradients), ICF (Integral Channel Features), or ACF (Aggregate Channel Features) may be used.
  • the person detection unit 2041 uses a person detection method based on the premise that the size of the person shown in the image frame is unknown. Instead, the person detection unit 2041 performs person detection a plurality of times, and when a tendency to the person scale for each image region is obtained, the person detection unit 2041 switches to a person detection method based on the scale and performs more processing. You may speed up.
  • the person detection unit 2041 acquires information on the moving speed of the grounding point by tracking feature points across a plurality of image frames, tracking a person, and the like.
  • the person detection unit 2041 uses the motion vector information as it is to determine the movement speed of the ground point. Information may be acquired.
  • the parameter acquisition unit 204 may be configured to save all the detection results as person detection result information in the person information storage 2045.
  • the parameter acquisition unit 204 may store the person information storage 2045. You may comprise so that the data amount accumulate
  • stored may be suppressed below to fixed level. For example, when the person detection unit 2041 detects another person again in an image area in which a person has already been detected, the person detection unit 2041 discards one of the detection results and stores it in the person information storage storage 2045. The amount of data being stored is suppressed below a certain level.
  • the person detection unit 2041 may take an average value of the detection results, integrate the detection results, and suppress the data amount stored in the person information storage 2045 to a certain level or less.
  • the foreground extraction unit 2042 performs foreground extraction within the area of the frame surrounding the detected person, and saves the person information by associating the foreground extraction result image with the person detection result information.
  • the data is stored in the storage 2045 (steps ST12 and ST13).
  • the foreground image extracted by the foreground extraction unit 2042 corresponds to a subject image.
  • FIG. 8 shows an image of the foreground extraction result.
  • FIG. 9 shows a pair of person detection result information and a person foreground image corresponding to each person detection result information stored in the person information storage 2045.
  • the person detection result information and the foreground image are collectively referred to as person information.
  • the foreground extraction unit 2042 extracts a foreground image using, for example, a background difference method, an adaptive background difference method, and a dense optical flow derivation algorithm.
  • the background difference method is a method of registering a background image in advance and calculating the difference between the input image and the background image.
  • the adaptive background subtraction method is a method of automatically updating a background image from a continuously input image frame using a model such as MOG (Mixture of Gaussian) Distribution.
  • the dense optical flow derivation algorithm is a method for acquiring motion information in an image in units of pixels.
  • the foreground extraction unit 2042 determines whether sufficient person information has been acquired by the processes in steps ST11 to ST13 (step ST14). If sufficient person information has been acquired, the process proceeds to step ST15. On the other hand, if sufficient personal information has not been acquired, the process ends. Whether or not sufficient person information has been obtained is determined based on measures such as the number of pieces of person information acquired so far and the elapsed time since the start of parameter acquisition processing. In addition, parameter estimation processing after step ST15 may be performed, and it may be determined whether sufficient person information has been obtained with the reliability of the parameters estimated by the parameter estimation processing.
  • the depression angle estimation unit 2043 estimates the depression angle of the camera 10 using the person information stored in the person information storage 2045 (step ST15).
  • 10, 11 and 12 show an outline of the depression angle estimation process.
  • the camera is installed downward from the horizontal.
  • the lens distortion of the camera is corrected and can be regarded as a pinhole camera.
  • the internal parameters of the camera are known and it is known how much angle the direction of each coordinate in the image has from the optical axis. In this case, if the image coordinates corresponding to the horizontal line height are known in the image, the angle ⁇ [rad] formed by the optical axis direction and the horizontal direction can be uniquely obtained.
  • the angle ⁇ [rad] formed by the optical axis direction and the vertical direction can be uniquely obtained.
  • FIG. 11 shows an image of deriving the horizontal line height.
  • the depression angle estimation unit 2043 extends the direction of movement of the grounding point of the moving person based on the person information, and regards the point where the extension lines of the movement directions of the plurality of persons intersect best as a point on the horizontal line. This is based on the premise that a plurality of pedestrians are walking in parallel and straight ahead. The lines do not necessarily intersect at one point due to the accuracy error of the moving direction detection and the influence of pedestrians who are not walking in parallel. For this reason, the depression angle estimation unit 2043 uses a lot of person information to estimate a more likely result as the horizontal height.
  • FIG. 12 shows an image of deriving the vertical point coordinates.
  • the depression angle estimation unit 2043 Based on the person information, the depression angle estimation unit 2043 extends a line connecting the coordinates of the top of the head and the ground point, that is, a normal line, and sets a point where the normal lines of a plurality of persons intersect best as a vertical point. It is considered. This is based on the premise that the pedestrian is standing upright with respect to the ground. The lines do not necessarily intersect at a single point due to the detection error of the coordinates of the top and the grounding point or the influence of a pedestrian who is not standing upright. For this reason, the depression angle estimation unit 2043 estimates a more likely result as the vertical point coordinates using a lot of person information.
  • the method of using the horizontal line height is suitable when the camera depression angle is shallow, that is, when the camera depression angle is close to the horizontal direction.
  • the direction in which the vertical point is used is suitable when the depression angle of the camera is deep, that is, when the depression angle of the camera is closer to the vertical direction.
  • the depression angle estimator 2043 may select the depression angle obtained by a more suitable method from the depression angles obtained by using both methods, or may select the average value of both. Alternatively, the depression angle estimation unit 2043 may use another depression angle estimation method.
  • the foreground map generation unit 2044 generates a foreground map to be used by the number analysis unit 206 using the depression angle information obtained by the depression angle estimation unit 2043 and the person information stored in the person information storage 2045 ( Step ST16).
  • the foreground map generation unit 2044 first calculates a two-dimensional coordinate system on the road surface (hereinafter referred to as a road surface coordinate system) from the depression angle information and the person contact point movement information. For example, the foreground map generation unit 2044 defines the main movement direction of the person as the X direction, and defines the direction orthogonal to the X direction as the Y direction.
  • the foreground map generation unit 2044 obtains an absolute scale of road surface coordinates that are coordinates in the road surface coordinate system.
  • the foreground map generation unit 2044 regards, for example, the movement width per unit time on the road surface coordinates as the movement width based on the average movement speed of pedestrians given in advance, and obtains an absolute scale of the road surface coordinates.
  • the foreground map generation unit 2044 combines the foreground images for each person acquired in step ST13, and the foreground of the crowd that is assumed to be observed at a specific congestion level.
  • An image (hereinafter referred to as a crowd foreground image) is generated.
  • 13, 14 and 15 show examples of the crowd foreground image.
  • FIG. 13 shows an example of the crowd foreground image at the congestion level 1.
  • FIG. 14 shows an example of the crowd foreground image at the congestion level 2.
  • FIG. 15 shows an example of the crowd foreground image at the congestion level 3. As shown in FIGS.
  • the foreground map generation unit 2044 generates a plurality of crowd foreground images in which a plurality of persons exist in the shooting space by changing the number of persons in the shooting space for each crowd foreground image. To do. As shown in FIGS. 13 to 15, in the crowd foreground image, occlusion occurs between persons depending on the density and arrangement of the persons.
  • the foreground map generation unit 2044 converts the image coordinates of the contact point of the person information acquired so far into road surface coordinates, and then converts the foreground image to a plurality of positions where the density of the persons becomes a predetermined value in the road surface coordinate system.
  • the crowd foreground image is generated by pasting. For example, the road surface coordinate grid shown in FIG.
  • the foreground map generation unit 2044 randomly arranges four persons in the grid. If the crowd density at crowd level 2 is 2 [person / m 2 ], the foreground map generation unit 2044 randomly arranges 8 persons in the grid. Assuming that the crowd density at the congestion level 3 is 4 [person / m 2 ], the foreground map generation unit 2044 randomly arranges 16 persons in the grid. In either case, the foreground map generation unit 2044 randomly keeps the person at a distance equal to or greater than the average shoulder width of the person and the average personal space radius for each density so that the persons are not too close to each other. Deploy.
  • the foreground map generation unit 2044 divides the crowd foreground image into a plurality of partial areas as shown in FIGS. 16, 17, and 18.
  • the amount of foreground area per area (example: 150 pixels / person) is calculated.
  • foreground area amounts are not displayed for some persons for reasons of drawing.
  • the foreground map generation unit 2044 calculates the foreground area amount included in the partial area for each partial area. For example, in the partial region denoted by reference numeral 161 in FIG. 16, for example, 90 pixels / partial region is obtained.
  • the foreground map generation unit 2044 can obtain the true value of the number of persons appearing in each partial area.
  • the foreground map generation unit 2044 counts the number of persons in the partial area for each partial area. When the entire foreground image of one person is included in one partial area, the number of persons in the partial area is one (one person / partial area). When only one part of the foreground image of the person is included in one partial area, the foreground map generation unit 2044 uses the entire area of the foreground image of the person as the area of the part of the person included in the partial area. By dividing by, the number of people included in the partial area is obtained in units of decimal places.
  • the foreground area amount (for example, 90 pixels / partial region) of a person included in the partial region is referred to as a regional person area amount.
  • the number of persons included in the partial area eg, 1 person / partial area, 0.6 person / partial area
  • a crowd foreground image in which the area person area amount and the area person number are added to each partial area is referred to as a foreground map.
  • the foreground map corresponds to the reference image.
  • the foreground map generation unit 2044 may generate a plurality of crowd foreground images for each congestion level. That is, the foreground map generation unit 2044 may generate a plurality of crowd foreground images by randomly changing the arrangement of people for one congestion level (for example, congestion level 3). As described above, when a plurality of crowded foreground images are generated for one congestion level, the foreground map generation unit 2044 stores all the plurality of foreground maps obtained from the plurality of crowd foreground images in the parameter storage storage 205. May be.
  • the foreground map generation unit 2044 integrates the plurality of foreground maps into one foreground map by taking the average value of the plurality of foreground maps obtained from the plurality of crowd foreground images, and only the one foreground map is obtained. It may be stored in the parameter storage 205. By doing in this way, the effect which can respond to the arrangement pattern of various persons is anticipated.
  • the person information storage 2045 stores the generated foreground map in the parameter storage 205 (step ST17).
  • the parameter acquisition process is completed by storing the foreground map.
  • FIG. 6 is a flowchart showing the operation of the number analysis unit 206.
  • the foreground extraction unit 2061 performs foreground extraction on the entire input image frame or the attention area (step ST21). This foreground extraction process is the same as the foreground extraction process of the foreground extraction unit 2042 of the parameter acquisition unit 204.
  • the number estimating unit 2062 estimates the number of foreground images extracted by the foreground extracting unit 2061 using the foreground map for each congestion level stored in the parameter storage storage 205 (step ST22). 19, 20 and 21 illustrate the number estimation process.
  • the number estimating unit 2062 extracts a foreground image from the image frame as shown in FIG. Then, the number estimating unit 2062 divides the foreground image into the same partial area as the foreground map.
  • the partial area in the foreground image shown in FIG. 19 is hereinafter referred to as an estimation target partial area.
  • the number-of-people estimation unit 2062 determines the foreground area amount for each estimation target partial region.
  • the number estimating unit 2062 extracts, for each estimation target partial area, a partial area at the same position from a plurality of foreground maps for each congestion level, the foreground area amount in each of the extracted partial areas, and the estimation target part The foreground area amount in the region is compared. Then, the number estimating unit 2062 selects a partial region of the foreground map having a foreground area amount most similar to the foreground area amount in the estimation target partial region. For example, the number-of-people estimation unit 2062 applies the first row 1 from each of the foreground maps of the congestion levels 1 to 3 to the estimation target partial region in the first row and first column (upper left corner) of the foreground image of FIG. Extract a partial region of the column.
  • the number estimating unit 2062 compares the foreground area amount in each of the three partial regions extracted from the foreground map of the congestion levels 1 to 3 with the partial region in the estimation target partial region in FIG. Then, the number estimating unit 2062 selects a partial area to which the foreground area amount most similar to the foreground area amount in the estimation target partial area in FIG. 20 is added from the extracted three partial areas.
  • the number-of-people estimation unit 2062 performs the above processing for all of the estimation target partial regions in FIG.
  • the partial region selected for the estimation target region is referred to as a selected partial region. In this way, the number estimating unit 2062 selects a selected partial area from the foreground map of the congestion levels 1 to 3 for each estimation target partial area.
  • the congestion level of the selected partial region may differ depending on the estimation target partial region. For example, for the nth row and mth column estimation target partial region of the foreground image of FIG. 20, the congestion level 3 foreground map partial region is selected, and the nth row m + 1th column estimation target partial region is selected. Thus, a partial area of the foreground map at the congestion level 2 is selected, and a partial area of the foreground map at the congestion level 1 may be selected for the estimation target partial area at the nth row and m + 2 column. . Next, the number estimating unit 2062 obtains the number of persons included in the estimation target partial area for each estimation target partial area.
  • the number estimating unit 2062 divides the foreground area amount of the estimation target partial region by the region person area amount of the selected partial region, and multiplies the division value by the number of region persons of the selected partial region. That is, the number estimating unit 2062 performs a calculation of (foreground area amount of estimation target partial region) / (region person area amount of selected partial region) ⁇ (number of region persons of selected partial region) to calculate the estimation target partial region. Find the number of people included. Then, the number estimating unit 2062 counts the number of persons obtained for each estimation target partial area to obtain the number of persons included in the entire image frame or the attention area.
  • the parameter acquisition unit 204 uses the actually observed foreground image to determine the area person area amount and the area person number in consideration of occlusion. Since the number analysis unit 206 applies the area person area amount and the number of area people to the foreground image to be estimated, the number of persons is estimated, so that it is possible to estimate the number of persons with high accuracy without obtaining the average size of the persons in advance. Is possible. In the present embodiment, the parameter acquisition unit 204 estimates the depression angle of the camera, which is an external parameter of the camera, and thus it is not necessary to measure the external parameter of the camera in advance.
  • the camera external parameters are, for example, parameters such as the depression angle of the camera and the distance from each of the scattered people to the camera.
  • the camera external parameters are, for example, parameters such as the depression angle of the camera and the distance from each of the scattered people to the camera.
  • a person has been described as an example of a subject, but the subject is not limited to a person.
  • the subject may be a living body such as a wild animal or an insect, or a moving body other than a person such as a vehicle.
  • a processor 901 illustrated in FIG. 22 is an IC (Integrated Circuit) that performs processing.
  • the processor 1101 is a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or the like.
  • a memory 1102 illustrated in FIG. 22 is a RAM (Random Access Memory).
  • the storage device 1104 illustrated in FIG. 22 is a ROM (Read Only Memory), a flash memory, an HDD (Hard Disk Drive), or the like.
  • the network interface 1103 illustrated in FIG. 22 includes a receiver that receives data and a transmitter that transmits data.
  • the network interface 1103 is, for example, a communication chip or a NIC (Network Interface Card).
  • the storage device 1104 also stores an OS (Operating System). Then, at least a part of the OS is executed by the processor 1101.
  • a processor 1101 executes at least part of the OS, and implements the functions of the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207. Execute. When the processor 1101 executes the OS, task management, memory management, file management, communication control, and the like are performed.
  • the programs for realizing the functions of the image receiving / decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206 and the analysis result output unit 207 are a magnetic disk, a flexible disk, an optical disk, a compact You may memorize
  • the crowd monitoring device 20 may be realized by an electronic circuit such as a logic IC (Integrated Circuit), a GA (Gate Array), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Gate Array).
  • a logic IC Integrated Circuit
  • GA Gate Array
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 are each realized as part of an electronic circuit.
  • the processor and the electronic circuit are also collectively referred to as a processing circuit.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)
  • Studio Circuits (AREA)

Abstract

A parameter acquisition unit (204): extracts a subject image, which is an image of a subject, from a first photographic image, which is a photographic image in which a photographic space in which the subject is present is photographed and the subject is projected; and, on the basis of the extracted subject images, generates a plurality of reference images in which the plurality of subjects are present in the photographic space by varying the number of subjects present in the photographic space for each reference image. A headcount analysis unit (206) compares a second photographic image, which is a photographic image of a photographic space in which the photographing time differs from the first photographic image, with the plurality of reference images, and estimates the number of subjects projected in the second photographic image.

Description

画像処理装置、画像処理方法及び画像処理プログラムImage processing apparatus, image processing method, and image processing program
 この発明は、画像処理装置、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program.
 従来から、人の数、密度あるいは流量等をカメラ画像から推定する技術が知られている。カメラ画像から人の数を推定する技術としては、人物検出に基づき人数をカウントする手法、または前景面積から人の数を推定する手法などがある。前者は、群集密度が低い場合には精度の面で優位性があるが、群集密度が高くなると、人物同士のオクルージョン(隠蔽)の影響で検出精度が低下する問題がある。後者は低密度時の解析精度では前者に劣るが、高密度時にも低演算量で処理が可能である。 Conventionally, a technique for estimating the number of people, density or flow rate from a camera image is known. Techniques for estimating the number of people from a camera image include a method for counting the number of people based on person detection, a method for estimating the number of people from a foreground area, and the like. The former has an advantage in accuracy when the crowd density is low, but when the crowd density is high, there is a problem that the detection accuracy is lowered due to the influence of occlusion between persons. The latter is inferior to the former in the accuracy of analysis at low density, but can be processed with a small amount of computation even at high density.
 例えば、特許文献1には、後者の技術が開示されている。具体的には、特許文献1では、群集を撮影した画像から背景差分により抽出された前景が人物領域として指定される。そして、特許文献1では、人物領域の面積から画像内の人の数が推定される。特許文献1では、予め群集を模したCG(Computer Graphics)モデルが複数の混雑度で生成される。そして、群集同士のオクルージョンを考慮した、前景面積と人数の関係式が導出され、オクルージョンの影響を抑制した人数推定が可能となっている。 For example, Patent Document 1 discloses the latter technique. Specifically, in Patent Document 1, a foreground extracted from an image obtained by photographing a crowd by background difference is designated as a person region. And in patent document 1, the number of persons in an image is estimated from the area of a person area. In Patent Document 1, a CG (Computer Graphics) model that imitates a crowd in advance is generated with a plurality of congestion levels. Then, a relational expression between the foreground area and the number of people in consideration of the occlusion between the crowds is derived, and the number of people can be estimated while suppressing the influence of the occlusion.
特開2005-25328号公報JP 2005-25328 A
 背景差分によって取得された人物1人あたりの前景面積は、実際に観測された人物の寸法や、照明条件などによる前景抽出の精度に応じて変化する。このため、特許文献1の技術では、予め生成していたCGモデルと、実際のカメラ画像から抽出された前景とが一致せず、人数推定結果に誤差が発生するという課題がある。 前 The foreground area per person acquired by background difference changes according to the size of the actually observed person and the accuracy of foreground extraction according to lighting conditions. For this reason, the technique of Patent Document 1 has a problem that the CG model generated in advance does not match the foreground extracted from the actual camera image, and an error occurs in the number of people estimation result.
 本発明は、上記のような課題を解決することを主な目的とする。具体的には、本発明は、撮影画像に映されている被写体の数を推定する精度を向上させることを主な目的とする。 The main object of the present invention is to solve the above-described problems. Specifically, a main object of the present invention is to improve the accuracy of estimating the number of subjects shown in a captured image.
 本発明に係る画像処理装置は、
 被写体が存在する撮影空間が撮影された前記被写体が映されている撮影画像である第1の撮影画像から前記被写体の画像である被写体画像を抽出し、抽出した前記被写体画像に基づき、前記撮影空間に複数の被写体が存在する複数の参照画像を前記撮影空間に存在する被写体の数を参照画像ごとに変化させて生成する参照画像生成部と、
 前記第1の撮影画像とは撮影時刻が異なる前記撮影空間の撮影画像である第2の撮影画像と、前記複数の参照画像とを比較し、前記第2の撮影画像に映されている被写体の数を推定する被写体数推定部とを有する。
An image processing apparatus according to the present invention includes:
A subject image that is an image of the subject is extracted from a first photographed image that is a photographed image in which the subject is photographed in a photographing space in which the subject is present, and the photographing space is based on the extracted subject image. A reference image generation unit that generates a plurality of reference images in which a plurality of subjects are present by changing the number of subjects existing in the shooting space for each reference image;
The second photographed image that is a photographed image in the photographing space having a photographing time different from that of the first photographed image is compared with the plurality of reference images, and the subject imaged in the second photographed image is compared. A number-of-subjects estimation unit for estimating the number.
 本発明によれば、撮影画像に映されている被写体の数を推定する精度を向上させることができる。 According to the present invention, it is possible to improve the accuracy of estimating the number of subjects shown in a captured image.
実施の形態1に係る群集監視装置の機能構成例を示す図。FIG. 3 is a diagram illustrating a functional configuration example of a crowd monitoring apparatus according to the first embodiment. 実施の形態1に係るパラメータ取得部の内部構成例を示す図。FIG. 3 is a diagram illustrating an internal configuration example of a parameter acquisition unit according to the first embodiment. 実施の形態1に係る人数解析部の内部構成例を示す図。FIG. 4 is a diagram illustrating an internal configuration example of a number analysis unit according to the first embodiment. 実施の形態1に係る群集監視装置の動作例を示すフローチャート。5 is a flowchart illustrating an operation example of the crowd monitoring apparatus according to the first embodiment. 実施の形態1に係るパラメータ取得部の動作例を示すフローチャート。5 is a flowchart illustrating an operation example of a parameter acquisition unit according to the first embodiment. 実施の形態1に係る人数解析部の動作例を示すフローチャート。5 is a flowchart showing an operation example of a number analysis unit according to the first embodiment. 実施の形態1に係る人物検出結果の例を示す図。FIG. 6 shows an example of a person detection result according to the first embodiment. 実施の形態1に係る前景抽出結果の例を示す図。FIG. 6 shows an example of foreground extraction results according to the first embodiment. 実施の形態1に係る検出した人物につき保存する情報の例を示す図。FIG. 6 is a diagram illustrating an example of information stored for a detected person according to the first embodiment. 実施の形態1に係るカメラの俯角の推定方法を説明する図。FIG. 6 is a diagram for explaining a method of estimating the depression angle of the camera according to Embodiment 1; 実施の形態1に係る水平線高さの導出方法を説明する図。FIG. 6 is a diagram for explaining a method for deriving a horizontal line height according to the first embodiment. 実施の形態1に係る鉛直点座標の導出方法を説明する図。FIG. 3 is a diagram for explaining a method for deriving vertical point coordinates according to the first embodiment. 実施の形態1に係る混雑レベル1の群集前景画像の例を示す図。The figure which shows the example of the crowd foreground image of the congestion level 1 which concerns on Embodiment 1. FIG. 実施の形態1に係る混雑レベル2の群集前景画像の例を示す図。The figure which shows the example of a crowd foreground image of the congestion level 2 which concerns on Embodiment 1. FIG. 実施の形態1に係る混雑レベル3の群集前景画像の例を示す図。The figure which shows the example of a crowd foreground image of the congestion level 3 which concerns on Embodiment 1. FIG. 実施の形態1に係る混雑レベル1の群集前景画の前景面積量の例を示す図。The figure which shows the example of the foreground area amount of the crowd foreground picture of the congestion level 1 which concerns on Embodiment 1. FIG. 実施の形態1に係る混雑レベル2の群集前景画の前景面積量の例を示す図。The figure which shows the example of the foreground area amount of the crowd foreground picture of the congestion level 2 which concerns on Embodiment 1. FIG. 実施の形態1に係る混雑レベル3の群集前景画の前景面積量の例を示す図。The figure which shows the example of the foreground area amount of the crowd foreground picture of the congestion level 3 which concerns on Embodiment 1. FIG. 実施の形態1に係る人数推定処理を説明する図。FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment. 実施の形態1に係る人数推定処理を説明する図。FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment. 実施の形態1に係る人数推定処理を説明する図。FIG. 6 is a diagram for explaining the number of people estimation process according to the first embodiment. 実施の形態1に係る群集監視装置のハードウェア構成例を示す図。FIG. 3 is a diagram illustrating a hardware configuration example of the crowd monitoring apparatus according to the first embodiment. 実施の形態1に係る群集監視装置の機能構成とハードウェア構成との関係を示す図。The figure which shows the relationship between the function structure of the crowd monitoring apparatus which concerns on Embodiment 1, and a hardware configuration.
 以下、本発明の実施の形態について、図を用いて説明する。以下の実施の形態の説明及び図面において、同一の符号を付したものは、同一の部分または相当する部分を示す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description of the embodiments and drawings, the same reference numerals denote the same or corresponding parts.
 実施の形態1.
***構成の説明***
 図1は、実施の形態1に係る群集監視装置20の機能構成例を示す。
 また、図22は、実施の形態1に係る群集監視装置20のハードウェア構成例を示す。
 群集監視装置20は、画像処理装置に相当する。また、群集監視装置20により行われる動作は、画像処理方法及び画像処理プログラムに相当する。
 図1に示されるように、群集監視装置20は、カメラ10と接続されている。
 カメラ10は、被写体が存在する撮影空間を見下ろす位置に設置されている。撮影空間は、カメラ10の監視対象の空間である。本実施の形態では、被写体は人物である。また、本実施の形態では、撮影空間には複数の人物が存在しているものとする。以下、撮影空間に存在する複数の人物を群集ともいう。
Embodiment 1 FIG.
*** Explanation of configuration ***
FIG. 1 shows a functional configuration example of the crowd monitoring apparatus 20 according to the first embodiment.
FIG. 22 shows a hardware configuration example of the crowd monitoring apparatus 20 according to the first embodiment.
The crowd monitoring device 20 corresponds to an image processing device. The operations performed by the crowd monitoring apparatus 20 correspond to an image processing method and an image processing program.
As shown in FIG. 1, the crowd monitoring device 20 is connected to the camera 10.
The camera 10 is installed at a position overlooking the shooting space where the subject exists. The shooting space is a space to be monitored by the camera 10. In the present embodiment, the subject is a person. In the present embodiment, it is assumed that there are a plurality of persons in the shooting space. Hereinafter, a plurality of persons existing in the shooting space are also referred to as a crowd.
 図1の群集監視装置20の機能構成例を説明する前に、図22を参照して、群集監視装置20のハードウェア構成例を説明する。 Before explaining the functional configuration example of the crowd monitoring device 20 in FIG. 1, an example hardware configuration of the crowd monitoring device 20 will be described with reference to FIG.
 群集監視装置20は、コンピュータである。
 群集監視装置20は、ハードウェアとして、プロセッサ1101、メモリ1102、ネットワークインタフェース1103、記憶装置1104を備える。
 記憶装置1104には、図1に示す画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207の機能を実現するプログラムが記憶されている。当該プログラムは記憶装置1104からメモリ1102にロードされる。プロセッサ1101は当該プログラムをメモリ1102から読込む。そして、プロセッサ1101は、当該プログラムを実行して、後述する画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207の動作を行う。
 図1に示すパラメータ保存ストレージ205は、記憶装置1104により実現される。
 図23は、図1に示す機能構成と図22に示すハードウェア構成との関係を示す。図23では、画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207がプロセッサ1101により実現されることが示されている。また、図23では、パラメータ保存ストレージ205が記憶装置1104により実現されることが示されている。
 ネットワークインタフェース1103は、カメラ10からの圧縮画像ストリームを受信する。
The crowd monitoring device 20 is a computer.
The crowd monitoring device 20 includes a processor 1101, a memory 1102, a network interface 1103, and a storage device 1104 as hardware.
The storage device 1104 stores programs that realize the functions of the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 shown in FIG. ing. The program is loaded from the storage device 1104 into the memory 1102. The processor 1101 reads the program from the memory 1102. The processor 1101 executes the program and performs operations of an image reception decoding unit 201, a changeover switch 202, a mode switching control unit 203, a parameter acquisition unit 204, a number analysis unit 206, and an analysis result output unit 207, which will be described later. .
The parameter storage 205 shown in FIG. 1 is realized by the storage device 1104.
FIG. 23 shows the relationship between the functional configuration shown in FIG. 1 and the hardware configuration shown in FIG. In FIG. 23, it is shown that the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 are realized by the processor 1101. FIG. 23 shows that the parameter storage 205 is realized by the storage device 1104.
The network interface 1103 receives the compressed image stream from the camera 10.
 次に、図1に示す群集監視装置20の機能構成例を説明する。 Next, a functional configuration example of the crowd monitoring apparatus 20 shown in FIG. 1 will be described.
 画像受信復号部201は、カメラ10から配信される圧縮画像ストリームを復号して圧縮画像ストリームを画像フレームに変換する。画像フレームは撮影空間の撮影画像である。 The image reception decoding unit 201 decodes the compressed image stream distributed from the camera 10 and converts the compressed image stream into an image frame. The image frame is a photographed image in the photographing space.
 モード切替制御部203は、群集監視装置20の動作モードを制御する。群集監視装置20の動作モードには、パラメータ取得モードと人数カウントモードとがある。モード切替制御部203は、切替スイッチ202にモード制御信号を出力する。 The mode switching control unit 203 controls the operation mode of the crowd monitoring device 20. The operation mode of the crowd monitoring device 20 includes a parameter acquisition mode and a people count mode. The mode change control unit 203 outputs a mode control signal to the changeover switch 202.
 切替スイッチ202は、モード切替制御部203からのモード制御信号に応じて画像フレームの出力先を切り替える。より具体的には、切替スイッチ202は、群集監視装置20の動作モードがパラメータ取得モードであれば、画像フレームをパラメータ取得部204に出力する。一方、群集監視装置20の動作モードが人数カウントモードであれば、切替スイッチ202は画像フレームを人数解析部206に出力する。なお、切替スイッチ202からパラメータ取得部204に出力される画像フレームは第1の撮影画像に相当する。また、切替スイッチ202から人数解析部206に出力される画像フレームは第2の撮影画像に相当する。切替スイッチ202から人数解析部206に出力される画像フレームは、切替スイッチ202からパラメータ取得部204に出力される画像フレームとは撮影時刻が異なる撮影空間の撮影画像である。 The changeover switch 202 switches the output destination of the image frame in accordance with the mode control signal from the mode switching control unit 203. More specifically, the changeover switch 202 outputs an image frame to the parameter acquisition unit 204 when the operation mode of the crowd monitoring apparatus 20 is the parameter acquisition mode. On the other hand, if the operation mode of the crowd monitoring device 20 is the people count mode, the changeover switch 202 outputs an image frame to the people analysis unit 206. The image frame output from the changeover switch 202 to the parameter acquisition unit 204 corresponds to the first captured image. Further, the image frame output from the changeover switch 202 to the number-of-people analysis unit 206 corresponds to the second captured image. The image frame output from the changeover switch 202 to the number-of-people analysis unit 206 is a captured image in a shooting space having a shooting time different from that of the image frame output from the changeover switch 202 to the parameter acquisition unit 204.
 パラメータ取得部204は、群集監視装置20の動作モードがパラメータ取得モードであるときに、画像フレームを用いて人数解析のための解析パラメータを取得する。
 より具体的には、パラメータ取得部204は、カメラ10の外部パラメータとしてカメラ10の俯角を推定する。また、パラメータ取得部204は、第1の撮影画像たる画像フレームから被写体の画像である被写体画像を抽出し、抽出した被写体画像と推定したカメラ10の俯角とに基づき複数の前景マップを生成する。パラメータ取得部204は、撮影空間に存在する被写体の数を前景マップごとに変化させて複数の前景マップを生成する。前景マップは複数の部分領域に区分されている。前景マップでは、部分領域ごとに、部分領域に映されている被写体の数(領域被写体数)と部分領域における前景面積量とが示される。前景マップは、人数解析部206による人数解析の際に画像フレームと比較される画像である。前景マップは参照画像に相当する。
 パラメータ取得部204は、生成した複数の前景マップを解析パラメータとしてパラメータ保存ストレージ205に格納する。
 パラメータ取得部204は、参照画像生成部に相当する。また、パラメータ取得部204により行われる処理は、参照画像生成処理に相当する。
When the operation mode of the crowd monitoring device 20 is the parameter acquisition mode, the parameter acquisition unit 204 acquires an analysis parameter for analyzing the number of people using an image frame.
More specifically, the parameter acquisition unit 204 estimates the depression angle of the camera 10 as an external parameter of the camera 10. In addition, the parameter acquisition unit 204 extracts a subject image that is a subject image from the image frame that is the first captured image, and generates a plurality of foreground maps based on the extracted subject image and the estimated depression angle of the camera 10. The parameter acquisition unit 204 generates a plurality of foreground maps by changing the number of subjects existing in the shooting space for each foreground map. The foreground map is divided into a plurality of partial areas. In the foreground map, for each partial area, the number of subjects (number of area subjects) shown in the partial area and the foreground area amount in the partial area are shown. The foreground map is an image that is compared with the image frame when the number analysis unit 206 performs the number analysis. The foreground map corresponds to a reference image.
The parameter acquisition unit 204 stores the generated foreground maps in the parameter storage 205 as analysis parameters.
The parameter acquisition unit 204 corresponds to a reference image generation unit. The process performed by the parameter acquisition unit 204 corresponds to a reference image generation process.
 パラメータ保存ストレージ205は、パラメータ取得部204により生成された解析パラメータを保存する。 The parameter storage 205 stores the analysis parameters generated by the parameter acquisition unit 204.
 人数解析部206は、群集監視装置20の動作モードが人数カウントモードであるときに、人数解析を実施する。
 より具体的には、人数解析部206は、第2の撮影画像たる画像フレームと、パラメータ取得部204により生成された複数の前景マップとを比較し、画像フレームに映されている被写体の数を推定する。
 人数解析部206は、被写体数推定部に相当する。人数解析部206により行われる処理は、被写体数推定処理に相当する。
The number analysis unit 206 performs the number analysis when the operation mode of the crowd monitoring device 20 is the number counting mode.
More specifically, the number analysis unit 206 compares the image frame as the second captured image with a plurality of foreground maps generated by the parameter acquisition unit 204, and determines the number of subjects displayed in the image frame. presume.
The number analysis unit 206 corresponds to a subject number estimation unit. The processing performed by the number-of-people analysis unit 206 corresponds to subject number estimation processing.
 解析結果出力部207は、人数解析部206による人数解析の結果を外部に出力する。 The analysis result output unit 207 outputs the result of the number analysis by the number analysis unit 206 to the outside.
 図2は、パラメータ取得部204の内部構成例を示す。
 パラメータ取得部204は、人物検出部2041、前景抽出部2042、俯角推定部2043、前景マップ生成部2044及び人物情報保存ストレージ2045で構成される。
FIG. 2 shows an internal configuration example of the parameter acquisition unit 204.
The parameter acquisition unit 204 includes a person detection unit 2041, a foreground extraction unit 2042, a depression angle estimation unit 2043, a foreground map generation unit 2044, and a person information storage 2045.
 人物検出部2041は、第1の撮影画像たる画像フレームから被写体である人物の画像を検出する。 The person detection unit 2041 detects an image of a person as a subject from an image frame that is a first photographed image.
 前景抽出部2042は、人物検出部2041により特定された人物の前景画像を抽出する。 The foreground extraction unit 2042 extracts the foreground image of the person specified by the person detection unit 2041.
 俯角推定部2043は、多数の人物の人物検出結果情報及び前景画像からカメラ10の俯角を推定する。 The depression angle estimation unit 2043 estimates the depression angle of the camera 10 from the person detection result information and the foreground image of many persons.
 前景マップ生成部2044は、カメラ10の俯角、多数の人物の人物検出結果情報及び前景画像から複数の前景マップを生成する。
 そして、前景マップ生成部2044は、前景マップを、解析パラメータとしてパラメータ保存ストレージ205に格納する。
The foreground map generation unit 2044 generates a plurality of foreground maps from the depression angle of the camera 10, the person detection result information of a large number of persons, and the foreground image.
Then, the foreground map generation unit 2044 stores the foreground map in the parameter storage 205 as an analysis parameter.
 人物情報保存ストレージ2045は、人物検出結果情報と前景画像を記憶する。 Person information storage 2045 stores person detection result information and foreground images.
 図3は、人数解析部206の内部構成例を示す。
 人数解析部206は、前景抽出部2061と人数推定部2062で構成される。
FIG. 3 shows an internal configuration example of the number of persons analysis unit 206.
The number analysis unit 206 includes a foreground extraction unit 2061 and a number estimation unit 2062.
 前景抽出部2061は、第2の撮影画像たる画像フレームから前景画像を抽出する。 The foreground extraction unit 2061 extracts the foreground image from the image frame that is the second captured image.
 人数推定部2062は、パラメータ保存ストレージ205に保存されている解析パラメータを用いて、前景画像から人数推定を行う。 The number of persons estimation unit 2062 estimates the number of persons from the foreground image using the analysis parameters stored in the parameter storage storage 205.
***動作の説明***
 次に、実施の形態1に係る群集監視装置20の動作例を説明する。
 図4は群集監視装置20の動作例を示すフローチャートである。
*** Explanation of operation ***
Next, an operation example of the crowd monitoring apparatus 20 according to the first embodiment will be described.
FIG. 4 is a flowchart showing an operation example of the crowd monitoring device 20.
 まず、群集監視装置20の起動直後に、モード切替制御部203が、動作モードをパラメータ取得モードに設定する(ステップST01)。つまり、モード切替制御部203は、切替スイッチ202にパラメータ取得モードを通知するモード制御信号を出力する。 First, immediately after the crowd monitoring device 20 is activated, the mode switching control unit 203 sets the operation mode to the parameter acquisition mode (step ST01). That is, the mode switching control unit 203 outputs a mode control signal for notifying the changeover switch 202 of the parameter acquisition mode.
 次に、モード切替制御部203は、パラメータ保存ストレージ205を参照し、解析パラメータが既に取得されているか否かを確認する(ステップST02)。つまり、モード切替制御部203は、パラメータ保存ストレージ205に解析パラメータが保存されているか否かを確認する。 Next, the mode switching control unit 203 refers to the parameter storage 205 and confirms whether or not the analysis parameter has already been acquired (step ST02). That is, the mode switching control unit 203 confirms whether or not the analysis parameters are stored in the parameter storage storage 205.
 解析パラメータが未だ取得されていない場合(ST02でNO)は、ステップST04、ステップST05、ST06を経て、解析パラメータがパラメータ保存ストレージ205に格納される。
 解析パラメータがパラメータ保存ストレージ205に格納された後のステップST02では、解析パラメータが取得済みと判定されて、処理がステップST03に進む。
If the analysis parameter has not yet been acquired (NO in ST02), the analysis parameter is stored in the parameter storage 205 via step ST04, step ST05, and ST06.
In step ST02 after the analysis parameters are stored in the parameter storage 205, it is determined that the analysis parameters have been acquired, and the process proceeds to step ST03.
 一方、解析パラメータが既に取得されている場合(ステップST02でYES)は、モード切替制御部203は、動作モードを人数カウントモードに変更する(ステップST03)。
 群集監視装置20の起動前に既にパラメータ保存ストレージ205に解析パラメータが保存されている場合にステップST02がYESと判定される。また、前述のようにステップST04、ステップST05、ST06を経て解析パラメータがパラメータ保存ストレージ205に格納された後のステップST02もYESと判定される。
On the other hand, when the analysis parameter has already been acquired (YES in step ST02), mode switching control unit 203 changes the operation mode to the person count mode (step ST03).
If the analysis parameter has already been saved in the parameter saving storage 205 before the crowd monitoring device 20 is activated, the determination in step ST02 is YES. Further, as described above, step ST02 after the analysis parameter is stored in the parameter storage 205 through step ST04, step ST05, and ST06 is also determined as YES.
 次に、画像受信復号部201が、カメラ10から、圧縮画像ストリームを受信し、受信した圧縮画像ストリームの少なくとも1フレーム分の画像フレームを復号する(ステップST04)。画像受信復号部201は、圧縮画像ストリームとして、例えばH.262/MPEG-2 video、H.264/AVC、H.265/HEVC又はJPEG等の画像圧縮符号化方式で圧縮された画像符号化データを受信する。また、画像受信復号部201は、圧縮画像ストリームとして、例えばMPEG-2 TS(Moving Picture Experts Group 2 Transport Stream)、RTP/RTSP(Real-time Transport Protocol/Real Time Streaming Protocol)、MMT(MPEG Media Transport)又はDASH(Dynamic Adaptive Streaming over HTTP)などの画像配信プロトコルでIP配信される画像データを受信してもよい。
 また、画像受信復号部201は、圧縮画像ストリームとして、上記以外の符号化方式で符号化された画像データ又は上記以外の配信フォーマットで配信された画像データを受信してもよい。画像受信復号部201は、圧縮画像ストリームとして、SDI(Serial Digital Interface)、HD(High Definition)-SDIなどの非圧縮の伝送規格で配信された画像データを受信してもよい。
Next, the image reception decoding unit 201 receives a compressed image stream from the camera 10 and decodes at least one image frame of the received compressed image stream (step ST04). The image receiving / decoding unit 201 is a compressed image stream, for example, H.264. 262 / MPEG-2 video, H.264. H.264 / AVC, H.H. Image encoded data compressed by an image compression encoding method such as H.265 / HEVC or JPEG is received. The image receiving / decoding unit 201 also includes, as compressed image streams, for example, MPEG-2 TS (Moving Picture Experts Group 2 Transport Stream), RTP / RTSP (Real-time Transport Protocol / Real Time Stream Protocol MT, MTP). ) Or DASH (Dynamic Adaptive Streaming over HTTP) or other image distribution protocol.
Further, the image receiving / decoding unit 201 may receive, as a compressed image stream, image data encoded by an encoding method other than the above or image data distributed in a distribution format other than the above. The image receiving / decoding unit 201 may receive image data distributed according to an uncompressed transmission standard such as SDI (Serial Digital Interface) or HD (High Definition) -SDI as a compressed image stream.
 次に、切替スイッチ202は、モード切替制御部203が設定した動作モードに従い、画像受信復号部201から出力された画像フレームをパラメータ取得部204又は人数解析部206に出力する(ステップST05)。
 つまり、切替スイッチ202は、モード切替制御部203からパラメータ取得モードを通知するモード制御信号が出力されていれば、画像フレームをパラメータ取得部204に出力する。一方、モード切替制御部203から人数カウントモードを通知するモード制御信号が出力されていれば、切替スイッチ202は、画像フレームを人数解析部206に出力する。
 なお、以下の説明では、パラメータ取得部204と人数解析部206を排他的に動作させる構成を前提とする。しかしながら、人数解析部206を動作させながら同時にパラメータ取得部204を動作させ、解析パラメータを随時アップデートしていくような構成としてもよい。
Next, the changeover switch 202 outputs the image frame output from the image reception decoding unit 201 to the parameter acquisition unit 204 or the number of people analysis unit 206 according to the operation mode set by the mode switching control unit 203 (step ST05).
That is, the changeover switch 202 outputs an image frame to the parameter acquisition unit 204 if a mode control signal for notifying the parameter acquisition mode is output from the mode change control unit 203. On the other hand, if a mode control signal notifying the number of people count mode is output from the mode switching control unit 203, the changeover switch 202 outputs an image frame to the number of people analysis unit 206.
In the following description, it is assumed that the parameter acquisition unit 204 and the number of people analysis unit 206 are operated exclusively. However, a configuration may be adopted in which the parameter acquisition unit 204 is simultaneously operated while the number of people analysis unit 206 is operated, and the analysis parameters are updated as needed.
 動作モードがパラメータ取得モードである場合は、パラメータ取得部204が、パラメータ取得処理を行う(ステップST06)。つまり、パラメータ取得部204は、切替スイッチ202からの画像フレームを順次処理し、解析パラメータを生成し、生成した解析パラメータをパラメータ保存ストレージ205に格納する。パラメータ取得部204はある程度の期間の画像フレームを処理することで解析パラメータを生成する。解析パラメータの生成が完了したら、パラメータ取得部204は、パラメータ保存ストレージ205に解析パラメータを格納する。パラメータ保存ストレージ205に解析パラメータが格納されると処理がステップST02に戻る。パラメータ取得部204の動作の詳細は後述する。 If the operation mode is the parameter acquisition mode, the parameter acquisition unit 204 performs parameter acquisition processing (step ST06). That is, the parameter acquisition unit 204 sequentially processes the image frames from the changeover switch 202, generates analysis parameters, and stores the generated analysis parameters in the parameter storage 205. The parameter acquisition unit 204 generates an analysis parameter by processing an image frame for a certain period. When the generation of the analysis parameter is completed, the parameter acquisition unit 204 stores the analysis parameter in the parameter storage storage 205. When the analysis parameter is stored in the parameter storage 205, the process returns to step ST02. Details of the operation of the parameter acquisition unit 204 will be described later.
 動作モードが人数カウントモードである場合は、人数解析部206が人数カウント処理を行う(ステップST07)。つまり、人数解析部206は、切替スイッチ202からの画像フレームを、パラメータ保存ストレージ205に保存された解析パラメータを用いて解析することで、当該画像フレームに映る人物の人数を解析する。人数解析部206の動作の詳細は後述する。 If the operation mode is the people count mode, the people analysis unit 206 performs a people count process (step ST07). That is, the number analysis unit 206 analyzes the image frame from the changeover switch 202 using the analysis parameter stored in the parameter storage 205, thereby analyzing the number of persons shown in the image frame. Details of the operation of the number of persons analysis unit 206 will be described later.
 人数解析部206により人数が解析された後は、解析結果出力部207が、人数解析部206の解析結果が示される人数解析結果を外部に出力する(ステップST08)。
 解析結果出力部207は、例えば、モニタへの表示、ログファイルへの出力、外部接続機器への出力、またはネットワークへの送出などにより、人数解析結果を出力する。また、解析結果出力部207は、他の形式で人数回数結果を出力してもよい。また、人数解析部206から人数解析結果が出力されるたびに外部に出力してもよいし、特定の期間または特定数の人数解析結果を集計または統計処理した後に出力するなど、断続的な出力であってもよい。ステップST08の後はステップST04に戻り、次の画像フレームの処理を行う。
After the number of people is analyzed by the number of people analysis unit 206, the analysis result output unit 207 outputs the number of people analysis result indicating the analysis result of the number of people analysis unit 206 to the outside (step ST08).
The analysis result output unit 207 outputs the number analysis result by, for example, display on a monitor, output to a log file, output to an externally connected device, or transmission to a network. Further, the analysis result output unit 207 may output the result of the number of people in another format. In addition, every time the number analysis result is output from the number analysis unit 206, the result may be output to the outside, or output after a specific period or a specific number of number analysis results are aggregated or statistically processed. It may be. After step ST08, the process returns to step ST04 to process the next image frame.
 以下、パラメータ取得部204の詳細な動作について説明する。
 図5は、パラメータ取得部204の動作を示すフローチャートである。
Hereinafter, the detailed operation of the parameter acquisition unit 204 will be described.
FIG. 5 is a flowchart showing the operation of the parameter acquisition unit 204.
 まず、人物検出部2041は、入力した画像フレームに対して人物検出処理を実施し、人物検出結果情報を人物情報保存ストレージ2045に保存する(ステップST11)。
 人物検出部2041は、立った状態であること、足先と地面との接触位置及び頭頂部が見えていること、及び他の人物とのオクルージョンが起きていない又は他の人物とのオクルージョンの割合が小さいことを満足する人物を検出する。
First, the person detection unit 2041 performs person detection processing on the input image frame, and stores person detection result information in the person information storage 2045 (step ST11).
The person detection unit 2041 is in a standing state, the contact position between the toes and the ground and the top of the head is visible, and the ratio of occlusion with another person in which no occlusion has occurred with another person. A person who satisfies that is small is detected.
 人物検出部2041は、検出した人物ごとに人物検出結果情報を人物情報保存ストレージ2045に出力する。人物検出結果情報には、例えば、検出した人物を過不足なく囲う矩形、楕円又はその他の形状の枠を示す情報、検出した人物の地面と足の接地位置を示す座標情報、人物の頭頂部を示す座標情報、および接地点の移動速度が含まれる。
 図7は、人物検出部2041による人物検出結果のイメージを示す。
 人物検出の手法として、HOG(Histgram of Oriented Gradients)、ICF(Integral Channel Features)、ACF(Aggregate Channel Features)などの静止画ベースの特徴量を利用した手法を用いてもよい。また、人物検出の手法として、時間的に近接した複数の画像フレームを用いた動画ベースの特徴量を利用する手法を用いてもよい。
 また、本実施の形態では、人物検出部2041は、画像フレームに映っている人物のサイズが未知であることを前提とする人物検出手法を用いる。これに代えて、人物検出部2041は、人物検出を複数回行い、画像領域ごとの人物のスケールに対する傾向が得られた時点で、そのスケールを前提とした人物検出手法に切り替えて、より処理を高速化してもよい。
 人物検出部2041は、接地点の移動速度の情報を、複数の画像フレームをまたがった特徴点の追跡、人物の追跡などにより取得する。また、カメラ10から入力された圧縮画像ストリーム内に圧縮符号化のための動きベクトルが含まれている場合は、人物検出部2041は、その動きベクトル情報をそのまま用いて、接地点の移動速度の情報を取得してもよい。
The person detection unit 2041 outputs person detection result information to the person information storage 2045 for each detected person. The person detection result information includes, for example, information indicating a rectangle, ellipse, or other shape frame that surrounds the detected person without excess or deficiency, coordinate information indicating the ground contact position of the detected person's ground and feet, and the top of the person's head. The coordinate information shown and the moving speed of the contact point are included.
FIG. 7 shows an image of a person detection result by the person detection unit 2041.
As a person detection technique, a technique using a still image-based feature quantity such as HOG (Histogram of Oriented Gradients), ICF (Integral Channel Features), or ACF (Aggregate Channel Features) may be used. Further, as a person detection technique, a technique using a moving image-based feature amount using a plurality of temporally adjacent image frames may be used.
In the present embodiment, the person detection unit 2041 uses a person detection method based on the premise that the size of the person shown in the image frame is unknown. Instead, the person detection unit 2041 performs person detection a plurality of times, and when a tendency to the person scale for each image region is obtained, the person detection unit 2041 switches to a person detection method based on the scale and performs more processing. You may speed up.
The person detection unit 2041 acquires information on the moving speed of the grounding point by tracking feature points across a plurality of image frames, tracking a person, and the like. In addition, when a motion vector for compression encoding is included in the compressed image stream input from the camera 10, the person detection unit 2041 uses the motion vector information as it is to determine the movement speed of the ground point. Information may be acquired.
 パラメータ取得部204は、検出結果を全て人物検出結果情報として人物情報保存ストレージ2045に保存するように構成してもよいし、人物情報保存ストレージ2045の容量節約のために、人物情報保存ストレージ2045に蓄積されているデータ量を一定以下に抑制するように構成してもよい。例えば、既に人物を検出している画像領域において人物検出部2041が再度別の人物を検出した場合に、人物検出部2041が、いずれかの検出結果を破棄して、人物情報保存ストレージ2045に蓄積されているデータ量を一定以下に抑制する。また、人物検出部2041が検出結果の平均値を取って、検出結果を統合して、人物情報保存ストレージ2045に蓄積されているデータ量を一定以下に抑制するようにしてもよい。 The parameter acquisition unit 204 may be configured to save all the detection results as person detection result information in the person information storage 2045. In order to save the capacity of the person information storage 2045, the parameter acquisition unit 204 may store the person information storage 2045. You may comprise so that the data amount accumulate | stored may be suppressed below to fixed level. For example, when the person detection unit 2041 detects another person again in an image area in which a person has already been detected, the person detection unit 2041 discards one of the detection results and stores it in the person information storage storage 2045. The amount of data being stored is suppressed below a certain level. In addition, the person detection unit 2041 may take an average value of the detection results, integrate the detection results, and suppress the data amount stored in the person information storage 2045 to a certain level or less.
 次に、前景抽出部2042は、人物検出結果情報に基づき、検出された人物を囲う枠の領域内の前景抽出を行い、その前景抽出結果の画像を人物検出結果情報と対応付けて人物情報保存ストレージ2045に格納する(ステップST12、ST13)。
 前景抽出部2042により抽出される前景画像は被写体画像に相当する。
Next, based on the person detection result information, the foreground extraction unit 2042 performs foreground extraction within the area of the frame surrounding the detected person, and saves the person information by associating the foreground extraction result image with the person detection result information. The data is stored in the storage 2045 (steps ST12 and ST13).
The foreground image extracted by the foreground extraction unit 2042 corresponds to a subject image.
 図8は、前景抽出結果のイメージを示す。
 また、図9は、人物情報保存ストレージ2045に保存される、人物検出結果情報と、それぞれの人物検出結果情報に対応する人物の前景画像のペアのイメージを示す。なお、人物検出結果情報と前景画像とを合わせて人物情報という。
FIG. 8 shows an image of the foreground extraction result.
FIG. 9 shows a pair of person detection result information and a person foreground image corresponding to each person detection result information stored in the person information storage 2045. The person detection result information and the foreground image are collectively referred to as person information.
 前景抽出部2042は、例えば背景差分法、適応型の背景差分法及び密なオプティカルフロー導出アルゴリズムを用いて前景画像を抽出する。
 背景差分法は、予め背景画像を登録しておき、入力画像と背景画像との差分を計算する手法である。また、適応型の背景差分法は、連続して入力される画像フレームから、MOG(Mixture of Gaussian) Distributionなどのモデルを用いて背景画像を自動更新する手法である。また、密なオプティカルフロー導出アルゴリズムは、画像内の動き情報を画素単位で取得する手法である。
The foreground extraction unit 2042 extracts a foreground image using, for example, a background difference method, an adaptive background difference method, and a dense optical flow derivation algorithm.
The background difference method is a method of registering a background image in advance and calculating the difference between the input image and the background image. The adaptive background subtraction method is a method of automatically updating a background image from a continuously input image frame using a model such as MOG (Mixture of Gaussian) Distribution. The dense optical flow derivation algorithm is a method for acquiring motion information in an image in units of pixels.
 次に、前景抽出部2042は、ステップST11~13の処理によって、十分な人物情報が取得できたかを判定する(ステップST14)。
 十分な人物情報が取得できた場合は、処理はステップST15に進む。一方、十分な人物情報が取得できていない場合は、処理は終了する。
 十分な人物情報が得られたかどうかの判定は、例えばこれまでに取得された人物情報の個数、パラメータ取得処理を開始してからの経過時間などの尺度で行う。また、ステップST15以降のパラメータ推定処理を実施し、パラメータ推定処理により推定されたパラメータの信頼度で、十分な人物情報が得られたかどうかの判定を行ってもよい。
Next, the foreground extraction unit 2042 determines whether sufficient person information has been acquired by the processes in steps ST11 to ST13 (step ST14).
If sufficient person information has been acquired, the process proceeds to step ST15. On the other hand, if sufficient personal information has not been acquired, the process ends.
Whether or not sufficient person information has been obtained is determined based on measures such as the number of pieces of person information acquired so far and the elapsed time since the start of parameter acquisition processing. In addition, parameter estimation processing after step ST15 may be performed, and it may be determined whether sufficient person information has been obtained with the reliability of the parameters estimated by the parameter estimation processing.
 十分な人物情報が取得された場合に、俯角推定部2043が、人物情報保存ストレージ2045に保存された人物情報を用いてカメラ10の俯角の推定を行う(ステップST15)。
 図10、図11及び図12は俯角推定処理の概要を示す。
 図10に示す通り、本実施の形態では、カメラが水平よりも下向きに設置されていることを前提とする。また、カメラのレンズ歪は補正され、ピンホールカメラとしてみなせることを前提とする。また、カメラの内部パラメータは既知であり、画像内の各座標の方向が、光軸からどれだけのなす角を持っているかが既知であることを前提とする。
 この場合に、画像内で水平線高さに相当する画像座標が分かれば、光軸方向と水平方向がなす角α[rad]が一意に求まる。光軸方向と地面がなす角、すなわち俯角θ[rad]はθ=αで求められる。または、画像内で鉛直点に相当する画像座標が分かれば、光軸方向と鉛直方向がなす角β[rad]が一意に求まる。俯角θ[rad]はθ=π/2-βで求められる。
When sufficient person information is acquired, the depression angle estimation unit 2043 estimates the depression angle of the camera 10 using the person information stored in the person information storage 2045 (step ST15).
10, 11 and 12 show an outline of the depression angle estimation process.
As shown in FIG. 10, in this embodiment, it is assumed that the camera is installed downward from the horizontal. Also, it is assumed that the lens distortion of the camera is corrected and can be regarded as a pinhole camera. Further, it is assumed that the internal parameters of the camera are known and it is known how much angle the direction of each coordinate in the image has from the optical axis.
In this case, if the image coordinates corresponding to the horizontal line height are known in the image, the angle α [rad] formed by the optical axis direction and the horizontal direction can be uniquely obtained. The angle formed by the optical axis direction and the ground, that is, the depression angle θ [rad] is obtained by θ = α. Alternatively, if the image coordinates corresponding to the vertical point are known in the image, the angle β [rad] formed by the optical axis direction and the vertical direction can be uniquely obtained. The depression angle θ [rad] is obtained by θ = π / 2−β.
 図11は、水平線高さの導出のイメージを示す。
 俯角推定部2043は、人物情報に基づき、移動する人物の接地点の移動の方向を延長し、複数の人物の移動方向の延長線が最もよく交わる点を水平線上の点とみなす。これは、複数の歩行者が並行かつ直進して歩いているという前提に基づく。移動方向検出の精度誤差や、平行に歩いていない歩行者などの影響で、各線は必ずしも一点には交わらない。このため、俯角推定部2043は、多くの人物情報を用いて、より尤もらしい結果を水平高さと推定する。
FIG. 11 shows an image of deriving the horizontal line height.
The depression angle estimation unit 2043 extends the direction of movement of the grounding point of the moving person based on the person information, and regards the point where the extension lines of the movement directions of the plurality of persons intersect best as a point on the horizontal line. This is based on the premise that a plurality of pedestrians are walking in parallel and straight ahead. The lines do not necessarily intersect at one point due to the accuracy error of the moving direction detection and the influence of pedestrians who are not walking in parallel. For this reason, the depression angle estimation unit 2043 uses a lot of person information to estimate a more likely result as the horizontal height.
 図12は、鉛直点座標の導出のイメージを示す。
 俯角推定部2043は、人物情報に基づき、頭頂部と接地点の座標を結んだ線、すなわち法線方向の線を延長し、複数の人物の法線方向の線が最もよく交わる点を鉛直点とみなす。これは、歩行者が地面に対して直立しているという前提に基づく。頭頂部と接地点の座標の検出誤差や、直立していない歩行者などの影響で、各線は必ずしも一点には交わらない。このため、俯角推定部2043は、多くの人物情報を用いて、より尤もらしい結果を鉛直点座標と推定する。
FIG. 12 shows an image of deriving the vertical point coordinates.
Based on the person information, the depression angle estimation unit 2043 extends a line connecting the coordinates of the top of the head and the ground point, that is, a normal line, and sets a point where the normal lines of a plurality of persons intersect best as a vertical point. It is considered. This is based on the premise that the pedestrian is standing upright with respect to the ground. The lines do not necessarily intersect at a single point due to the detection error of the coordinates of the top and the grounding point or the influence of a pedestrian who is not standing upright. For this reason, the depression angle estimation unit 2043 estimates a more likely result as the vertical point coordinates using a lot of person information.
 なお、水平線高さを利用する方法は、カメラの俯角が浅い場合、すなわちカメラの俯角が水平方向に近い場合に適する。また鉛直点を利用する方向は、カメラの俯角が深い場合、すなわちよりカメラの俯角が鉛直方向に近い場合に適する。俯角推定部2043は、両者の方法を用いて得られた俯角のうち、より適している方法で得られた俯角を選択してもよいし、両者の平均値を選択してもよい。または、俯角推定部2043は、他の俯角推定方法を用いてもよい。 Note that the method of using the horizontal line height is suitable when the camera depression angle is shallow, that is, when the camera depression angle is close to the horizontal direction. The direction in which the vertical point is used is suitable when the depression angle of the camera is deep, that is, when the depression angle of the camera is closer to the vertical direction. The depression angle estimator 2043 may select the depression angle obtained by a more suitable method from the depression angles obtained by using both methods, or may select the average value of both. Alternatively, the depression angle estimation unit 2043 may use another depression angle estimation method.
 次に、前景マップ生成部2044が、俯角推定部2043により得られた俯角情報、および人物情報保存ストレージ2045に保存された人物情報を用いて、人数解析部206で使用する前景マップを生成する(ステップST16)。 Next, the foreground map generation unit 2044 generates a foreground map to be used by the number analysis unit 206 using the depression angle information obtained by the depression angle estimation unit 2043 and the person information stored in the person information storage 2045 ( Step ST16).
 より具体的には、前景マップ生成部2044は、まず、俯角情報と人物接地点の移動情報から、路面上の2次元座標系(以下、路面座標系という)を計算する。例えば、前景マップ生成部2044は、人物の主な移動方向をX方向と定義し、X方向に直交する方向をY方向と定義する。 More specifically, the foreground map generation unit 2044 first calculates a two-dimensional coordinate system on the road surface (hereinafter referred to as a road surface coordinate system) from the depression angle information and the person contact point movement information. For example, the foreground map generation unit 2044 defines the main movement direction of the person as the X direction, and defines the direction orthogonal to the X direction as the Y direction.
 次に、前景マップ生成部2044は、路面座標系における座標である路面座標の絶対的なスケールを求める。前景マップ生成部2044は、例えば、路面座標上での単位時間での移動幅を、予め与えられた歩行者の平均的な移動速度による移動幅とみなし、路面座標の絶対的なスケールを求める。 Next, the foreground map generation unit 2044 obtains an absolute scale of road surface coordinates that are coordinates in the road surface coordinate system. The foreground map generation unit 2044 regards, for example, the movement width per unit time on the road surface coordinates as the movement width based on the average movement speed of pedestrians given in advance, and obtains an absolute scale of the road surface coordinates.
 次に、路面座標系を取得した後に、前景マップ生成部2044は、ステップST13で取得された人物ごとの前景画像を合成して、特定の混雑度において観測されることが想定される群集の前景画像(以下、群集前景画像という)を生成する。
 図13、図14及び図15は、群集前景画像の例を示す。図13は、混雑レベル1の群集前景画像の例を示す。図14は、混雑レベル2の群集前景画像の例を示す。図15は、混雑レベル3の群集前景画像の例を示す。
 図13~図15に示すように、前景マップ生成部2044は、撮影空間に複数の人物が存在する複数の群集前景画像を撮影空間に存在する人物の数を群集前景画像ごとに変化させて生成する。
 また、図13~図15に示すように、群集前景画像では、人物の密度や配置によっては人物間でオクルージョンが発生している。
 前景マップ生成部2044は、これまでに取得された人物情報の接地点の画像座標を路面座標に変換したのちに、路面座標系で人物の密度が既定の値となる複数の位置に前景画像を貼り合わせて群集前景画像を生成する。例えば図13に示す路面座標のグリッドが、路面座標系で50cm間隔とする。すなわちグリッド全体は4[m]である。混雑レベル1の群集密度が1[人/m]であるとすると、前景マップ生成部2044は、グリッド内に4人の人物をランダムに配置する。混雑レベル2の群集密度が2[人/m]であるとすると、前景マップ生成部2044は、グリッド内に8人の人物をランダムに配置する。混雑レベル3の群集密度が4[人/m]であるとすると、前景マップ生成部2044は、グリッド内に16人の人物をランダムに配置する。いずれの場合も、前景マップ生成部2044は、人物同士が近くなりすぎないように、例えば人の肩幅平均値や、密度ごとのパーソナルスペース半径平均値以上の距離を保つようにして人物をランダムに配置する。
Next, after acquiring the road surface coordinate system, the foreground map generation unit 2044 combines the foreground images for each person acquired in step ST13, and the foreground of the crowd that is assumed to be observed at a specific congestion level. An image (hereinafter referred to as a crowd foreground image) is generated.
13, 14 and 15 show examples of the crowd foreground image. FIG. 13 shows an example of the crowd foreground image at the congestion level 1. FIG. 14 shows an example of the crowd foreground image at the congestion level 2. FIG. 15 shows an example of the crowd foreground image at the congestion level 3.
As shown in FIGS. 13 to 15, the foreground map generation unit 2044 generates a plurality of crowd foreground images in which a plurality of persons exist in the shooting space by changing the number of persons in the shooting space for each crowd foreground image. To do.
As shown in FIGS. 13 to 15, in the crowd foreground image, occlusion occurs between persons depending on the density and arrangement of the persons.
The foreground map generation unit 2044 converts the image coordinates of the contact point of the person information acquired so far into road surface coordinates, and then converts the foreground image to a plurality of positions where the density of the persons becomes a predetermined value in the road surface coordinate system. The crowd foreground image is generated by pasting. For example, the road surface coordinate grid shown in FIG. 13 is set at an interval of 50 cm in the road surface coordinate system. That is, the whole grid is 4 [m 2 ]. Assuming that the crowd density at the congestion level 1 is 1 [person / m 2 ], the foreground map generation unit 2044 randomly arranges four persons in the grid. If the crowd density at crowd level 2 is 2 [person / m 2 ], the foreground map generation unit 2044 randomly arranges 8 persons in the grid. Assuming that the crowd density at the congestion level 3 is 4 [person / m 2 ], the foreground map generation unit 2044 randomly arranges 16 persons in the grid. In either case, the foreground map generation unit 2044 randomly keeps the person at a distance equal to or greater than the average shoulder width of the person and the average personal space radius for each density so that the persons are not too close to each other. Deploy.
 次に、前景マップ生成部2044は、群集前景画像を生成した後に、図16、図17及び図18に示すように、群集前景画像を複数の部分領域に区分し、混雑レベル別に、人物1人あたりの前景面積量(例:150ピクセル/人)を計算する。なお、図17及び図18では、作図上の理由から、一部の人物には前景面積量を表示していない。
 また、前景マップ生成部2044は、部分領域ごとに、部分領域に含まれる前景面積量を計算する。例えば、図16の符号161で示す部分領域では、例えば、90ピクセル/部分領域が得られる。群集前景画像は複数の人物の前景画像が合成されて生成されているため、前景マップ生成部2044は、各部分領域に映っている人物の数の真値を得ることができる。前景マップ生成部2044は、部分領域ごとに、部分領域に映っている人物の数を計数する。1つの部分領域に人物1人の前景画像の全体が含まれている場合は、当該部分領域に映っている人物の数は1人である(1人/部分領域)。1つの部分領域に人物の一部の前景画像しか含まれていない場合は、前景マップ生成部2044は、当該人物の前景画像の面積全体を当該部分領域に含まれている当該人物の部分の面積で割ることで、当該部分領域に含まれている人数を小数点以下の単位で求める。例えば、符号161の部分領域に含まれている人物の前景面積量は150ピクセルである。符号161の部分領域に90ピクセルが含まれていると仮定すると、前景マップ生成部2044は、符号161の部分領域に含まれる人数を0.6人(90÷150=0.6)として定義する(0.6人/部分領域)。
 そして、前景マップ生成部2044は、各部分領域に、部分領域に含まれる人物の前景面積量(例:90ピクセル/部分領域)と部分領域に含まれる人物の数(例:1人/部分領域、0.6人/部分領域)を付加する。
 以下、部分領域に含まれる人物の前景面積量(例:90ピクセル/部分領域)を領域人物面積量という。また、部分領域に含まれる人物の数(例:1人/部分領域、0.6人/部分領域)を領域人物数という。
 各部分領域に領域人物面積量と領域人物数が付加された群集前景画像を前景マップという。前述したように、前景マップは参照画像に相当する。
Next, after generating the crowd foreground image, the foreground map generation unit 2044 divides the crowd foreground image into a plurality of partial areas as shown in FIGS. 16, 17, and 18. The amount of foreground area per area (example: 150 pixels / person) is calculated. In FIG. 17 and FIG. 18, foreground area amounts are not displayed for some persons for reasons of drawing.
The foreground map generation unit 2044 calculates the foreground area amount included in the partial area for each partial area. For example, in the partial region denoted by reference numeral 161 in FIG. 16, for example, 90 pixels / partial region is obtained. Since the foreground image of the crowd is generated by combining the foreground images of a plurality of persons, the foreground map generation unit 2044 can obtain the true value of the number of persons appearing in each partial area. The foreground map generation unit 2044 counts the number of persons in the partial area for each partial area. When the entire foreground image of one person is included in one partial area, the number of persons in the partial area is one (one person / partial area). When only one part of the foreground image of the person is included in one partial area, the foreground map generation unit 2044 uses the entire area of the foreground image of the person as the area of the part of the person included in the partial area. By dividing by, the number of people included in the partial area is obtained in units of decimal places. For example, the foreground area amount of the person included in the partial region 161 is 150 pixels. Assuming that 90 pixels are included in the partial area 161, the foreground map generation unit 2044 defines the number of people included in the partial area 161 as 0.6 (90 ÷ 150 = 0.6). (0.6 people / partial area).
Then, the foreground map generation unit 2044 includes, for each partial area, the foreground area amount (eg, 90 pixels / partial area) of the person included in the partial area and the number of persons included in the partial area (eg: 1 person / partial area). 0.6 person / partial area).
Hereinafter, the foreground area amount (for example, 90 pixels / partial region) of a person included in the partial region is referred to as a regional person area amount. The number of persons included in the partial area (eg, 1 person / partial area, 0.6 person / partial area) is referred to as the number of area persons.
A crowd foreground image in which the area person area amount and the area person number are added to each partial area is referred to as a foreground map. As described above, the foreground map corresponds to the reference image.
 なお、前景マップ生成部2044は、混雑レベルごとに、複数の群集前景画像を生成してもよい。つまり、前景マップ生成部2044は、1つの混雑レベル(例えば、混雑レベル3)に対して、人物の配置をランダムに変化させて複数の群集前景画像を生成してもよい。このように、1つの混雑レベルに対して複数の混雑前景画像を生成した場合は、前景マップ生成部2044は、複数の群集前景画像から得られた複数の前景マップを全てパラメータ保存ストレージ205に格納してもよい。また、前景マップ生成部2044は、複数の群集前景画像から得られた複数の前景マップの平均値を取ることで、複数の前景マップを1つの前景マップに統合し、当該1つの前景マップのみをパラメータ保存ストレージ205に格納してもよい。このようにすることで、様々な人物の配置パターンに対応できる効果が見込まれる。 Note that the foreground map generation unit 2044 may generate a plurality of crowd foreground images for each congestion level. That is, the foreground map generation unit 2044 may generate a plurality of crowd foreground images by randomly changing the arrangement of people for one congestion level (for example, congestion level 3). As described above, when a plurality of crowded foreground images are generated for one congestion level, the foreground map generation unit 2044 stores all the plurality of foreground maps obtained from the plurality of crowd foreground images in the parameter storage storage 205. May be. Further, the foreground map generation unit 2044 integrates the plurality of foreground maps into one foreground map by taking the average value of the plurality of foreground maps obtained from the plurality of crowd foreground images, and only the one foreground map is obtained. It may be stored in the parameter storage 205. By doing in this way, the effect which can respond to the arrangement pattern of various persons is anticipated.
 最後に、人物情報保存ストレージ2045は、生成した前景マップをパラメータ保存ストレージ205に格納する(ステップST17)。前景マップの格納により、パラメータ取得処理が完了となる。 Finally, the person information storage 2045 stores the generated foreground map in the parameter storage 205 (step ST17). The parameter acquisition process is completed by storing the foreground map.
 以下、人数解析部206の詳細な動作について説明する。図6は、人数解析部206の動作を示すフローチャートである。 Hereinafter, the detailed operation of the number analysis unit 206 will be described. FIG. 6 is a flowchart showing the operation of the number analysis unit 206.
 まず、前景抽出部2061は、入力された画像フレーム全体、または注目領域に対して前景抽出を行う(ステップST21)。この前景抽出処理は、パラメータ取得部204の前景抽出部2042の前景抽出処理と同じ処理である。 First, the foreground extraction unit 2061 performs foreground extraction on the entire input image frame or the attention area (step ST21). This foreground extraction process is the same as the foreground extraction process of the foreground extraction unit 2042 of the parameter acquisition unit 204.
 次に、人数推定部2062は、前景抽出部2061により抽出された前景画像に対して、パラメータ保存ストレージ205に保存された混雑レベル別の前景マップを用いて人数推定を行う(ステップST22)。
 図19、図20及び図21は人数推定処理を説明する。
 人数推定部2062は、図19に示すように、画像フレームから前景画像を抽出する。そして、人数推定部2062は、前景画像を、前景マップと同じ部分領域に区分する。図19に示す前景画像における部分領域を、以下、推定対象部分領域という。
 人数推定部2062は、図20に示すように、推定対象部分領域ごとに前景面積量を判定する。
 次に、人数推定部2062は、推定対象部分領域ごとに、混雑レベル別の複数の前景マップから同じ位置にある部分領域を抽出し、抽出した部分領域の各々における前景面積量と、推定対象部分領域における前景面積量とを比較する。そして、人数推定部2062は、推定対象部分領域における前景面積量と最も類似する前景面積量の前景マップの部分領域を選択する。
 人数推定部2062は、例えば、図20の前景画像の1行目1列目(最上段の左端)の推定対象部分領域に対して、混雑レベル1~3の前景マップの各々から1行目1列目の部分領域を抽出する。次に、人数推定部2062は、混雑レベル1~3の前景マップから抽出した3つの部分領域の各々における前景面積量と、図20の推定対象部分領域における部分領域とを比較する。そして、人数推定部2062は、図20の推定対象部分領域における前景面積量と最も類似する前景面積量が付加されている部分領域を、抽出した3つの部分領域の中から選択する。人数推定部2062は、以上の処理を、図20の全ての推定対象部分領域について行う。以下、推定対象領域に対して選択した部分領域を選択部分領域という。
 このようにして人数推定部2062は推定対象部分領域ごとに混雑レベル1~3の前景マップの中から選択部分領域を選択する。このため、推定対象部分領域によって、選択部分領域の混雑レベルが相違する場合がある。例えば、図20の前景画像のn行目m列目の推定対象部分領域に対しては、混雑レベル3の前景マップの部分領域が選択され、n行目m+1列目の推定対象部分領域に対しては、混雑レベル2の前景マップの部分領域が選択され、n行目m+2列目の推定対象部分領域に対しては、混雑レベル1の前景マップの部分領域が選択されるということが生じ得る。
 次に、人数推定部2062は、推定対象部分領域ごとに、推定対象部分領域に含まれる人物の数を求める。具体的には、人数推定部2062は、推定対象部分領域の前景面積量を、選択部分領域の領域人物面積量で割り、除算値に選択部分領域の領域人物数を乗算する。つまり、人数推定部2062は、(推定対象部分領域の前景面積量)÷(選択部分領域の領域人物面積量)×(選択部分領域の領域人物数)という計算を行って、推定対象部分領域に含まれる人物の数を求める。
 そして、人数推定部2062は、推定対象部分領域ごとに得られた人数を集計して、画像フレーム全体、または注目領域に含まれる人数を得る。
Next, the number estimating unit 2062 estimates the number of foreground images extracted by the foreground extracting unit 2061 using the foreground map for each congestion level stored in the parameter storage storage 205 (step ST22).
19, 20 and 21 illustrate the number estimation process.
The number estimating unit 2062 extracts a foreground image from the image frame as shown in FIG. Then, the number estimating unit 2062 divides the foreground image into the same partial area as the foreground map. The partial area in the foreground image shown in FIG. 19 is hereinafter referred to as an estimation target partial area.
As shown in FIG. 20, the number-of-people estimation unit 2062 determines the foreground area amount for each estimation target partial region.
Next, the number estimating unit 2062 extracts, for each estimation target partial area, a partial area at the same position from a plurality of foreground maps for each congestion level, the foreground area amount in each of the extracted partial areas, and the estimation target part The foreground area amount in the region is compared. Then, the number estimating unit 2062 selects a partial region of the foreground map having a foreground area amount most similar to the foreground area amount in the estimation target partial region.
For example, the number-of-people estimation unit 2062 applies the first row 1 from each of the foreground maps of the congestion levels 1 to 3 to the estimation target partial region in the first row and first column (upper left corner) of the foreground image of FIG. Extract a partial region of the column. Next, the number estimating unit 2062 compares the foreground area amount in each of the three partial regions extracted from the foreground map of the congestion levels 1 to 3 with the partial region in the estimation target partial region in FIG. Then, the number estimating unit 2062 selects a partial area to which the foreground area amount most similar to the foreground area amount in the estimation target partial area in FIG. 20 is added from the extracted three partial areas. The number-of-people estimation unit 2062 performs the above processing for all of the estimation target partial regions in FIG. Hereinafter, the partial region selected for the estimation target region is referred to as a selected partial region.
In this way, the number estimating unit 2062 selects a selected partial area from the foreground map of the congestion levels 1 to 3 for each estimation target partial area. For this reason, the congestion level of the selected partial region may differ depending on the estimation target partial region. For example, for the nth row and mth column estimation target partial region of the foreground image of FIG. 20, the congestion level 3 foreground map partial region is selected, and the nth row m + 1th column estimation target partial region is selected. Thus, a partial area of the foreground map at the congestion level 2 is selected, and a partial area of the foreground map at the congestion level 1 may be selected for the estimation target partial area at the nth row and m + 2 column. .
Next, the number estimating unit 2062 obtains the number of persons included in the estimation target partial area for each estimation target partial area. Specifically, the number estimating unit 2062 divides the foreground area amount of the estimation target partial region by the region person area amount of the selected partial region, and multiplies the division value by the number of region persons of the selected partial region. That is, the number estimating unit 2062 performs a calculation of (foreground area amount of estimation target partial region) / (region person area amount of selected partial region) × (number of region persons of selected partial region) to calculate the estimation target partial region. Find the number of people included.
Then, the number estimating unit 2062 counts the number of persons obtained for each estimation target partial area to obtain the number of persons included in the entire image frame or the attention area.
***実施の形態の効果の説明***
 以上のように、本実施の形態では、パラメータ取得部204が、実際に観測された前景画像を用いて、オクルージョンを考慮して、領域人物面積量と領域人物数を求める。そして、人数解析部206が、人数推定対象の前景画像に領域人物面積量と領域人物数を適用して、人数推定を行うため、人物の平均寸法を予め得ることなく、高精度な人数推定が可能である。
 また、本実施の形態では、パラメータ取得部204が、カメラの外部パラメータであるカメラの俯角を推定するため、カメラの外部パラメータを予め測定しておく必要がない。特許文献1に開示されている技術では、CGモデルの生成のために、カメラ外部パラメータを予め測定して取得しておく必要がある。カメラ外部パラメータは、例えば、カメラの俯角や、点在する人物の各々からカメラまでの距離などのパラメータである。このように、特許文献1に開示されている技術では、カメラ外部パラメータを予め測定して取得しておく必要があるため、カメラ設置時のコストが大きいという課題がある。本実施の形態では、前述のように、カメラの外部パラメータを予め測定しておく必要がないという効果がある。
*** Explanation of the effect of the embodiment ***
As described above, in the present embodiment, the parameter acquisition unit 204 uses the actually observed foreground image to determine the area person area amount and the area person number in consideration of occlusion. Since the number analysis unit 206 applies the area person area amount and the number of area people to the foreground image to be estimated, the number of persons is estimated, so that it is possible to estimate the number of persons with high accuracy without obtaining the average size of the persons in advance. Is possible.
In the present embodiment, the parameter acquisition unit 204 estimates the depression angle of the camera, which is an external parameter of the camera, and thus it is not necessary to measure the external parameter of the camera in advance. In the technique disclosed in Patent Document 1, it is necessary to measure and acquire camera external parameters in advance in order to generate a CG model. The camera external parameters are, for example, parameters such as the depression angle of the camera and the distance from each of the scattered people to the camera. As described above, in the technique disclosed in Patent Document 1, since it is necessary to measure and acquire the camera external parameters in advance, there is a problem that the cost at the time of camera installation is large. In this embodiment, as described above, there is an effect that it is not necessary to measure the external parameters of the camera in advance.
 なお、以上では、人を被写体の例として説明を行ったが、被写体は人に限らない。被写体が、例えば、野生動物、昆虫等の生命体、または車両等の人以外の移動体であってもよい。 In the above description, a person has been described as an example of a subject, but the subject is not limited to a person. The subject may be a living body such as a wild animal or an insect, or a moving body other than a person such as a vehicle.
 また、本願発明はその発明の範囲内において、実施の形態で示した構成要素又は手順の自由な組み合わせ、変形、省略が可能である。 Further, the present invention can be freely combined, modified, or omitted from the constituent elements or procedures shown in the embodiment within the scope of the invention.
***ハードウェア構成の説明***
 最後に、群集監視装置20のハードウェア構成の補足説明を行う。
 図22に示すプロセッサ901は、プロセッシングを行うIC(Integrated Circuit)である。
 プロセッサ1101は、CPU(Central Processing Unit)、DSP(Digital Signal Processor)等である。
 図22に示すメモリ1102は、RAM(Random Access Memory)である。
 図22に示す記憶装置1104は、ROM(Read Only Memory)、フラッシュメモリ、HDD(Hard Disk Drive)等である。
 図22に示すネットワークインタフェース1103は、データを受信するレシーバー及びデータを送信するトランスミッターを含む。
 ネットワークインタフェース1103は、例えば、通信チップ又はNIC(Network Interface Card)である。
*** Explanation of hardware configuration ***
Finally, a supplementary explanation of the hardware configuration of the crowd monitoring apparatus 20 will be given.
A processor 901 illustrated in FIG. 22 is an IC (Integrated Circuit) that performs processing.
The processor 1101 is a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or the like.
A memory 1102 illustrated in FIG. 22 is a RAM (Random Access Memory).
The storage device 1104 illustrated in FIG. 22 is a ROM (Read Only Memory), a flash memory, an HDD (Hard Disk Drive), or the like.
The network interface 1103 illustrated in FIG. 22 includes a receiver that receives data and a transmitter that transmits data.
The network interface 1103 is, for example, a communication chip or a NIC (Network Interface Card).
 また、記憶装置1104には、OS(Operating System)も記憶されている。
 そして、OSの少なくとも一部がプロセッサ1101により実行される。
 プロセッサ1101はOSの少なくとも一部を実行しながら、画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207の機能を実現するプログラムを実行する。
 プロセッサ1101がOSを実行することで、タスク管理、メモリ管理、ファイル管理、通信制御等が行われる。
 また、画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207の処理の結果を示す情報、データ、信号値及び変数値の少なくともいずれかが、メモリ1102、記憶装置1104、プロセッサ1101内のレジスタ及びキャッシュメモリの少なくともいずれかに記憶される。
 また、画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207の機能を実現するプログラムは、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ブルーレイ(登録商標)ディスク、DVD等の可搬記憶媒体に記憶されてもよい。
The storage device 1104 also stores an OS (Operating System).
Then, at least a part of the OS is executed by the processor 1101.
A processor 1101 executes at least part of the OS, and implements the functions of the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207. Execute.
When the processor 1101 executes the OS, task management, memory management, file management, communication control, and the like are performed.
Further, at least information, data, signal values, and variable values indicating the processing results of the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207. Any one of them is stored in at least one of the memory 1102, the storage device 1104, a register in the processor 1101, and a cache memory.
The programs for realizing the functions of the image receiving / decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206 and the analysis result output unit 207 are a magnetic disk, a flexible disk, an optical disk, a compact You may memorize | store in portable storage media, such as a disk, a Blu-ray (trademark) disk, and DVD.
 また、画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207を、「回路」又は「工程」又は「手順」又は「処理」に読み替えてもよい。
 また、群集監視装置20は、ロジックIC(Integrated Circuit)、GA(Gate Array)、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)といった電子回路により実現されてもよい。
 この場合は、画像受信復号部201、切替スイッチ202、モード切替制御部203、パラメータ取得部204、人数解析部206及び解析結果出力部207は、それぞれ電子回路の一部として実現される。
 なお、プロセッサ及び上記の電子回路を総称してプロセッシングサーキットリーともいう。
In addition, the image receiving / decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 are set to “circuit” or “process” or “procedure” or “processing”. May be read as
The crowd monitoring device 20 may be realized by an electronic circuit such as a logic IC (Integrated Circuit), a GA (Gate Array), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Gate Array).
In this case, the image reception decoding unit 201, the changeover switch 202, the mode switching control unit 203, the parameter acquisition unit 204, the number of people analysis unit 206, and the analysis result output unit 207 are each realized as part of an electronic circuit.
The processor and the electronic circuit are also collectively referred to as a processing circuit.
 10 カメラ、20 群集監視装置、201 画像受信復号部、202 切替スイッチ、203 モード切替制御部、204 パラメータ取得部、205 パラメータ保存ストレージ、206 人数解析部、207 解析結果出力部、1101 プロセッサ、1102 メモリ、1103 ネットワークインタフェース、1104 記憶装置、2041 人物検出部、2042 前景抽出部、2043 俯角推定部、2044 前景マップ生成部、2045 人物情報保存ストレージ、2061 前景抽出部、2062 人数推定部。 10 cameras, 20 crowd monitoring devices, 201 image reception decoding unit, 202 switch, 203 mode switching control unit, 204 parameter acquisition unit, 205 parameter storage, 206 number analysis unit, 207 analysis result output unit, 1101 processor, 1102 memory 1103 network interface, 1104 storage device, 2041 person detection unit, 2042 foreground extraction unit, 2043 depression angle estimation unit, 2044 foreground map generation unit, 2045 person information storage storage, 2061 foreground extraction unit, 2062 number estimation unit.

Claims (8)

  1.  被写体が存在する撮影空間が撮影された前記被写体が映されている撮影画像である第1の撮影画像から前記被写体の画像である被写体画像を抽出し、抽出した前記被写体画像に基づき、前記撮影空間に複数の被写体が存在する複数の参照画像を前記撮影空間に存在する被写体の数を参照画像ごとに変化させて生成する参照画像生成部と、
     前記第1の撮影画像とは撮影時刻が異なる前記撮影空間の撮影画像である第2の撮影画像と、前記複数の参照画像とを比較し、前記第2の撮影画像に映されている被写体の数を推定する被写体数推定部とを有する画像処理装置。
    A subject image that is an image of the subject is extracted from a first photographed image that is a photographed image in which the subject is photographed in a photographing space in which the subject is present, and the photographing space is based on the extracted subject image. A reference image generation unit that generates a plurality of reference images in which a plurality of subjects are present by changing the number of subjects existing in the shooting space for each reference image;
    The second photographed image that is a photographed image in the photographing space having a photographing time different from that of the first photographed image is compared with the plurality of reference images, and the subject imaged in the second photographed image is compared. An image processing apparatus comprising: a subject number estimation unit that estimates a number.
  2.  前記参照画像生成部は、
     前記複数の参照画像の各参照画像を複数の部分領域に区分し、各参照画像の部分領域ごとに、部分領域に映されている被写体の数を領域被写体数として算出し、
     前記被写体数推定部は、
     前記第2の撮影画像を前記複数の部分領域に区分し、前記第2の撮影画像の部分領域ごとに、前記複数の参照画像の各参照画像から、前記第2の撮影画像の部分領域の位置と同じ位置にある部分領域を抽出し、抽出した複数の部分領域の各々と前記第2の撮影画像の部分領域とを比較し、抽出した前記複数の部分領域の中からいずれかの部分領域を選択し、選択した部分領域の領域被写体数を用いて、前記第2の撮影画像に映されている被写体の数を推定する請求項1に記載の画像処理装置。
    The reference image generation unit
    Dividing each reference image of the plurality of reference images into a plurality of partial areas, for each partial area of each reference image, calculating the number of subjects reflected in the partial area as the number of area subjects,
    The subject number estimating unit
    The second captured image is divided into the plurality of partial regions, and the position of the partial region of the second captured image is determined for each partial region of the second captured image from each reference image of the plurality of reference images. And extracting the partial areas at the same position, comparing each of the extracted partial areas with the partial area of the second photographed image, and selecting any partial area from the extracted partial areas. The image processing apparatus according to claim 1, wherein the number of subjects selected is estimated, and the number of subjects shown in the second captured image is estimated using the number of subject subjects in the selected partial region.
  3.  前記参照画像生成部は、
     各参照画像の部分領域ごとに、前記領域被写体数と、部分領域における前景面積量とを算出し、
     前記被写体数推定部は、
     前記第2の撮影画像の部分領域ごとに、前記第2の撮影画像の部分領域における前景面積量を算出し、
     前記第2の撮影画像の部分領域ごとに、抽出した前記複数の部分領域の各々における前景面積量と、前記第2の撮影画像の部分領域における前景面積量とを比較し、前記第2の撮影画像の部分領域における前景面積量に類似する前景面積量の部分領域を選択し、
     前記第2の撮影画像の部分領域ごとに、選択した部分領域の領域被写体数と、選択した部分領域における前景面積量と、前記第2の撮影画像の部分領域における前景面積量とを用いて、前記第2の撮影画像の部分領域に映されている被写体の数を推定し、
     前記第2の撮影画像の部分領域ごとに推定した被写体の数に基づき、前記第2の撮影画像に映されている被写体の数を推定する請求項2に記載の画像処理装置。
    The reference image generation unit
    For each partial area of each reference image, calculate the number of area subjects and the foreground area amount in the partial area,
    The subject number estimating unit
    For each partial area of the second captured image, calculate the foreground area amount in the partial area of the second captured image;
    For each partial region of the second photographed image, the foreground area amount in each of the extracted partial regions is compared with the foreground area amount in the partial region of the second photographed image, and the second photographing Select a partial area of the foreground area amount similar to the foreground area amount in the partial area of the image,
    For each partial region of the second captured image, using the number of area subjects in the selected partial region, the foreground area amount in the selected partial region, and the foreground area amount in the partial region of the second captured image, Estimating the number of subjects shown in the partial area of the second captured image;
    The image processing apparatus according to claim 2, wherein the number of subjects shown in the second photographed image is estimated based on the number of subjects estimated for each partial region of the second photographed image.
  4.  前記参照画像生成部は、
     前記撮影空間に存在する被写体の数ごとに、前記撮影空間に存在する被写体の数が同じで被写体の配置が異なる複数の参照画像を生成する請求項1に記載の画像処理装置。
    The reference image generation unit
    The image processing apparatus according to claim 1, wherein a plurality of reference images having the same number of subjects existing in the shooting space and different subject arrangements are generated for each number of subjects existing in the shooting space.
  5.  前記参照画像生成部は、
     被写体間でオクルージョンが発生している参照画像を生成する請求項1に記載の画像処理装置。
    The reference image generation unit
    The image processing apparatus according to claim 1, wherein a reference image in which occlusion occurs between subjects is generated.
  6.  前記参照画像生成部は、
     前記撮影空間を見下ろす位置に設置されたカメラにより撮影された前記第1の撮影画像から前記被写体画像を抽出し、前記第1の撮影画像を解析して前記カメラの俯角を推定し、抽出した前記被写体画像と、推定した前記カメラの俯角とに基づき、前記複数の参照画像を生成する請求項1に記載の画像処理装置。
    The reference image generation unit
    The subject image is extracted from the first photographed image photographed by a camera installed at a position overlooking the photographing space, the depression angle of the camera is estimated by analyzing the first photographed image, and the extracted The image processing apparatus according to claim 1, wherein the plurality of reference images are generated based on a subject image and an estimated depression angle of the camera.
  7.  コンピュータが、被写体が存在する撮影空間が撮影された前記被写体が映されている撮影画像である第1の撮影画像から前記被写体の画像である被写体画像を抽出し、抽出した前記被写体画像に基づき、前記撮影空間に複数の被写体が存在する複数の参照画像を前記撮影空間に存在する被写体の数を参照画像ごとに変化させて生成し、
     前記コンピュータが、前記第1の撮影画像とは撮影時刻が異なる前記撮影空間の撮影画像である第2の撮影画像と、前記複数の参照画像とを比較し、前記第2の撮影画像に映されている被写体の数を推定する画像処理方法。
    A computer extracts a subject image that is an image of the subject from a first photographed image that is a photographed image in which the subject is photographed in a photographing space in which the subject is present, and based on the extracted subject image, Generating a plurality of reference images having a plurality of subjects in the shooting space by changing the number of subjects existing in the shooting space for each reference image;
    The computer compares the plurality of reference images with a second photographed image that is a photographed image of the photographing space that has a photographing time different from that of the first photographed image, and is reflected in the second photographed image. An image processing method for estimating the number of subjects.
  8.  被写体が存在する撮影空間が撮影された前記被写体が映されている撮影画像である第1の撮影画像から前記被写体の画像である被写体画像を抽出し、抽出した前記被写体画像に基づき、前記撮影空間に複数の被写体が存在する複数の参照画像を前記撮影空間に存在する被写体の数を参照画像ごとに変化させて生成する参照画像生成処理と、
     前記第1の撮影画像とは撮影時刻が異なる前記撮影空間の撮影画像である第2の撮影画像と、前記複数の参照画像とを比較し、前記第2の撮影画像に映されている被写体の数を推定する被写体数推定処理とをコンピュータに実行させる画像処理プログラム。
    A subject image that is an image of the subject is extracted from a first photographed image that is a photographed image in which the subject is photographed in a photographing space in which the subject is present, and the photographing space is based on the extracted subject image. A reference image generation process for generating a plurality of reference images in which a plurality of subjects are present by changing the number of subjects existing in the shooting space for each reference image;
    The second photographed image that is a photographed image in the photographing space having a photographing time different from that of the first photographed image is compared with the plurality of reference images, and the subject imaged in the second photographed image is compared. An image processing program for causing a computer to execute subject number estimation processing for estimating the number.
PCT/JP2017/010247 2017-03-14 2017-03-14 Image processing device, image processing method, and image processing program WO2018167851A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
PCT/JP2017/010247 WO2018167851A1 (en) 2017-03-14 2017-03-14 Image processing device, image processing method, and image processing program
CN201780088044.6A CN110383295B (en) 2017-03-14 2017-03-14 Image processing apparatus, image processing method, and computer-readable storage medium
MYPI2019004399A MY184063A (en) 2017-03-14 2017-03-14 Image processing device, image processing method, and image processing program
SG11201906822YA SG11201906822YA (en) 2017-03-14 2017-03-14 Image processing device, image processing method, and image processing program
JP2019505568A JP6559378B2 (en) 2017-03-14 2017-03-14 Image processing apparatus, image processing method, and image processing program
TW106123456A TW201833822A (en) 2017-03-14 2017-07-13 Image processing device, image processing method, and image processing program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/010247 WO2018167851A1 (en) 2017-03-14 2017-03-14 Image processing device, image processing method, and image processing program

Publications (1)

Publication Number Publication Date
WO2018167851A1 true WO2018167851A1 (en) 2018-09-20

Family

ID=63521936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/010247 WO2018167851A1 (en) 2017-03-14 2017-03-14 Image processing device, image processing method, and image processing program

Country Status (6)

Country Link
JP (1) JP6559378B2 (en)
CN (1) CN110383295B (en)
MY (1) MY184063A (en)
SG (1) SG11201906822YA (en)
TW (1) TW201833822A (en)
WO (1) WO2018167851A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115701120B (en) * 2021-07-26 2024-06-25 荣耀终端有限公司 Electronic equipment and camera module

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025328A (en) * 2003-06-30 2005-01-27 Ntt Data Corp Congestion monitoring system and congestion monitoring program
JP2005242646A (en) * 2004-02-26 2005-09-08 Ntt Data Corp People flow measuring system, people flow measuring method and people flow measuring program
JP2013089174A (en) * 2011-10-21 2013-05-13 Nippon Telegr & Teleph Corp <Ntt> Apparatus, method and program for measuring number of object
JP2014229068A (en) * 2013-05-22 2014-12-08 株式会社 日立産業制御ソリューションズ People counting device and person flow line analysis apparatus
US20160210756A1 (en) * 2013-08-27 2016-07-21 Nec Corporation Image processing system, image processing method, and recording medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5624809B2 (en) * 2010-06-24 2014-11-12 株式会社 日立産業制御ソリューションズ Image signal processing device
JP2014164525A (en) * 2013-02-25 2014-09-08 Nippon Telegr & Teleph Corp <Ntt> Method, device and program for estimating number of object
US20160127657A1 (en) * 2013-06-11 2016-05-05 Sharp Kabushiki Kaisha Imaging system
JP6638723B2 (en) * 2015-03-04 2020-01-29 ノーリツプレシジョン株式会社 Image analysis device, image analysis method, and image analysis program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025328A (en) * 2003-06-30 2005-01-27 Ntt Data Corp Congestion monitoring system and congestion monitoring program
JP2005242646A (en) * 2004-02-26 2005-09-08 Ntt Data Corp People flow measuring system, people flow measuring method and people flow measuring program
JP2013089174A (en) * 2011-10-21 2013-05-13 Nippon Telegr & Teleph Corp <Ntt> Apparatus, method and program for measuring number of object
JP2014229068A (en) * 2013-05-22 2014-12-08 株式会社 日立産業制御ソリューションズ People counting device and person flow line analysis apparatus
US20160210756A1 (en) * 2013-08-27 2016-07-21 Nec Corporation Image processing system, image processing method, and recording medium

Also Published As

Publication number Publication date
JP6559378B2 (en) 2019-08-14
TW201833822A (en) 2018-09-16
CN110383295A (en) 2019-10-25
SG11201906822YA (en) 2019-08-27
JPWO2018167851A1 (en) 2019-06-27
CN110383295B (en) 2022-11-11
MY184063A (en) 2021-03-17

Similar Documents

Publication Publication Date Title
US8331617B2 (en) Robot vision system and detection method
US10893251B2 (en) Three-dimensional model generating device and three-dimensional model generating method
CN110692083B (en) Block-matched optical flow and stereoscopic vision for dynamic vision sensor
GB2507395B (en) Video-based vehicle speed estimation from motion vectors in video streams
US9230333B2 (en) Method and apparatus for image processing
US20200250885A1 (en) Reconstruction method, reconstruction device, and generation device
WO2018052547A1 (en) An automatic scene calibration method for video analytics
KR101781154B1 (en) Camera and method for optimizing the exposure of an image frame in a sequence of image frames capturing a scene based on level of motion in the scene
JP2015528614A5 (en)
AU2012340862A1 (en) Geographic map based control
US9576204B2 (en) System and method for automatic calculation of scene geometry in crowded video scenes
CN112104869B (en) Video big data storage and transcoding optimization system
CN110992393B (en) Target motion tracking method based on vision
JP6622575B2 (en) Control device, control method, and program
CN106530353B (en) The three-dimensional motion point detecting method rebuild for binocular vision system sparse three-dimensional
JP6559378B2 (en) Image processing apparatus, image processing method, and image processing program
WO2019087383A1 (en) Crowd density calculation device, crowd density calculation method and crowd density calculation program
JP4979083B2 (en) Monitoring system, monitoring method, and program
JP2004046464A (en) Apparatus and method for estimating three-dimensional position of mobile object, program, and recording medium thereof
US11272209B2 (en) Methods and apparatus for determining adjustment parameter during encoding of spherical multimedia content
KR101904170B1 (en) Coding Device and Method for Depth Information Compensation by Sphere Surface Modeling
KR101904128B1 (en) Coding Method and Device Depth Video by Spherical Surface Modeling
JP2001145110A (en) Image-encoding device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17901310

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019505568

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17901310

Country of ref document: EP

Kind code of ref document: A1