WO2017155126A1 - Information transmitting system, information transmitting device, information receiving device, and computer program - Google Patents

Information transmitting system, information transmitting device, information receiving device, and computer program Download PDF

Info

Publication number
WO2017155126A1
WO2017155126A1 PCT/JP2017/010290 JP2017010290W WO2017155126A1 WO 2017155126 A1 WO2017155126 A1 WO 2017155126A1 JP 2017010290 W JP2017010290 W JP 2017010290W WO 2017155126 A1 WO2017155126 A1 WO 2017155126A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image
person
avatar
camera
Prior art date
Application number
PCT/JP2017/010290
Other languages
French (fr)
Japanese (ja)
Inventor
靖和 本玉
寛紀 山内
Original Assignee
一般社団法人 日本画像認識協会
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 一般社団法人 日本画像認識協会 filed Critical 一般社団法人 日本画像認識協会
Priority to JP2017564647A priority Critical patent/JP6357595B2/en
Publication of WO2017155126A1 publication Critical patent/WO2017155126A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering

Definitions

  • the present invention relates to a system for transmitting video information acquired by a camera.
  • the present invention provides a technology that enables effective use of a network communication band when transmitting video from a camera.
  • an information transmission system includes: A feature point extraction unit that extracts feature points from a subject in an image captured by at least one camera and outputs the feature points as feature information; A coordinate information adding unit for acquiring coordinate information of the subject in the shooting range of the camera; An information transmission unit for transmitting the feature information and the coordinate information to the network; An information receiver for receiving feature information and coordinate information from the network; A dynamic generation unit that generates an avatar image of the subject based on the feature information; And an image composition unit that generates a composite image by compositing an avatar image based on the coordinate information with an image representing the background of the shooting range of the camera.
  • the feature point feature information extracted from the subject and the subject coordinate information in the shooting range of the camera are transmitted to the network.
  • an avatar image of the subject is generated based on the feature information, and the avatar image is combined with an image representing the background of the shooting range of the camera based on the coordinate information.
  • a composite image is generated from the background image and the avatar image on the receiving side without sending the video signal of the camera. Therefore, the network communication band is made effective compared to the case of sending the video signal of the camera. It can be used.
  • an avatar image is used instead of the actual video of the subject, there is an advantage that privacy is not infringed even when a large number of unspecified persons are photographed.
  • the coordinate information adding unit sets the position identified as the ground contact point of the person who makes the subject appearing on the photographing screen as the photographing ground position, and displays the person image area appearing at the photographing installation position on the photographing screen.
  • the coordinate information adding unit uses a camera two-dimensional coordinate system set on the camera shooting screen with the walking plane in the real space when the subject is a person as a reference in the height direction.
  • Actual grounding position coordinate information which is the grounding position coordinates of the person in the real space based on the position / height conversion relation information on the ground position / height specifying means and the information of the identified shooting grounding position coordinates and shooting height
  • real person coordinate / height information generating means for converting and generating the height of the person in the real space into real person height information as information.
  • the dynamic generation unit includes an avatar height determining unit that determines the height dimension of the avatar image based on the generated real person height information, and the image composition unit includes the background image based on the actual ground position coordinate information.
  • An avatar composition position determining means for determining the composition position of the avatar image to the image can be provided.
  • the spatial existence range of the person in the area to be photographed is a horizontal plane (similarly, x ⁇ direction) such as a floor surface or the ground where the height direction (assuming to be the z axis direction of the real space orthogonal coordinate system) is constant. It is almost limited to the y plane), and the z coordinate of the contact point (foot position) can always be regarded as constant (for example, 0). That is, the coordinates of the contact point of the person walking in the area can be substantially described in an xy two-dimensional system, and can be uniquely associated with the camera two-dimensional coordinate system.
  • the camera two-dimensional coordinate system corresponds to a real-space three-dimensional coordinate system obtained by projective transformation, and an object separated from the camera is projected with a reduced size.
  • This transformation is mathematically described as a matrix, but if a reference object with a known height is placed at various known positions in the real space coordinate system on the floor or on the ground, the image is taken with a camera.
  • the position and height of the reference body image on the shooting screen By comparing the position and height of the reference body image on the shooting screen with the position and actual size in the real space, the position and height of the person on the camera screen are changed to the position and height in the real space.
  • Position / height conversion related information that is information to be converted can be obtained.
  • the dynamic generation unit can easily determine the height of the avatar image to be synthesized on the background image, and the image synthesis unit reasonably and easily determines the synthesis position of the avatar image to the background image. be able to.
  • the feature point extraction unit analyzes the motion or orientation of the subject and outputs it as motion analysis information
  • the information transmission unit sends the person attribute information to the network
  • the image composition unit operates. It can be configured to adjust the movement or orientation of the avatar image based on the analysis information. According to this, since the movement and direction of the avatar image are adjusted based on the movement and direction of the subject at the time of shooting, for example, the moving speed and moving direction of the subject can be reflected in the avatar image.
  • the camera can capture moving images
  • the coordinate information adding unit acquires coordinate information of a person who makes a subject for each frame of the captured moving image
  • the feature point extracting unit stores the coordinate information of the person. It may be configured to output movement trajectory information between frames as motion analysis information. If the movement trajectory information of the previous frame is analyzed for the current frame, it becomes particularly easy to grasp the movement of the person image that reaches the current frame. For example, a person usually walks with their face and torso facing forward unless they move irregularly, such as walking sideways or backwards, so the movement trajectory of the representative point of the person image (for example, the ground contact point) is If it is found out, the orientation of the body according to the walking motion can be grasped sequentially.
  • the image composition unit can be configured to adjust the orientation of the avatar image to be synthesized on the background image based on the movement trajectory information.
  • the dynamic generation unit can be configured to generate different avatar images according to the movement direction of the person in the real space so that the appearance of the person from the viewpoint of the camera is reflected.
  • the reality of the avatar image expression can be increased by changing the avatar image in accordance with the way (angle) the camera is reflected in the walking direction.
  • the dynamic generation unit includes a direction-specific two-dimensional avatar image data storage unit that stores a plurality of two-dimensional avatar image data having different representation forms according to a plurality of predetermined movement directions of a person in real space.
  • the dynamic generation unit includes a three-dimensional avatar image data storage unit that stores the data of the avatar image as the three-dimensional avatar image data, and generates a three-dimensional avatar object based on the three-dimensional avatar image data.
  • the image composition unit generates a two-dimensional avatar image data by projecting and transforming a three-dimensional avatar object in the real space whose arrangement direction is determined to a two-dimensional coordinate system of a background image, and based on the two-dimensional avatar image data
  • the avatar image can be configured to be combined with the background image. In this case, the data capacity is increased by making the avatar image data three-dimensional.
  • the direction in which the avatar image is pasted on the background image can be made stepless, and more realistic expression can be realized.
  • the image composition unit can also generate an image representing a person's flow line based on the movement trajectory information. According to this configuration, it is easy to visually grasp how a specific subject has moved on the background image. For example, it can be used effectively for crime prevention purposes, and it can be clarified by statistical trend analysis of flow line images where there are places where individual people are interested in exhibition halls and public facilities. Benefit from the benefits.
  • the information transmission system of the present invention can be configured such that the feature point extraction unit analyzes the person attribute of the subject and outputs it as person attribute information, and the information transmission unit sends the person attribute information to the network. According to this configuration, various analysis / statistical processes and the like can be performed using the person attribute information on the receiving side.
  • the dynamic generation unit can be configured to generate the avatar image as reflecting the person attribute information.
  • the attribute of a corresponding person can be easily grasped even after being converted into an avatar image.
  • the attributes of the people can be simplified or emphasized by the avatar image, and there is an advantage that the tendency on the image can be easily grasped.
  • the person attribute information can be configured to include gender information that reflects the gender of the person and age information that reflects the age of the person, but is not limited thereto, for example, the appearance of the face, etc.
  • nationality for example, Japanese or Westerners
  • the feature point extraction unit can analyze the appearance of the subject person and output it as appearance feature information
  • the information transmission unit can be configured to send the appearance feature information to the network.
  • the appearance of the subject is important information that leads to the identification of individual persons following the person attributes, and is useful in analysis and statistical processing.
  • the dynamics generation unit can be configured to generate the avatar image as a reflection of the appearance feature information, so that the features of the corresponding person can be further understood after being converted into the avatar image. . Examples of elements that most reflect the characteristics of the appearance of a person include hair, clothing, and belongings.
  • the appearance characteristic information includes hair information that reflects one or both of the form and color of the person's hair, clothing information that reflects one or both of the form and color of the person's clothing, and the form and color of the person's belongings.
  • the body shape of a person is also useful information.
  • the appearance feature information can be configured to include body shape information that reflects the body shape of a person.
  • gaits features of walking
  • the appearance feature information can be configured to include gait information reflecting a person's gait.
  • the information for specifying the gait is, for example, the stride (or the frequency of movement linked to the walking speed), the swing angle of the arm, the walking speed, the upper body angle at the time of walking, the vertical shaking, etc.
  • the dynamic generation unit can be configured to use avatar animation data composed of frame data obtained by subdividing a person walking action, and can realistically represent an avatar image as an animation of a walking action on a background image.
  • the dynamic generation unit performs image correction processing for correcting each frame of the frame data based on the gait information
  • the image composition unit reflects the gait feature on the avatar image based on the corrected frame data. It can be configured to be combined with the background image in the form of animation. The movement of the avatar image reflecting the gait information of the corresponding person can be easily realized by the correction processing of each frame of the avatar animation data.
  • the imaging range can be covered by a plurality of cameras in the form of sharing real space coordinates.
  • Each camera shoots a common area with different camera coordinate systems, but if the real space of the area to be jointly monitored is stretched in the same coordinate system, if you want to integrate the shooting information of each camera later,
  • the integration process can be completed immediately only by performing the process of converting the coordinates of the person into the common real space (for example, a global coordinate system that can be acquired by GPS or the like).
  • the image composition unit can be configured to generate a composite image as a bird's-eye view image including shooting ranges of a plurality of cameras. According to this, the whole imaging range of a plurality of cameras can be grasped at a glance.
  • the coordinate information adding unit may be configured as follows. That is, the position identified as the ground contact point of the person appearing on the shooting screens of the plurality of cameras is set as the shooting ground position, and the height of the person image area appearing at the shooting setting position on the shooting screen is set as the person shooting height.
  • the information adding unit includes a plane coordinate in a camera two-dimensional coordinate system set on a camera shooting screen and a real space three-dimensional coordinate system with a walking plane in the real space when the subject is a person as a reference in the height direction.
  • a position / height conversion relationship information acquisition means for acquiring height conversion relation information
  • a shooting contact position / height specifying means for specifying a shooting contact position and a shooting height of a person image on the shooting screen
  • Them Based on the position / height conversion relation information, the information on the shadow contact position coordinates and the shooting height is the actual contact position coordinate information that is the contact position coordinates of the person in the real space and the actual height of the person in the real space.
  • the coordinate information adding unit is configured as having real person coordinate / height information generating means for converting / generating person height information.
  • the dynamic generation unit includes avatar height determining means for determining the height dimension of the avatar image based on the generated real person height information, and the image composition unit is a person photographed by a plurality of cameras in the real space coordinate system.
  • the actual ground position coordinate information is converted from the viewpoint of the bird's-eye view image, and the avatar composition position determining means for determining the composition position of the avatar image to the bird's-eye view image is provided.
  • the feature point extraction unit divides the image of the subject into a plurality of parts corresponding to parts of the human body, and extracts feature points from each part. According to this structure, the feature point of each part can be detected effectively.
  • the dynamic generation unit includes an avatar image data storage unit that divides and stores the data of the avatar image into a plurality of avatar fragments corresponding to the parts, and uses the feature point information extracted for the corresponding parts of the person. After correcting the avatar fragment of the avatar image based on the basis, the corrected avatar fragment can be integrated to generate an avatar image. In this way, it is possible to make fine corrections that reflect feature points for each avatar fragment (that is, a person's part), and it is not necessary to prepare a large number of image data of the entire avatar for each feature. Reduction can be achieved.
  • the information transmitting apparatus of the present invention is A feature point extraction unit that extracts feature points from a subject in an image captured by at least one camera and outputs the feature points as feature information;
  • a coordinate information adding unit for acquiring coordinate information of the subject in the shooting range of the camera;
  • An information transmission device comprising an information transmission unit for transmitting feature information and coordinate information to a network,
  • the feature information is associated with the constituent elements of the avatar image of the subject displayed at the transmission destination,
  • the coordinate information is used to specify a position where the avatar image is to be combined in an image representing the background of the shooting range of the camera at the transmission destination.
  • the information receiving apparatus of the present invention is An information receiving unit that receives, via a network, feature information representing feature points extracted from a subject in an image captured by at least one camera, and coordinate information of the subject in a shooting range of the camera; A dynamic generation unit that generates an avatar image of the subject based on the feature information; An image composition unit that generates an image by combining an avatar image with an image representing a background of a shooting range of a camera based on coordinate information is provided.
  • the computer program applied to the information transmission side of the present invention is: A feature point extraction process for extracting feature points from a subject in an image captured by at least one camera and outputting them as feature information; Coordinate information addition processing for acquiring coordinate information of the subject in the shooting range of the camera; A computer program for causing a computer to execute information transmission processing for transmitting feature information and coordinate information to a network,
  • the feature information is associated with the constituent elements of the avatar image of the subject displayed at the transmission destination,
  • the coordinate information is used to specify a position where the avatar image is to be combined in an image representing the background of the shooting range of the camera at the transmission destination.
  • the computer program applied to the information receiving side of the present invention is: A reception process for receiving, via a network, feature information representing feature points extracted from a subject in an image captured by at least one camera, and coordinate information of the subject in a shooting range of the camera; Dynamic generation processing for generating an avatar image of a subject based on feature information; A computer is caused to perform an image composition process for generating a composite image by compositing the avatar image based on coordinate information with an image representing the background of the shooting range of the camera.
  • the present invention it is possible to provide a transmission method that does not hinder the effective use of the communication band of the network when transmitting video from the camera.
  • FIG. 1 is a block diagram showing a schematic configuration of an information transmission system according to the first embodiment of the present invention.
  • FIG. 2 is a flowchart showing the processing procedure of the feature point extraction unit.
  • FIG. 3 is a schematic diagram showing how the feature point extraction unit extracts features by dividing the human body into parts.
  • FIG. 4 is a flowchart showing a flow of processing in which the feature point extraction unit extracts person attribute information.
  • FIG. 5 is a schematic diagram illustrating an example of a coordinate system set in the shooting range of the camera.
  • FIG. 6 is a schematic diagram illustrating a display example in which an avatar image is combined with a background image.
  • FIG. 7 is a schematic diagram showing an application example of the present invention.
  • FIG. 1 is a block diagram showing a schematic configuration of an information transmission system according to the first embodiment of the present invention.
  • FIG. 2 is a flowchart showing the processing procedure of the feature point extraction unit.
  • FIG. 3 is a schematic diagram showing how the feature point extraction unit
  • FIG. 8 is a schematic diagram illustrating an expression example of an avatar when there is no continuity of the transmitting camera.
  • FIG. 9 is a block diagram illustrating a schematic configuration of an information transmission system according to the second embodiment.
  • FIG. 10 is a schematic diagram showing a conventional transmission method.
  • FIG. 11 is a diagram for explaining the concept of extracting a difference in a person image area.
  • FIG. 12 is a conceptual diagram of a background image.
  • FIG. 13 is an explanatory diagram of the coordinate information addition process.
  • FIG. 14 is an explanatory diagram following FIG. 13.
  • FIG. 15 is an explanatory diagram of lens distortion correction.
  • FIG. 16 is a flowchart showing the flow of the coordinate information addition process.
  • FIG. 17 is a diagram showing an example of a person image region extraction state on the screen.
  • FIG. 17 is a diagram showing an example of a person image region extraction state on the screen.
  • FIG. 18 is an explanatory diagram for converting the height h of the person image area into the actual height H using the conversion coefficient ⁇ .
  • FIG. 19 is a flowchart showing the flow of the person area detection process.
  • FIG. 20 is a diagram showing a concept of extracting gait feature information.
  • FIG. 21 is a diagram illustrating a concept of extracting movement trajectory information.
  • FIG. 22 is a diagram showing the concept of information storage on the receiving side.
  • FIG. 23 is a diagram illustrating the concept of an avatar image database.
  • FIG. 24 is a diagram illustrating the concept of the person moving direction used for determining the direction of the avatar image.
  • FIG. 25 is a diagram illustrating an example of avatar fragment graphic data.
  • FIG. 26 is an explanatory diagram illustrating an example in which an avatar image is obtained by combining avatar fragment graphics.
  • FIG. 27 is a diagram illustrating an example in which avatar image data is configured as avatar animation data.
  • FIG. 28 is a diagram illustrating an example in which avatar fragment image data is configured as two-dimensional vector graphic data.
  • FIG. 29 is a flowchart showing a flow of processing on the reception unit side.
  • FIG. 30 is a flowchart showing a flow of new avatar creation processing.
  • FIG. 31 is a flowchart showing the flow of the avatar background composition process.
  • FIG. 32 is a flowchart showing the flow of the integrated mode display process.
  • FIG. 33 is a diagram illustrating an example of a planar display form in the integrated display mode.
  • FIG. 34 is a diagram showing an example of a bird's eye view display form.
  • FIG. 35 is an image showing an example of displaying a three-dimensional avatar image.
  • FIG. 1 is a block diagram showing a schematic configuration of the information transmission system 1.
  • the information transmission system 1 includes an information transmission system transmission unit 12 (information transmission device) and an information transmission system reception unit 13 (information reception device).
  • the information transmission system transmission unit 12 and the information transmission system reception unit 13 are connected via a network 15.
  • the network 15 is a public network such as the Internet, but may be a private network such as a local network.
  • the information transmission system transmission unit 12 receives video signals from a plurality of cameras 11 (11a, 11b%) Installed in various places, performs pre-transmission processing (described in detail later), and then performs the network 15. To send. In FIG. 1, only two cameras 11 are shown, but the number of cameras is arbitrary. Communication between the camera 11 and the information transmission system transmission unit 12 may be wired communication or wireless communication.
  • the information transmission system reception unit 13 receives the video signal transmitted from the information transmission system transmission unit 12 via the network 15, performs post-reception processing (described in detail later), and then displays it on the monitor 14. Or recording to a video recording device (not shown) as necessary.
  • the information transmission system transmission unit 12 includes a coordinate information addition unit 121, a feature point extraction unit 122, a multiple camera linkage unit 123, and an information transmission unit 124.
  • One set of coordinate information adding unit 121 and feature point extracting unit 122 is provided for each camera 11.
  • a coordinate information adding unit 121a and a feature point extracting unit 122a are provided for the camera 11a
  • a coordinate information adding unit 121b and a feature point extracting unit 122b are provided for the camera 11b.
  • the feature point extraction unit 122 detects a person area from the video signal photographed by the camera 11, and further extracts features regarding the appearance (for example, clothing, hairstyle, body shape, belongings, etc.) of each person.
  • the coordinate information adding unit 121 detects the position of a person in an area photographed by the camera 11 as coordinate information.
  • the information transmission system 1 differs from the conventional information transmission system in which the video signal photographed by the camera is compressed and transmitted as it is, and is obtained by the feature information obtained by the feature point extraction unit 122 and the coordinate information addition unit 121. Only the received coordinate information is transmitted via the network 15.
  • the information transmission system receiving unit 13 that has received the feature information and the coordinate information records a background image of the shooting range of each camera 11 in advance and accurately identifies each person based on the feature information. An avatar image is generated, and an avatar image is synthesized at an appropriate position of the background image according to the coordinate information.
  • each of the cameras 11 includes the coordinate information addition unit 121 and the feature point extraction unit 122.
  • the multi-camera cooperation unit 123 uses the video signal of any of the plurality of cameras 11 from the coordinate information obtained by the coordinate information addition unit 121 and the feature information obtained by the feature point extraction unit 122.
  • Tag information indicating whether the information is obtained is attached and sent to the information transmission unit 124.
  • the information transmission unit 124 encodes information obtained from the multi-camera cooperation unit 123 according to a predetermined standard, and transmits the encoded information to the network 15.
  • the information transmission system reception unit 13 includes an information reception unit 131, a dynamic generation unit 132, and an image composition unit 133.
  • the information receiving unit 131 decodes the information received from the network 15 and sends it to the dynamic generation unit 132.
  • the dynamic generation unit 132 generates an avatar image representing a photographed person based on the feature information included in the received information.
  • the avatar image generated by the dynamic generation unit 132 is sent to the image composition unit 133 together with the coordinate information.
  • the image composition unit 133 Based on the avatar image and the coordinate information, the image composition unit 133 generates a composite image of the background image of the shooting range of each camera 11 and the avatar image and displays the composite image on the monitor 14. At this time, the tag information indicating which camera 11 is the information obtained from the video signal is used to specify the background image.
  • the coordinate information adding unit 121 specifies the coordinates of the position of the person in the coordinate system set for the shooting range of each camera 11. For example, as shown in FIG. 5, an xy coordinate system 51 is set in the shooting range of one camera 11.
  • the coordinate information adding unit 121 detects the coordinates of the person area specified by the feature point extracting unit 122 in the xy coordinate system 51.
  • the coordinates detected here are sent to the information transmission system receiving unit 13 together with the feature information as coordinate information representing the position of the person.
  • the subject targeted by the present invention is a person who moves around in the area photographed by the camera 11, and considering the spatial geometric movement characteristics, the person on the screen of the single camera 11 shown in FIG.
  • the position and height of the person in the real space can be specified from the information of the image area PA.
  • the spatial existence range of the person in the area to be photographed is the floor surface or the ground, in the case of FIG.
  • the road surface RS on which the person walks, and the position in the height direction (z-axis direction) is constant. Note that it is almost confined to the horizontal plane.
  • This road surface RS is an xy plane whose z-axis coordinate is always 0 in an orthogonal coordinate system, and the coordinates of the contact point of a person walking on the road surface RS can be substantially described in two dimensions of xy. Although it is a point in the three-dimensional space, it can be uniquely associated with the camera two-dimensional coordinate system set on the photographing screen.
  • the camera two-dimensional coordinate system corresponds to a projective transformation of the real space three-dimensional coordinate system, and a subject that is separated in the camera optical axis direction is projected with a reduced size.
  • the reference points p1 to p3 are read by the ⁇ - ⁇ coordinate system which is a camera two-dimensional coordinate system set on the screen, and stored as screen coordinate data p ( ⁇ , ⁇ ) of the reference point (S503).
  • image on the shooting screen is affected by distortion of the camera lens, it is not a strict projective transformation image in real space, and the image may be distorted depending on the position in the field of view. is there. As shown on the left of FIG. 15, the distortion is larger in the region closer to the edge of the screen, and the coordinate system becomes nonlinear.
  • a lens having a large viewing angle such as a wide-angle lens has an outward convex distortion
  • a lens having a small viewing angle such as a telephoto lens has a concave distortion. Therefore, this distortion is eliminated and conversion correction is performed so as to be a point in the orthogonal plane coordinate system (S504).
  • the correction coefficient at this time is determined by an optimization operation that linearizes the shape of a figure that is known to be straight in real space, such as the white line WL appearing on the screen in FIG. Can do. Note that the size of the edge of the screen expands as the edge of the screen is eliminated by this correction, and thus the corrected screen shape SA ′ protrudes outside the original screen SA.
  • the real space coordinates P (x, y, 0) of the ground contact point of the reference body SC can be obtained.
  • the coordinates may be directly specified by a satellite positioning system (GPS).
  • GPS satellite positioning system
  • the real space coordinate system used here may be an independent coordinate system set within the shooting range of each camera, or linked to a global coordinate system provided by a satellite positioning system (GPS). May be.
  • GPS satellite positioning system
  • the height h on the screen of the reference body image SCI is read (S506).
  • FIG. 13 is a flowchart showing a processing procedure of the feature point extraction unit 122.
  • FIG. 3 is a schematic diagram showing how the feature point extraction unit 122 extracts features by dividing the human body into parts.
  • the feature point extraction unit 122 calculates the moving object MO reflected in the video signal by taking the difference between the frames FM as shown in FIG. 11. It detects (step S11 of FIG. 2). Specifically, the image area MO ′ of the preceding frame and the image area MO of the succeeding frame have different positions and shapes between frames if the image area is a moving object, but the background does not change. The image area MO of the moving object can be extracted by taking the image difference between them. On the other hand, if an image is taken in the absence of a moving object, a background image BP is obtained as shown in FIG. The background image BP is taken for each camera, transmitted to the receiving unit 13 in FIG.
  • the feature point extraction unit 122 extracts a person region by performing segmentation, edge detection, pattern matching, and the like on the moving object image detected in step S11, and determines whether or not the moving object is a person. Judgment is made (step S12).
  • Various methods can be used for the moving object detection process and the person extraction process from the video signal, and the method is not limited to a specific method. Also, among the moving objects detected from the video signal, those having a relatively small size are likely to be noise, so they are determined not to be humans, and those having a relatively large size are determined to be humans.
  • the detected position of the lower end edge of the person area PA is regarded as the ground point p, and the coordinates p ( ⁇ , ⁇ ) on the screen are read (S1201), and the above-described position / height conversion is performed. Since the dimension in the height direction of the person area changes depending on the posture of the object with reference to the relationship information, all the images of the person area that seems to be closest to the upright state are searched and specified in the frame (S1203).
  • the feature point extraction unit 122 further performs a process (step S15) of analyzing the operation of each part. For example, for the head p1, head movement (movement and orientation) is detected.
  • the head is the easiest to recognize. If the orientation of the head is known by first extracting the head p1, it is easy to specify the state of other parts, the moving direction, and the like. In addition, for example, when the head is pointing to the right, it is possible to work on the assumption that the left hand and the left foot may be hidden and invisible in the parting described later. For example, if a person is walking, the movement is analyzed and acquired as gait information. In this case, as the operation of the trunk p2, for example, as shown in FIG. 20, the posture such as the upper body angle ⁇ , whether or not it is a stoop, etc. are detected.
  • the movements of the right hand p3 and the left hand p4 are detected as, for example, each hand swing ⁇ .
  • movements of the right foot p5 and the left foot p6 for example, a walking speed, a stride WL, a knee bending angle, and the like are detected.
  • the motion analysis information such as a gait detected here is sent to the information transmission system receiving unit 13 as motion analysis information, and is reflected in the movement and orientation of the avatar representing the person. What is important as motion analysis information is the moving direction of the person. As shown in FIG. 21, when the coordinate information P1, P2,... Pn of the person is specified for each frame of the captured moving image, the set of the coordinate information P1, P2,. The movement trajectory information between them will be configured.
  • a difference Vn ⁇ Vn ⁇ 1 between the position vectors Vn and Vn ⁇ 1 of the coordinates Pn and Pn ⁇ 1 between the adjacent frames can be used as an index representing the moving direction of the person at the position Pn, and the direction of the avatar image described later It is also used effectively in decisions.
  • the feature point extraction unit 122 uses the human region P extracted in step S12 (see FIG. 3A) as six parts: a head p1, a torso p2, a right hand p3, a left hand p4, a right foot p5, and a left foot p6. (Refer to (b) of FIG. 3) to form parts (step S13). Then, an external feature analysis is performed for each of the six parts that have been made into parts (step S14).
  • the hairstyle, hair color, presence / absence of a hat, and the like are extracted as feature points.
  • the body shape, the shape of clothing, the color of clothing, the presence or absence of specific belongings such as a rucksack, and the like are extracted as feature points.
  • the characteristic points regarding the right hand p3 and the left hand p4 are, for example, the body shape, the shape (or type) of clothing, the color of clothing, and the belongings.
  • the characteristic points regarding the right foot p5 and the left foot p6 are, for example, body shape, clothing shape (or type), clothing color, shoes, and the like.
  • the number of parts at the time of making into parts is not limited to six.
  • the feature points listed here are merely examples, and various elements may be extracted as feature points.
  • the hairstyle and hair color, and the clothing shape and clothing color are extracted as independent feature points, but the “hair color” and “clothing color” It may be treated as additional data of “hairstyle” and “clothing shape”.
  • the extracted feature points for each part are output as feature data and sent to the information transmission system receiver 13.
  • the variation of the extracted feature point (feature data) corresponds to the variation of the component (partial image) of each part in the avatar of the person generated by the information transmission system receiving unit 13 as described later.
  • a partial image of “long hair” is used as the head hair of the avatar.
  • a thick body is used as a partial image of the body of an avatar.
  • the feature point extraction unit 122 may further extract information (person attribute information) that specifies the person to some extent, such as the age and sex of the person.
  • the feature point extraction unit 122 determines the age and sex of the person based on the feature amount extracted from the image of the part that has been partized (step S ⁇ b> 23). For example, if the head p1 can be captured, it is possible to discriminate age and sex using face recognition technology.
  • the age may be output as age data in increments of 1 year, or may be output as data representing the age zone, for example, “20 years old”.
  • gender and age are exemplified as the person attribute information, but any information other than this can be used as information for specifying a person to some extent. For example, it may be possible to discriminate between “adult” and “child”.
  • a person database in which images such as faces and personal information (names, etc.) are registered in advance is used instead of uniquely identifying a person but by gender and age. If it can be used, it is possible to uniquely identify an individual by collating the image of the head p1 with the face image registered in the person database as necessary (step S24).
  • the information receiving unit 131 receives information from the network 15 and decodes it.
  • the decoded information includes information (feature information and coordinate information) obtained from the video signals of a plurality of cameras 11 (cameras 11a, 11b,). Stored and accumulated in the statistical processing unit 135.
  • FIG. 22 shows an example of accumulated information.
  • a detection ID is assigned to a person who is determined to be the same from the degree of coincidence of the feature information described above, the time and date of reception, and the position (x coordinate) And y coordinate), how to walk (gait), physique, height, hair color, upper body clothing color, lower body clothing color, facial feature information, gender, age, etc., sequentially stored in association with each other It has been done.
  • the date and ID part are abbreviated to # 1, # 2, etc., but information such as the type (form) of the upper and lower body clothes, the presence or absence of a hat, and belongings are also associated.
  • the gait data includes a stride WL, an arm swing angle ⁇ , an upper body angle ⁇ , a knee bending angle ⁇ , a one-step cycle ⁇ , and the like.
  • the dynamic generation unit 132 generates an avatar image of each person based on the received feature information. That is, as described above, the feature information includes feature data representing the feature of each part of the person.
  • the dynamic generation unit 132 is a database (in the information accumulation / statistical processing unit 135 in FIG. 1) that stores in advance a partial image of an avatar corresponding to each of the feature data. It may be a storage device or a server).
  • FIG. 23 conceptually shows a construction example of the database.
  • the database includes avatar fragment graphic data, which is an avatar constituent element, including upper and lower body clothes, hairstyles, belongings, and the like when the height and body shape are set as standard.
  • Each avatar fragment graphic data for each avatar component is configured in different representations according to the moving direction of the person in the real space so that the appearance of the person from the viewpoint from the camera (direction with respect to the camera) is reflected. Yes.
  • the direction of the person P with respect to the camera 11 is determined in eight directions (J1 to J8), and the avatar fragments divided in a form corresponding to the human body parts p1 to p6 described in FIG.
  • Graphic data (p2 to p4 for upper body clothes, p5 and p6 for lower body clothes) is prepared for each of the eight directions (v1 to v8: arguments correspond to J1 to J8).
  • shoes, hair, and belongings are not divided, but they are also prepared in eight ways.
  • the left of FIG. 25 shows an example of selection of avatar fragment graphic data when the direction v7 in FIG. 24 is designated, and the right shows an example of selection of avatar fragment graphic data when the direction v1 is designated.
  • FIG. 26 shows avatar images AV7 and AV1 obtained by combining them.
  • contours and human phases that reflect the extracted facial feature information are synthesized for each direction, but the standard face (or head) of the face depends on gender and age.
  • An image may be prepared.
  • the avatar image data (or avatar fragment graphic data) is configured as avatar animation data including a set of frame data obtained by subdividing the object walking motion.
  • one frame for two steps of a plurality of frames (four frames AFM1 to AFM4 in this case) until the landing with the right foot and four frames (AFM5 to 8) until the landing with the left foot is performed.
  • AFM1 to AFM4 for at least the lower body clothes and the upper body clothes, data of these eight frames is prepared for each type of avatar fragment graphic data.
  • the image data of each avatar fragment is configured as two-dimensional vector graphic data as shown in FIG.
  • the vector graphic data is obtained by circularly concatenating vertex coordinates that specify a graphic outline with a vector.
  • the vertex coordinates are also moved according to a matrix operation representing the primary transformation, and the point after the movement is moved.
  • FIG. 29 is a flowchart showing the flow of processing on the receiving side.
  • the person ID, operation information (coordinate points), and feature information sent via the network are received (S601).
  • the received coordinate information P is plotted on real space coordinates shared by a plurality of cameras (S602).
  • the person's walking direction vector is calculated from the position change of the person's coordinate P in the preceding and following frames, and the avatar is selected from the eight directions J1 to J8 in FIG. It determines as an image arrangement direction (S603).
  • an avatar image may have already been created and an avatar image may have been created. (S604). If there is no avatar creation history in S605, a search is made as to whether or not there is a person on the database whose time, position, and feature match under a predetermined condition (S606). If there is no corresponding person in S607, a process for creating a new avatar image is performed (S610).
  • step S601 the hairstyle, clothes, belongings and their colors included in the feature data are specified.
  • step S6102 among the avatar fragment graphics corresponding to the identified feature, the one corresponding to the determined avatar image arrangement direction (any of J1 to J8) (any of v1 to v8 in FIG. 23). read out.
  • step S6103 the avatar fragment graphic is corrected based on the height / body shape / gait information included in the feature data, and the avatar fragment graphic is colored in the color designated in step S6104.
  • the avatar image data is completed by synthesizing each avatar fragment in S6105.
  • the process proceeds to S609, where the avatar image data of the corresponding ID is read from the database and reused.
  • the received person ID is updated with the ID of the corresponding person, and the process proceeds to S609 and the same processing is performed.
  • the avatar image of each person generated by the dynamic generation unit 132 is sent to the image composition unit 133 together with the coordinate information of each person, and the avatar / background composition process is performed (S611).
  • the image composition unit 133 can access a database (in the information accumulation / statistical processing unit 135) in which background images of the photographing ranges of the respective cameras 11 are stored in advance.
  • the image composition unit 133 acquires the background image of each camera 11 from the database, and composes it with the avatar image generated by the dynamic generation unit 132.
  • the composite image is output to the monitor 14.
  • the position where the avatar image is arranged is based on the coordinate information of the person of the avatar.
  • the image composition unit 133 changes the avatar direction or adjusts the speed at which the avatar moves based on the motion analysis information (data representing the movement and direction of the person) obtained by the feature point extraction unit 122. can do.
  • transmission from the information transmission system transmission unit 12 is performed at a frame rate as high as possible within a range that can be processed by the feature point extraction unit 122, the coordinate information addition unit 121, the dynamic generation unit 132, and the image synthesis unit 133.
  • FIG. 31 shows the flow of the avatar background / compositing process.
  • avatar image data corresponding to the specified ID and direction is read.
  • This avatar image data is a set of frame data constituting an animation (FIG. 27), and avatar animation frames are allocated to frames for moving image reproduction in accordance with the speed and stride of the moving coordinate point P (S61102).
  • this is a composite video display mode, in this embodiment, a display mode (camera actual video mode) in the same field of view as the shooting screen of the camera 11 or an integrated display mode of a plurality of cameras can be selected. Yes.
  • This mode selection can be switched by an input unit (configured by a keyboard or a touch panel) connected to the information transmission system reception unit 13 in FIG.
  • the process proceeds to S61104, and the position coordinates P (x, y, 0) of all avatars to be displayed simultaneously are plotted in the real space visual field region of the corresponding camera.
  • the real space visual field area is projected and converted to the corresponding coordinate system of the camera together with the plotted position coordinates P.
  • the camera two-dimensional coordinate system used for determining the position coordinate P is temporarily corrected from the left state in FIG. 15 to the right state in consideration of lens distortion.
  • the entire field of view fits on the output screen in the coordinate system before correction, but after correction, the region at the end of the field of view extends beyond the screen of the monitor (FIG. 1: reference numeral 14).
  • the image changes by the amount of distortion correction, creating a sense of incongruity and displaying the avatar image of the person reflected at the edge of the field of view. It may disappear. Therefore, in S61106 of FIG. 31, reverse distortion correction for restoring the influence of the original lens distortion is performed on the projective transformation image, and the shape of the field of view is restored. As a result, the above problem is solved.
  • step S61107 the selected background image is superimposed on the output plane returned to the camera two-dimensional coordinate system through projective transformation and reverse distortion correction together with the mapped human coordinate position P (S61107)
  • the avatar image data adjusted in size and orientation as described above is pasted and synthesized at each position p (according to the camera two-dimensional coordinate system).
  • the screen of the monitor 14 in FIG. 1 may be divided to display the video signals of the plurality of cameras 11 at the same time, or any one of the plurality of cameras 11 by switching the screen of the monitor 14. Only the video signal may be displayed.
  • the integrated display mode is selected in S61104, the process proceeds to S1000 to display in the integrated mode.
  • step S1001 the positions P (x, y, 0) and directions of all avatars to be displayed simultaneously are plotted in a real space shared by a plurality of cameras.
  • step S1002 flow line trajectory data is created by superimposing the person position coordinates P in the previous and next frames.
  • an avatar image for plane view may be prepared separately, or the avatar may be displayed in the horizontal direction so that the feature information can be easily grasped.
  • the flow line display is designated, the flow line image ML of the corresponding avatar image AV is displayed based on the flow line locus data described above.
  • the process proceeds to S1006, where the real space position, direction, and flow line data of the avatar are projectively transformed according to the bird's-eye view angle and direction, and the background image in the case of bird's-eye view is superimposed in S100. .
  • a captured image for overhead view may be prepared and used, or three-dimensional background image data (for example, three-dimensional computer graphics (CG) data) may be prepared and converted to an overhead view by projective transformation.
  • three-dimensional background image data for example, three-dimensional computer graphics (CG) data
  • CG computer graphics
  • the avatar image corresponding to the direction of the avatar after the projective transformation is read, and the pasted avatar image AVS is pasted on the overhead view background image PBPS as shown in FIG.
  • the flow line image MLS of the corresponding avatar image AVS is displayed based on the above-mentioned flow line locus data.
  • the avatar image data may be 3D avatar image data, and the avatar image may be displayed as a 3D CG image as shown in FIG.
  • the image composition unit 133 (FIG. 1) generates two-dimensional avatar image data by projectively transforming the three-dimensional avatar object in the real space whose arrangement direction is determined into the two-dimensional coordinate system of the background image.
  • the avatar image based on the two-dimensional avatar image data is combined with the background image.
  • the image of the person is not displayed as it is on the monitor 14 but is displayed in an anthropomorphic (avatarized) state, privacy can be used when shooting an unspecified number of persons such as a security camera on the street.
  • anthropomorphic avatarized
  • the person in the captured image shown in FIG. 5 is displayed on the monitor 14 as an avatar as shown in FIG.
  • each avatar is designed to represent the characteristics of each person based on the feature information extracted from the video, so it is possible to grasp what person is in the shooting range.
  • the feature information and coordinate information of the person is acquired from the video signal of the camera 11, and further, the motion analysis information indicating the motion and direction of the person and the person attribute information such as age and sex are obtained.
  • the example to acquire was demonstrated.
  • Various applications can be considered using such information.
  • the above information may be processed and processed by the image composition unit 133 and a plurality of screens may be displayed on the monitor 14.
  • an actual video space screen 81, a feature amount reproduction screen 82, a statistical space screen 83, a flow line analysis space screen 84, and a personal identification space screen 85 are displayed side by side on the monitor 14.
  • the real video space screen 81 is a screen that displays video signals from the plurality of cameras 11 in a state where a person is replaced with an avatar.
  • the actual video space screen 81 is divided into four, and the video signals from the four cameras 11 are displayed simultaneously, but the number of cameras is not limited to this.
  • the feature amount reproduction screen 82 is a screen for displaying videos from a plurality of cameras 11, in which a person is replaced with an avatar and a background image is also displayed in a graphic display.
  • the feature amount reproduction screen 82 is generated by three-dimensionally integrating the images from the plurality of cameras 11. That is, the feature amount reproduction screen 82 is configured as a bird's-eye view image by combining videos taken by a plurality of cameras installed at a plurality of locations.
  • the feature amount reproduction screen 82 illustrated in FIG. 7 is a screen representing the state of the station premises (the vicinity of the platform and the ticket gate) and the surrounding stores.
  • video signals respectively obtained from an installation camera at a station platform, an installation camera around a ticket gate, and a camera installed at each of a plurality of stores are used. Although it is impossible to shoot all of these areas with a single camera, such a bird's-eye view image can be obtained by three-dimensionally combining images taken with multiple cameras installed at multiple locations.
  • a simple screen can be configured.
  • the motion analysis information extracted from the video signal of the camera includes information about the direction of the person and the direction in which the person is moving. By using this information and arranging the avatars so as to match the direction of the actual person, there is an advantage that the movement of the crowd can be easily grasped on the feature amount reproduction screen 82.
  • the statistical space screen 83 is a screen that displays various statistical results. For example, the transition of the number of people within the shooting range of a certain camera can be represented by a graph.
  • the analysis space screen 84 pays attention to a certain person (avatar), and displays how the person has moved in the shooting range of the camera by a flow line. This is possible by acquiring the coordinate information of a certain person (avatar) in time series.
  • the personal identification space screen 85 displays the person attribute information of the person in the shooting range. In the example of FIG. 7, the face part of each person's avatar image, gender, and age are displayed.
  • the real image space screen 81, the feature amount reproduction screen 82, the statistical space screen 83, and the flow line analysis space 84 preferably have a GUI (graphical user interface) function.
  • the person attribute of the person represented by the avatar is displayed.
  • Information is highlighted on the personal identification space screen 85.
  • “male, 35 years old” which is the personal attribute information of the avatar 82 a is highlighted on the personal identification space screen 85.
  • an avatar image corresponding to the selected person attribute information is highlighted on the feature amount reproduction screen 82.
  • the moving path of the avatar may be displayed on the flow line analysis space screen 84.
  • FIG. 8 is a schematic diagram illustrating an expression example of an avatar when there is no continuity of the transmitting camera.
  • the received avatar is colored when the feature amount captured by the camera A can be confirmed by the camera B at the destination,
  • the received avatar is not colored and the avatar (default avatar) is left as it is and the camera is used in each case while distinguishing between the two.
  • FIG. 9 is a block diagram illustrating a schematic configuration of the information transmission system 2. As shown in FIG.
  • the information transmission system 2 is different from the information transmission system 1 (first embodiment) including a plurality of cameras 11a, 11b,. .
  • the information transmission system 2 includes only one set of the coordinate information addition unit 121 and the feature point extraction unit 122 and does not include the multiple camera cooperation unit 123.
  • the operations of the coordinate information adding unit 121, the feature point extracting unit 122, and other processing units are the same as those in the first embodiment.
  • a system that extracts and transmits information only from the video signal of one camera 11 is also included in one embodiment of the present invention.
  • each of the information transmission system transmission unit 12 and the information transmission system reception unit 13 can be realized as an independent device (camera controller), a computer, or a server.
  • Each unit such as the coordinate information adding unit 121 shown in the block diagram can be realized by the processor executing the program recorded in the memory in these devices.
  • the information transmission system transmission unit 22 according to the second embodiment can be realized as an apparatus integrated with the camera 11.
  • the present invention can also be implemented as a program executed by a general-purpose computer or server, or a medium recording the program, in addition to the embodiment in which the present invention is implemented as hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Provided is an information transmitting system that enables the effective use of a network communication band when transmitting camera video signals. The information transmitting system 1 comprises: a feature point extracting unit 122 for extracting feature points from a subject in a video taken with at least one camera 11, and outputting said feature points as feature information; a coordinate information adding unit 121 for acquiring coordinate information for a subject within the imaging range of the camera; an information transmitting unit 124 for transmitting, to a network, the feature information and the coordinate information; an information receiving unit 131 for receiving the feature information and the coordinate information from the network; a movement generating unit 132 for generating an avatar image of a subject on the basis of the feature information; and an image compositing unit 133 for generating a composite image by combining, with an image showing the background in the imaging range of the camera, the avatar image on the basis of the coordinate information.

Description

情報伝送システム、情報送信装置、情報受信装置、およびコンピュータプログラムInformation transmission system, information transmission device, information reception device, and computer program
 本発明は、カメラで取得された映像情報を伝送するシステムに関する。 The present invention relates to a system for transmitting video information acquired by a camera.
 近年、防犯等の目的で、ショッピングセンターや街頭等の様々な場所に監視カメラが設置されている。これらのカメラで撮影された映像は、ネットワークを介して遠隔地の集中監視センター等にリアルタイムで送られ、モニタに表示される。
 従来、このようなカメラの映像を伝送する際には、ネットワークの通信帯域を有効に利用するために、図10に示すように、撮影された映像信号を、映像圧縮装置により所定の規格(例えば、H.264またはMPEG2等)にしたがって圧縮してから伝送する。そして、伝送先において、映像復号装置により、受信した映像信号を当該規格にしたがって復号する。このような監視カメラの映像信号の圧縮と復号を利用した発明は、例えば下記の特許文献1等に開示されている。また、特許文献2には、画像中の人物と背景とを画像認識装置で個別のオブジェクトに分解し、変換画像選択部でオブジェクト毎に他の画像へ変換して送信できるようにする技術が開示されている。
In recent years, surveillance cameras have been installed in various places such as shopping centers and streets for the purpose of crime prevention. Images taken by these cameras are sent in real time to a remote centralized monitoring center or the like via a network and displayed on a monitor.
Conventionally, when transmitting the video of such a camera, in order to effectively use the communication band of the network, as shown in FIG. 10, a captured video signal is converted into a predetermined standard (for example, by a video compression device). , H.264 or MPEG2, etc.) before being transmitted. Then, at the transmission destination, the received video signal is decoded according to the standard by the video decoding device. The invention using such compression and decoding of the video signal of the surveillance camera is disclosed, for example, in Patent Document 1 below. Japanese Patent Application Laid-Open No. 2004-228561 discloses a technique that allows a person and background in an image to be decomposed into individual objects by an image recognition device and converted into another image for each object by a converted image selection unit and transmitted. Has been.
特開2007−60022号公報Japanese Patent Laid-Open No. 2007-60022
 しかしながら、上述のような画像圧縮技術を利用したとしても、データの圧縮率には限界があるので、例えば複数台の監視カメラからの映像をリアルタイムに伝送する場合には、依然として幅広い帯域が必要になるという課題がある。
 本発明は、このような課題を鑑み、カメラからの映像を伝送する際に、ネットワークの通信帯域の有効利用を可能とする技術を提供する。
However, even if the above image compression technology is used, there is a limit to the data compression rate. For example, when transmitting images from a plurality of surveillance cameras in real time, a wide bandwidth is still necessary. There is a problem of becoming.
In view of such problems, the present invention provides a technology that enables effective use of a network communication band when transmitting video from a camera.
 上記の目的を達成するために、本発明にかかる情報伝送システムは、
 少なくとも1台のカメラで写した映像内の被写体から特徴点を抽出して特徴情報として出力する特徴点抽出部と、
 カメラの撮影範囲における被写体の座標情報を取得する座標情報付加部と、
 特徴情報および前記座標情報をネットワークへ送出する情報送信部と、
 ネットワークから特徴情報および座標情報を受け取る情報受信部と、
 特徴情報に基づいて被写体のアバター画像を生成する動態生成部と、
 カメラの撮影範囲の背景を表す画像に、座標情報に基づいてアバター画像を合成することにより、合成画像を生成する画像合成部と、を備えたことを特徴とする。
 この構成によれば、被写体から抽出した特徴点の特徴情報と、カメラの撮影範囲における被写体の座標情報とがネットワークへ送出される。そして、この特徴情報に基づいて被写体のアバター画像が生成され、前記座標情報に基づいて、カメラの撮影範囲の背景を表す画像に、当該アバター画像が合成される。これにより、カメラの映像信号を送ることなく、受信側で、背景画像とアバター画像とによって合成画像が生成されるので、カメラの映像信号を送る場合に比較して、ネットワークの通信帯域を有効に利用することが可能となる。また、被写体の実映像の代わりにアバター画像が用いられるので、不特定多数の人物を撮影する場面においても、プライバシーを侵害するおそれがないという利点がある。
 上記本発明の情報伝送システムにおいて座標情報付加部は、撮影画面上に現れる被写体をなす人物の接地点として識別される位置を撮影接地位置とし、当該撮影設置位置に現れる人物画像領域の撮影画面上の高さを人物撮影高さとして、座標情報付加部は、被写体が人物である場合の実空間における歩行面を高さ方向の基準として、カメラの撮影画面上に設定されるカメラ二次元座標系における平面座標点と実空間三次元座標系における歩行面上の空間座標点との変換関係と、カメラ二次元座標系における撮影接地位置毎の人物の撮影高さと実空間座標系での当該人物の実高さとの変換関係とを含む位置・高さ変換関係情報を取得する位置・高さ変換関係情報取得手段と、撮影画面上にて人物画像の撮影接地位置及び撮影高さを特定する撮影接地位置・高さ特定手段と、特定されたそれら撮影接地位置座標及び撮影高さの情報を、位置・高さ変換関係情報に基づいて実空間における人物の接地位置座標である実接地位置座標情報と実空間における人物の高さを情報である実人物高さ情報とに変換・生成する実人物座標・高さ情報生成手段とを備えたものとすることができる。また、動態生成部は、生成された実人物高さ情報に基づいてアバター画像の高さ寸法を決定するアバター高さ決定手段を備え、画像合成部は、実接地位置座標情報に基づいて背景画像へのアバター画像の合成位置を決定するアバター合成位置決定手段を備えたものとすることができる。
 三次元空間内の立体をカメラ映像から特定したい場合、映像は二次元データであるから、一般の立体の空間的な状態を1台のカメラ映像で特定することは原理的にできない。しかし、この発明が対象とする被写体はカメラ撮影される平面上のエリア内を動き回る人物であり、上記構成によれば、その空間幾何学的な移動特性を考慮することで、単独のカメラ映像上の人物画像領域の情報から実空間内の人物位置と高さとを容易に特定可能である。すなわち、撮影対象となるエリアの人物の空間的な存在範囲は、床面や地面など、高さ方向(仮に実空間直交座標系のz軸方向とする)位置が一定の水平面(同様にx−y平面である)にほぼ限られており、その接地点(足元位置)のz座標は常に一定(例えば0)とみなしえる。つまり、エリア内を歩行する人物の接地点の座標は実質的にx−yの二次元系で記述でき、カメラ二次元座標系とも一義的な対応付けが可能となる。カメラ二次元座標系は実空間三次元座標系が射影変換されたものに相当し、カメラから隔たった被写体ほど寸法が縮小されて投影される。この変換は数学的には行列で記述されるが、床面上ないし地面上の実空間座標系での予め知れた種々の位置に高さが既知の基準体を配置してカメラで撮影すれば、その基準体画像の撮影画面上での位置と高さを、実空間上の位置及び実寸と比較することにより、カメラ画面上の人物の位置と高さを実空間上の位置と高さに変換する情報である位置・高さ変換関係情報を得ることができる。これを用いることにより動態生成部は、背景画像上に合成するべきアバター画像の高さを容易に決定でき、画像合成部は、背景画像へのアバター画像の合成位置を合理的かつ容易に決定することができる。
 次に、本発明の情報伝送システムは、特徴点抽出部が被写体の動きまたは向きを解析して動作解析情報として出力し、情報送信部は人物属性情報をネットワークへ送出し、画像合成部が動作解析情報に基づいて前記アバター画像の動きまたは向きを調整するものとして構成できる。これによれば、撮影された時点の被写体の動きや向きに基づいてアバター画像の動きや向きが調整されるので、例えば被写体の動く速度や動く方向を、アバター画像に反映させることができる。
 この場合、カメラは動画撮影可能なものであり、座標情報付加部は撮影された動画のフレーム別に被写体をなす人物の座標情報を取得するものであり、特徴点抽出部は、人物の座標情報のフレーム間の移動軌跡情報を動作解析情報として出力するものとして構成しておくとよい。現在のフレームに対し、これに先行するフレームの移動軌跡情報を解析すれば、現在のフレームに至る人物画像の動きを把握することが特に容易となる。
 例えば、人物は、横歩きや後ずさりなどのイレギュラーな動きをしない限り、顔や胴体が前を向くように歩行動作するのが通常なので、人物画像の代表点(例えば接地点)の移動軌跡が判明していれば、歩行動作に応じた体の向きを逐次把握することができる。そこで、画像合成部は上記の移動軌跡情報に基づいて、背景画像上に合成するアバター画像の向きを調整するように構成できる。
 この場合、動態生成部は、カメラからの視点による当該人物の見え方が反映されるように、実空間における人物の移動方向に応じて異なる表現形態のアバター画像を生成ように構成できる。カメラ視点に対し人物の歩行方向が変化する場合、その歩行方向によるカメラへの映り方(角度)に応じてアバター画像を変化させることで、アバター画像の表現のリアリティーを増すことができる。
 例えば、動態生成部は、実空間における人物の予め定められた複数の移動方向別に表現形態が互いに異なる複数の二次元アバター画像データを記憶する方向別二次元アバター画像データ記憶手段を備え、先行するフレームについて取得されている移動軌跡情報に基づいて人物の移動方向を推定するとともに、方向別の二次元アバター画像データから、推定された移動方向に適合するものを選択するものであり、画像合成部は、選択された二次元アバター画像データに基づくアバター画像を背景画像と合成するものとして構成できる。アバター化する人物の移動方向を、上記のように決められた複数の方向から選択するようにしておき、かつアバター画像データを二次元描画データとして構成しておくことで、用意するアバター画像データの容量を大幅に削減することができる。
 一方、動態生成部は、アバター画像のデータを三次元アバター画像データとして記憶する三次元アバター画像データ記憶手段を備え、該三次元アバター画像データに基づいて三次元アバターオブジェクトを生成するとともに、先行するフレームについて取得されている移動軌跡情報に基づいて人物の移動方向を推定するとともに、推定された移動方向を向くように該三次元アバターオブジェクトの実空間上への配置方向を決定するものであり、画像合成部は、配置方向が決定された実空間上の三次元アバターオブジェクトを背景画像の二次元座標系に射影変換することにより二次元アバター画像データを生成し、該二次元アバター画像データに基づくアバター画像を背景画像と合成するように構成することもできる。この場合はアバター画像データが三次元化されることでデータ容量は増すが、アバター画像の背景画像への貼り込み方向は無段階化でき、一層リアリティーのある表現が可能となる。
 また、画像合成部は、移動軌跡情報に基づいて人物の動線を表す画像を生成することも可能である。この構成によれば、特定の被写体が背景画像上でどのように動いたかを視覚的に把握することが容易となる。例えば、防犯目的等に有効活用することができほか、展示会場や公共施設等において個々の人物が関心を集める場所がどこにあるかを、動線画像の統計傾向分析により明確にできるなど、種々の利点を享受できる。
 次に、本発明の情報伝送システムは、特徴点抽出部が被写体の人物属性を解析して人物属性情報として出力し、情報送信部が人物属性情報をネットワークへ送出するものとして構成できる。この構成によれば、受信側で、この人物属性情報を用いて様々な分析・統計処理等を行うことができる。
 この場合、動態生成部はアバター画像を、人物属性情報を反映したものとして生成ように構成できる。これにより、対応する人物の属性をアバター画像に変換された後も容易に把握することができる。これは、例えば防犯目的の使用を考える場合、肖像権などのプライバシー侵害を回避しつつ被疑者の特定に貢献することにもつながるし、防犯を目的としないビューイング等においても、撮影エリアを行きかう人々の属性をアバター画像によりより単純化ないし強調することができ、画像上での傾向把握を容易にできる利点がある。人物属性情報は、具体的には、人物の性別を反映した性別情報と人物の年齢を反映した年齢情報とを含むものとして構成できるが、これらに限定されるものではなく、例えば顔の風貌などから明確に把握可能なものに限られるものの、国籍(例えば、日本人か、欧米人か)なども属性の一つとしてとらえることができる。
 また、特徴点抽出部は、被写体の人物の外観を解析して外観特徴情報として出力するものとすることもでき、情報送信部は、外観特徴情報をネットワークへ送出するように構成できる。被写体の外観は、人物属性に次いで個々の人物の特定につながる重要な情報であり、分析・統計処理においては有益である。そして、動態生成部はアバター画像を、外観特徴情報を反映したものとして生成するように構成することで、アバター画像に変換された後の、対応する人物の特徴把握を一層踏み込んで行うことができる。
 人物の外観の特徴を最も反映する要素として髪、着衣、持ち物などを例示できる。この場合、外観特徴情報は人物の頭髪の形態及び色彩の一方又は双方を反映した頭髪情報と、人物の着衣の形態及び色彩の一方又は双方を反映した着衣情報と、人物の持ち物の形態及び色彩の一方又は双方を反映した持ち物情報の1以上のものを含むものとして構成できる。これらは、性別や年齢層などの属性の把握補助に貢献し、例えば顔の特徴だけでは年齢等の把握が困難な場合に、これらの特徴情報を合わせて考慮することでより正確な属性把握が可能になる。例えば、着衣や持ち物は年齢層別の流行なども反映するから、10台後半と20台半ばなど、世代の接近した人物の属性を明確化する上で有用である。
 また、防犯等を目的とする場合は、人物の体形(肥満、小太り、やせ形、中肉中背、足の長短など)も有用な情報である。この場合、外観特徴情報は人物の体形を反映した体形情報を含むものとして構成できる。
 さらに、近年は、歩容(歩き方の特徴)も人物を特定する情報として有用である。この場合、外観特徴情報は人物の歩容を反映した歩容情報を含むものとして構成できる。歩容を特定する情報は、例えば歩幅(あるいは、歩行速度と連動した動きの周波数)、腕の振り角、歩行速度、歩行時の上半身角度や上下方向の揺れなどであり、その1種又は2種以上を組み合わせて使用できる。
 動態生成部は、人物歩行動作を細分化したコマデータからなるアバターアニメーションデータを使用するものとして構成でき、背景画像上でアバター画像を歩行動作するアニメーションとしてリアルに表現できる。この場合、動態生成部にて、コマデータの各コマを歩容情報に基づいて補正する画像補正処理を行い、画像合成部はアバター画像を、補正後のコマデータに基づき歩容特徴を反映させたアニメーション形態で背景画像に合成するものとして構成できる。アバターアニメーションデータの各コマの補正処理により、対応する人物の歩容情報反映したアバター画像の動きを容易に実現できる。
 また、撮影監視したいエリアが広い場合、1台のカメラでは視野が届かない、あるいは遠方で画像が小さくなり特徴把握できない、という場合が生じる。この場合は、撮影範囲は実空間座標を共有する形で複数のカメラによりカバーすることができる。各カメラは共通のエリアに対し異なるカメラ座標系で撮影を行うが、共同監視するエリアの実空間を同一座標系にて張っておくと、のちに各カメラの撮影情報を統合したい場合に、前述した手法により人物の座標を、その共通の実空間(例えば、GPSなどで取得できるグローバル座標系など)上に変換する処理を行うだけで直ちに統合処理も完了する利点がある。このとき、異なるカメラの視野感を同一人物が移動する場合、その画像の人物の同一性の判定をカメラ間で受け渡す必要が生じるが、この場合、上記の属性情報や外観特徴情報の一致度に応じて人物の同一性を判定するように構成すれば、特定の人物の追跡や、同一人物に同一アバター画像を使用する、といった判断にも容易に利用できる。
 また、画像合成部は、複数のカメラの撮影範囲を含む俯瞰画像として合成画像を生成するように構成できる。これによれば、複数のカメラの撮影範囲の全体を一目で把握することができる。
 特に、複数カメラでないとカバーできないエリアについて、上記のような俯瞰画像を得るためには、座標情報付加部を次のように構成するとよい。すなわち、複数のカメラの撮影画面上に現れる人物の接地点として識別される位置を撮影接地位置とし、当該撮影設置位置に現れる人物画像領域の撮影画面上の高さを人物撮影高さとして、座標情報付加部は、被写体が人物である場合の実空間における歩行面を高さ方向の基準として、カメラの撮影画面上に設定されるカメラ二次元座標系における平面座標と、実空間三次元座標系における歩行面上の空間座標点との変換関係と、カメラ二次元座標系における撮影接地位置毎の人物の撮影高さと実空間座標系での当該人物の実高さとの変換関係とを含む位置・高さ変換関係情報を取得する位置・高さ変換関係情報取得手段と、撮影画面上にて人物画像の撮影接地位置及び撮影高さを特定する撮影接地位置・高さ特定手段と、特定されたそれら撮影接地位置座標及び撮影高さの情報を、位置・高さ変換関係情報に基づいて実空間における人物の接地位置座標である実接地位置座標情報と実空間における人物の高さを情報である実人物高さ情報とに変換・生成する実人物座標・高さ情報生成手段とを備えたものとして、座標情報付加部を構成する。動態生成部は、生成された実人物高さ情報に基づいてアバター画像の高さ寸法を決定するアバター高さ決定手段を備え、画像合成部は、実空間座標系における複数のカメラが撮影した人物の実接地位置座標情報を俯瞰画像の視点にて座標変換しつつ該俯瞰画像へのアバター画像の合成位置を決定するアバター合成位置決定手段を備えるものとして構成する。
 これは、位置・高さ変換関係情報をカメラ側に付随させることで、撮影画面上の人物領域の位置と高さ情報を実空間座標系に変換する前述の構成を応用したものである。人物の画像情報を一旦実空間上の位置・寸法情報に変換してしまえば、俯瞰視点の背景画像にアバター画像を合成したい場合も、その俯瞰背景画像と実空間との変換関係を予め用意しておくことで、俯瞰視点の背景画像上へもアバター画像の合成を容易に行うことができる。
 次に、本発明においては、特徴点抽出部を、被写体の画像を人体の部位に相当する複数のパーツに分割し、各パーツから特徴点を抽出する。この構成によれば、各部位の特徴点を有効的に検出することができる。この場合、動態生成部は、アバター画像のデータを複数の前記パーツに対応したアバター断片に分割して記憶するアバター画像データ記憶手段を備え、人物の対応するパーツについて抽出された特徴点の情報に基づいてアバター画像のアバター断片を補正した後、その補正後のアバター断片を統合してアバター画像を生成ように構成できる。このようにすると、アバター断片(すなわち、人物の部位)ごとに特徴点を反映した補正をきめ細かく行うことができ、かつ、アバター全体の画像データを特徴別に多数用意する必要がなくなるので、データ容量の削減を図ることができるようになる。
 次に、本発明の情報送信装置は、
 少なくとも1台のカメラで写した映像内の被写体から特徴点を抽出して特徴情報として出力する特徴点抽出部と、
 カメラの撮影範囲における被写体の座標情報を取得する座標情報付加部と、
 特徴情報および座標情報をネットワークへ送出する情報送信部とを備えた情報送信装置であって、
 特徴情報は、送信先で表示される被写体のアバター画像の構成要素に対応付けられており、
 座標情報は、送信先で、前記カメラの撮影範囲の背景を表す画像において、アバター画像を合成する位置を特定するために用いられることを特徴とする。
 また、本発明の情報受信装置は、
 少なくとも1台のカメラで写した映像内の被写体から抽出された特徴点を表す特徴情報と、カメラの撮影範囲における被写体の座標情報とを、ネットワークを介して受け取る情報受信部と、
 特徴情報に基づいて被写体のアバター画像を生成する動態生成部と、
 カメラの撮影範囲の背景を表す画像に、座標情報に基づいてアバター画像を合成することにより、合成画像を生成する画像合成部とを備えていることを特徴とする。
 本発明の情報送信側に適用されるコンピュータプログラムは、
 少なくとも1台のカメラで写した映像内の被写体から特徴点を抽出して特徴情報として出力する特徴点抽出処理と、
 カメラの撮影範囲における被写体の座標情報を取得する座標情報付加処理と、
 特徴情報および座標情報をネットワークへ送出する情報送信処理とをコンピュータに実行させるコンピュータプログラムであって、
 特徴情報は、送信先で表示される被写体のアバター画像の構成要素に対応付けられており、
 座標情報は、送信先で、前記カメラの撮影範囲の背景を表す画像において、アバター画像を合成する位置を特定するために用いられることを特徴とする。
 また本発明の情報受信側に適用されるコンピュータプログラムは、
 少なくとも1台のカメラで写した映像内の被写体から抽出された特徴点を表す特徴情報と、カメラの撮影範囲における被写体の座標情報とを、ネットワークを介して受け取る受信処理と、
 特徴情報に基づいて被写体のアバター画像を生成する動態生成処理と、
 カメラの撮影範囲の背景を表す画像に、座標情報に基づいて前記アバター画像を合成することにより、合成画像を生成する画像合成処理とをコンピュータに実行させるものである。
In order to achieve the above object, an information transmission system according to the present invention includes:
A feature point extraction unit that extracts feature points from a subject in an image captured by at least one camera and outputs the feature points as feature information;
A coordinate information adding unit for acquiring coordinate information of the subject in the shooting range of the camera;
An information transmission unit for transmitting the feature information and the coordinate information to the network;
An information receiver for receiving feature information and coordinate information from the network;
A dynamic generation unit that generates an avatar image of the subject based on the feature information;
And an image composition unit that generates a composite image by compositing an avatar image based on the coordinate information with an image representing the background of the shooting range of the camera.
According to this configuration, the feature point feature information extracted from the subject and the subject coordinate information in the shooting range of the camera are transmitted to the network. Then, an avatar image of the subject is generated based on the feature information, and the avatar image is combined with an image representing the background of the shooting range of the camera based on the coordinate information. As a result, a composite image is generated from the background image and the avatar image on the receiving side without sending the video signal of the camera. Therefore, the network communication band is made effective compared to the case of sending the video signal of the camera. It can be used. In addition, since an avatar image is used instead of the actual video of the subject, there is an advantage that privacy is not infringed even when a large number of unspecified persons are photographed.
In the information transmission system of the present invention, the coordinate information adding unit sets the position identified as the ground contact point of the person who makes the subject appearing on the photographing screen as the photographing ground position, and displays the person image area appearing at the photographing installation position on the photographing screen. The coordinate information adding unit uses a camera two-dimensional coordinate system set on the camera shooting screen with the walking plane in the real space when the subject is a person as a reference in the height direction. The transformation relationship between the plane coordinate point in the real space and the spatial coordinate point on the walking plane in the real space 3D coordinate system, the shooting height of the person at each shooting ground position in the camera 2D coordinate system, and the Position / height conversion relationship information acquisition means for acquiring position / height conversion relationship information including conversion relationship with actual height, and shooting for specifying the shooting contact position and shooting height of a person image on the shooting screen Actual grounding position coordinate information which is the grounding position coordinates of the person in the real space based on the position / height conversion relation information on the ground position / height specifying means and the information of the identified shooting grounding position coordinates and shooting height And real person coordinate / height information generating means for converting and generating the height of the person in the real space into real person height information as information. The dynamic generation unit includes an avatar height determining unit that determines the height dimension of the avatar image based on the generated real person height information, and the image composition unit includes the background image based on the actual ground position coordinate information. An avatar composition position determining means for determining the composition position of the avatar image to the image can be provided.
When it is desired to specify a three-dimensional space in a three-dimensional space from a camera image, since the image is two-dimensional data, it is not possible in principle to specify a general three-dimensional spatial state with one camera image. However, the subject of the present invention is a person who moves around in the area on the plane to be photographed by the camera, and according to the above configuration, the spatial geometric movement characteristics are taken into consideration, so that a single camera image can be obtained. It is possible to easily specify the position and height of the person in the real space from the information of the person image area. That is, the spatial existence range of the person in the area to be photographed is a horizontal plane (similarly, x− direction) such as a floor surface or the ground where the height direction (assuming to be the z axis direction of the real space orthogonal coordinate system) is constant. It is almost limited to the y plane), and the z coordinate of the contact point (foot position) can always be regarded as constant (for example, 0). That is, the coordinates of the contact point of the person walking in the area can be substantially described in an xy two-dimensional system, and can be uniquely associated with the camera two-dimensional coordinate system. The camera two-dimensional coordinate system corresponds to a real-space three-dimensional coordinate system obtained by projective transformation, and an object separated from the camera is projected with a reduced size. This transformation is mathematically described as a matrix, but if a reference object with a known height is placed at various known positions in the real space coordinate system on the floor or on the ground, the image is taken with a camera. By comparing the position and height of the reference body image on the shooting screen with the position and actual size in the real space, the position and height of the person on the camera screen are changed to the position and height in the real space. Position / height conversion related information that is information to be converted can be obtained. By using this, the dynamic generation unit can easily determine the height of the avatar image to be synthesized on the background image, and the image synthesis unit reasonably and easily determines the synthesis position of the avatar image to the background image. be able to.
Next, in the information transmission system of the present invention, the feature point extraction unit analyzes the motion or orientation of the subject and outputs it as motion analysis information, the information transmission unit sends the person attribute information to the network, and the image composition unit operates. It can be configured to adjust the movement or orientation of the avatar image based on the analysis information. According to this, since the movement and direction of the avatar image are adjusted based on the movement and direction of the subject at the time of shooting, for example, the moving speed and moving direction of the subject can be reflected in the avatar image.
In this case, the camera can capture moving images, the coordinate information adding unit acquires coordinate information of a person who makes a subject for each frame of the captured moving image, and the feature point extracting unit stores the coordinate information of the person. It may be configured to output movement trajectory information between frames as motion analysis information. If the movement trajectory information of the previous frame is analyzed for the current frame, it becomes particularly easy to grasp the movement of the person image that reaches the current frame.
For example, a person usually walks with their face and torso facing forward unless they move irregularly, such as walking sideways or backwards, so the movement trajectory of the representative point of the person image (for example, the ground contact point) is If it is found out, the orientation of the body according to the walking motion can be grasped sequentially. Therefore, the image composition unit can be configured to adjust the orientation of the avatar image to be synthesized on the background image based on the movement trajectory information.
In this case, the dynamic generation unit can be configured to generate different avatar images according to the movement direction of the person in the real space so that the appearance of the person from the viewpoint of the camera is reflected. When the walking direction of the person changes with respect to the camera viewpoint, the reality of the avatar image expression can be increased by changing the avatar image in accordance with the way (angle) the camera is reflected in the walking direction.
For example, the dynamic generation unit includes a direction-specific two-dimensional avatar image data storage unit that stores a plurality of two-dimensional avatar image data having different representation forms according to a plurality of predetermined movement directions of a person in real space. Estimating the moving direction of the person based on the movement trajectory information acquired for the frame, and selecting one that matches the estimated moving direction from the two-dimensional avatar image data for each direction. Can be configured to synthesize an avatar image based on the selected two-dimensional avatar image data with a background image. By selecting the moving direction of the person to be avatar from a plurality of directions determined as described above and configuring the avatar image data as two-dimensional drawing data, the prepared avatar image data The capacity can be greatly reduced.
On the other hand, the dynamic generation unit includes a three-dimensional avatar image data storage unit that stores the data of the avatar image as the three-dimensional avatar image data, and generates a three-dimensional avatar object based on the three-dimensional avatar image data. Estimating the movement direction of the person based on the movement trajectory information acquired for the frame, and determining the arrangement direction of the three-dimensional avatar object in the real space so as to face the estimated movement direction, The image composition unit generates a two-dimensional avatar image data by projecting and transforming a three-dimensional avatar object in the real space whose arrangement direction is determined to a two-dimensional coordinate system of a background image, and based on the two-dimensional avatar image data The avatar image can be configured to be combined with the background image. In this case, the data capacity is increased by making the avatar image data three-dimensional. However, the direction in which the avatar image is pasted on the background image can be made stepless, and more realistic expression can be realized.
The image composition unit can also generate an image representing a person's flow line based on the movement trajectory information. According to this configuration, it is easy to visually grasp how a specific subject has moved on the background image. For example, it can be used effectively for crime prevention purposes, and it can be clarified by statistical trend analysis of flow line images where there are places where individual people are interested in exhibition halls and public facilities. Benefit from the benefits.
Next, the information transmission system of the present invention can be configured such that the feature point extraction unit analyzes the person attribute of the subject and outputs it as person attribute information, and the information transmission unit sends the person attribute information to the network. According to this configuration, various analysis / statistical processes and the like can be performed using the person attribute information on the receiving side.
In this case, the dynamic generation unit can be configured to generate the avatar image as reflecting the person attribute information. Thereby, the attribute of a corresponding person can be easily grasped even after being converted into an avatar image. For example, when considering use for crime prevention purposes, this also contributes to the identification of suspects while avoiding privacy infringements such as portrait rights. The attributes of the people can be simplified or emphasized by the avatar image, and there is an advantage that the tendency on the image can be easily grasped. Specifically, the person attribute information can be configured to include gender information that reflects the gender of the person and age information that reflects the age of the person, but is not limited thereto, for example, the appearance of the face, etc. However, nationality (for example, Japanese or Westerners) can be considered as one of the attributes.
In addition, the feature point extraction unit can analyze the appearance of the subject person and output it as appearance feature information, and the information transmission unit can be configured to send the appearance feature information to the network. The appearance of the subject is important information that leads to the identification of individual persons following the person attributes, and is useful in analysis and statistical processing. And the dynamics generation unit can be configured to generate the avatar image as a reflection of the appearance feature information, so that the features of the corresponding person can be further understood after being converted into the avatar image. .
Examples of elements that most reflect the characteristics of the appearance of a person include hair, clothing, and belongings. In this case, the appearance characteristic information includes hair information that reflects one or both of the form and color of the person's hair, clothing information that reflects one or both of the form and color of the person's clothing, and the form and color of the person's belongings. Can be configured to include one or more items of inventory information reflecting one or both of the items. These contribute to assisting in grasping attributes such as gender and age group.For example, when it is difficult to grasp the age etc. only with facial features, more accurate attribute grasping is possible by considering these feature information together. It becomes possible. For example, since clothes and belongings reflect trends by age group, it is useful for clarifying the attributes of persons with similar generations, such as the latter half of 10 and the middle of 20.
In addition, for the purpose of crime prevention or the like, the body shape of a person (obesity, fatness, skinny shape, middle back of a meat, length of legs, etc.) is also useful information. In this case, the appearance feature information can be configured to include body shape information that reflects the body shape of a person.
Furthermore, in recent years, gaits (features of walking) are also useful as information for specifying a person. In this case, the appearance feature information can be configured to include gait information reflecting a person's gait. The information for specifying the gait is, for example, the stride (or the frequency of movement linked to the walking speed), the swing angle of the arm, the walking speed, the upper body angle at the time of walking, the vertical shaking, etc. Can be used in combination with more than one species.
The dynamic generation unit can be configured to use avatar animation data composed of frame data obtained by subdividing a person walking action, and can realistically represent an avatar image as an animation of a walking action on a background image. In this case, the dynamic generation unit performs image correction processing for correcting each frame of the frame data based on the gait information, and the image composition unit reflects the gait feature on the avatar image based on the corrected frame data. It can be configured to be combined with the background image in the form of animation. The movement of the avatar image reflecting the gait information of the corresponding person can be easily realized by the correction processing of each frame of the avatar animation data.
In addition, when the area to be photographed and monitored is large, there are cases where the field of view cannot be reached with one camera, or the image becomes small at a distant place and the feature cannot be grasped. In this case, the imaging range can be covered by a plurality of cameras in the form of sharing real space coordinates. Each camera shoots a common area with different camera coordinate systems, but if the real space of the area to be jointly monitored is stretched in the same coordinate system, if you want to integrate the shooting information of each camera later, There is an advantage that the integration process can be completed immediately only by performing the process of converting the coordinates of the person into the common real space (for example, a global coordinate system that can be acquired by GPS or the like). At this time, if the same person moves in the sense of field of view of different cameras, it is necessary to pass the determination of the identity of the person of the image between the cameras. In this case, the degree of coincidence of the attribute information and appearance feature information described above If the configuration is such that the identity of a person is determined according to the above, it can be easily used for the tracking of a specific person and the determination of using the same avatar image for the same person.
Further, the image composition unit can be configured to generate a composite image as a bird's-eye view image including shooting ranges of a plurality of cameras. According to this, the whole imaging range of a plurality of cameras can be grasped at a glance.
In particular, in order to obtain an overhead image as described above for an area that can only be covered by a plurality of cameras, the coordinate information adding unit may be configured as follows. That is, the position identified as the ground contact point of the person appearing on the shooting screens of the plurality of cameras is set as the shooting ground position, and the height of the person image area appearing at the shooting setting position on the shooting screen is set as the person shooting height. The information adding unit includes a plane coordinate in a camera two-dimensional coordinate system set on a camera shooting screen and a real space three-dimensional coordinate system with a walking plane in the real space when the subject is a person as a reference in the height direction. Including the conversion relationship between the spatial coordinate points on the walking plane in the camera and the conversion relationship between the shooting height of the person at each shooting ground position in the camera two-dimensional coordinate system and the actual height of the person in the real space coordinate system. A position / height conversion relationship information acquisition means for acquiring height conversion relation information, a shooting contact position / height specifying means for specifying a shooting contact position and a shooting height of a person image on the shooting screen, and Them Based on the position / height conversion relation information, the information on the shadow contact position coordinates and the shooting height is the actual contact position coordinate information that is the contact position coordinates of the person in the real space and the actual height of the person in the real space. The coordinate information adding unit is configured as having real person coordinate / height information generating means for converting / generating person height information. The dynamic generation unit includes avatar height determining means for determining the height dimension of the avatar image based on the generated real person height information, and the image composition unit is a person photographed by a plurality of cameras in the real space coordinate system. The actual ground position coordinate information is converted from the viewpoint of the bird's-eye view image, and the avatar composition position determining means for determining the composition position of the avatar image to the bird's-eye view image is provided.
This is an application of the above-described configuration for converting the position and height information of the person area on the photographing screen into the real space coordinate system by attaching the position / height conversion relation information to the camera side. Once the person's image information is converted into position / dimension information in the real space, even if you want to combine an avatar image with the background image of the overhead view viewpoint, prepare the conversion relationship between the overhead view background image and the real space in advance. By doing so, it is possible to easily synthesize the avatar image on the background image of the overhead view viewpoint.
Next, in the present invention, the feature point extraction unit divides the image of the subject into a plurality of parts corresponding to parts of the human body, and extracts feature points from each part. According to this structure, the feature point of each part can be detected effectively. In this case, the dynamic generation unit includes an avatar image data storage unit that divides and stores the data of the avatar image into a plurality of avatar fragments corresponding to the parts, and uses the feature point information extracted for the corresponding parts of the person. After correcting the avatar fragment of the avatar image based on the basis, the corrected avatar fragment can be integrated to generate an avatar image. In this way, it is possible to make fine corrections that reflect feature points for each avatar fragment (that is, a person's part), and it is not necessary to prepare a large number of image data of the entire avatar for each feature. Reduction can be achieved.
Next, the information transmitting apparatus of the present invention is
A feature point extraction unit that extracts feature points from a subject in an image captured by at least one camera and outputs the feature points as feature information;
A coordinate information adding unit for acquiring coordinate information of the subject in the shooting range of the camera;
An information transmission device comprising an information transmission unit for transmitting feature information and coordinate information to a network,
The feature information is associated with the constituent elements of the avatar image of the subject displayed at the transmission destination,
The coordinate information is used to specify a position where the avatar image is to be combined in an image representing the background of the shooting range of the camera at the transmission destination.
The information receiving apparatus of the present invention is
An information receiving unit that receives, via a network, feature information representing feature points extracted from a subject in an image captured by at least one camera, and coordinate information of the subject in a shooting range of the camera;
A dynamic generation unit that generates an avatar image of the subject based on the feature information;
An image composition unit that generates an image by combining an avatar image with an image representing a background of a shooting range of a camera based on coordinate information is provided.
The computer program applied to the information transmission side of the present invention is:
A feature point extraction process for extracting feature points from a subject in an image captured by at least one camera and outputting them as feature information;
Coordinate information addition processing for acquiring coordinate information of the subject in the shooting range of the camera;
A computer program for causing a computer to execute information transmission processing for transmitting feature information and coordinate information to a network,
The feature information is associated with the constituent elements of the avatar image of the subject displayed at the transmission destination,
The coordinate information is used to specify a position where the avatar image is to be combined in an image representing the background of the shooting range of the camera at the transmission destination.
The computer program applied to the information receiving side of the present invention is:
A reception process for receiving, via a network, feature information representing feature points extracted from a subject in an image captured by at least one camera, and coordinate information of the subject in a shooting range of the camera;
Dynamic generation processing for generating an avatar image of a subject based on feature information;
A computer is caused to perform an image composition process for generating a composite image by compositing the avatar image based on coordinate information with an image representing the background of the shooting range of the camera.
 本発明によれば、カメラからの映像を伝送する際に、ネットワークの通信帯域の有効利用を妨げない伝送方式を提供できる。 According to the present invention, it is possible to provide a transmission method that does not hinder the effective use of the communication band of the network when transmitting video from the camera.
 図1は、本発明の第1の実施形態にかかる情報伝送システムの概略構成を示すブロック図である。
 図2は、特徴点抽出部の処理手順を示すフローチャートである。
 図3は、特徴点抽出部が人体を各パーツに分けて特徴を抽出する様子を示す模式図である。
 図4は、特徴点抽出部が人物属性情報を抽出する処理の流れを示すフローチャートである。
 図5は、カメラの撮影範囲に設定された座標系の一例を示す模式図である。
 図6は、背景画像にアバター画像が合成された表示例を示す模式図である。
 図7は、本発明の応用例を示す模式図である。
 図8は、送信側のカメラの連続性が無い場合におけるアバターの表現例を示す模式図である。
 図9は、第2の実施形態にかかる情報伝送システムの概略構成を示すブロック図である。
 図10は、従来の伝送方式を示す模式図である。
 図11は、人物画像領域の差分抽出の概念を説明する図である。
 図12は、背景画像の概念図である。
 図13は、座標情報付加処理の説明図である。
 図14は、図13に続く説明図である
 図15は、レンズ歪補正の説明図である。
 図16は、座標情報付加処理の流れを示すフローチャートである。
 図17は、画面上の人物画像領域の抽出状態の一例を示す図である。
 図18は、人物画像領域の高さhを、変換係数αを用いて実身長Hに変換する説明図である。
 図19は、人物領域検出処理の流れを示すフローチャートである。
 図20は、歩容特徴情報の抽出概念を示す図である。
 図21は、移動軌跡情報の抽出概念を示す図である。
 図22は、受信部側の情報蓄積の概念を示す図である。
 図23は、アバター画像データベースの概念を示す図である。
 図24は、アバター画像の方向決定に用いる人物移動方向の概念を示す図である。
 図25は、アバター断片図形データの例を示す図である。
 図26は、アバター断片図形を合成してアバター画像を得る例を示す説明図である。
 図27は、アバター画像データをアバターアニメーションデータとして構成する例を示す図である。
 図28は、アバター断片画像データを二次元のベクトル図形データとして構成する例を示す図である。
 図29は、受信部側処理の流れを示すフローチャートである。
 図30は、新規アバター作成処理の流れを示すフローチャートである。
 図31は、アバター背景合成処理の流れを示すフローチャートである。
 図32は、統合モード表示処理の流れを示すフローチャートである。
 図33は、統合表示モードにおける平面視表示形態の一例を示す図である。
 図34は、同じく俯瞰視表示形態の一例を示す図である。
 図35は、三次元アバター画像を表示する例を示す画像である。
FIG. 1 is a block diagram showing a schematic configuration of an information transmission system according to the first embodiment of the present invention.
FIG. 2 is a flowchart showing the processing procedure of the feature point extraction unit.
FIG. 3 is a schematic diagram showing how the feature point extraction unit extracts features by dividing the human body into parts.
FIG. 4 is a flowchart showing a flow of processing in which the feature point extraction unit extracts person attribute information.
FIG. 5 is a schematic diagram illustrating an example of a coordinate system set in the shooting range of the camera.
FIG. 6 is a schematic diagram illustrating a display example in which an avatar image is combined with a background image.
FIG. 7 is a schematic diagram showing an application example of the present invention.
FIG. 8 is a schematic diagram illustrating an expression example of an avatar when there is no continuity of the transmitting camera.
FIG. 9 is a block diagram illustrating a schematic configuration of an information transmission system according to the second embodiment.
FIG. 10 is a schematic diagram showing a conventional transmission method.
FIG. 11 is a diagram for explaining the concept of extracting a difference in a person image area.
FIG. 12 is a conceptual diagram of a background image.
FIG. 13 is an explanatory diagram of the coordinate information addition process.
FIG. 14 is an explanatory diagram following FIG. 13. FIG. 15 is an explanatory diagram of lens distortion correction.
FIG. 16 is a flowchart showing the flow of the coordinate information addition process.
FIG. 17 is a diagram showing an example of a person image region extraction state on the screen.
FIG. 18 is an explanatory diagram for converting the height h of the person image area into the actual height H using the conversion coefficient α.
FIG. 19 is a flowchart showing the flow of the person area detection process.
FIG. 20 is a diagram showing a concept of extracting gait feature information.
FIG. 21 is a diagram illustrating a concept of extracting movement trajectory information.
FIG. 22 is a diagram showing the concept of information storage on the receiving side.
FIG. 23 is a diagram illustrating the concept of an avatar image database.
FIG. 24 is a diagram illustrating the concept of the person moving direction used for determining the direction of the avatar image.
FIG. 25 is a diagram illustrating an example of avatar fragment graphic data.
FIG. 26 is an explanatory diagram illustrating an example in which an avatar image is obtained by combining avatar fragment graphics.
FIG. 27 is a diagram illustrating an example in which avatar image data is configured as avatar animation data.
FIG. 28 is a diagram illustrating an example in which avatar fragment image data is configured as two-dimensional vector graphic data.
FIG. 29 is a flowchart showing a flow of processing on the reception unit side.
FIG. 30 is a flowchart showing a flow of new avatar creation processing.
FIG. 31 is a flowchart showing the flow of the avatar background composition process.
FIG. 32 is a flowchart showing the flow of the integrated mode display process.
FIG. 33 is a diagram illustrating an example of a planar display form in the integrated display mode.
FIG. 34 is a diagram showing an example of a bird's eye view display form.
FIG. 35 is an image showing an example of displaying a three-dimensional avatar image.
 以下、図面を参照しながら、本発明の具体的な実施形態について詳しく説明する。
 (実施の形態1)
 まず、図1を参照しながら、本発明の第1の実施形態にかかる情報伝送システム1の構成と動作の概略について説明する。図1は、情報伝送システム1の概略構成を示すブロック図である。情報伝送システム1は、情報伝送システム送信部12(情報送信装置)と、情報伝送システム受信部13(情報受信装置)とを有している。情報伝送システム送信部12と、情報伝送システム受信部13とは、ネットワーク15を介して接続されている。ネットワーク15は、例えばインターネットなどの公共ネットワークであるが、ローカルネットワークなどのプライベートネットワークであっても良い。
 情報伝送システム送信部12は、様々な場所に設置された複数のカメラ11(11a,11b・・・)から映像信号を受信し、送信前処理(後に詳述する。)を行ってからネットワーク15へ送出する。なお、図1においては、カメラ11を2台のみ図示しているが、カメラの台数は任意である。カメラ11と情報伝送システム送信部12との間の通信は、有線通信であっても良いし、無線通信であっても良い。
 また、情報伝送システム受信部13は、情報伝送システム送信部12からネットワーク15を介して送信された映像信号を受信し、受信後処理(後に詳述する。)を行ってから、モニタ14へ表示させたり、必要に応じて映像記録装置(図示せず)へ録画したりする。
 情報伝送システム送信部12は、座標情報付加部121、特徴点抽出部122、複数カメラ連携部123、および、情報送信部124を備えている。座標情報付加部121および特徴点抽出部122は、一つのカメラ11に対して一組ずつ設けられている。例えば、図1においては、カメラ11aに対して座標情報付加部121aおよび特徴点抽出部122aが設けられ、カメラ11bに対して座標情報付加部121bおよび特徴点抽出部122bが設けられている。
 特徴点抽出部122は、カメラ11で撮影された映像信号から人物領域を検出し、さらに、それぞれの人物の外観(例えば、着衣、髪型、体形、持ち物等)についての特徴を抽出する。座標情報付加部121は、カメラ11で撮影されるエリア内の人物の位置を、座標情報として検出する。
 ここで、情報伝送システム1は、カメラで撮影した映像信号をそのまま圧縮して伝送する従来の情報伝送システムとは異なり、特徴点抽出部122で得られた特徴情報と座標情報付加部121で得られた座標情報のみを、ネットワーク15を介して伝送する。そして、この特徴情報と座標情報とを受け取った情報伝送システム受信部13側では、それぞれのカメラ11の撮影範囲の背景画像を予め記録しておき、前記の特徴情報に基づいて個々の人物を的確に表すアバターの画像を生成し、前記の座標情報にしたがって、背景画像の適宜の位置にアバター画像を合成する。このようにすることで、撮影された映像信号をそのまま圧縮して伝送する場合と比較して、伝送されるデータ量が少なくて済むので、ネットワークの通信帯域を有効に利用することができる。
 なお、情報伝送システム1は、複数のカメラ11と接続されているので、前述のように、カメラ11のそれぞれについて座標情報付加部121および特徴点抽出部122を備えている。このため、複数カメラ連携部123は、座標情報付加部121で得られた座標情報と、特徴点抽出部122で得られた特徴情報とに、複数のカメラ11のうちいずれのカメラの映像信号から得られた情報であるかを示すタグ情報を付与して、情報送信部124へ送る。情報送信部124は、複数カメラ連携部123から得た情報を所定の規格で符号化し、ネットワーク15へ送出する。
 情報伝送システム受信部13は、情報受信部131、動態生成部132、および、画像合成部133を備えている。情報受信部131は、ネットワーク15から受信した情報を復号化し、動態生成部132へ送る。動態生成部132は、受信した情報に含まれる特徴情報に基づいて、撮影された人物を表すアバターの画像を生成する。動態生成部132で生成されたアバター画像は、座標情報と共に画像合成部133へ送られる。画像合成部133は、アバター画像と座標情報とに基づいて、それぞれのカメラ11の撮影範囲の背景画像とアバター画像との合成画像を生成し、モニタ14へ表示させる。このとき、どのカメラ11の映像信号から得られた情報であるかを示す前記のタグ情報は、背景画像を特定するために用いられる。
 次に、座標情報付加部121の処理について説明する。座標情報付加部121は、それぞれのカメラ11の撮影範囲に対して設定された座標系において、人物がいる位置の座標を特定する。例えば図5に示すように、一つのカメラ11の撮影範囲において、x−y座標系51を設定する。座標情報付加部121は、このx−y座標系51において、特徴点抽出部122が特定した人物領域の座標を検出する。ここで検出された座標は、当該人物のいる位置を表す座標情報として、特徴情報と共に情報伝送システム受信部13へ送られる。
 三次元空間内の立体をカメラ映像から特定したい場合、映像は二次元データであるから、一般の立体の空間的な状態を1台のカメラ映像で特定することは原理的にできない。しかし、本発明が対象とする被写体はカメラ11が撮影するエリア内を動き回る人物であり、その空間幾何学的な移動特性を考慮することで、図5に示す単独のカメラ11の画面上の人物画像領域PAの情報から実空間内の人物位置と高さを特定可能である。すなわち、撮影対象となるエリアの人物の空間的な存在範囲は、床面や地面、図5の場合は人物が歩行する路面RSなどであり、要するに高さ方向(z軸方向)の位置が一定の水平面にほぼ限られている点に着目する。この路面RSは、直交座標系にてz軸座標が常に0のx−y平面であり、該路面RS上を歩行する人物の接地点の座標は実質的にx−yの二次元で記述でき、三次元空間内の点でありながら、撮影画面に設定されるカメラ二次元座標系と一義的な対応付けが可能となる。
 他方、カメラ二次元座標系は実空間三次元座標系が射影変換されたものに相当し、カメラ光軸方向に隔たった被写体ほど寸法が縮小されて投影される。これは数学的には射影変換行列で記述されるが、実空間座標系での予め知れた種々の位置に基準体を配置してカメラで撮影すれば、その基準体画像の撮影画面上での位置と高さを実空間上の基準体の位置及び実寸と比較することにより、カメラ上の人物の映像位置・高さを実空間上の位置・高さに変換する位置・高さ変換関係情報を得ることができる。その具体例を図13~図15の説明図及び図16のフローチャートを用いて説明する。
 すなわち、カメラの撮影視野SAにおいて、路面RS上に高さが既知の基準体SCを前後左右の種々の位置に配置し撮影を行う。図16のS501では、その基準体の高さHを入力する。すると撮影画面SA上では、これは同一の基準体に由来したものであるにもかかわらず、カメラ11からの距離に応じて異なる寸法の基準体画像SCIとなって現れるので、これを抽出する(S502)。これらの基準体画像SCIは全て同じ路面RS(すなわち、x−y平面(z=0))上にあるので、その下端を表す点(基準点)p1~p3は実空間においてすべてz=0の接地点である。そこで、この基準点p1~p3を画面上に設定されたカメラ二次元座標系であるξ−η座標系にて読み取り、基準点の画面座標データp(ξ,η)として記憶する(S503)。なお、画面上のどのエリアが路面RSを表すかについては、路側縁REや路面上の白線WLなどの画像を参考にすることができる。
 次に、撮影画面上の映像はカメラレンズの歪の影響を受けるので、実空間の厳密な射影変換画像とはなっておらず、視野内の位置に応じて画像にゆがみが生じていることがある。図15の左に示すように、そのひずみは画面の端に近い領域ほど大きく座標系も非線形化する。例えば、広角レンズなど視野角の大きいレンズでは外向きに凸状の歪となり、逆に望遠レンズなど視野角の小さいレンズでは凹状の歪となる。そこで、この歪を解消し、直交平面座標系の点となるように変換補正を行う(S504)。この時の補正係数は、例えば図13において、画面上に現れている白線WLなど、実空間上で直線であることが予めわかっている図形の形状が直線化するような最適化演算によって定めることができる。なお、この補正により画面の端ほど歪み解消に伴い寸法は伸長するから、補正後の画面形状SA’は元の画面SAの外にはみ出すこととなる。
 次いで、図14に示すように、基準体SCの実空間系での座標を決定する。例えば、測量による場合は、カメラ11から路面に設置した基準体SCまでの距離dと、カメラから基準体を見込む線と基準線(例えば、x軸方向)とのなす角度θを測定すれば、
 x=d・cosθ
 y=d・sinθ
として基準体SCの接地点の実空間座標P(x,y,0)を求めることができる。他方、衛星測位システム(GPS)により座標を直接特定してもよい。なお、ここで用いられる実空間座標系は、それぞれのカメラの撮影範囲内に設定される独立した座標系であっても良いし、衛星測位システム(GPS)から提供されるグローバル座標系と連動していても良い。ただし、後述するが、複数のカメラの撮影範囲を統合して一つの空間を生成する場合は、それぞれのカメラの座標系を連結する必要があり、複数カメラが連携撮影するエリアに対し、統合的な実空間座標を張っておくことが望ましい。また、x−y座標系51を設定する際に、例えばLEDライト等を用いて、キャリブレーションを行うことが望ましい。
 次に、基準体画像SCIの画面上の高さhを読み取る(S506)。基準体SCの実高さHは既知なので、基準体画像SCIの高さhを実高さHに変換する係数
 α=H/h
を計算し(S507)、画面座標データp(ξ,η)と実空間座標データP(x,y,0)と互いに対応付て記憶する(S508)。以上の処理をすべての基準体SCについて繰り返したのち(S509→S501)、路面RS上にて実測していない主要点でのp,P,αの組を補う処理を行う。この処理は補間データを取得するステップとして行ってもよいし、得られているp,Pの組から射影変換行列の要素を定める処理として行ってもよい。そして、これらの情報が位置・高さ変換関係情報を構成することとなる。
 なお、図13に示すように、座標情報の取得には、基準体SCに代えて人物SPを用いてもよい。この場合、被写体となる人物の身長を入力し、その人物がカメラの四隅などを歩くことでカメラ画角と位置情報を学習取得するキャリブレーション方式などを用いてもよい。
 次に、特徴点抽出部122による特徴点抽出処理の内容について、図2および図3を参照しながら詳しく説明する。図2は、特徴点抽出部122の処理手順を示すフローチャートである。図3は、特徴点抽出部122が人体を各パーツに分けて特徴を抽出する様子を示す模式図である。
 特徴点抽出部122は、対応するカメラ11から所定数のフレームの映像信号を入力すると、図11に示すようにフレームFM間の差分を取ることにより、映像信号に映り込んでいる動物体MOを検出する(図2のステップS11)。具体的には、先行するフレームの画像領域MO’と後続のフレームの画像領域MOとは、画像領域が動物体であればフレーム間で位置や形状が変化するが背景は変化しないので、両フレーム間の画像差分をとれば動物体の画像領域MOを抽出することができる。一方、動物体が存在しない状態で撮影すれば、図12に示すように背景画像BPが得られる。この背景画像BPはカメラごとに撮影され、図1の受信部13に送信されて該受信部13をなすコンピュータがアクセス可能な記憶装置135(本実施形態では外部記憶装置ないし別のコンピュータにて構成された情報蓄積・統計処理部)に格納される。
 次に、特徴点抽出部122は、ステップS11で検出した動物体の画像に対して、セグメンテーション、エッジ検出、パターンマッチング等を行うことによって人物領域を抽出し、その動物体が人物か否かを判断する(ステップS12)。なお、映像信号からの動物体検出処理や人物抽出処理については、様々な手法を用いることができ、特定の手法に限定されない。また、映像信号から検出された動物体のうち、そのサイズが比較的小さいものはノイズである可能性が高いので人物ではないと判断し、比較的大きいサイズの領域については人物であるものと判断しても良い。
 そして、この人物領域の抽出に随伴して、人物の位置座標と身長を特定する処理がなされる。以下、図17及び図18の説明図と、図19のフローチャートを用いて説明する。まず、図17に示すように、検出された人物領域PAの下端縁位置を接地点pとしてみなし、その画面上の座標p(ξ,η)を読み取り(S1201)、前述の位置・高さ変換関係情報を参照し、物の姿勢により人物領域の高さ方向寸法は変化するので、最も直立状態に近いと思われる人物領域の画像を全そのフレーム内を検索して特定する(S1203)。例えば、歩行中の人物については、両足を開いている姿勢よりは、先に踏み出した足に対し後方に位置する反対側の足を次に踏み出す過程で、先に踏み出したほうの足にほぼ重なるときが直立姿勢に最も近いから、そのようになる画像フレーム上での人物領域を用いて領域の高さhを画面上で測定する(S1204)。そして、図18に示すように、前述の位置・高さ変換関係情報に含まれる変換係数αを用いてこれを実身長Hに変換する(S1205)。
 また、特徴点抽出部122は、さらに、各部位の動作を解析する処理(ステップS15)を行うことが好ましい。例えば、頭p1については、頭の動作(動きや向き)が検出される。人体の部位のうち、認識が最も容易であるのは頭であると言われている。最初に頭p1を抽出することで頭の向きが分かれば、他の部位の状態や動く方向等を特定し易い。また、例えば、頭が右を向いている場合、後述のパーツ化において、左手や左足は隠れていて見えない可能性があるという推定を働かせることも可能となる。
 例えば、歩行中の人物の動作であれば、動作は歩容情報として解析取得される。この場合、胴体p2の動作としては、例えば、図20に示すように、上半身角度ψなどの姿勢や猫背か否か等が検出される。右手p3・左手p4の動作は、例えば、手の振り各λ等として検出される。右足p5・左足p6の動作としては、例えば、歩速、歩幅WL、ひざ曲げ角などが検出される。ここで検出された歩容等の動作解析情報は、動作解析情報として情報伝送システム受信部13へ送られ、当該人物を表すアバターの動きや向きに反映される。
 また、動作の解析情報として重要なのは、人物の移動方向である。図21に示すように、撮影された動画のフレーム別に人物の座標情報P1、P2、・・・Pnが特定された場合、その座標情報P1、P2、・・・Pnの組は当該人物のフレーム間の移動軌跡情報を構成することとなる。そして、隣接するフレーム間の座標Pn、Pn−1の位置ベクトルVn、Vn−1の差Vn−Vn−1は位置Pnでの人物の移動方向を表す指標として使用でき、後述のアバター画像の方向決定にも有効活用される。
 次に、特徴点抽出部122は、ステップS12で抽出された人物領域P(図3の(a)参照)を、頭p1、胴体p2、右手p3、左手p4、右足p5、左足p6の6部位(図3の(b)参照)にパーツ化する(ステップS13)。そして、パーツ化された6部位のそれぞれについて、外観的な特徴の解析を行う(ステップS14)。例えば、頭p1については、髪型、髪色、帽子の有無等が特徴点として抽出される。胴体p2については、体形、着衣の形、着衣の色、リュックサックなどの特定の持物の有無等が特徴点として抽出される。右手p3・左手p4についての特徴点は、例えば、体形、着衣の形(あるいは種別)、着衣の色、持物等である。右足p5・左足p6についての特徴点は、例えば、体形、着衣の形(あるいは種別)、着衣の色、靴等である。
 なお、パーツ化する際のパーツの数は、6つに限定されない。処理負荷を軽減する目的で、例えば、頭、上半身、および下半身、の3部位に分割することも考えられる。その逆に、よりリアルなアバターを生成するために、6つよりも多い数のパーツに分割することも考えられる。また、ここに挙げた特徴点はあくまでも例示であり、様々な要素を特徴点として抽出すればよい。また、例えば、上述の説明では、髪型と髪色や、着衣の形と着衣の色とを、それぞれ独立した特徴点として抽出するものとしているが、「髪色」および「着衣の色」は、「髪型」および「着衣の形」の付加的データとして扱っても良い。抽出されたパーツごとの特徴点は、特徴データとして出力され、情報伝送システム受信部13へ送られる。
 抽出される特徴点(特徴データ)のバリエーションは、後述の通り、情報伝送システム受信部13で生成される当該人物のアバターにおいて、各部位の構成要素(部分画像)のバリエーションに対応する。例えば、髪型の特徴点として「長髪」が抽出された場合は、アバターの頭髪として「長髪」の部分画像が使用される。また、例えば、胴体の横幅が太い人物の場合は、アバターの胴体の部分画像として太い胴体が用いられる。
 なお、ステップS11およびS12により、一つのカメラ11の撮影範囲において複数の人物領域が抽出された場合は、ステップS13~S15の処理は、それぞれの人物領域に対して行われる。そして、得られた特徴情報と動作情報は、個々の人物領域を特定するタグ情報と共に、複数カメラ連携部123へ送られる。
 また、特徴点抽出部122が、人物の年齢や性別等の、人物をある程度特定する情報(人物属性情報)をさらに抽出するようにしてもよい。この場合、図4に示すように、特徴点抽出部122は、パーツ化された部位の画像から抽出される特徴量に基づいて、当該人物の年齢や性別を判別する(ステップS23)。例えば、頭p1を捕捉できれば、顔認識技術を利用して、年齢や性別を判別することが可能である。
 なお、年齢は、1歳きざみの年齢データとして出力しても良いし、例えば「20歳台」のように年齢ゾーンを表すデータとして出力しても良い。また、ここでは、人物属性情報として性別と年齢を例示したが、人物をある程度特定する情報としては、これ以外に任意の情報を利用できる。例えば、「大人」と「子供」を判別することも考えられる。
 上記の場合は、人物を一意に特定するのではなく性別や年齢等で属性付けをするのに対して、顔等の画像と人物の個人情報(氏名等)とが予め登録された人物データベースを利用できる場合には、必要に応じて、頭p1の画像と人物データベースに登録されている顔画像とを照合することにより、個人を一意に特定することも可能である(ステップS24)。これにより、例えば、
(1)犯人逮捕などを目的とする場合は1対1照合
(2)迷子探しなどを目的とする場合は着衣年齢による照合
(3)マーケティング活用などを目的とする場合は大まかな年齢推定のための照合
といった個人識別が可能となる。
 次に、図1に戻り、情報伝送システム受信部13においては、情報受信部131が、ネットワーク15から情報を受信して復号化する。復号化された情報には、上述したように、複数のカメラ11(カメラ11a,11b・・・)の映像信号から得られた情報(特徴情報および座標情報)が含まれており、情報蓄積・統計処理部135に記憶・蓄積される。図22は蓄積されている情報の例を示し、前述の特徴情報の一致度から同一と判断される人物に検出IDが振られ、受信の時刻と日付、人物の実空間座標による位置(x座標及びy座標)、歩き方(歩容)、体格、身長、頭髪の色、上半身の着衣の色、下半身の着衣の色、顔の特徴情報、性別、年齢などの抽出データと対応付けて順次記憶されておる。また、日付やID部分は#1、#2等に省略して表現しているが、上半身及び下半身の服装の種別(形態)、帽子の有無、持ち物などの情報も対応付けられている。さらに、歩容のデータは、歩幅WL、腕振り角λ、上半身角ψ、膝曲げ角δ、一歩周期τなどを含むものとなっている。
 以下、受信部側の処理について説明する。
 動態生成部132は、受け取った特徴情報に基づいて、各人物のアバター画像を生成する。すなわち、特徴情報には、前述したように、人物のパーツごとの特徴を表す特徴データが含まれている。動態生成部132は、この特徴データのそれぞれに対応するアバターの部分画像を予め記憶したデータベース(図1の情報蓄積・統計処理部135内に形成されているものとするが、ネットワーク上の別の記憶装置やサーバであってもよい)にアクセス可能である。例えば、ある人物の頭p1の特徴データとして「長髪」が含まれる場合は、長髪の部分画像を上記データベースから取得する。動態生成部132は、人物の各パーツの特徴データに基づいてアバターの部分画像を組み合わせることにより、当該人物のアバター画像を生成する。
 図23は、そのデータベースの構築例を概念的に示すものである。このデータベースには、身長や体形を標準に設定したときの上半身及び下半身の着衣、髪型、持ち物等からなる、アバター構成要素となるアバター断片図形データが含まれている。アバター構成要素ごとに各アバター断片図形データは、カメラからの視点による人物の見え方(カメラに対する向き)が反映されるように、実空間における人物の移動方向に応じて異なる表現形態に構成されている。本実施形態では、図24に示すように、カメラ11に対する人物Pの向きを8方向(J1~J8)に定め、図3で説明した人体パーツp1~p6に対応する形で分割されたアバター断片図形データ(上半身着衣ではp2~p4、下半身着衣ではp5、p6)が、該8方向における見え方に対応して各々8通り(v1~v8:引数はJ1~J8に対応)ずつ用意されている。また、靴や頭髪、持物は分割されないが、これも8通りずつ用意されている。
 図25の左は、図24における方向v7が指定されたときのアバター断片図形データの選択例を、同じく右は方向v1が指定されたときのアバター断片図形データの選択例を示すものである。上半身についてはTシャツが、下半身についてはジーンズが選択され、図23のデータから、方向v7及びv1に対応するものがそれぞれ選択されている。図26はそれらを合成して得られるアバター画像AV7及びAV1である。なお、顔については、抽出された顔特徴情報を反映した輪郭や人相のものをその都度方向別に合成するようにしているが、性別や年齢に応じて標準的な顔(ないし頭部)の画像を用意しておいてもよい。
 また、アバター画像データ(あるいはアバター断片図形データ)は、図27に示すように、物歩行動作を細分化したコマデータの集合からなるアバターアニメーションデータとして構成されている。図27の例では、右足を踏み出して着地するまでの複数コマ(ここではAFM1~4の4コマ)と、左足を踏み出して着地するまでの4コマ(AFM5~8)の2歩分1周期を8コマのアニメーンョンで表現している。図23に示すように、少なくとも下半身着衣と上半身着衣については、アバター断片図形データの種別ごとにこの8コマのデータが用意されている。
 そして、各アバター断片の画像データは、図28に示すように、二次元のベクトル図形データとして構成されている。該ベクトル図形データは図形外形線を特定する頂点座標をベクトルで周回連結したものであり、一次変換変形処理を施すと、頂点座標もその一次変換を表す行列演算に従って移動し、移動後の点を再びベクトル連結することで、図形の拡大や縮小、回転などの変形処理を簡単に実行することができる。そして、そのベクトルで囲まれた図形の内部はベクトル線を基準とした内外判断により着色領域が特定され、指定の色にて領域内の画素をラスタライジングすることにより、図22の特徴情報が規定する色情報を最終的なアバター画像VDRに容易に反映させることができる。
 図29は、受信部側の処理の流れを示すフローチャートである。
 まず、ネットワークを介して送られてくる人物IDと動作情報(座標点)及び特徴情報を受信する(S601)。次に、受信した座標情報Pを、複数のカメラが共有する実空間座標上にプロットする(S602)。そして、前後のフレームにおける人物の座標Pの位置変化から人物の歩行方向ベクトルを演算するとともに、図24の8つの方向J1~J8から、その歩行方向ベクトルの向きに最も近いものを選択してアバター画像の配置方向として決定する(S603)。
 別のカメラの視野から着目しているカメラの視野に初めて入ってきた人物については、すでにID特定されてアバター画像を作成済みの場合があるから、受信したIDに対応するアバターの作成履歴を検索する(S604)。S605でアバター作成の履歴がない場合は、時刻・位置・特徴が、予め定められた条件で一致する人物がデータベース上にあるかどうかを検索する(S606)。S607で該当人物がなければ新規アバター画像の作成処理となる(S610)。
 図30は、新規アバター作成処理の詳細を示すもので、S601で特徴データに含まれる髪型、服装、持ち物及びそれらの色を特定する。次に、S6102で、特定された特徴に対応するアバター断片図形のうち、決定されたアバター画像配置方向(J1~J8のいずれか)に対応するもの(図23のv1~v8のいずれか)を読み出す。次に、S6103では、特徴データに含まれる身長・体形・歩容情報によりアバター断片図形を補正し、S6104で指定された色にアバター断片図形を着色する。最後にS6105で各アバター断片を合成することによりアバター画像データが完成する。
 図29に戻り、S605でアバター作成の履歴があった場合はS609に進み、対応するIDのアバター画像データをデータベースから読みだして再利用する。一方、S607で該当人物があった場合は、受信した人物IDを該当する人物のIDで更新し、S609に進んで同様の処理となる。
 動態生成部132で生成された各人物のアバター画像は、それぞれの人物の座標情報と共に、画像合成部133へ送られ、アバター・背景合成処理となる(S611)。画像合成部133は、それぞれのカメラ11の撮影範囲の背景画像を予め記憶したデータベース(情報蓄積・統計処理部135内)にアクセス可能である。画像合成部133は、それぞれのカメラ11の背景画像を上記データベースから取得し、動態生成部132で生成されたアバター画像と合成する。合成画像はモニタ14へ出力される。このときに、アバター画像を配置する位置は、当該アバターの人物の座標情報に基づく。また、画像合成部133は、特徴点抽出部122で得られた動作解析情報(人物の動きや向きを表すデータ)に基づいて、アバターの向きを変えたり、アバターが動く速さを調整したりすることができる。なお、情報伝送システム送信部12からの伝送を、特徴点抽出部122および座標情報付加部121や、動態生成部132および画像合成部133が処理可能な範囲で、できるだけ高いフレームレートで行うことにより、カメラ11の映像信号をほぼリアルタイムにモニタ14へ表示させることができる。
 図31は、アバター背景・合成処理の流れを示すものである。まず、S61101では、特定されたID及び方向に対応するアバター画像データを読み出す。このアバター画像データはアニメーションを構成するコマデータの集合であり(図27)、移動する座標点Pの速度と歩幅に合わせて、アバターアニメーションンのコマを動画再生のフレームに割り振る(S61102)。
 合成映像の表示形態であるが、この実施形態では、カメラ11の撮影画面と同じ視野での表示モード(カメラ実映像モード)と、複数カメラの統合表示モードのいずれかを選択できるようになっている。このモード選択は、図1にて情報伝送システム受信部13に接続された入力部(キーボードやタッチパネルで構成される)により切り替えが可能である。カメラ実映像モードが選択された場合はS61104に進み、同時表示するすべてのアバターの位置座標P(x,y,0)を、対応するカメラの実空間視野領域にプロットする。S61105では、プロットされた位置座標Pとともに実空間視野領域を、対応するカメラの座標系に射影変換する。
 ここで、位置座標Pの決定に使用したカメラ二次元座標系は、すでに説明したごとく、レンズ歪を考慮して、図15左の状態から右の状態へと一旦補正されている。この場合、補正前の座標系で視野の全体が出力画面に収まるようになっていたものが、補正後は視野の端の領域がモニタ(図1:符号14)の画面からはみ出ており、単純に射影変換しただけでは、カメラの撮影画像を直接見る場合と比較して、歪補正した分だけ画像が変化して違和感を生じるほか、視野の端に写り込んでいた人物のアバター画像を表示できなくなる場合もある。そこで、図31のS61106では、射影変換画像に元のレンズ歪の影響を復活させるための逆歪補正を施し、視野の形状を元に戻す。これにより、上記の不具合は解消されることとなる。
 そして、S61107では、マッピングされた人物座標位置Pとともに、射影変換及び歪逆補正を経てカメラ二次元座標系に戻された出力平面上に、選択されている背景映像を重ね合わせ(S61107)、変換後の(カメラ二次元座標系による)各位置pに、上記のごとく寸法及び向きを調整されたアバター画像データを貼付け、合成する。
 なお、図1のモニタ14の画面は、分割することにより複数のカメラ11の映像信号を同時に表示しても良いし、モニタ14の画面を切り替え操作することにより、複数のカメラ11のうちのいずれかの映像信号のみを表示するようにしても良い。
 一方、S61104で統合表示モードが選択されている場合はS1000に進み、統合モードでの表示を行う。図32はその詳細を示すフローチャートであり、S1001では、同時表示するすべてのアバターの位置P(x,y,0)及び方向を、複数カメラが共有する実空間にプロットする。そして、S1002では、前後のフレームにおける人物位置座標Pを重ね合わせて動線軌跡データを作成する。本実施形態では、図33のような平面視による表示形態と、図34のような俯瞰視による表示形態のいずれかから選択できるようになっている。
 平面視を選択した場合はS1004に進み、図33に示すように、予め用意された平面視背景画像PBPを貼付ける。そして、アバター画像AVを、平面視背景画像上にプロットし表示する。この場合、平面視用のアバター画像を別に用意してもよいし、アバターを横方向に表示して特徴情報を把握しやすくしてもよい。そして、動線表示が指定され場合は、前述の動線軌跡データに基づいて、対応するアバター画像AVの動線画像MLを表示する。
 一方、S1003で俯瞰視を選択した場合はS1006に進み、俯瞰角度及び方向に応じてアバターの実空間位置、方向及び動線データを射影変換し、S100で俯瞰視の場合の背景画像を重ね合わせる。背景画像については、俯瞰視用の撮影画像を用意して用いてもよいし、三次元背景画像データ(例えば三次元コンピュータグラフィックス(CG)データ)を用意して射影変換により俯瞰視化してもよい。S1008で、射影変換後のアバターの方向に対応するアバター画像を読み出し、図34に示すように、俯瞰視背景画像PBPS上に貼付けアバター画像AVSを貼りつける。この場合も、動線表示が指定され場合は、前述の動線軌跡データに基づいて、対応するアバター画像AVSの動線画像MLSを表示する。
 なお、アバター画像データは三次元アバター画像データを用い、アバター画像を、図35に示すように、三次元CG画像として表示するようにしてもよい。この場合、アバター画像は最初から三次元化されたアバターオブジェクトとして用意されるので、指定された方向への配置や回転も三次元的に自由に設定できる。この場合、画像合成部133(図1)は、配置方向が決定された実空間上の三次元のアバターオブジェクトを背景画像の二次元座標系に射影変換することにより二次元アバター画像データを生成し、該二次元アバター画像データに基づくアバター画像を背景画像と合成する処理を行う。
 以上説明したごとく、カメラ11の映像信号から人物の特徴点と位置座標とを抽出し、抽出されたデータのみを伝送することにより、従来のように映像信号をそのまま圧縮して伝送する場合に比べて、ネットワーク15の帯域を有効に利用することができる。また、モニタ14に人物の映像がそのまま映されるのではなく、擬人化(アバター化)された状態で表示するので、街頭の防犯カメラのように不特定多数の人物を撮影する場合に、プライバシーを侵害しないという利点がある。例えば、図5に示した撮影画像中の人物は、図6に示すようなアバターとしてモニタ14に表示される。また、それぞれのアバターは、映像から抽出された特徴情報に基づいて、それぞれの人物の特徴を表すデザインになっているので、どのような人物が撮影範囲内にいるのかを把握することができる。
 また、上述の実施形態において、カメラ11の映像信号から人物の特徴情報や座標情報を取得し、さらには、その人物の動きや向きを表す動作解析情報や、年齢・性別等の人物属性情報を取得する例を説明した。このような情報を利用して、様々な応用例が考えられる。例えば、図7に示すように、上記の情報を画像合成部133において処理加工し、モニタ14に複数の画面を表示するようにしても良い。図7の例では、モニタ14に、実映像空間画面81、特徴量再現画面82、統計空間画面83、動線解析空間画面84、個人識別空間画面85が並べて表示されている。或いは、アバターを見ながら必要に応じて実映像をストリーミングするといった方法でもよい。また送信側で顔を認識した時にその顔のキャプチャーを送信側で録画しておき、受信側からの要求でアバターと紐付された顔画像を受信側の要求により受け取ることができるように構成してもよい。
 実映像空間画面81は、複数のカメラ11からの映像信号を、人物をアバターに置き換えた状態で表示する画面である。図7の例では、実映像空間画面81は4つに分割され、4台のカメラ11からの映像信号を同時に表示しているが、カメラの台数はこれに限定されない。
 特徴量再現画面82は、複数台のカメラ11からの映像を表示する画面であり、人物がアバターに置き換えられ、かつ、背景画像もグラフィック表示にしたものである。なお、図7の例では、複数台のカメラ11からの映像を三次元的に統合して、特徴量再現画面82が生成されている。すなわち、複数個所に設置された複数台のカメラで撮影した映像を組み合わせて、俯瞰画像として特徴量再現画面82を構成する。例えば、図7に例示した特徴量再現画面82は、駅構内(ホームと改札口周辺)とその周辺の店舗の様子を表す画面である。この例では、例えば、駅のホームの設置カメラ、改札口周辺の設置カメラ、および、複数の店舗のそれぞれに設置されたカメラ、からそれぞれ取得された映像信号を用いる。一台のカメラでこの全ての領域を撮影することは不可能であるが、複数個所に設置された複数台のカメラで撮影された映像を三次元的に組み合わせることにより、このような俯瞰画像的な画面を構成することができる。
 また、カメラの映像信号から抽出された動作解析情報には、人物の向きや動いている方向についての情報が含まれている。この情報を利用して、実際の人物の向きに合うようにアバターを配置することにより、特徴量再現画面82において群衆の動きが把握しやすくなるという利点がある。このような特徴量再現画面82を構成することにより、複数個所に設置されたカメラの映像を統合的に見ることができ、より広い範囲の状況をリアルタイムに監視することが可能になる。また、実映像空間画面81と同様に、人物がアバターに置き換えられているので、プライバシーを侵害しないという利点がある。また、それぞれのアバターは、映像から抽出された特徴情報に基づいて、それぞれの人物の特徴を表すデザインになっているので、どのような人物が撮影範囲内にいるのかを把握することが可能である。
 統計空間画面83は、様々な統計結果を表示する画面である。例えば、あるカメラの撮影範囲内にいる人の数の推移をグラフで表すことができる。あるいは、人物属性情報に基づいて、撮影範囲内にいる人を、性別や年代別にグラフで表しても良い。また、析空間画面84は、ある人物(アバター)に着目して、その人物がカメラの撮影範囲でどのように移動したかを動線で表示するものである。これは、ある人物(アバター)の座標情報を時系列で取得することにより可能である。さらに、個人識別空間画面85は、撮影範囲にいる人物の人物属性情報を表示する。図7の例では、それぞれの人物のアバター画像の顔部分と、性別と年齢とが表示されている。
 また、実映像空間画面81、特徴量再現画面82、統計空間画面83、および動線解析空間84は、GUI(グラフィカルユーザインタフェース)機能を有していることが好ましい。例えば、この特徴量再現画面82において、表示されているアバターの一つを、マウス等のポインティングデバイスを用いて選択すると(図7では、アバター82a)、このアバターで表されている人物の人物属性情報が、個人識別空間画面85でハイライト表示される。この例では、アバター82aの人物属性情報である「男、35歳」が、個人識別空間画面85においてハイライト表示されている。また、これとは逆に、個人識別空間画面85において、いずれかの人物属性情報を選択すると、特徴量再現画面82において、選択された人物属性情報に対応するアバター画像がハイライト表示されるようにしても良い。また、特徴量再現画面82においていずれかのアバターを選択すると、そのアバターの移動経路が、動線解析空間画面84に表示されるようにしても良い。
 図8は、送信側のカメラの連続性が無い場合におけるアバターの表現例を示す模式図である。本実施形態では送信側のカメラの連続性が無い場合、受信側でアバターの確認ができない場合も想定しなければならない。その対処法としては、例えば図8(A)に示す統合レイヤーの図に示すように、カメラAでとらえた特徴量を通行先にあるカメラBで確認できた時に受信アバターに着色する、一方、図8(B)に示すように、受信側でアバターの確認ができない時は受信アバターを着色しない状態のアバター(デフォルトアバター)のままにして、両者を区別して表現しつつ、いずれの場合もカメラ間の動作は移動速度により計算し特徴量再現空間などに三次元画像を投射するという方法が考えられる。
 このように、カメラの映像信号から抽出された特徴情報、座標情報、人物属性情報、動作解析情報等を情報伝送システム受信部13側で様々に用いることにより、統合的な防犯カメラシステム等を実現することが可能になる。
(実施の形態2)
 本発明の実施の形態2にかかる情報伝送システム2について説明する。なお、(実施の形態1)で説明した構成と同様の機能を有する構成については、同じ参照符号を付記し、重複した説明は行わない。
 図9は、情報伝送システム2の概略構成を示すブロック図である。図9に示すように、情報伝送システム2は、カメラ11が一台である点において、複数のカメラ11a,11b・・・を備えた情報伝送システム1(第1の実施形態)と異なっている。このため、情報伝送システム2は、座標情報付加部121と特徴点抽出部122とを一組だけ備え、複数カメラ連携部123を備えていない。なお、座標情報付加部121および特徴点抽出部122、ならびに他の処理部の動作については、第1の実施形態と同様である。このように、一台のカメラ11の映像信号のみから情報を抽出して伝送するシステムも、本発明の一実施形態に含まれる。
 また、上述の実施の形態1において、情報伝送システム送信部12および情報伝送システム受信部13のそれぞれは、独立した装置(カメラコントローラ)またはコンピュータやサーバとして実現することができる。ブロック図に表された座標情報付加部121等の各部は、これらの装置において、メモリに記録されたプログラムをプロセッサが実行することにより、実現することができる。また、実施の形態2の情報伝送システム送信部22は、カメラ11と一体化された装置として実現することもできる。
 また、このように、ハードウエアとして本発明を実施する形態の他に、汎用のコンピュータやサーバで実行されるプログラムまたはこれを記録した媒体としても、本発明を実施することができる。
Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings.
(Embodiment 1)
First, an outline of the configuration and operation of the information transmission system 1 according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing a schematic configuration of the information transmission system 1. The information transmission system 1 includes an information transmission system transmission unit 12 (information transmission device) and an information transmission system reception unit 13 (information reception device). The information transmission system transmission unit 12 and the information transmission system reception unit 13 are connected via a network 15. The network 15 is a public network such as the Internet, but may be a private network such as a local network.
The information transmission system transmission unit 12 receives video signals from a plurality of cameras 11 (11a, 11b...) Installed in various places, performs pre-transmission processing (described in detail later), and then performs the network 15. To send. In FIG. 1, only two cameras 11 are shown, but the number of cameras is arbitrary. Communication between the camera 11 and the information transmission system transmission unit 12 may be wired communication or wireless communication.
The information transmission system reception unit 13 receives the video signal transmitted from the information transmission system transmission unit 12 via the network 15, performs post-reception processing (described in detail later), and then displays it on the monitor 14. Or recording to a video recording device (not shown) as necessary.
The information transmission system transmission unit 12 includes a coordinate information addition unit 121, a feature point extraction unit 122, a multiple camera linkage unit 123, and an information transmission unit 124. One set of coordinate information adding unit 121 and feature point extracting unit 122 is provided for each camera 11. For example, in FIG. 1, a coordinate information adding unit 121a and a feature point extracting unit 122a are provided for the camera 11a, and a coordinate information adding unit 121b and a feature point extracting unit 122b are provided for the camera 11b.
The feature point extraction unit 122 detects a person area from the video signal photographed by the camera 11, and further extracts features regarding the appearance (for example, clothing, hairstyle, body shape, belongings, etc.) of each person. The coordinate information adding unit 121 detects the position of a person in an area photographed by the camera 11 as coordinate information.
Here, the information transmission system 1 differs from the conventional information transmission system in which the video signal photographed by the camera is compressed and transmitted as it is, and is obtained by the feature information obtained by the feature point extraction unit 122 and the coordinate information addition unit 121. Only the received coordinate information is transmitted via the network 15. The information transmission system receiving unit 13 that has received the feature information and the coordinate information records a background image of the shooting range of each camera 11 in advance and accurately identifies each person based on the feature information. An avatar image is generated, and an avatar image is synthesized at an appropriate position of the background image according to the coordinate information. By doing so, the amount of data to be transmitted can be reduced compared to the case where the captured video signal is compressed and transmitted as it is, so that the network communication band can be used effectively.
Since the information transmission system 1 is connected to the plurality of cameras 11, as described above, each of the cameras 11 includes the coordinate information addition unit 121 and the feature point extraction unit 122. For this reason, the multi-camera cooperation unit 123 uses the video signal of any of the plurality of cameras 11 from the coordinate information obtained by the coordinate information addition unit 121 and the feature information obtained by the feature point extraction unit 122. Tag information indicating whether the information is obtained is attached and sent to the information transmission unit 124. The information transmission unit 124 encodes information obtained from the multi-camera cooperation unit 123 according to a predetermined standard, and transmits the encoded information to the network 15.
The information transmission system reception unit 13 includes an information reception unit 131, a dynamic generation unit 132, and an image composition unit 133. The information receiving unit 131 decodes the information received from the network 15 and sends it to the dynamic generation unit 132. The dynamic generation unit 132 generates an avatar image representing a photographed person based on the feature information included in the received information. The avatar image generated by the dynamic generation unit 132 is sent to the image composition unit 133 together with the coordinate information. Based on the avatar image and the coordinate information, the image composition unit 133 generates a composite image of the background image of the shooting range of each camera 11 and the avatar image and displays the composite image on the monitor 14. At this time, the tag information indicating which camera 11 is the information obtained from the video signal is used to specify the background image.
Next, processing of the coordinate information adding unit 121 will be described. The coordinate information adding unit 121 specifies the coordinates of the position of the person in the coordinate system set for the shooting range of each camera 11. For example, as shown in FIG. 5, an xy coordinate system 51 is set in the shooting range of one camera 11. The coordinate information adding unit 121 detects the coordinates of the person area specified by the feature point extracting unit 122 in the xy coordinate system 51. The coordinates detected here are sent to the information transmission system receiving unit 13 together with the feature information as coordinate information representing the position of the person.
When it is desired to specify a three-dimensional space in a three-dimensional space from a camera image, since the image is two-dimensional data, it is not possible in principle to specify a general three-dimensional spatial state with one camera image. However, the subject targeted by the present invention is a person who moves around in the area photographed by the camera 11, and considering the spatial geometric movement characteristics, the person on the screen of the single camera 11 shown in FIG. The position and height of the person in the real space can be specified from the information of the image area PA. In other words, the spatial existence range of the person in the area to be photographed is the floor surface or the ground, in the case of FIG. 5, the road surface RS on which the person walks, and the position in the height direction (z-axis direction) is constant. Note that it is almost confined to the horizontal plane. This road surface RS is an xy plane whose z-axis coordinate is always 0 in an orthogonal coordinate system, and the coordinates of the contact point of a person walking on the road surface RS can be substantially described in two dimensions of xy. Although it is a point in the three-dimensional space, it can be uniquely associated with the camera two-dimensional coordinate system set on the photographing screen.
On the other hand, the camera two-dimensional coordinate system corresponds to a projective transformation of the real space three-dimensional coordinate system, and a subject that is separated in the camera optical axis direction is projected with a reduced size. This is mathematically described by a projective transformation matrix, but if a reference body is placed at various positions known in advance in the real space coordinate system and shot with a camera, the reference body image on the shooting screen is displayed. By comparing the position and height with the position and actual size of the reference object in the real space, the position / height conversion related information that converts the image position / height of the person on the camera into the position / height in the real space Can be obtained. Specific examples thereof will be described with reference to explanatory diagrams of FIGS. 13 to 15 and a flowchart of FIG.
That is, in the imaging field of view SA of the camera, the reference body SC having a known height is arranged on the road surface RS at various positions on the front, rear, left, and right. In S501 of FIG. 16, the height H of the reference body is input. Then, on the photographing screen SA, although it is derived from the same reference body, it appears as a reference body image SCI having a different size depending on the distance from the camera 11, and this is extracted ( S502). Since these reference body images SCI are all on the same road surface RS (that is, the xy plane (z = 0)), the points (reference points) p1 to p3 representing the lower end thereof are all z = 0 in the real space. It is a grounding point. Therefore, the reference points p1 to p3 are read by the ξ-η coordinate system which is a camera two-dimensional coordinate system set on the screen, and stored as screen coordinate data p (ξ, η) of the reference point (S503). In addition, as for which area on the screen represents the road surface RS, images such as the road side edge RE and the white line WL on the road surface can be referred to.
Next, since the image on the shooting screen is affected by distortion of the camera lens, it is not a strict projective transformation image in real space, and the image may be distorted depending on the position in the field of view. is there. As shown on the left of FIG. 15, the distortion is larger in the region closer to the edge of the screen, and the coordinate system becomes nonlinear. For example, a lens having a large viewing angle such as a wide-angle lens has an outward convex distortion, whereas a lens having a small viewing angle such as a telephoto lens has a concave distortion. Therefore, this distortion is eliminated and conversion correction is performed so as to be a point in the orthogonal plane coordinate system (S504). The correction coefficient at this time is determined by an optimization operation that linearizes the shape of a figure that is known to be straight in real space, such as the white line WL appearing on the screen in FIG. Can do. Note that the size of the edge of the screen expands as the edge of the screen is eliminated by this correction, and thus the corrected screen shape SA ′ protrudes outside the original screen SA.
Next, as shown in FIG. 14, the coordinates of the reference body SC in the real space system are determined. For example, in the case of surveying, if the distance d from the camera 11 to the reference body SC installed on the road surface and the angle θ formed by the line from which the reference body is viewed from the camera and the reference line (for example, the x-axis direction) are measured,
x = d · cos θ
y = d · sin θ
The real space coordinates P (x, y, 0) of the ground contact point of the reference body SC can be obtained. On the other hand, the coordinates may be directly specified by a satellite positioning system (GPS). Note that the real space coordinate system used here may be an independent coordinate system set within the shooting range of each camera, or linked to a global coordinate system provided by a satellite positioning system (GPS). May be. However, as will be described later, when generating a single space by integrating the shooting ranges of multiple cameras, it is necessary to connect the coordinate systems of the respective cameras, and the multiple cameras will be integrated into the area where linked shooting is performed. It is desirable to set real space coordinates. Further, when setting the xy coordinate system 51, it is desirable to perform calibration using, for example, an LED light or the like.
Next, the height h on the screen of the reference body image SCI is read (S506). Since the actual height H of the reference body SC is known, a coefficient for converting the height h of the reference body image SCI into the actual height H
α = H / h
(S507), and the screen coordinate data p (ξ, η) and the real space coordinate data P (x, y, 0) are stored in association with each other (S508). After the above process is repeated for all the reference bodies SC (S509 → S501), a process for supplementing the set of p, P, α at the main points not actually measured on the road surface RS is performed. This process may be performed as a step of acquiring interpolation data, or may be performed as a process of determining the elements of the projective transformation matrix from the obtained combination of p and P. These pieces of information constitute position / height conversion related information.
In addition, as shown in FIG. 13, in order to acquire coordinate information, a person SP may be used instead of the reference body SC. In this case, a calibration method or the like may be used in which the height of the person who is the subject is input and the person walks around the four corners of the camera to learn and acquire the camera angle of view and position information.
Next, the contents of the feature point extraction process by the feature point extraction unit 122 will be described in detail with reference to FIGS. FIG. 2 is a flowchart showing a processing procedure of the feature point extraction unit 122. FIG. 3 is a schematic diagram showing how the feature point extraction unit 122 extracts features by dividing the human body into parts.
When a predetermined number of frames of video signals are input from the corresponding camera 11, the feature point extraction unit 122 calculates the moving object MO reflected in the video signal by taking the difference between the frames FM as shown in FIG. 11. It detects (step S11 of FIG. 2). Specifically, the image area MO ′ of the preceding frame and the image area MO of the succeeding frame have different positions and shapes between frames if the image area is a moving object, but the background does not change. The image area MO of the moving object can be extracted by taking the image difference between them. On the other hand, if an image is taken in the absence of a moving object, a background image BP is obtained as shown in FIG. The background image BP is taken for each camera, transmitted to the receiving unit 13 in FIG. 1 and accessible by a computer constituting the receiving unit 13 (in the present embodiment, configured by an external storage device or another computer). Stored in the information storage / statistical processing unit).
Next, the feature point extraction unit 122 extracts a person region by performing segmentation, edge detection, pattern matching, and the like on the moving object image detected in step S11, and determines whether or not the moving object is a person. Judgment is made (step S12). Various methods can be used for the moving object detection process and the person extraction process from the video signal, and the method is not limited to a specific method. Also, among the moving objects detected from the video signal, those having a relatively small size are likely to be noise, so they are determined not to be humans, and those having a relatively large size are determined to be humans. You may do it.
Along with the extraction of the person area, processing for specifying the position coordinates and height of the person is performed. Hereinafter, description will be made with reference to the explanatory diagrams of FIGS. First, as shown in FIG. 17, the detected position of the lower end edge of the person area PA is regarded as the ground point p, and the coordinates p (ξ, η) on the screen are read (S1201), and the above-described position / height conversion is performed. Since the dimension in the height direction of the person area changes depending on the posture of the object with reference to the relationship information, all the images of the person area that seems to be closest to the upright state are searched and specified in the frame (S1203). For example, for a person who is walking, rather than a posture with both feet open, in the process of next stepping on the opposite foot that is located behind the foot that was stepped first, it almost overlaps the foot that stepped first Since the time is closest to the upright posture, the height h of the area is measured on the screen using the person area on the image frame as such (S1204). Then, as shown in FIG. 18, using the conversion coefficient α included in the position / height conversion relation information, it is converted into the actual height H (S1205).
Moreover, it is preferable that the feature point extraction unit 122 further performs a process (step S15) of analyzing the operation of each part. For example, for the head p1, head movement (movement and orientation) is detected. Of the parts of the human body, it is said that the head is the easiest to recognize. If the orientation of the head is known by first extracting the head p1, it is easy to specify the state of other parts, the moving direction, and the like. In addition, for example, when the head is pointing to the right, it is possible to work on the assumption that the left hand and the left foot may be hidden and invisible in the parting described later.
For example, if a person is walking, the movement is analyzed and acquired as gait information. In this case, as the operation of the trunk p2, for example, as shown in FIG. 20, the posture such as the upper body angle ψ, whether or not it is a stoop, etc. are detected. The movements of the right hand p3 and the left hand p4 are detected as, for example, each hand swing λ. As movements of the right foot p5 and the left foot p6, for example, a walking speed, a stride WL, a knee bending angle, and the like are detected. The motion analysis information such as a gait detected here is sent to the information transmission system receiving unit 13 as motion analysis information, and is reflected in the movement and orientation of the avatar representing the person.
What is important as motion analysis information is the moving direction of the person. As shown in FIG. 21, when the coordinate information P1, P2,... Pn of the person is specified for each frame of the captured moving image, the set of the coordinate information P1, P2,. The movement trajectory information between them will be configured. A difference Vn−Vn−1 between the position vectors Vn and Vn−1 of the coordinates Pn and Pn−1 between the adjacent frames can be used as an index representing the moving direction of the person at the position Pn, and the direction of the avatar image described later It is also used effectively in decisions.
Next, the feature point extraction unit 122 uses the human region P extracted in step S12 (see FIG. 3A) as six parts: a head p1, a torso p2, a right hand p3, a left hand p4, a right foot p5, and a left foot p6. (Refer to (b) of FIG. 3) to form parts (step S13). Then, an external feature analysis is performed for each of the six parts that have been made into parts (step S14). For example, for the head p1, the hairstyle, hair color, presence / absence of a hat, and the like are extracted as feature points. For the body p2, the body shape, the shape of clothing, the color of clothing, the presence or absence of specific belongings such as a rucksack, and the like are extracted as feature points. The characteristic points regarding the right hand p3 and the left hand p4 are, for example, the body shape, the shape (or type) of clothing, the color of clothing, and the belongings. The characteristic points regarding the right foot p5 and the left foot p6 are, for example, body shape, clothing shape (or type), clothing color, shoes, and the like.
In addition, the number of parts at the time of making into parts is not limited to six. For the purpose of reducing the processing load, for example, it is possible to divide into three parts, that is, the head, upper body, and lower body. Conversely, in order to generate a more realistic avatar, it may be possible to divide into more than six parts. The feature points listed here are merely examples, and various elements may be extracted as feature points. In addition, for example, in the above description, the hairstyle and hair color, and the clothing shape and clothing color are extracted as independent feature points, but the “hair color” and “clothing color” It may be treated as additional data of “hairstyle” and “clothing shape”. The extracted feature points for each part are output as feature data and sent to the information transmission system receiver 13.
The variation of the extracted feature point (feature data) corresponds to the variation of the component (partial image) of each part in the avatar of the person generated by the information transmission system receiving unit 13 as described later. For example, when “long hair” is extracted as a feature point of the hairstyle, a partial image of “long hair” is used as the head hair of the avatar. For example, in the case of a person whose body width is thick, a thick body is used as a partial image of the body of an avatar.
When a plurality of person areas are extracted in the shooting range of one camera 11 in steps S11 and S12, the processes in steps S13 to S15 are performed for each person area. The obtained feature information and operation information are sent to the multi-camera cooperation unit 123 together with tag information for specifying individual person areas.
Further, the feature point extraction unit 122 may further extract information (person attribute information) that specifies the person to some extent, such as the age and sex of the person. In this case, as shown in FIG. 4, the feature point extraction unit 122 determines the age and sex of the person based on the feature amount extracted from the image of the part that has been partized (step S <b> 23). For example, if the head p1 can be captured, it is possible to discriminate age and sex using face recognition technology.
The age may be output as age data in increments of 1 year, or may be output as data representing the age zone, for example, “20 years old”. In addition, here, gender and age are exemplified as the person attribute information, but any information other than this can be used as information for specifying a person to some extent. For example, it may be possible to discriminate between “adult” and “child”.
In the above case, a person database in which images such as faces and personal information (names, etc.) are registered in advance is used instead of uniquely identifying a person but by gender and age. If it can be used, it is possible to uniquely identify an individual by collating the image of the head p1 with the face image registered in the person database as necessary (step S24). This allows, for example,
(1) One-to-one verification for the purpose of arresting criminals
(2) When searching for lost children, check by clothing age
(3) For marketing purposes, etc., collation for rough age estimation
It becomes possible to identify such individuals.
Next, returning to FIG. 1, in the information transmission system receiving unit 13, the information receiving unit 131 receives information from the network 15 and decodes it. As described above, the decoded information includes information (feature information and coordinate information) obtained from the video signals of a plurality of cameras 11 (cameras 11a, 11b,...). Stored and accumulated in the statistical processing unit 135. FIG. 22 shows an example of accumulated information. A detection ID is assigned to a person who is determined to be the same from the degree of coincidence of the feature information described above, the time and date of reception, and the position (x coordinate) And y coordinate), how to walk (gait), physique, height, hair color, upper body clothing color, lower body clothing color, facial feature information, gender, age, etc., sequentially stored in association with each other It has been done. In addition, the date and ID part are abbreviated to # 1, # 2, etc., but information such as the type (form) of the upper and lower body clothes, the presence or absence of a hat, and belongings are also associated. Further, the gait data includes a stride WL, an arm swing angle λ, an upper body angle ψ, a knee bending angle δ, a one-step cycle τ, and the like.
Hereinafter, processing on the receiving unit side will be described.
The dynamic generation unit 132 generates an avatar image of each person based on the received feature information. That is, as described above, the feature information includes feature data representing the feature of each part of the person. The dynamic generation unit 132 is a database (in the information accumulation / statistical processing unit 135 in FIG. 1) that stores in advance a partial image of an avatar corresponding to each of the feature data. It may be a storage device or a server). For example, when “long hair” is included as feature data of a person's head p1, a partial image of long hair is acquired from the database. The dynamic generation unit 132 generates the avatar image of the person by combining the partial images of the avatar based on the feature data of each part of the person.
FIG. 23 conceptually shows a construction example of the database. The database includes avatar fragment graphic data, which is an avatar constituent element, including upper and lower body clothes, hairstyles, belongings, and the like when the height and body shape are set as standard. Each avatar fragment graphic data for each avatar component is configured in different representations according to the moving direction of the person in the real space so that the appearance of the person from the viewpoint from the camera (direction with respect to the camera) is reflected. Yes. In this embodiment, as shown in FIG. 24, the direction of the person P with respect to the camera 11 is determined in eight directions (J1 to J8), and the avatar fragments divided in a form corresponding to the human body parts p1 to p6 described in FIG. Graphic data (p2 to p4 for upper body clothes, p5 and p6 for lower body clothes) is prepared for each of the eight directions (v1 to v8: arguments correspond to J1 to J8). . Also, shoes, hair, and belongings are not divided, but they are also prepared in eight ways.
The left of FIG. 25 shows an example of selection of avatar fragment graphic data when the direction v7 in FIG. 24 is designated, and the right shows an example of selection of avatar fragment graphic data when the direction v1 is designated. T-shirts are selected for the upper body and jeans are selected for the lower body, and those corresponding to the directions v7 and v1 are selected from the data shown in FIG. FIG. 26 shows avatar images AV7 and AV1 obtained by combining them. As for the face, contours and human phases that reflect the extracted facial feature information are synthesized for each direction, but the standard face (or head) of the face depends on gender and age. An image may be prepared.
In addition, as shown in FIG. 27, the avatar image data (or avatar fragment graphic data) is configured as avatar animation data including a set of frame data obtained by subdividing the object walking motion. In the example of FIG. 27, one frame for two steps of a plurality of frames (four frames AFM1 to AFM4 in this case) until the landing with the right foot and four frames (AFM5 to 8) until the landing with the left foot is performed. Expressed in an 8-frame animation. As shown in FIG. 23, for at least the lower body clothes and the upper body clothes, data of these eight frames is prepared for each type of avatar fragment graphic data.
The image data of each avatar fragment is configured as two-dimensional vector graphic data as shown in FIG. The vector graphic data is obtained by circularly concatenating vertex coordinates that specify a graphic outline with a vector. When the primary transformation process is performed, the vertex coordinates are also moved according to a matrix operation representing the primary transformation, and the point after the movement is moved. By performing vector connection again, deformation processing such as enlargement, reduction, and rotation of a figure can be easily executed. Then, inside the figure surrounded by the vector, a colored area is specified by internal / external judgment based on the vector line, and the characteristic information in FIG. 22 is defined by rasterizing pixels in the area with a specified color. Color information to be reflected can be easily reflected in the final avatar image VDR.
FIG. 29 is a flowchart showing the flow of processing on the receiving side.
First, the person ID, operation information (coordinate points), and feature information sent via the network are received (S601). Next, the received coordinate information P is plotted on real space coordinates shared by a plurality of cameras (S602). Then, the person's walking direction vector is calculated from the position change of the person's coordinate P in the preceding and following frames, and the avatar is selected from the eight directions J1 to J8 in FIG. It determines as an image arrangement direction (S603).
For a person who has entered the camera field of view for the first time from another camera's field of view, an avatar image may have already been created and an avatar image may have been created. (S604). If there is no avatar creation history in S605, a search is made as to whether or not there is a person on the database whose time, position, and feature match under a predetermined condition (S606). If there is no corresponding person in S607, a process for creating a new avatar image is performed (S610).
FIG. 30 shows the details of the new avatar creation process. In S601, the hairstyle, clothes, belongings and their colors included in the feature data are specified. Next, in S6102, among the avatar fragment graphics corresponding to the identified feature, the one corresponding to the determined avatar image arrangement direction (any of J1 to J8) (any of v1 to v8 in FIG. 23). read out. In step S6103, the avatar fragment graphic is corrected based on the height / body shape / gait information included in the feature data, and the avatar fragment graphic is colored in the color designated in step S6104. Finally, the avatar image data is completed by synthesizing each avatar fragment in S6105.
Returning to FIG. 29, if there is an avatar creation history in S605, the process proceeds to S609, where the avatar image data of the corresponding ID is read from the database and reused. On the other hand, if there is a corresponding person in S607, the received person ID is updated with the ID of the corresponding person, and the process proceeds to S609 and the same processing is performed.
The avatar image of each person generated by the dynamic generation unit 132 is sent to the image composition unit 133 together with the coordinate information of each person, and the avatar / background composition process is performed (S611). The image composition unit 133 can access a database (in the information accumulation / statistical processing unit 135) in which background images of the photographing ranges of the respective cameras 11 are stored in advance. The image composition unit 133 acquires the background image of each camera 11 from the database, and composes it with the avatar image generated by the dynamic generation unit 132. The composite image is output to the monitor 14. At this time, the position where the avatar image is arranged is based on the coordinate information of the person of the avatar. Further, the image composition unit 133 changes the avatar direction or adjusts the speed at which the avatar moves based on the motion analysis information (data representing the movement and direction of the person) obtained by the feature point extraction unit 122. can do. Note that transmission from the information transmission system transmission unit 12 is performed at a frame rate as high as possible within a range that can be processed by the feature point extraction unit 122, the coordinate information addition unit 121, the dynamic generation unit 132, and the image synthesis unit 133. The video signal of the camera 11 can be displayed on the monitor 14 in almost real time.
FIG. 31 shows the flow of the avatar background / compositing process. First, in S61101, avatar image data corresponding to the specified ID and direction is read. This avatar image data is a set of frame data constituting an animation (FIG. 27), and avatar animation frames are allocated to frames for moving image reproduction in accordance with the speed and stride of the moving coordinate point P (S61102).
Although this is a composite video display mode, in this embodiment, a display mode (camera actual video mode) in the same field of view as the shooting screen of the camera 11 or an integrated display mode of a plurality of cameras can be selected. Yes. This mode selection can be switched by an input unit (configured by a keyboard or a touch panel) connected to the information transmission system reception unit 13 in FIG. When the camera real video mode is selected, the process proceeds to S61104, and the position coordinates P (x, y, 0) of all avatars to be displayed simultaneously are plotted in the real space visual field region of the corresponding camera. In step S61105, the real space visual field area is projected and converted to the corresponding coordinate system of the camera together with the plotted position coordinates P.
Here, as described above, the camera two-dimensional coordinate system used for determining the position coordinate P is temporarily corrected from the left state in FIG. 15 to the right state in consideration of lens distortion. In this case, the entire field of view fits on the output screen in the coordinate system before correction, but after correction, the region at the end of the field of view extends beyond the screen of the monitor (FIG. 1: reference numeral 14). Compared to viewing the camera image directly, the image changes by the amount of distortion correction, creating a sense of incongruity and displaying the avatar image of the person reflected at the edge of the field of view. It may disappear. Therefore, in S61106 of FIG. 31, reverse distortion correction for restoring the influence of the original lens distortion is performed on the projective transformation image, and the shape of the field of view is restored. As a result, the above problem is solved.
In step S61107, the selected background image is superimposed on the output plane returned to the camera two-dimensional coordinate system through projective transformation and reverse distortion correction together with the mapped human coordinate position P (S61107) The avatar image data adjusted in size and orientation as described above is pasted and synthesized at each position p (according to the camera two-dimensional coordinate system).
The screen of the monitor 14 in FIG. 1 may be divided to display the video signals of the plurality of cameras 11 at the same time, or any one of the plurality of cameras 11 by switching the screen of the monitor 14. Only the video signal may be displayed.
On the other hand, if the integrated display mode is selected in S61104, the process proceeds to S1000 to display in the integrated mode. FIG. 32 is a flowchart showing the details, and in S1001, the positions P (x, y, 0) and directions of all avatars to be displayed simultaneously are plotted in a real space shared by a plurality of cameras. In step S1002, flow line trajectory data is created by superimposing the person position coordinates P in the previous and next frames. In this embodiment, it is possible to select from either a display form in plan view as shown in FIG. 33 or a display form in overhead view as shown in FIG.
If the plan view is selected, the process proceeds to S1004, and a plan view background image PBP prepared in advance is pasted as shown in FIG. Then, the avatar image AV is plotted and displayed on the planar view background image. In this case, an avatar image for plane view may be prepared separately, or the avatar may be displayed in the horizontal direction so that the feature information can be easily grasped. When the flow line display is designated, the flow line image ML of the corresponding avatar image AV is displayed based on the flow line locus data described above.
On the other hand, if bird's-eye view is selected in S1003, the process proceeds to S1006, where the real space position, direction, and flow line data of the avatar are projectively transformed according to the bird's-eye view angle and direction, and the background image in the case of bird's-eye view is superimposed in S100. . As for the background image, a captured image for overhead view may be prepared and used, or three-dimensional background image data (for example, three-dimensional computer graphics (CG) data) may be prepared and converted to an overhead view by projective transformation. Good. In S1008, the avatar image corresponding to the direction of the avatar after the projective transformation is read, and the pasted avatar image AVS is pasted on the overhead view background image PBPS as shown in FIG. Also in this case, when the flow line display is designated, the flow line image MLS of the corresponding avatar image AVS is displayed based on the above-mentioned flow line locus data.
The avatar image data may be 3D avatar image data, and the avatar image may be displayed as a 3D CG image as shown in FIG. In this case, since the avatar image is prepared as a three-dimensional avatar object from the beginning, the arrangement and rotation in the designated direction can be freely set in three dimensions. In this case, the image composition unit 133 (FIG. 1) generates two-dimensional avatar image data by projectively transforming the three-dimensional avatar object in the real space whose arrangement direction is determined into the two-dimensional coordinate system of the background image. The avatar image based on the two-dimensional avatar image data is combined with the background image.
As described above, by extracting the feature points and position coordinates of the person from the video signal of the camera 11 and transmitting only the extracted data, the video signal is directly compressed and transmitted as compared with the conventional case. Thus, the bandwidth of the network 15 can be used effectively. In addition, since the image of the person is not displayed as it is on the monitor 14 but is displayed in an anthropomorphic (avatarized) state, privacy can be used when shooting an unspecified number of persons such as a security camera on the street. Has the advantage of not infringing. For example, the person in the captured image shown in FIG. 5 is displayed on the monitor 14 as an avatar as shown in FIG. In addition, each avatar is designed to represent the characteristics of each person based on the feature information extracted from the video, so it is possible to grasp what person is in the shooting range.
In the above-described embodiment, the feature information and coordinate information of the person is acquired from the video signal of the camera 11, and further, the motion analysis information indicating the motion and direction of the person and the person attribute information such as age and sex are obtained. The example to acquire was demonstrated. Various applications can be considered using such information. For example, as shown in FIG. 7, the above information may be processed and processed by the image composition unit 133 and a plurality of screens may be displayed on the monitor 14. In the example of FIG. 7, an actual video space screen 81, a feature amount reproduction screen 82, a statistical space screen 83, a flow line analysis space screen 84, and a personal identification space screen 85 are displayed side by side on the monitor 14. Alternatively, a method of streaming an actual video as necessary while watching an avatar may be used. In addition, when a face is recognized on the transmission side, a capture of the face is recorded on the transmission side, and a face image associated with an avatar can be received by a request from the reception side upon request from the reception side. Also good.
The real video space screen 81 is a screen that displays video signals from the plurality of cameras 11 in a state where a person is replaced with an avatar. In the example of FIG. 7, the actual video space screen 81 is divided into four, and the video signals from the four cameras 11 are displayed simultaneously, but the number of cameras is not limited to this.
The feature amount reproduction screen 82 is a screen for displaying videos from a plurality of cameras 11, in which a person is replaced with an avatar and a background image is also displayed in a graphic display. In the example of FIG. 7, the feature amount reproduction screen 82 is generated by three-dimensionally integrating the images from the plurality of cameras 11. That is, the feature amount reproduction screen 82 is configured as a bird's-eye view image by combining videos taken by a plurality of cameras installed at a plurality of locations. For example, the feature amount reproduction screen 82 illustrated in FIG. 7 is a screen representing the state of the station premises (the vicinity of the platform and the ticket gate) and the surrounding stores. In this example, for example, video signals respectively obtained from an installation camera at a station platform, an installation camera around a ticket gate, and a camera installed at each of a plurality of stores are used. Although it is impossible to shoot all of these areas with a single camera, such a bird's-eye view image can be obtained by three-dimensionally combining images taken with multiple cameras installed at multiple locations. A simple screen can be configured.
Further, the motion analysis information extracted from the video signal of the camera includes information about the direction of the person and the direction in which the person is moving. By using this information and arranging the avatars so as to match the direction of the actual person, there is an advantage that the movement of the crowd can be easily grasped on the feature amount reproduction screen 82. By configuring such a feature amount reproduction screen 82, it is possible to view the images of cameras installed at a plurality of locations in an integrated manner, and to monitor a wider range of situations in real time. Further, as with the real image space screen 81, since the person is replaced with an avatar, there is an advantage that privacy is not infringed. In addition, each avatar is designed to represent the characteristics of each person based on the feature information extracted from the video, so it is possible to know what person is in the shooting range. is there.
The statistical space screen 83 is a screen that displays various statistical results. For example, the transition of the number of people within the shooting range of a certain camera can be represented by a graph. Or based on person attribute information, you may represent the person who exists in the imaging | photography range with a graph according to sex and age. In addition, the analysis space screen 84 pays attention to a certain person (avatar), and displays how the person has moved in the shooting range of the camera by a flow line. This is possible by acquiring the coordinate information of a certain person (avatar) in time series. Further, the personal identification space screen 85 displays the person attribute information of the person in the shooting range. In the example of FIG. 7, the face part of each person's avatar image, gender, and age are displayed.
In addition, the real image space screen 81, the feature amount reproduction screen 82, the statistical space screen 83, and the flow line analysis space 84 preferably have a GUI (graphical user interface) function. For example, when one of the displayed avatars on the feature amount reproduction screen 82 is selected using a pointing device such as a mouse (in FIG. 7, an avatar 82a), the person attribute of the person represented by the avatar is displayed. Information is highlighted on the personal identification space screen 85. In this example, “male, 35 years old” which is the personal attribute information of the avatar 82 a is highlighted on the personal identification space screen 85. On the other hand, when any person attribute information is selected on the personal identification space screen 85, an avatar image corresponding to the selected person attribute information is highlighted on the feature amount reproduction screen 82. Anyway. Further, when any avatar is selected on the feature amount reproduction screen 82, the moving path of the avatar may be displayed on the flow line analysis space screen 84.
FIG. 8 is a schematic diagram illustrating an expression example of an avatar when there is no continuity of the transmitting camera. In this embodiment, when there is no continuity of the camera on the transmission side, it may be assumed that the avatar cannot be confirmed on the reception side. As a countermeasure, for example, as shown in the diagram of the integrated layer shown in FIG. 8 (A), the received avatar is colored when the feature amount captured by the camera A can be confirmed by the camera B at the destination, As shown in FIG. 8 (B), when the avatar cannot be confirmed on the receiving side, the received avatar is not colored and the avatar (default avatar) is left as it is and the camera is used in each case while distinguishing between the two. A method of calculating the movement in the meantime based on the moving speed and projecting a three-dimensional image onto a feature reproduction space or the like can be considered.
In this way, an integrated security camera system or the like is realized by variously using feature information, coordinate information, person attribute information, motion analysis information, etc. extracted from the video signal of the camera on the information transmission system receiving unit 13 side. It becomes possible to do.
(Embodiment 2)
An information transmission system 2 according to the second exemplary embodiment of the present invention will be described. In addition, about the structure which has the function similar to the structure demonstrated in (Embodiment 1), the same referential mark is attached and the overlapping description is not performed.
FIG. 9 is a block diagram illustrating a schematic configuration of the information transmission system 2. As shown in FIG. 9, the information transmission system 2 is different from the information transmission system 1 (first embodiment) including a plurality of cameras 11a, 11b,. . For this reason, the information transmission system 2 includes only one set of the coordinate information addition unit 121 and the feature point extraction unit 122 and does not include the multiple camera cooperation unit 123. The operations of the coordinate information adding unit 121, the feature point extracting unit 122, and other processing units are the same as those in the first embodiment. Thus, a system that extracts and transmits information only from the video signal of one camera 11 is also included in one embodiment of the present invention.
In the first embodiment, each of the information transmission system transmission unit 12 and the information transmission system reception unit 13 can be realized as an independent device (camera controller), a computer, or a server. Each unit such as the coordinate information adding unit 121 shown in the block diagram can be realized by the processor executing the program recorded in the memory in these devices. Further, the information transmission system transmission unit 22 according to the second embodiment can be realized as an apparatus integrated with the camera 11.
As described above, the present invention can also be implemented as a program executed by a general-purpose computer or server, or a medium recording the program, in addition to the embodiment in which the present invention is implemented as hardware.
 1、2 情報伝送システム
 11 カメラ
 12 情報伝送システム送信部
 13 情報伝送システム受信部
 14 モニタ
 15 ネットワーク
 121 座標情報付加部
 122 特徴点抽出部
 123 複数カメラ連携部
 124 情報送信部
 131 情報受信部
 132 動態生成部
 133 画像合成部
DESCRIPTION OF SYMBOLS 1, 2 Information transmission system 11 Camera 12 Information transmission system transmission part 13 Information transmission system reception part 14 Monitor 15 Network 121 Coordinate information addition part 122 Feature point extraction part 123 Multiple camera cooperation part 124 Information transmission part 131 Information reception part 132 Dynamic generation 133 Image composition unit

Claims (26)

  1. 少なくとも1台のカメラで写した映像内の被写体から特徴点を抽出して特徴情報として出力する特徴点抽出部と、
     前記カメラの撮影範囲における被写体の座標情報を取得する座標情報付加部と、
     前記特徴情報および前記座標情報をネットワークへ送出する情報送信部と、
     前記ネットワークから前記特徴情報および前記座標情報を受け取る情報受信部と、
     前記特徴情報に基づいて被写体のアバター画像を生成する動態生成部と、
     前記カメラの撮影範囲の背景を表す背景画像に、前記座標情報に基づいて前記アバター画像を合成することにより、合成画像を生成する画像合成部とを備えたことを特徴とする情報伝送システム。
    A feature point extraction unit that extracts feature points from a subject in an image captured by at least one camera and outputs the feature points as feature information;
    A coordinate information adding unit for acquiring coordinate information of a subject in the shooting range of the camera;
    An information transmission unit for transmitting the feature information and the coordinate information to a network;
    An information receiving unit for receiving the feature information and the coordinate information from the network;
    A dynamic generation unit that generates an avatar image of the subject based on the feature information;
    An information transmission system comprising: an image composition unit that generates a composite image by compositing the avatar image on the basis of the coordinate information with a background image representing the background of the shooting range of the camera.
  2.  前記座標情報付加部は、撮影画面上に現れる前記被写体をなす人物の接地点として識別される位置を撮影接地位置とし、当該撮影設置位置に現れる人物画像領域の撮影画面上の高さを人物撮影高さとして、前記座標情報付加部は、被写体が人物である場合の実空間における歩行面を高さ方向の基準として、前記カメラの撮影画面上に設定されるカメラ二次元座標系における平面座標点と、実空間三次元座標系における前記歩行面上の空間座標点との変換関係と、前記カメラ二次元座標系における前記撮影接地位置毎の前記人物の撮影高さと前記実空間座標系での当該人物の実高さとの変換関係とを含む位置・高さ変換関係情報を取得する位置・高さ変換関係情報取得手段と、前記撮影画面上にて前記人物画像の前記撮影接地位置及び撮影高さを特定する撮影接地位置・高さ特定手段と、特定されたそれら撮影接地位置座標及び撮影高さの情報を、前記位置・高さ変換関係情報に基づいて前記実空間における前記人物の接地位置座標である実接地位置座標情報と前記実空間における人物の高さを情報である実人物高さ情報とに変換・生成する実人物座標・高さ情報生成手段とを備え、
     前記動態生成部は、生成された前記実人物高さ情報に基づいて前記アバター画像の高さ寸法を決定するアバター高さ決定手段を備え、
     前記画像合成部は、前記実接地位置座標情報に基づいて前記背景画像への前記アバター画像の合成位置を決定するアバター合成位置決定手段を備える請求項1記載の情報伝送システム。
    The coordinate information adding unit sets a position identified as a grounding point of the person who forms the subject appearing on the photographing screen as a photographing grounding position, and sets the height of the person image area appearing at the photographing installation position on the photographing screen. As the height, the coordinate information adding unit is a plane coordinate point in the camera two-dimensional coordinate system set on the shooting screen of the camera with the walking plane in the real space when the subject is a person as a reference in the height direction. And the transformation relationship between the space coordinate point on the walking surface in the real space three-dimensional coordinate system, the photographing height of the person for each photographing grounding position in the camera two-dimensional coordinate system, and the real space coordinate system Position / height conversion relationship information acquisition means for acquiring position / height conversion relationship information including a conversion relationship with the actual height of the person, and the shooting ground position and shooting height of the person image on the shooting screen Based on the position / height conversion relation information, the information on the ground contact position / height specifying means to be identified, and the information on the identified ground contact position coordinates and the image height are specified as the ground contact position coordinates of the person in the real space. Real person coordinate / height information generating means for converting and generating certain real contact position coordinate information and the height of the person in the real space into real person height information as information,
    The dynamic generation unit includes avatar height determining means for determining a height dimension of the avatar image based on the generated real person height information,
    The information transmission system according to claim 1, wherein the image composition unit includes avatar composition position determination means for determining a composition position of the avatar image to the background image based on the actual ground position coordinate information.
  3.  前記特徴点抽出部が、さらに、前記被写体の動きまたは向きを解析して動作解析情報として出力し、
     前記情報送信部が、さらに、前記人物属性情報をネットワークへ送出し、
     前記画像合成部が、前記動作解析情報に基づいて前記アバター画像の動きまたは向きを調整する請求項1又は請求項2に記載の情報伝送システム。
    The feature point extraction unit further analyzes the movement or orientation of the subject and outputs it as motion analysis information,
    The information transmitting unit further sends the person attribute information to a network;
    The information transmission system according to claim 1, wherein the image composition unit adjusts the movement or orientation of the avatar image based on the motion analysis information.
  4.  前記カメラは動画撮影可能なものであり、
     前記座標情報付加部は撮影された動画のフレーム別に前記被写体をなす人物の座標情報を取得するものであり、
     前記特徴点抽出部は、前記人物の前記座標情報の前記フレーム間の移動軌跡情報を前記動作解析情報として出力するものである請求項3記載の情報伝送システム。
    The camera is capable of video recording,
    The coordinate information adding unit acquires coordinate information of a person who forms the subject for each frame of a captured video,
    The information transmission system according to claim 3, wherein the feature point extraction unit outputs movement trajectory information between the frames of the coordinate information of the person as the motion analysis information.
  5. 前記画像合成部は前記移動軌跡情報に基づいて、前記背景画像上に合成する前記アバター画像の向きを調整するものである請求項4記載の情報伝送システム。 The information transmission system according to claim 4, wherein the image composition unit adjusts the direction of the avatar image to be synthesized on the background image based on the movement trajectory information.
  6.  前記動態生成部は、前記カメラからの視点による当該人物の見え方が反映されるように、前記実空間における前記人物の移動方向に応じて異なる表現形態のアバター画像を生成するものである請求項5記載の情報伝送システム。 The dynamic generation unit generates an avatar image having a different expression form according to a moving direction of the person in the real space so that the appearance of the person from the viewpoint from the camera is reflected. 5. The information transmission system according to 5.
  7.  前記動態生成部は、前記実空間における前記人物の、予め定められた複数の移動方向別に前記表現形態が互いに異なる複数の二次元アバター画像データを記憶する方向別二次元アバター画像データ記憶手段を備え、前記先行する前記フレームについて取得されている前記移動軌跡情報に基づいて前記人物の移動方向を推定するとともに、方向別の前記二次元アバター画像データから、推定された前記移動方向に適合するものを選択するものであり、
     前記画像合成部は、選択された二次元アバター画像データに基づくアバター画像を前記背景画像と合成する請求項6記載の情報伝送システム。
    The dynamic generation unit includes a direction-specific two-dimensional avatar image data storage unit that stores a plurality of two-dimensional avatar image data having different representation forms for a plurality of predetermined movement directions of the person in the real space. Estimating the moving direction of the person based on the moving trajectory information acquired for the preceding frame, and adapting to the estimated moving direction from the two-dimensional avatar image data for each direction To choose,
    The information transmission system according to claim 6, wherein the image combining unit combines an avatar image based on the selected two-dimensional avatar image data with the background image.
  8.  前記動態生成部は、前記アバター画像のデータを三次元アバター画像データとして記憶する三次元アバター画像データ記憶手段を備え、該三次元アバター画像データに基づいて三次元アバターオブジェクトを生成するとともに、前記先行する前記フレームについて取得されている前記移動軌跡情報に基づいて前記人物の移動方向を推定し、推定された前記移動方向を向くように前記三次元アバターオブジェクトの前記実空間上への配置方向を決定するものであり、
     前記画像合成部は、配置方向が決定された前記実空間上の前記三次元アバターオブジェクトを前記背景画像の二次元座標系に射影変換することにより二次元アバター画像データを生成し、該二次元アバター画像データに基づくアバター画像を前記背景画像と合成する請求項6記載の情報伝送システム。
    The dynamic generation unit includes a three-dimensional avatar image data storage unit that stores the data of the avatar image as three-dimensional avatar image data, generates a three-dimensional avatar object based on the three-dimensional avatar image data, and The movement direction of the person is estimated based on the movement trajectory information acquired for the frame to be determined, and the arrangement direction of the three-dimensional avatar object in the real space is determined so as to face the estimated movement direction Is what
    The image composition unit generates two-dimensional avatar image data by projectively transforming the three-dimensional avatar object in the real space whose arrangement direction is determined to a two-dimensional coordinate system of the background image, and the two-dimensional avatar The information transmission system according to claim 6, wherein an avatar image based on image data is combined with the background image.
  9.  前記画像合成部が、前記移動軌跡情報に基づいて前記人物の動線を表す画像を生成する請求項4ないし請求項8のいずれか1項に記載の情報伝送システム。 The information transmission system according to any one of claims 4 to 8, wherein the image composition unit generates an image representing a flow line of the person based on the movement trajectory information.
  10.  前記特徴点抽出部は、前記被写体の人物属性を解析して人物属性情報として出力するものであり、
     前記情報送信部は、前記人物属性情報を前記ネットワークへ送出するものである請求項1ないし請求項9のいずれか1項に記載の情報伝送システム。
    The feature point extraction unit analyzes the person attribute of the subject and outputs it as person attribute information;
    The information transmission system according to claim 1, wherein the information transmission unit transmits the person attribute information to the network.
  11.  前記動態生成部は前記アバター画像を、前記人物属性情報を反映したものとして生成する請求項10記載の情報伝送システム。 The information transmission system according to claim 10, wherein the behavior generation unit generates the avatar image as a reflection of the person attribute information.
  12.  前記人物属性情報は前記人物の性別を反映した性別情報と前記人物の年齢を反映した年齢情報とを含むものである請求項10又は請求項11に記載の情報伝送システム。 The information transmission system according to claim 10 or 11, wherein the person attribute information includes gender information reflecting the gender of the person and age information reflecting the age of the person.
  13.  前記特徴点抽出部は、前記被写体の人物の外観を解析して外観特徴情報として出力するものであり、
     前記情報送信部は、前記外観特徴情報を前記ネットワークへ送出するものである請求項1ないし請求項12のいずれか1項に記載の情報伝送システム。
    The feature point extraction unit analyzes the appearance of the person of the subject and outputs it as appearance feature information;
    The information transmission system according to any one of claims 1 to 12, wherein the information transmission unit transmits the appearance feature information to the network.
  14.  前記動態生成部は前記アバター画像を、前記外観特徴情報を反映したものとして生成する請求項13記載の情報伝送システム。 14. The information transmission system according to claim 13, wherein the dynamic generation unit generates the avatar image as a reflection of the appearance feature information.
  15.  前記外観特徴情報は前記人物の頭髪の形態及び色彩の一方又は双方を反映した頭髪情報と、前記人物の着衣の形態及び色彩の一方又は双方を反映した着衣情報と、前記人物の持ち物の形態及び色彩の一方又は双方を反映した持ち物情報の1以上のものを含むものである請求項14記載の情報伝送システム。 The appearance characteristic information includes hair information that reflects one or both of the form and color of the person's hair, clothing information that reflects one or both of the form and color of the person's clothes, the form of the person's belongings, and 15. The information transmission system according to claim 14, wherein the information transmission system includes one or more items of belonging information reflecting one or both of colors.
  16.  前記外観特徴情報は前記人物の体形を反映した体形情報を含むものである請求項14又は請求項15記載の情報伝送システム。 16. The information transmission system according to claim 14, wherein the appearance feature information includes body shape information reflecting the body shape of the person.
  17.  前記外観特徴情報は前記人物の歩容を反映した歩容情報を含むものである請求項14ないし請求項16のいずれか1項に記載の情報伝送システム。 The information transmission system according to any one of claims 14 to 16, wherein the appearance feature information includes gait information reflecting the gait of the person.
  18.  前記動態生成部は、人物歩行動作を細分化したコマデータの集合からなるアバターアニメーションデータを使用するものであり、前記コマデータの各コマを前記歩容情報に基づいて補正する画像補正処理を行い、
     前記画像合成部は前記アバター画像を、補正後の前記コマデータに基づき前記歩容特徴を反映させた動画形態で前記背景画像に合成する請求項17記載の情報伝送システム。
    The dynamic generation unit uses avatar animation data composed of a set of frame data obtained by subdividing human walking motion, and performs image correction processing for correcting each frame of the frame data based on the gait information. ,
    The information transmission system according to claim 17, wherein the image composition unit synthesizes the avatar image with the background image in a moving image form in which the gait feature is reflected based on the corrected frame data.
  19.  前記画像合成部が、複数の前記カメラの撮影範囲を含む俯瞰画像として前記合成画像を生成する請求項1ないし請求項18のいずれか記載の情報伝送システム。 The information transmission system according to any one of claims 1 to 18, wherein the image composition unit generates the composite image as a bird's-eye view image including a plurality of shooting ranges of the cameras.
  20.  前記撮影範囲は実空間座標を共有する形で複数の前記カメラによりカバーされてなり、
     前記座標情報付加部は、複数の前記カメラの撮影画面上に現れる人物の接地点として識別される位置を撮影接地位置とし、当該撮影設置位置に現れる人物画像領域の撮影画面上の高さを人物撮影高さとして、前記座標情報付加部は、被写体が人物である場合の実空間における歩行面を高さ方向の基準として、前記カメラの撮影画面上に設定されるカメラ二次元座標系における平面座標と前記実空間の三次元座標系である実空間座標系における前記歩行面上の空間座標点との変換関係と、前記カメラ二次元座標系における前記撮影接地位置毎の前記人物の撮影高さと前記実空間座標系での当該人物の実高さとの変換関係とを含む位置・高さ変換関係情報を取得する位置・高さ変換関係情報取得手段と、前記撮影画面上にて前記人物画像の前記撮影接地位置及び撮影高さを特定する撮影接地位置・高さ特定手段と、特定されたそれら撮影接地位置座標及び撮影高さの情報を、前記位置・高さ変換関係情報に基づいて前記実空間における前記人物の接地位置座標である実接地位置座標情報と前記実空間における人物の高さを情報である実人物高さ情報とに変換・生成する実人物座標・高さ情報生成手段とを備え、
     前記動態生成部は、生成された前記実人物高さ情報に基づいて前記アバター画像の高さ寸法を決定するアバター高さ決定手段を備え、
     前記画像合成部は、前記実空間座標系における複数の前記カメラが撮影した前記人物の前記実接地位置座標情報を前記俯瞰画像の視点にて座標変換しつつ該俯瞰画像への前記アバター画像の合成位置を決定するアバター合成位置決定手段を備える請求項19に記載の情報伝送システム。
    The shooting range is covered by a plurality of the cameras in the form of sharing real space coordinates,
    The coordinate information adding unit sets a position identified as a grounding point of a person appearing on the photographing screens of the plurality of cameras as a photographing grounding position, and sets a height on the photographing screen of a person image area appearing at the photographing installation position. As the shooting height, the coordinate information adding unit uses the walking plane in the real space when the subject is a person as a reference in the height direction, and the plane coordinates in the camera two-dimensional coordinate system set on the shooting screen of the camera And the transformation relationship between the space coordinate points on the walking plane in the real space coordinate system which is a three-dimensional coordinate system of the real space, the photographing height of the person for each photographing grounding position in the camera two-dimensional coordinate system, and the Position / height conversion relationship information acquisition means for acquiring position / height conversion relationship information including a conversion relationship with the actual height of the person in the real space coordinate system, and the person image on the shooting screen Shoot The photographing grounding position / height identifying means for identifying the grounding position and the photographing height, and the information of the identified photographing grounding position coordinates and the photographing height in the real space based on the position / height conversion relation information. Real person coordinate / height information generating means for converting and generating real ground position coordinate information that is the contact position coordinates of the person and real person height information that is information about the height of the person in the real space,
    The dynamic generation unit includes avatar height determining means for determining a height dimension of the avatar image based on the generated real person height information,
    The image synthesizing unit synthesizes the avatar image to the overhead image while performing coordinate conversion on the actual ground position coordinate information of the person photographed by the plurality of cameras in the real space coordinate system from the viewpoint of the overhead image. The information transmission system according to claim 19, further comprising avatar synthesis position determination means for determining a position.
  21.  前記特徴点抽出部が、人物の画像を人体の部位に相当する複数のパーツに分割し、各パーツから特徴点を抽出する請求項1ないし請求項20のいずれか1項に記載の情報伝送システム。 The information transmission system according to any one of claims 1 to 20, wherein the feature point extraction unit divides a person image into a plurality of parts corresponding to parts of a human body and extracts feature points from each part. .
  22.  前記動態生成部は、前記アバター画像のデータを複数の前記パーツに対応したアバター断片に分割して記憶するアバター画像データ記憶手段を備え、前記人物の対応する前記パーツについて抽出された前記特徴点の情報に基づいて前記アバター画像の前記アバター断片を補正した後、その補正後のアバター断片を統合して前記アバター画像を生成するものである請求項21記載の情報伝送システム。 The dynamic generation unit includes an avatar image data storage unit that stores data of the avatar image divided into a plurality of avatar fragments corresponding to the parts, and stores the feature points extracted for the parts corresponding to the person. The information transmission system according to claim 21, wherein after correcting the avatar fragment of the avatar image based on information, the corrected avatar fragment is integrated to generate the avatar image.
  23.  少なくとも1台のカメラで写した映像内の被写体から特徴点を抽出して特徴情報として出力する特徴点抽出部と、
     前記カメラの撮影範囲における被写体の座標情報を取得する座標情報付加部と、
     前記特徴情報および前記座標情報をネットワークへ送出する情報送信部とを備えた情報送信装置であって、
     前記特徴情報は、送信先で表示される被写体のアバター画像の構成要素に対応付けられており、
     前記座標情報は、送信先で、前記カメラの撮影範囲の背景を表す画像において、前記アバター画像を合成する位置を特定するために用いられることを特徴とする情報送信装置。
    A feature point extraction unit that extracts feature points from a subject in an image captured by at least one camera and outputs the feature points as feature information;
    A coordinate information adding unit for acquiring coordinate information of a subject in the shooting range of the camera;
    An information transmission device comprising an information transmission unit for transmitting the feature information and the coordinate information to a network,
    The feature information is associated with the constituent elements of the avatar image of the subject displayed at the transmission destination,
    The information transmission apparatus according to claim 1, wherein the coordinate information is used at a transmission destination to specify a position where the avatar image is synthesized in an image representing a background of a shooting range of the camera.
  24.  少なくとも1台のカメラで写した映像内の被写体から抽出された特徴点を表す特徴情報と、前記カメラの撮影範囲における被写体の座標情報とを、ネットワークを介して受け取る情報受信部と、
     前記特徴情報に基づいて被写体のアバター画像を生成する動態生成部と、
     前記カメラの撮影範囲の背景を表す画像に、前記座標情報に基づいて前記アバター画像を合成することにより、合成画像を生成する画像合成部とを備えたことを特徴とする情報受信装置。
    An information receiving unit that receives, via a network, feature information representing feature points extracted from a subject in an image captured by at least one camera, and coordinate information of the subject in a shooting range of the camera;
    A dynamic generation unit that generates an avatar image of the subject based on the feature information;
    An information receiving apparatus comprising: an image composition unit configured to generate a synthesized image by synthesizing the avatar image based on the coordinate information with an image representing a background of a shooting range of the camera.
  25.  少なくとも1台のカメラで写した映像内の被写体から特徴点を抽出して特徴情報として出力する特徴点抽出処理と、
     前記カメラの撮影範囲における被写体の座標情報を取得する座標情報付加処理と、
     前記特徴情報および前記座標情報をネットワークへ送出する情報送信処理とをコンピュータに実行させるコンピュータプログラムであって、
     前記特徴情報は、送信先で表示される被写体のアバター画像の構成要素に対応付けられており、
     前記座標情報は、送信先で、前記カメラの撮影範囲の背景を表す画像において、前記アバター画像を合成する位置を特定するために用いられるコンピュータプログラム。
    A feature point extraction process for extracting feature points from a subject in an image captured by at least one camera and outputting them as feature information;
    Coordinate information addition processing for acquiring coordinate information of a subject in the shooting range of the camera;
    A computer program for causing a computer to execute information transmission processing for sending the feature information and the coordinate information to a network,
    The feature information is associated with the constituent elements of the avatar image of the subject displayed at the transmission destination,
    The coordinate information is a computer program used for specifying a position where the avatar image is synthesized in an image representing a background of a shooting range of the camera at a transmission destination.
  26.  少なくとも1台のカメラで写した映像内の被写体から抽出された特徴点を表す特徴情報と、前記カメラの撮影範囲における被写体の座標情報とを、ネットワークを介して受け取る情報受信処理と、
     前記特徴情報に基づいて被写体のアバター画像を生成する動態生成処理と、
     前記カメラの撮影範囲の背景を表す画像に、前記座標情報に基づいて前記アバター画像を合成することにより、合成画像を生成する画像合成処理とをコンピュータに実行させるコンピュータプログラム。
    An information receiving process for receiving, via a network, feature information representing feature points extracted from a subject in an image captured by at least one camera, and coordinate information of the subject in a shooting range of the camera;
    A dynamic generation process for generating an avatar image of a subject based on the feature information;
    A computer program for causing a computer to execute an image composition process for generating a composite image by compositing the avatar image with an image representing a background of a shooting range of the camera based on the coordinate information.
PCT/JP2017/010290 2016-03-08 2017-03-08 Information transmitting system, information transmitting device, information receiving device, and computer program WO2017155126A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017564647A JP6357595B2 (en) 2016-03-08 2017-03-08 Information transmission system, information receiving apparatus, and computer program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-044793 2016-03-08
JP2016044793 2016-03-08

Publications (1)

Publication Number Publication Date
WO2017155126A1 true WO2017155126A1 (en) 2017-09-14

Family

ID=59790440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/010290 WO2017155126A1 (en) 2016-03-08 2017-03-08 Information transmitting system, information transmitting device, information receiving device, and computer program

Country Status (2)

Country Link
JP (1) JP6357595B2 (en)
WO (1) WO2017155126A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2017141454A1 (en) * 2016-05-13 2018-02-22 株式会社日立製作所 Congestion status visualization device, congestion status visualization system, congestion status visualization method, and congestion status visualization program
CN110619807A (en) * 2018-06-20 2019-12-27 北京京东尚科信息技术有限公司 Method and device for generating global thermodynamic diagram
JP2020160988A (en) * 2019-03-27 2020-10-01 オムロン株式会社 Notification system, and notification device
JP2021150735A (en) * 2020-03-17 2021-09-27 本田技研工業株式会社 Information processing device, information processing system, information processing method and program
CN115004715A (en) * 2020-02-14 2022-09-02 欧姆龙株式会社 Image processing apparatus, image sensor, and method for controlling image processing apparatus
JP2022551660A (en) * 2020-01-16 2022-12-12 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 SCENE INTERACTION METHOD AND DEVICE, ELECTRONIC DEVICE AND COMPUTER PROGRAM
JP7319637B1 (en) 2022-05-30 2023-08-02 株式会社セルシス Information processing system, information processing method and information processing program
US12033241B2 (en) 2020-01-16 2024-07-09 Tencent Technology (Shenzhen) Company Limited Scene interaction method and apparatus, electronic device, and computer storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09307868A (en) * 1996-03-15 1997-11-28 Toshiba Corp Communication equipment and communication method
JP2015149557A (en) * 2014-02-05 2015-08-20 パナソニックIpマネジメント株式会社 Monitoring device, monitoring system, and monitoring method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5040844B2 (en) * 2008-07-31 2012-10-03 カシオ計算機株式会社 Imaging apparatus, imaging method, and program
JP5783629B2 (en) * 2011-07-08 2015-09-24 株式会社ドワンゴ Video display system, video display method, video display control program, operation information transmission program
WO2015136796A1 (en) * 2014-03-10 2015-09-17 ソニー株式会社 Information processing apparatus, storage medium and control method
JP6312512B2 (en) * 2014-04-23 2018-04-18 博司 佐久田 Remote monitoring system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09307868A (en) * 1996-03-15 1997-11-28 Toshiba Corp Communication equipment and communication method
JP2015149557A (en) * 2014-02-05 2015-08-20 パナソニックIpマネジメント株式会社 Monitoring device, monitoring system, and monitoring method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2017141454A1 (en) * 2016-05-13 2018-02-22 株式会社日立製作所 Congestion status visualization device, congestion status visualization system, congestion status visualization method, and congestion status visualization program
US10607356B2 (en) 2016-05-13 2020-03-31 Hitachi, Ltd. Congestion analysis device, congestion analysis method, and congestion analysis program
CN110619807A (en) * 2018-06-20 2019-12-27 北京京东尚科信息技术有限公司 Method and device for generating global thermodynamic diagram
JP7127592B2 (en) 2019-03-27 2022-08-30 オムロン株式会社 Notification system
JP2020160988A (en) * 2019-03-27 2020-10-01 オムロン株式会社 Notification system, and notification device
US11967183B2 (en) 2019-03-27 2024-04-23 Omron Corporation Notification system and notification device
JP2022551660A (en) * 2020-01-16 2022-12-12 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 SCENE INTERACTION METHOD AND DEVICE, ELECTRONIC DEVICE AND COMPUTER PROGRAM
JP7408792B2 (en) 2020-01-16 2024-01-05 ▲騰▼▲訊▼科技(深▲セン▼)有限公司 Scene interaction methods and devices, electronic equipment and computer programs
US12033241B2 (en) 2020-01-16 2024-07-09 Tencent Technology (Shenzhen) Company Limited Scene interaction method and apparatus, electronic device, and computer storage medium
CN115004715A (en) * 2020-02-14 2022-09-02 欧姆龙株式会社 Image processing apparatus, image sensor, and method for controlling image processing apparatus
JP2021150735A (en) * 2020-03-17 2021-09-27 本田技研工業株式会社 Information processing device, information processing system, information processing method and program
JP7017596B2 (en) 2020-03-17 2022-02-08 本田技研工業株式会社 Information processing equipment, information processing systems, information processing methods and programs
JP7319637B1 (en) 2022-05-30 2023-08-02 株式会社セルシス Information processing system, information processing method and information processing program
JP2023175084A (en) * 2022-05-30 2023-12-12 株式会社セルシス Information processing system, information processing method, and information processing program

Also Published As

Publication number Publication date
JP6357595B2 (en) 2018-07-11
JPWO2017155126A1 (en) 2018-06-14

Similar Documents

Publication Publication Date Title
JP6357595B2 (en) Information transmission system, information receiving apparatus, and computer program
JP2019009752A (en) Image processing device
US11217006B2 (en) Methods and systems for performing 3D simulation based on a 2D video image
US10757373B2 (en) Method and system for providing at least one image captured by a scene camera of a vehicle
JP4473754B2 (en) Virtual fitting device
Fuchs et al. Virtual space teleconferencing using a sea of cameras
JP3512992B2 (en) Image processing apparatus and image processing method
US20160342861A1 (en) Method for Training Classifiers to Detect Objects Represented in Images of Target Environments
JPH0877356A (en) Method and device for processing three-dimensional multi-view image
CN105556508A (en) Devices, systems and methods of virtualizing a mirror
KR20190016143A (en) Slam on a mobile device
US11494963B2 (en) Methods and systems for generating a resolved threedimensional (R3D) avatar
JP5833526B2 (en) Video communication system and video communication method
JP4695275B2 (en) Video generation system
JP2023539865A (en) Real-time cross-spectral object association and depth estimation
CN107016730A (en) The device that a kind of virtual reality is merged with real scene
Kim et al. Augmenting aerial earth maps with dynamic information from videos
CN106981100A (en) The device that a kind of virtual reality is merged with real scene
Cui et al. Fusing surveillance videos and three‐dimensional scene: A mixed reality system
CN113111743A (en) Personnel distance detection method and device
JP2005149145A (en) Object detecting device and method, and computer program
Fiore et al. Towards achieving robust video selfavatars under flexible environment conditions
Dijk et al. Image processing in aerial surveillance and reconnaissance: from pixels to understanding
Malerczyk et al. 3D reconstruction of sports events for digital TV
WO2022022809A1 (en) Masking device

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2017564647

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17763458

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17763458

Country of ref document: EP

Kind code of ref document: A1