WO2010070820A1 - Image communication device and image communication method - Google Patents

Image communication device and image communication method Download PDF

Info

Publication number
WO2010070820A1
WO2010070820A1 PCT/JP2009/006382 JP2009006382W WO2010070820A1 WO 2010070820 A1 WO2010070820 A1 WO 2010070820A1 JP 2009006382 W JP2009006382 W JP 2009006382W WO 2010070820 A1 WO2010070820 A1 WO 2010070820A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
unit
coordinates
face
counterpart
Prior art date
Application number
PCT/JP2009/006382
Other languages
French (fr)
Japanese (ja)
Inventor
本田義雅
岡田晋
井藤好克
Original Assignee
パナソニック株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニック株式会社 filed Critical パナソニック株式会社
Publication of WO2010070820A1 publication Critical patent/WO2010070820A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Definitions

  • the present invention relates to an image communication apparatus for conducting a TV (television) conference on a large screen, and more particularly to an image communication apparatus that allows a speaker to recognize his / her position.
  • ADSL Asymmetric Digital Subscriber Line
  • optical fiber networks have spread rapidly, and low-speed and high-speed Internet connection has become available.
  • ADSL Asymmetric Digital Subscriber Line
  • HD High Definition
  • PDP Plasma Display Panel
  • the viewing angle of the camera tends to be narrow in order to display a person in real size, and if the speaker moves greatly, the camera will move out of the imaging area, An image that does not include the speaker's face is transmitted.
  • an image captured by a camera is displayed as a self-portrait on a display, and the user moves to an image capturing area while confirming the self-image.
  • Patent Document 1 discloses a technique for notifying an outside of an imaging area by generating a warning sound when a subject is outside the imaging area in a camera photographing apparatus. .
  • the present invention prevents the communication from being interrupted due to the generation of a warning sound or the display of the self-portrait on the display, and visually indicates to the speaker how far the reference position is deviated.
  • An object of the present invention is to provide an image communication apparatus that can be shown.
  • an image communication apparatus is an image communication apparatus that communicates image data with another image communication apparatus via a network.
  • An image receiving unit that receives the second image data including the partner image transmitted from the other image communication device, and processes the partner image included in the second image data received by the image receiving unit.
  • An image processing unit that generates a processed image, and an image output unit that outputs the processed image generated by the image processing unit to a display device, and the image processing unit is operated by the face detection unit.
  • the difference between at least one of the position and size of the detected face area and a predetermined reference is calculated, and the partner image is processed so that the partner image and the processed image differ greatly as the calculated difference increases.
  • the larger the difference between the position or size of the face area and the predetermined reference the larger the partner image displayed on the display device, so that the self-image is not displayed on the display device.
  • the speaker By looking at the image of the other party, the speaker can check how much his / her position deviates from the reference position. Thereby, communication is not hindered by the display of a warning sound or a self-portrait, and a TV conference with a higher sense of reality can be performed.
  • the face detection unit detects face region coordinates indicating the position of the face region in the captured image
  • the image processing unit calculates an absolute value of a difference between the face region coordinates and a predetermined reference coordinate.
  • the processed image is processed so that the larger the absolute difference value, the larger the counterpart image differs from the processed image.
  • An image may be generated.
  • the speaker can see how much he / she deviates from the reference position by looking at the other party's image displayed on the display device. It can be confirmed. At this time, if the amount of deviation from the reference position (difference absolute value) is less than or equal to a predetermined threshold, the image is not processed. If the user is within a predetermined range from the reference position, the image of the other party is displayed on the display device. Can be made.
  • the image processing unit superimposes the superimposed image on the counterpart image such that the larger the absolute value of the difference between the face area coordinates and the reference coordinates, the larger the area of the predetermined superimposed image becomes.
  • the processed image may be generated.
  • the speaker can see the area of the superimposed image displayed on the display device by himself / herself. It can be confirmed how much is deviated from the reference position.
  • the image processing unit when the face area coordinates are on the right side of the reference coordinates, from the left end of the counterpart image, or when the face area coordinates are on the left side of the reference coordinates, From the above, the superimposed image may be superimposed on the counterpart image so that the larger the absolute difference in the horizontal direction between the face region coordinates and the reference coordinates, the larger the area of the superimposed image.
  • the speaker in order to control the area of the superimposed image to be superimposed on the partner image according to the horizontal shift direction and the distance from the reference coordinate, the speaker can position the superimposed image displayed on the display device. It is possible to confirm how much they are displaced in the horizontal direction by looking at the area. For example, the larger the distance from the reference coordinate, the larger the area of the superimposed image. Therefore, the larger the area of the superimposed image, the greater the deviation of the speaker from the reference position.
  • the face region coordinates are on the right side of the reference coordinates in the captured image, the superimposed image is superimposed from the left end of the partner image, so that the speaker can see the reference image by looking at the superimposed image displayed on the left side.
  • the image processing unit when the face area coordinates are above the reference coordinates, from the upper end of the counterpart image, or when the face area coordinates are below the reference coordinates, From the lower end, the superimposed image may be superimposed on the counterpart image such that the larger the absolute value of the vertical difference between the face area coordinates and the reference coordinates, the larger the area of the superimposed image.
  • the speaker in order to control the area of the superimposed image to be superimposed on the partner image according to the vertical deviation direction and distance from the reference coordinate, the speaker can position the superimposed image displayed on the display device. And the area, it is possible to confirm how much they are displaced in the vertical direction. For example, when the face area coordinates are above the reference coordinates in the captured image, the superimposed image is superimposed from the upper end of the partner image, so that the speaker can see the reference image by looking at the superimposed image displayed on the upper side. You can recognize that you are above the position. Conversely, if the face area coordinates are below the reference coordinates in the captured image, the superimposed image is superimposed from the lower end of the partner image, so the speaker can see the superimposed image displayed on the lower side, You can recognize that you are below the reference position.
  • the face detection unit further determines whether or not a face region is in the captured image, generates a flag indicating the presence or absence of the face region, and the image processing unit determines that there is no face region.
  • the flag indicates, the superimposed image having a predetermined area may be superimposed on a predetermined region of the counterpart image.
  • the image communication apparatus further includes a buffer for storing the face area coordinates detected by the face detection unit, and the image processing unit has the flag indicating that the face area is not in the captured image.
  • the face area coordinates stored in the buffer are on the right side of the reference coordinates, from the left end of the counterpart image, or when the face area coordinates are on the left side of the reference coordinates, from the right end of the counterpart image, and
  • the face area coordinates stored in the buffer are predetermined from the upper end of the counterpart image when they are above the reference coordinates, or from the lower end of the counterpart image when they are below the reference coordinates.
  • the superimposed image of the area may be superimposed on the counterpart image.
  • the image processing unit generates the processed image by projective transforming the counterpart image so that the gradient of the projective transformation increases as the absolute difference between the face region coordinates and the reference coordinates increases. Also good.
  • the processed image displayed on the display device is an image obtained by projective transformation of the partner image, by looking at the inclination of the projective transformation of the processed image displayed on the display device, the speaker can You can check how far you are from the reference position.
  • the image processing unit may enlarge the counterpart image with a larger enlargement ratio as the difference absolute value between the face area coordinates and the reference coordinates is larger.
  • the processed image displayed on the display device is an image obtained by enlarging a part of the partner image
  • the speaker can see the degree of enlargement of the processed image displayed on the display device. It is possible to check how much the user is deviated from the reference position.
  • the image communication apparatus further detects a face area from the counterpart image included in the second image data received by the image receiving unit, and uses the face area coordinates indicating the position of the detected face area as the reference coordinates. You may provide the reference coordinate setting part to set.
  • image processing is performed using the reference coordinates as the coordinates of the face area of the partner image, and the speaker reference position is not fixed and can be set according to the partner position. That is, when the other party moves, the reference position of the speaker can be changed accordingly.
  • the face detection unit detects the size of the face region
  • the image processing unit calculates a difference absolute value between the size of the face region and a predetermined reference size, and the calculated difference absolute value is
  • the processed image may be generated by processing the counterpart image so that the larger the difference absolute value is, the larger the counterpart image and the processed image are.
  • the speaker can see how much he / she is out of the reference position by looking at the partner image displayed on the display device. It can be confirmed. At this time, if the amount of deviation from the reference position (difference absolute value) is less than or equal to a predetermined threshold, the image is not processed. If the user is within a predetermined range from the reference position, the image of the other party is displayed on the display device. Can be made.
  • the image processing unit may generate the processed image by blurring the counterpart image so that the blur amount becomes larger as the difference absolute value between the size of the face area and the reference size is larger.
  • the blurring amount of the partner image is increased as the size of the face region is larger or smaller. Therefore, the speaker can see the reference level by looking at the degree of blur of the processed image displayed on the display device. It can be confirmed how much the position is deviated from the position. Specifically, the speaker can recognize that he is too close to the camera or too far away.
  • the image processing unit enlarges the counterpart image when the size of the face area is larger than the reference size, and reduces the counterpart image when the size of the face area is smaller than the reference size.
  • the processed image may be generated, and the enlargement ratio or reduction ratio may be larger as the difference absolute value between the size of the face area and the reference size is larger.
  • the present invention can be realized not only as an image communication apparatus but also as a method using a processing unit constituting the image communication apparatus as a step. Moreover, you may implement
  • a communication network such as the Internet.
  • the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically includes a microprocessor, ROM, RAM (Random Access Memory), and the like.
  • Computer system is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically includes a microprocessor, ROM, RAM (Random Access Memory), and the like.
  • the present invention it is possible to prevent an alarm sound from being generated or to prevent communication from being hindered by displaying a self-portrait on a display, and to determine how far the camera is out of the imaging area of the camera. Can be shown.
  • FIG. 1 is a block diagram illustrating a configuration of the image communication apparatus according to the first embodiment.
  • FIG. 2 is a flowchart illustrating transmission processing of the image communication apparatus according to the first embodiment.
  • FIG. 3 is a flowchart illustrating a reception process of the image communication apparatus according to the first embodiment.
  • FIG. 4 is a diagram illustrating a positional relationship among the speaker, the camera, and the monitor according to the first embodiment.
  • FIG. 5 is a diagram illustrating a positional relationship between speakers in an input image captured by the camera according to the first embodiment.
  • FIG. 6 is a diagram illustrating an example of a counterpart image before execution of the image superimposing process of Embodiment 1 and a processed image after execution.
  • FIG. 1 is a block diagram illustrating a configuration of the image communication apparatus according to the first embodiment.
  • FIG. 2 is a flowchart illustrating transmission processing of the image communication apparatus according to the first embodiment.
  • FIG. 3 is a flowchart illustrating a reception process of the image
  • FIG. 7A is a diagram illustrating an example of a graph showing the relationship between the horizontal distance (dx) and the horizontal overlap size (W (dx)).
  • FIG. 7B is a diagram illustrating an example of a graph showing the relationship between the vertical distance (dy) and the vertical superimposition size (H (dy)).
  • FIG. 8 is a diagram illustrating an example of a processed image when the face area according to the first embodiment is not detected.
  • FIG. 9 is a schematic diagram showing how the partner image changes when the speaker according to Embodiment 1 moves.
  • FIG. 10 is a block diagram illustrating a configuration of the image communication apparatus according to the second embodiment.
  • FIG. 11 is a flowchart illustrating a reception process of the image communication apparatus according to the second embodiment.
  • FIG. 12 is a diagram illustrating an example of a partner image before execution of projective transformation according to Embodiment 2 and a processed image after execution.
  • FIG. 13 is a diagram illustrating an example of a graph for calculating image processing parameters for projective conversion according to the second embodiment.
  • FIG. 14 is a schematic diagram showing how the partner image changes when the speaker according to Embodiment 2 moves.
  • FIG. 15 is a block diagram illustrating a configuration of the image communication apparatus according to the third embodiment.
  • FIG. 16 is a flowchart illustrating a reception process of the image communication apparatus according to the third embodiment.
  • FIG. 17 is a diagram illustrating an example of a graph for calculating the blur parameter according to the third embodiment.
  • FIG. 18 is a diagram illustrating an example of an input image, a partner image, and a processed image according to the third embodiment.
  • FIG. 19 is a block diagram illustrating a configuration of the image communication apparatus according to the fourth embodiment.
  • FIG. 20 is a flowchart illustrating a reception process of the image communication apparatus according to the fourth embodiment.
  • FIG. 21 is a diagram illustrating an example of a graph for calculating an enlargement / reduction parameter according to the fourth embodiment.
  • FIG. 22 is a diagram illustrating an example of an input image, a partner image, and a processed image according to the fourth embodiment.
  • FIG. 23 is a block diagram illustrating a configuration of the image communication apparatus according to the fifth embodiment.
  • FIG. 24 is a flowchart illustrating a reception process of the image communication apparatus according to the fifth embodiment.
  • FIG. 25A is a diagram illustrating an example of a partner image according to the fifth embodiment.
  • FIG. 25B is a diagram illustrating an example of an input image according to the fifth embodiment.
  • FIG. 26 is a block diagram illustrating a configuration of the image communication apparatus according to the sixth embodiment.
  • FIG. 27 is a flowchart illustrating transmission processing of the image communication apparatus according to the sixth embodiment.
  • FIG. 28 is a diagram illustrating an example of an input image and a transmission image according to the sixth embodiment.
  • FIG. 29 is a block diagram illustrating a configuration of the image communication apparatus according to the seventh embodiment.
  • FIG. 25A is a diagram illustrating an example of a partner image according to the fifth embodiment.
  • FIG. 25B is a diagram illustrating an example of an input image according to the fifth embodiment.
  • FIG. 26 is a block diagram illustrating a configuration of the image communication apparatus according to the sixth
  • FIG. 30 is a diagram illustrating a positional relationship between the two cameras and the speaker according to the seventh embodiment.
  • FIG. 31 is a flowchart illustrating a transmission process of the image communication apparatus according to the seventh embodiment.
  • FIG. 32 is a diagram illustrating an example of a first input image and a second input image according to the seventh embodiment.
  • FIG. 33 is a diagram illustrating an example of a partner image and a processed image.
  • FIG. 34 is a block diagram showing an example of a different form of the image communication apparatus according to the present invention.
  • the image communication apparatus is an apparatus that superimposes and displays a predetermined image on a partner image received from another image communication apparatus based on the position of the speaker's face in a captured image acquired from a camera or the like. It is.
  • FIG. 1 is a block diagram illustrating a configuration of the image communication apparatus 100 according to the first embodiment.
  • the image communication apparatus 100 illustrated in FIG. 1 transmits image data including a captured image captured by the camera 101 to another image communication apparatus via the network 111, and from the other image communication apparatus via the network 111.
  • This is a device for displaying a partner image included in the received image data on the monitor 110.
  • the image communication apparatus 100 includes an image input unit 102, a face detection unit 103, an image encoding unit 104, an image transmission unit 105, an image reception unit 106, and an image decoding unit 107.
  • the image processing unit 108 and the image output unit 109 are provided.
  • the image input unit 102 is an interface connected to the camera 101 that captures an image, and acquires a captured image captured by the camera 101.
  • the image input unit 102 outputs the acquired captured image to the face detection unit 103 as an input image.
  • the face detection unit 103 detects a face area from the input image acquired by the image input unit 102. For example, the face detection unit 103 detects a face area, thereby detecting face area coordinates indicating the position of the face area and the size of the face area.
  • the face area is a circular area including the speaker's face in the input image.
  • the face detection unit 103 performs face area detection processing using a technique such as template matching or ellipse detection, and determines the face detection flag indicating the presence or absence of a face area, and the center position of the face area as an example of face area coordinates. Information such as the center coordinates to be shown and the radius of the face area is output to the image processing unit 108. Further, the face detection unit 103 outputs the input image to the image encoding unit 104.
  • the face area may be elliptical. In this case, the center position of the face area is the midpoint between the two focal points, and the radius of the face area is an average value of the major axis and the minor axis. In the present embodiment, the radius information of the face area need not be output.
  • the image encoding unit 104 is H.264.
  • compressed image data is generated. That is, the generated image data includes a captured image acquired by the image input unit 102.
  • the image encoding unit 104 outputs the generated image data to the image transmission unit 105.
  • the image transmission unit 105 transmits the image data compressed by the image encoding unit 104 to another image communication apparatus or the like via the network 111.
  • the image transmission unit 105 packetizes the image data according to a packet transmission method such as RTP (Real-time Transport Protocol), and outputs the RTP packet generated by packetization to the network 111.
  • RTP Real-time Transport Protocol
  • the image receiving unit 106 receives image data including a partner image from another image communication apparatus via the network 111. For example, the image receiving unit 106 receives an RTP packet from the network 111 and acquires compressed image data by removing the RTP header. The image receiving unit 106 outputs the acquired compressed image data to the image decoding unit 107.
  • the image decoding unit 107 generates a partner image by decoding the compressed image data received by the image receiving unit 106.
  • the image decoding unit 107 outputs the generated partner image to the image processing unit 108.
  • the partner image is an image showing a communication partner using another image communication apparatus, and is an image acquired by a camera or the like connected to the other image communication apparatus.
  • the image processing unit 108 is an example of an image processing unit that generates a processed image by processing a counterpart image. “Processing” here does not include superimposing a captured image, which is a self-portrait, on the partner image. For example, the image processing unit 108 calculates the difference between the position of the face area detected by the face detection unit 103 and a predetermined threshold, and the larger the calculated difference, the larger the partner image and the processed image differ. Process the other party's image. Specifically, the image processing unit 108 calculates image processing parameters according to the position of the face area detected by the face detection unit 103, and processes the partner image using the calculated image processing parameters. As illustrated in FIG. 1, the image processing unit 108 includes a determination unit 121, a buffer 122, a parameter calculation unit 123, and an image superimposition unit 124.
  • the determination unit 121 determines whether a face area exists in the input image using the face detection flag input from the face detection unit 103. When the face area exists, the determination unit 121 outputs the center coordinates and the radius of the face area input from the face detection unit 103 to the parameter calculation unit 123 and causes the buffer 122 to store the center coordinates and the radius. When the face area does not exist, the determination unit 121 reads the center coordinates from the buffer 122 and outputs the read center coordinates to the parameter calculation unit 123.
  • the buffer 122 is a memory for storing the center coordinates of the face area and the radius of the face area.
  • the buffer 122 may always store only the latest center coordinates and radius, that is, only the center coordinates of the face area when the face area was last detected.
  • the buffer 122 may store a plurality of center coordinates and a plurality of radii in association with the time when the face area is detected.
  • the parameter calculation unit 123 calculates a difference between the center coordinates input from the determination unit 121 and a predetermined reference coordinate, and calculates an image processing parameter using the calculated difference. Specifically, the parameter calculation unit 123 calculates an image processing parameter for superimposing an image based on the calculated positive / negative of the difference and the absolute value of the difference. The parameter calculation unit 123 outputs the calculated image processing parameters to the image superimposing unit 124. The calculation processing of image processing parameters for image superimposition will be described later.
  • the image superimposing unit 124 generates a processed image by superimposing a predetermined superimposed image on the counterpart image generated by the image decoding unit 107 using the image processing parameter.
  • the superimposed image may be any image as long as it is different from the original counterpart image and the captured image that is the self-portrait.
  • the superimposed image is preferably a single color image or a semi-transparent image, for example, an image that allows the speaker to easily determine that it is superimposed and its position.
  • the image superimposing unit 124 outputs the generated processed image to the image output unit 109.
  • the image processing unit 108 calculates the absolute value of the difference between the center coordinates, which are an example of face area coordinates, and the reference coordinates. Then, when the calculated difference absolute value is larger than a predetermined threshold, the image processing unit 108 processes the partner image so that the partner image and the processed image are greatly different as the difference absolute value is larger. Generate an image. Specifically, the processed image is generated by superimposing the superimposed image on the counterpart image so that the larger the difference absolute value is, the larger the area of the predetermined superimposed image is.
  • the image output unit 109 outputs the processed image generated by the image processing unit 108 to the monitor 110 which is an example of a display device.
  • the image output unit 109 is an interface connected to a monitor 110 that displays an image, and displays a processed image on the monitor 110.
  • FIGS. 2 and 3 the operation of the image communication apparatus 100 having the above configuration will be described using the flowcharts shown in FIGS. 2 and 3 is stored as a control program in a storage device (not shown) such as a ROM or flash memory, and is controlled by a CPU (Central Processing Unit) (not shown).
  • a storage device such as a ROM or flash memory
  • CPU Central Processing Unit
  • FIG. 2 is a flowchart showing transmission processing of the image communication apparatus 100 according to the first embodiment.
  • the image input unit 102 acquires an uncompressed captured image from the camera 101 in units of frames, and outputs the acquired image to the face detection unit 103 (S101).
  • the face detection unit 103 detects a face area from a non-compressed captured image input from the image input unit 102 by a method such as template matching or ellipse detection (S102). Then, the face detection unit 103 calculates information such as a face detection flag, the center coordinates of the face region, and the radius of the face region, and outputs the calculated information to the image processing unit 108 and also converts the input image into an image code. To the conversion unit 104.
  • the image encoding unit 104 is H.264.
  • the input image input from the face detection unit 103 is encoded using the H.264 compression encoding method, and the compression-encoded image data is output to the image transmission unit 105.
  • the image transmitting unit 105 converts the image data input from the image encoding unit 104 into RTP packets according to a packet transmission method such as RTP, and outputs the packet to the network 111 (S103).
  • the compression encoding method is H.264.
  • H.264 is not limited to MPEG-2, MPEG-4, H.264. 261, H.H.
  • Any compression coding scheme such as H.263 can be used.
  • the packet transmission method is not limited to RTP, and any transmission method such as RTSP (Real Time Streaming Protocol) can be used.
  • FIG. 4 is a diagram illustrating a positional relationship among the speaker 201, the camera 101, and the monitor 110 according to the first embodiment.
  • 4A is a view of the speaker 201, the camera 101, and the monitor 110 as viewed from above.
  • FIG. 4B is a view of the speaker 201, the camera 101, and the monitor 110 as viewed from the front. is there.
  • the camera 101 is installed in the upper center of the monitor 110, and an area that can be imaged by the camera 101 is an imaging area 202. Further, only the partner image is displayed on the monitor 110, and the self-portrait (speaker 201) is not displayed. In the present invention, even when the speaker 201 does not display the self-portrait on the monitor 110, the speaker 201 confirms where the speaker 201 is in the imaging area 202 with the image of the other party displayed on the monitor 110. The purpose is to do.
  • FIG. 5 is a diagram showing the positional relationship of the speaker 201 in the input image 301 captured by the camera 101 in FIG.
  • the camera 101 since the camera 101 is installed in the upper center of the monitor 110, when the speaker 201 is positioned on the left side of the monitor 110, the right side in the input image 301 captured by the camera 101 is displayed.
  • the speaker image 302 exists.
  • the speaker image 302 is the speaker 201 shown in the input image 301.
  • the face detection unit 103 detects the face area 303 of the speaker image 302 from the input image 301 and calculates center coordinates (x1, y1) and a radius (R) that are the centers of the face areas.
  • the reference coordinates (x0, y0) are coordinates that make a preset speaker position appropriate, and are used when detecting the amount of deviation of face coordinates.
  • the reference coordinate is the center of the imaging area 202 of the camera 101.
  • FIG. 3 is a flowchart showing the reception process of the image communication apparatus 100 according to the first embodiment.
  • the image receiving unit 106 receives the RTP packet via the network 111, acquires the compressed image data by removing the RTP header, and outputs the acquired compressed image data to the image decoding unit 107.
  • the image decoding unit 107 converts the image data input from the image receiving unit 106 into the H.264 format.
  • the non-compressed counterpart image is generated by decoding based on H.264, and the generated counterpart image is output to the image processing unit 108 (S201).
  • the determination unit 121 determines whether a face area exists in the input image using the face detection flag input from the face detection unit 103 (S202).
  • the determination unit 121 When the face area exists (Yes in S202), the determination unit 121 outputs the center coordinates (x1, y1) input from the face detection unit 103 to the parameter calculation unit 123. At this time, the center coordinates (x1, y1) are stored in the buffer 122. The parameter calculation unit 123 calculates the difference between the center coordinates (x1, y1) input from the determination unit 121 and the reference coordinates (x0, y0) (S203).
  • the parameter calculation unit 123 calculates an image processing parameter based on the positive / negative and absolute value of the calculated difference (S207).
  • the determination unit 121 reads the past center coordinates from the buffer 122 (S205), and outputs the read center coordinates (x1, y1) to the parameter calculation unit 123.
  • the determination unit 121 reads the latest center coordinates, that is, the center coordinates when the face area was last detected.
  • the parameter calculation unit 123 calculates the difference between the center coordinates (x1, y1) and the reference coordinates (x0, y0) (S206). Then, the parameter calculation unit 123 calculates an image processing parameter based on the sign of the calculated difference (S207). A specific example of the image processing parameter calculation process will be described later.
  • the image superimposing unit 124 superimposes a predetermined superimposed image on the counterpart image using the image processing parameter (S208). Then, the image superimposing unit 124 outputs the processed image generated by the superimposition to the image output unit 109.
  • the image superimposing unit 124 When the calculated difference is equal to or less than the threshold (No in S204), the image superimposing unit 124 outputs the partner image as it is to the image output unit 109 without executing the superimposing process on the input partner image.
  • the image output unit 109 outputs the processed image (or the partner image) input from the image superimposing unit 124 to the monitor 110 and displays it on the monitor 110 (S209).
  • the image processing unit 108 calculates an image processing parameter for superimposition using the face detection flag input from the face detection unit 103 and the center coordinates of the face area.
  • the parameter calculation unit 123 calculates the reference coordinates (x0, y0) and the center coordinates (x1, y1) of the face area according to Expression 1 and Expression 2. The difference, that is, the horizontal distance (dx) and the vertical distance (dy) are calculated.
  • the parameter calculation unit 123 performs threshold determination on the horizontal distance (dx) and the vertical distance (dy), respectively, and when the horizontal distance (dx) and the vertical distance (dy) that are the differences are larger than the threshold, Image processing parameters for image processing (image superimposition) are calculated.
  • FIG. 6 is a diagram illustrating an example of the partner image 401 before the execution of the image superimposing process according to the first embodiment and the processed image 402 after the execution.
  • FIG. 6B shows a processed image 402 generated by superimposing the superimposed image 403 on the counterpart image 401 shown in FIG.
  • FIGS. 7A and 7B are diagrams illustrating an example of a graph for calculating an image processing parameter for superimposition.
  • FIG. 7A is a diagram illustrating an example of a graph showing the relationship between the horizontal distance (dx) and the horizontal overlap size (W (dx)).
  • FIG. 7B is a diagram illustrating an example of a graph showing the relationship between the vertical distance (dy) and the vertical superimposition size (H (dy)).
  • the image processing parameter for superimposition is a parameter indicating which size of the superimposed image 403 is to be superimposed at which position of the counterpart image 401, and is, for example, the following four points.
  • Image processing parameters for superimposition Horizontal superimposition start position: Left end or right end of partner image 401 Horizontal overlap size ... W (dx) 3.
  • Vertical superimposition start position upper end or lower end of the partner image 401 Vertical overlap size ... H (dy)
  • the parameter calculation unit 123 first sets and calculates horizontal image processing parameters according to the following.
  • the parameter calculation unit 123 determines the horizontal superimposition start position according to the sign of dx. Specifically, the parameter calculation unit 123 determines the left end when the sign of dx is positive and the right end when the sign is negative.
  • the fact that dx is positive means that the face region 303 exists on the right side in the input image 301, that is, as shown in FIG. That is, the speaker 201 exists on the right side as viewed from the right.
  • the image superimposing unit 124 since the partner is viewed from the left side, the image superimposing unit 124 superimposes the superimposed image 403 from the left end of the partner image 401. The opposite is true when dx is negative.
  • the parameter calculation unit 123 calculates vertical image processing parameters according to the following.
  • the parameter calculation unit 123 determines the vertical superimposition start position according to the sign of dy. Specifically, the parameter calculation unit 123 determines the upper end when the sign of dy is positive and the lower end when it is negative.
  • dy being positive means that the face region 303 exists on the upper side in the input image 301.
  • the image superimposing unit 124 superimposes the superimposed image 403 from the upper end of the partner image 401. The opposite is true when dy is negative.
  • the calculation method is not limited to this example, and any method can be used as long as the value of the screen processing parameter increases as the horizontal distance or the vertical distance increases. That is, the horizontal distance
  • the image processing unit 108 when the face area coordinates are on the right side of the reference coordinates in the input image, the image processing unit 108 superimposes the superimposed image from the left end of the counterpart image, and the face area coordinates are on the left side of the reference coordinates in the input image. If it is, the superimposed image is superimposed from the right end of the partner image. At this time, the image processing unit 108 superimposes the superimposed image on the partner image so that the larger the horizontal difference absolute value between the face region coordinates and the reference coordinates, the larger the area of the superimposed image.
  • the image processing unit 108 when the face area coordinates are above the reference coordinates in the input image, the image processing unit 108 superimposes the superimposed image from the upper end of the partner image, and the face area coordinates are below the reference coordinates in the input image. In some cases, the superimposed image is superimposed from the lower end of the partner image. At this time, the image processing unit 108 superimposes the superimposed image on the partner image so that the larger the absolute value of the vertical difference between the face area coordinates and the reference coordinates, the larger the area of the superimposed image.
  • the case where the face area does not exist is as follows.
  • FIG. 8 is a diagram showing an example of a processed image when the face area of the first embodiment is not detected.
  • the past input image 501 includes a speaker image 502 and includes a face area 503.
  • the input image 501 is, for example, an input image when the face area 503 is detected last.
  • the parameter calculation unit 123 calculates the machining parameter as follows.
  • Horizontal superimposition start position Horizontal superimposition start position determined when the face area was detected last.
  • Horizontal superimposition size W / N W is the horizontal size of the screen, N is a fixed value of 1 or more
  • Vertical superimposition start position Vertical superimposition start position determined when the face area was detected last.
  • the determination unit 121 reads the center coordinates (x1, y1) of the face area 503 shown in FIG. 8A from the buffer 122, and uses the read center coordinates as the parameter calculation unit. It outputs to 123.
  • the parameter calculation unit 123 calculates a difference (dx and dy) between the center coordinates (x1, y1) and the reference coordinates (x0, y0), and determines a superposition start position based on the sign of the calculated difference. At this time, the superimposition size is a fixed value as described above regardless of the absolute value of the difference.
  • the difference between the past center coordinates and the reference coordinates is 0 in the horizontal direction and positive in the vertical direction. Therefore, as illustrated in FIG. 8C, the image superimposing unit 124 does not perform superimposition in the horizontal direction, but superimposes the superimposed image 523 having a fixed area from the upper end on the partner image 521 in the vertical direction. A processed image 522 is generated.
  • the image processing unit 108 starts from the left end when the face area coordinates stored in the buffer 122 are on the right side of the reference coordinates.
  • a superimposed image having a predetermined area is superimposed on the counterpart image from the right end.
  • the face area coordinates stored in the buffer 122 are above the reference coordinates, from the upper end, or when the face area coordinates stored in the buffer 122 are below the reference coordinates, the lower end From the above, a superimposed image having a predetermined area is superimposed on the counterpart image.
  • the face detection unit 103 detects the center coordinates of the face region, and the image processing unit 108 determines the other image according to the horizontal and vertical distances and directions between the center coordinates and the reference coordinates. On the other hand, another image is superimposed.
  • the speaker 201 when the speaker 201 is on the left side toward the monitor 110, that is, the center coordinates of the face area 303 are on the right side of the reference coordinates in the input image 301.
  • the processed image 402 generated by superimposing the superimposed image 403 from the left end of the counterpart image 401 is displayed on the monitor 110.
  • the person when viewing the opponent from the left side, it is possible to recognize that the person (speaker) is on the left side from the center by making the area on the left side of the opponent invisible.
  • the right side, upper side and lower side when viewing the opponent from the left side, it is possible to recognize that the person (speaker) is on the left side from the center by making the area on the left side of the opponent invisible.
  • FIG. 9 is a schematic diagram showing how the partner image changes when the speaker according to Embodiment 1 moves.
  • the superimposed image 403 is superimposed on the partner image 401 from the left side of the partner image 401. That is, the processed image 402 generated by superimposing the superimposed image 403 on the left side of the partner image 401 is displayed on the monitor 110.
  • the speaker 201 sees the position where the superimposed image 403 is superimposed and the area of the superimposed image 403, so that the speaker 201 can move in which direction. Judgment can be made. For example, in the example shown in FIG. 9A, the speaker 201 may move to the right side.
  • the speaker 201 can confirm that the speaker 201 is moving in an appropriate direction, and it can be understood that the speaker 201 is not yet in an appropriate position (that is, a reference position).
  • the image communication apparatus 100 corresponds to the position of the speaker 201, specifically, the position of the face area of the speaker 201 in the captured image captured by the camera 101. Then, the partner image 401 is processed. Specifically, the image communication apparatus 100 generates the processed image 402 by superimposing the superimposed image 403 having a larger area on the partner image 401 as the position of the speaker 201 is farther from the reference position.
  • the speaker an impression that the speaker and the other party are communicating through the window frame. That is, when the speaker is in front of the window frame, the other party can be seen in front, and when the speaker is on the left side toward the window frame, the left side of the other party cannot be seen by the window frame (corresponding to the superimposed image). The same impression can be given to the speaker. Therefore, by applying the image communication apparatus 100 according to the present embodiment to the TV conference system, communication with higher presence can be performed.
  • the speaker does not display the self-portrait on the monitor, but looks at the position and size of the image superimposed on the partner image on the monitor to determine how much the speaker is displaced in which direction on the screen. It is possible to determine.
  • the image processing unit 108 refers to the face detection flag, and the last time the face area is detected with respect to the partner image. Another image having a certain area is superimposed and displayed based on the positional relationship between the center coordinates and the reference coordinates.
  • the speaker does not display his / her own image on the monitor, and the face area of the speaker is not imaged by seeing that a certain area of the partner image displayed on the monitor is superimposed on another image. It is possible to determine whether or not the image area is shifted.
  • the buffer 122 does not store the center coordinates when the face area was last detected, but may store the image processing parameters calculated by the parameter calculation unit 123 when the face area was last detected. Good.
  • the image superimposing unit 124 does not superimpose a fixed area superposed image, but has an area indicated by the horizontal superimposition size and the vertical superimposition size stored in the buffer 122. The superimposed image may be superimposed on the partner image.
  • the buffer 122 since the position at which the superimposed image is superimposed is determined based on the difference between the horizontal direction and the vertical direction, the buffer 122 includes the horizontal direction superposition start position among the image processing parameters. Only the vertical superimposition start position may be stored.
  • the image communication apparatus is an apparatus that performs projective transformation on the partner image received from another image communication apparatus based on the position of the speaker's face area in the input image.
  • FIG. 10 is a block diagram illustrating a configuration of the image communication apparatus 600 according to the second embodiment.
  • the image communication apparatus 600 shown in the figure is different from the image communication apparatus 100 of FIG. 1 in that an image processing unit 608 is provided instead of the image processing unit 108.
  • 10, components having the same reference numerals as those in FIG. 1 perform the same processes as those in the first embodiment, and thus description thereof will be omitted here and different points will be mainly described.
  • the image processing unit 608 is an example of an image processing unit that generates a processed image by processing the counterpart image. Specifically, the image processing unit 608 generates a processed image by projective conversion of the partner image so that the gradient of the projective conversion increases as the absolute difference between the face area coordinates and the reference coordinates increases. As illustrated in FIG. 10, the image processing unit 608 includes a determination unit 121, a buffer 122, a parameter calculation unit 623, and a projective conversion unit 624. Since the determination unit 121 and the buffer 122 perform the same processing as in the first embodiment, the description thereof is omitted.
  • the parameter calculation unit 623 calculates a difference between the center coordinates and the reference coordinates input from the determination unit 121, and calculates an image processing parameter using the calculated difference. Specifically, the parameter calculation unit 623 calculates an image processing parameter for projective transformation based on the calculated difference between positive and negative and the absolute value of the difference. The parameter calculation unit 623 outputs the calculated image processing parameters to the projective conversion unit 624. The process for calculating the image processing parameters for projective transformation will be described later.
  • the projective conversion unit 624 generates a processed image by performing projective conversion on the counterpart image generated by the image decoding unit 107 using the image processing parameters for projective conversion.
  • FIG. 11 is a flowchart illustrating a reception process of the image communication apparatus 600 according to the second embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
  • FIG. 11 The operation of the flowchart shown in FIG. 11 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
  • a storage device such as a ROM or a flash memory
  • a CPU not shown
  • FIG. 11 the processing steps assigned with the same reference numerals as those in FIG. 3 are the same operations as those in FIG.
  • the parameter calculation unit 623 calculates an image processing parameter for projective transformation based on the positive / negative and absolute value of the calculated difference (S307).
  • the parameter calculation unit 623 calculates an image processing parameter for projective transformation based on the sign of the calculated difference (S307). A specific example of the image parameter calculation process will be described later.
  • the projective transformation unit 624 performs projective transformation of the partner image using the calculated image processing parameter (S308). Then, the projective transformation unit 624 outputs the processed image generated by the projective transformation to the image output unit 109.
  • the image output unit 109 outputs the processed image (or the partner image) to the monitor 110 and displays it on the monitor 110 (S209).
  • the image processing unit 608 calculates an image processing parameter for projective transformation using the face detection flag input from the face detection unit 103 and the center coordinates of the face area.
  • the parameter calculation unit 623 performs center coordinates (x1) of the face area with reference coordinates (x0, y0) according to (Expression 1) and (Expression 2). , Y1), that is, the horizontal distance (dx) and the vertical distance (dy) are calculated.
  • the parameter calculation unit 623 performs threshold determination for each of the horizontal distance (dx) and the vertical distance (dy), and calculates an image processing parameter for performing projective transformation when the threshold is exceeded.
  • FIG. 12 is a diagram illustrating an example of a partner image 701 before execution of the projective transformation according to the second embodiment and a processed image 702 after execution.
  • FIG. 12B shows a processed image 702 generated by projective transformation of the counterpart image 701 shown in FIG. In the following, an example of projective transformation of the counterpart image 701 at an angle calculated in the horizontal direction is shown.
  • FIG. 13 is a diagram showing an example of a graph for calculating image processing parameters for projective transformation. Specifically, an example of the relationship between the horizontal distance (dx) and the horizontal direction projection conversion width W (dx) is shown.
  • the image processing parameter for projective conversion is a parameter indicating in which direction of the counterpart image 701 and how much the projective conversion is performed.
  • the image processing parameters are the following two points.
  • Image processing parameters for projective transformation Horizontal projection conversion direction: Left side or right side of partner image 701 Horizontal projection conversion width ... W (dx)
  • the horizontal direction projective transformation direction as one of the image processing parameters being the left side of the partner image 701 is a projective transformation so that the left side of the partner image is far away. That is, projective transformation of the partner image 701 is performed so that the length of the left end of the partner image 701 is smaller than the length of the right end.
  • the horizontal direction projective transformation direction being the right side of the partner image 701 is a projective transformation so that the right side of the partner image is far away (FIG. 12B). That is, projective transformation of the partner image 701 is performed so that the length of the right end of the partner image 701 is smaller than the length of the left end.
  • the slope of the projective transformation is represented by the ratio of the length at the left end to the length at the right end.
  • the projective transformation unit 624 has four points (upper left TL, upper right TR, lower left BL, lower right BR) of the counterpart image 701, and four points (upper left TL ′, upper right TR ′) of the processed image 702. , Lower left BL ′, lower right BR ′).
  • FIG. 12B shows an example in which the horizontal projection conversion direction is the right side. As shown in the figure, the projective transformation can be performed so that the right side is farther as the horizontal direction projection transformation width W (dx) is larger.
  • the parameter calculation unit 623 calculates the image processing parameter in the horizontal direction according to the following when there is a face area.
  • the parameter calculation unit 623 determines the horizontal projection conversion direction according to the sign of dx. Specifically, the parameter calculation unit 623 determines the right side when the sign of dx is positive and the left side when the sign is negative.
  • the fact that dx is positive means that the face region 303 exists on the right side in the input image 301, that is, as shown in FIG. ) Presents the speaker 201.
  • the partner since the partner is seen from the left side, the right side of the partner image 701 can be shown to the speaker 201 so that the right side is farther away by increasing the inclination of the projective transformation.
  • the calculation method is not limited to this example, and any method can be used as long as the value of the projective transformation width increases as the horizontal distance or the vertical distance increases. That is, it is only necessary that the horizontal distance
  • the threshold th_x may be 0.
  • the image processing unit 608 makes the right end of the counterpart image shorter than the left end, or the face region coordinates in the input image
  • the processed image is generated by projective transformation of the partner image so that the left end of the partner image is shorter than the right end.
  • the image processing unit 608 increases the inclination of the projective transformation as the absolute difference between the face region coordinates and the reference coordinates in the horizontal direction increases, in other words, the difference between the right end length and the left end length. Projective transformation of the partner image is performed so that becomes larger.
  • the case where the face area does not exist is as follows.
  • Horizontal projection conversion direction The side calculated when the face area was detected last.
  • Horizontal projection conversion width ... H / K (H: vertical size of image, K: fixed value of 1 or more)
  • the determination unit 121 reads the previous face area coordinates from the buffer 122 and outputs the read face area coordinates to the parameter calculation unit 623.
  • the parameter calculation unit 623 determines the horizontal projection conversion direction using the read face area coordinates.
  • the horizontal projection transformation width is a fixed value as described above.
  • the image processing unit 608 determines that the image of the partner image is in the case where the face area coordinates stored in the buffer 122 are on the right side of the reference coordinates.
  • image processing parameters can be calculated in the same way in the vertical direction, and instead of performing horizontal projective conversion, vertical projective conversion may be performed. Alternatively, both horizontal and vertical projective transformations can be used.
  • the image processing parameters are the following two points.
  • Image processing parameters Vertical projection conversion direction: upper end or lower end of the partner image 701 Vertical projection transformation width ... H (dy)
  • the image processing unit 608 makes the lower end of the counterpart image shorter than the upper end, or the face area coordinates in the input image
  • the processed image is generated by projective transformation of the partner image so that the upper end of the partner image is shorter than the lower end.
  • the image processing unit 608 increases the inclination of the projective transformation as the absolute value of the vertical difference between the face area coordinates and the reference coordinates increases, in other words, the difference between the upper end length and the lower end length. Projective transformation of the partner image is performed so that becomes larger.
  • the image processing unit 608 determines that the lower end of the partner image is the upper end when the face area coordinates stored in the buffer 122 are above the reference coordinates.
  • the partner image is projected at a predetermined inclination so that the upper end of the partner image is shorter than the lower end when the face area coordinates stored in the buffer 122 are below the reference coordinates. Convert.
  • the face detection unit 103 detects the center coordinates of the face region, and the image processing unit 608 applies the distance and direction in the horizontal and vertical directions between the center coordinates and the reference coordinates to the partner image. Projective transformation is performed.
  • the image communication apparatus 600 of the present embodiment when the speaker 201 is on the left side of the monitor 110, that is, when the center coordinates of the face area 303 are on the right side of the reference coordinates in the input image 301, Projective transformation is performed on the partner image 701 so that the left end of the partner image 701 is shorter than the right end.
  • Projective transformation is performed on the partner image 701 so that the left end of the partner image 701 is shorter than the right end.
  • FIG. 14 is a schematic diagram showing how the partner image changes when the speaker according to the second embodiment moves.
  • the right end of the partner image 701 is shorter than the left end, that is, the right side of the partner image 701 is seen far away.
  • the partner image 701 has undergone projective transformation. That is, the monitor 110 displays a processed image 702 generated by projective transformation so that the right side of the partner image 701 can be seen in the distance.
  • the speaker 201 determines how much to move in which direction by looking at the degree (ie, tilt) and direction of the projective transformation of the partner image 701 in the processed image 702 displayed on the monitor 110. can do. For example, in the example shown in FIG. 14A, the speaker 201 may move to the right side.
  • the speaker 201 when the speaker 201 moves to the right side, in the processed image 702 displayed on the monitor 110, the degree (inclination) of the projective transformation of the counterpart image 701 is reduced. As a result, the speaker 201 can confirm that the speaker 201 is moving in an appropriate direction, and it can be understood that the speaker 201 is not yet in an appropriate position (that is, a reference position).
  • the image communication apparatus 600 corresponds to the position of the speaker 201, specifically, the position of the face area of the speaker 201 in the captured image captured by the camera 101. Then, the partner image 701 is processed. Specifically, the image communication apparatus 600 generates a processed image 702 by projectively transforming the partner image 701 with a larger inclination as the position of the speaker 201 is farther from the reference position.
  • the speaker does not display the self-portrait on the monitor, but looks at the tilt direction and the tilt magnitude of the projected image on the monitor, and how much the speaker is displaced in which direction with respect to the center of the screen. Can be determined.
  • the image processing unit 608 refers to the face detection flag, and sets a fixed angle on the side set at the time of the last face detection with respect to the partner image. Perform a projective transformation.
  • the speaker does not display the self-portrait on the monitor, but sees that the other party's image displayed on the monitor is projective transformed at a constant angle, and the speaker's face area is not imaged, It is possible to determine which of the imaging areas is shifted. Specifically, according to the image communication apparatus 600 of the present embodiment, when the speaker is shifted from the proper position to the left side, the right side of the partner image appears far and the speaker is shifted from the proper position to the right side. If so, the partner image is processed so that the left side of the partner image can be seen far away. For this reason, the speaker can determine how much he / she is deviated from the appropriate position only by determining which part of the partner image is seen far away.
  • the buffer 122 does not store the center coordinates when the face area was last detected, but may store the image processing parameters calculated by the parameter calculation unit 123 when the face area was last detected. Good.
  • the image superimposing unit 124 may perform the projective conversion on the partner image with the projective conversion width stored in the buffer 122 instead of performing the projective conversion at a constant angle.
  • the buffer 122 since the direction for projective transformation is determined based on the sign of the difference between the horizontal direction and the vertical direction, the buffer 122 includes the horizontal direction projective transformation direction and the vertical direction among the image processing parameters. Only the projective transformation direction may be stored.
  • the image communication apparatus is an apparatus that blurs a partner image received from another image communication apparatus in accordance with the difference between the detected radius of the speaker's face area and the reference radius.
  • FIG. 15 is a block diagram illustrating a configuration of the image communication apparatus 800 according to the third embodiment.
  • the image communication apparatus 800 shown in the figure is different from the image communication apparatus 100 in FIG. 1 in that an image processing unit 808 is provided instead of the image processing unit 108.
  • the components denoted by the same reference numerals as those in FIG. 1 perform the same processing as in the first embodiment, and thus description thereof will be omitted here and different points will be mainly described.
  • the image processing unit 808 is an example of an image processing unit that generates a processed image by processing a counterpart image. Specifically, the image processing unit 808 generates a processed image by blurring the partner image so that the blur amount increases as the absolute value of the difference between the size of the face area and a predetermined reference size increases. To do. As illustrated in FIG. 15, the image processing unit 808 includes a determination unit 121, a buffer 122, a parameter calculation unit 823, and a blur processing unit 824. Since the determination unit 121 and the buffer 122 perform the same processing as in the first embodiment, the description thereof is omitted.
  • the parameter calculation unit 823 calculates a difference between the radius of the face area input from the determination unit 121 and a predetermined reference radius, and calculates an image processing parameter using the calculated difference. Specifically, a blur parameter used for blur processing, which is one of the processes for blurring an image, is calculated as the image processing parameter. The blur parameter calculation process will be described later.
  • the blur processing unit 824 generates a processed image by performing blur processing on the counterpart image generated by the image decoding unit 107 using the blur parameter.
  • the image processing unit 808 calculates the absolute difference between the radius of the face area, which is an example of the size of the face area, and the reference radius, which is an example of the reference size, and the calculated difference absolute value is determined in advance.
  • the processed image is generated by processing the counterpart image so that the counterpart image and the processed image differ greatly as the difference absolute value increases.
  • FIG. 16 is a flowchart illustrating a reception process of the image communication apparatus 800 according to the third embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
  • FIG. 16 The operation of the flowchart shown in FIG. 16 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
  • a storage device such as a ROM or a flash memory
  • a CPU not shown
  • FIG. 16 the processing steps given the same reference numerals as those in FIG. 3 are the same operations as those in FIG.
  • the determination unit 121 determines that a face area exists in the input image (Yes in S202)
  • the determination unit 121 determines the radius R of the face area input from the face detection unit 103. It outputs to the parameter calculation part 823. At this time, the radius R is stored in the buffer 122.
  • the determination unit 121 reads the radius R of the face area from the buffer 122 (S403), and outputs the read radius R of the face area to the parameter calculation unit 823.
  • the buffer 122 stores the radius R of the face area when the face area was last detected, and performs the blur process based on the last detected face area by the following process. Execute.
  • the parameter calculation unit 823 calculates the difference between the radius R input from the determination unit 121 and the reference radius R0 (S404).
  • the parameter calculation unit 823 calculates an image processing parameter based on the calculated absolute value of the difference (S406).
  • the blur processing unit 824 blurs the partner image (S407). For example, blur processing is executed on the partner image using the blur parameter. Then, the processed image generated by the blur processing is output to the image output unit 109.
  • the blur processing unit 824 outputs the partner image as it is to the image output unit 109 without performing blur processing on the input partner image.
  • the image output unit 109 outputs the processed image (or the partner image) input from the blur processing unit 824 to the monitor 110 and displays it on the monitor 110 (S209).
  • the image processing unit 808 calculates an image processing parameter for blur processing using the face detection flag input from the face detection unit 103 and the face radius information.
  • the parameter calculation unit 823 determines a blur parameter ⁇ for blur processing using the radius R of the face area as follows.
  • the parameter calculation unit 823 calculates a difference absolute value dr between a reference radius R0 and a radius R determined in advance using Expression 3.
  • FIG. 17 is a diagram illustrating an example of a graph for calculating the blur parameter ⁇ .
  • dr is an absolute difference between the radius R of the face area and the reference radius R0.
  • ⁇ (dr) is a blur parameter.
  • th_r is a predetermined threshold value.
  • ⁇ _max is a maximum blur parameter.
  • the blur parameter ⁇ has a feature that increases in proportion to the absolute difference value dr.
  • the blur parameter ⁇ only needs to be larger as the radius R is significantly different from the reference radius R0. That is, the blur parameter ⁇ only needs to have a positive correlation with the difference absolute value dr.
  • the threshold th_r may be 0.
  • Expression 4 is an expression used when blur processing is executed in the horizontal direction (x-axis direction).
  • out (x, y) is a pixel value of the image in the x, y coordinates after blurring.
  • in (x, y) is a pixel value in the x and y coordinates of the counterpart image.
  • the blur parameter ⁇ and the maximum blur parameter ⁇ _max are values calculated by the parameter calculation unit 823 using the graph shown in FIG.
  • the blur processing unit 824 can perform blur processing not only in the horizontal direction but also in the vertical direction or in both the horizontal and vertical directions.
  • FIG. 18 is a diagram illustrating an example of the input images 901 and 911, the counterpart image 921, and the processed image 922 according to the third embodiment.
  • FIG. 18A is a diagram illustrating a case where the face area 903 of the speaker image 902 in the input image 901 is larger than the reference radius R0.
  • FIG. 18B is a diagram showing a case where the face area 913 of the speaker image 912 in the input image 911 is smaller than the reference radius R0.
  • FIG. 18C is a diagram illustrating an example of the partner image 921 before the blur processing according to the third embodiment is executed.
  • FIG. 18D is a diagram illustrating an example of the processed image 922 after the blur processing is executed.
  • the radius R of the face area 903 detected by the face detection unit 103 is larger than the reference radius R0, as shown in FIG. .
  • the radius R of the face region 913 detected by the face detection unit 103 is smaller than the reference radius R0, as shown in FIG.
  • the parameter calculation unit 823 determines the blur parameter ⁇ according to the graph shown in FIG. Then, the blur processing unit 824 generates a processed image 922 by performing blur processing on the counterpart image 921 using the determined blur parameter ⁇ .
  • the image processing unit 808 performs control so that the value of the blur parameter ⁇ used for blur processing increases as the difference absolute value between the radius of the face area and the reference radius increases. Then, increase the degree of blur on the partner image.
  • the speaker looks at the degree of blurring of the partner image displayed on the monitor, the distance between the speaker and the camera is not appropriate, and the size of the face Can be distinguished from the reference radius.
  • the face of the partner image is blurred when the speaker is too close to the camera or too far away. For this reason, the speaker can determine whether or not he / she is at an appropriate position with respect to the camera only by looking at the partner image.
  • the other party image can be blurred using the radius of the face area detected in the past. You can confirm that you are not there.
  • the partner image may be blurred using a fixed blur parameter ⁇ .
  • the image communication device enlarges or reduces the size of the partner image received from another image communication device according to the difference between the detected radius of the face area and the reference radius, thereby reducing the size of the partner image. It is a device to change.
  • FIG. 19 is a block diagram showing the configuration of the image communication apparatus 1000 according to the fourth embodiment.
  • the image communication apparatus 1000 shown in the figure is different from the image communication apparatus 800 in FIG. 15 in that an image processing unit 1008 is provided instead of the image processing unit 808.
  • 19, components having the same reference numerals as those in FIG. 15 perform the same processes as those in the third embodiment, and thus description thereof will be omitted here and different points will be mainly described.
  • the image processing unit 1008 is an example of an image processing unit that generates a processed image by processing the counterpart image. Specifically, the image processing unit 1008 enlarges the partner image when the size of the face region is larger than the reference size, and reduces the partner image when the size of the face region is smaller than the reference size. Generate an image. At this time, the larger the difference absolute value between the size of the face area and the reference size, the larger the enlargement ratio or reduction ratio.
  • the image processing unit 1008 includes a determination unit 121, a buffer 122, a parameter calculation unit 1023, and a size changing unit 1024.
  • the determination unit 121 and the buffer 122 perform the same processing as in the third embodiment, and thus description thereof is omitted.
  • the parameter calculation unit 1023 calculates a difference between the radius of the face area input from the determination unit 121 and a predetermined reference radius, and calculates an image processing parameter using the calculated difference. Specifically, an enlargement / reduction parameter Z for changing the size of the counterpart image is determined as the image processing parameter. Specifically, the enlargement / reduction parameter Z is an enlargement rate or a reduction rate. When the enlargement / reduction parameter Z is a value smaller than 1, the counterpart image is reduced. When the enlargement / reduction parameter Z is a value larger than 1, the counterpart image is enlarged.
  • the size changing unit 1024 generates a processed image by changing the size of the counterpart image generated by the image decoding unit 107 using the enlargement / reduction parameter.
  • the enlargement / reduction parameter is also a horizontal size ratio (or a vertical size ratio) between the generated processed image and the counterpart image.
  • FIG. 20 is a flowchart illustrating a reception process of the image communication apparatus 1000 according to the fourth embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
  • FIG. 20 The operation of the flowchart shown in FIG. 20 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
  • a storage device such as a ROM or a flash memory
  • a CPU not shown
  • FIG. 20 the processing steps to which the same reference numerals as those in FIG. 16 are given are the same operations as those in FIG.
  • the parameter calculation unit 1023 calculates the difference between the radius R input from the determination unit 121 and the reference radius R0 (S404).
  • the parameter calculation unit 1023 calculates an image processing parameter based on the calculated difference when the calculated difference is larger than the predetermined first threshold (“larger than the first threshold” in S505) (S506). Then, using the calculated image processing parameter, the size changing unit 1024 generates a processed image by enlarging the partner image (S507). Then, the size changing unit 1024 outputs the generated processed image to the image output unit 109.
  • the parameter calculation unit 1023 calculates an image processing parameter based on the calculated difference when the calculated difference is smaller than the predetermined second threshold (“smaller than the second threshold” in S505) (S508). Then, using the calculated image processing parameter, the size changing unit 1024 generates a processed image by reducing the partner image (S509). Then, the size changing unit 1024 outputs the generated processed image to the image output unit 109.
  • the size changing unit 1024 does not change the size of the input partner image.
  • the image is output as it is to the image output unit 109.
  • the image output unit 109 outputs the processed image (or the partner image) input from the size changing unit 1024 to the monitor 110 and displays it on the monitor 110 (S209).
  • the image processing unit 1008 calculates the image processing parameters for the size change processing using the face detection flag input from the face detection unit 103 and the face radius information.
  • the parameter calculation unit 1023 determines the enlargement / reduction parameter Z using the face radius information R as follows.
  • the parameter calculation unit 1023 calculates a difference value dR between a reference radius R0 and a radius determined in advance using Equation 5.
  • FIG. 21 is a diagram illustrating an example of a graph for calculating the enlargement / reduction parameter Z.
  • dR is a difference value between the radius R and the reference radius R0
  • Z (dR) is an enlargement / reduction parameter
  • th_R is a preset threshold value
  • Z_max and Z_min are the maximum value and the minimum value of the enlargement / reduction parameter, respectively. That is, th_R corresponds to the first threshold shown in FIG. 20, and ⁇ th_R corresponds to the second threshold.
  • the enlargement / reduction parameter is 1.
  • Z is a value of 1 or more.
  • the face radius R is smaller than the reference radius R0 and the difference value dR is less than the threshold ⁇ th_R, Z is a value of 1 or less.
  • both the first threshold value and the second threshold value may be zero.
  • the size changing unit 1024 changes the size of the counterpart image using Equation 6 and Equation 7.
  • W_in is the horizontal size of the counterpart image
  • H_in is the vertical size of the counterpart image
  • W_out is the horizontal size of the processed image generated by enlargement or reduction
  • H_out is the vertical size of the processed image generated by enlargement or reduction
  • Z is an enlargement / reduction parameter determined by the parameter calculation unit 1023.
  • FIG. 22 is a diagram illustrating an example of the input images 1101 and 1111, the counterpart image 1121, and the processed images 1122 and 1123 according to the fourth embodiment.
  • FIG. 22A is a diagram illustrating a case where the face area 1103 of the speaker image 1102 in the input image 1101 is larger than the reference radius R0.
  • FIG. 22B is a diagram illustrating a case where the face area 1113 of the speaker image 1112 in the input image 1111 is smaller than the reference radius R0.
  • FIG. 22C is a diagram illustrating an example of the partner image 1121 before executing the size changing process of the fourth embodiment.
  • FIG. 22D is a diagram illustrating an example of the processed image 1122 generated by enlarging the partner image 1121.
  • FIG. 22E is a diagram showing an example of a processed image 1123 generated by reducing the partner image 1121.
  • the image processing unit 1008 uses the radius of the face region to enlarge and display the partner image when it is larger than the reference radius, and conversely, when it is smaller than the reference radius. The other party's image is reduced and displayed.
  • the speaker's distance between the speaker and the camera is appropriate by looking at the size of the partner image's face on the monitor which changes as the partner image is enlarged or reduced. It is possible to determine that the size of the face is larger or smaller than the reference radius range.
  • the image communication apparatus 1000 of the present embodiment when the speaker is too close to the camera, the face of the partner image becomes large, and when the speaker is too far away from the camera, the face of the partner image is small. Become. For this reason, the speaker can determine whether his / her position is too close or too far away from the camera simply by looking at the partner image.
  • the size of the partner image can be changed using the radius of the face area detected in the past. Can confirm that he is not in the right position.
  • the partner image may be enlarged or reduced using a fixed enlargement rate or reduction rate. At this time, whether to enlarge or reduce can be determined based on whether the radius of the past face area held in the buffer 122 is larger or smaller than the reference radius.
  • the image communication apparatus is an apparatus that sets the reference coordinates and the reference radius used when calculating the image processing parameters using the face area coordinates and the face area radius of the counterpart image.
  • FIG. 23 is a block diagram illustrating a configuration of the image communication apparatus 1200 according to the fifth embodiment.
  • the image communication apparatus 1200 shown in the figure is different from the image communication apparatus 100 in FIG. 1 in that an image processing unit 1208 is provided instead of the image processing unit 108.
  • the constituent elements denoted by the same reference numerals as those in FIG. 1 perform the same processing as in the first embodiment, and therefore description thereof will be omitted here and different points will be mainly described.
  • the image processing unit 1208 is an example of an image processing unit that generates a processed image by processing the counterpart image. Specifically, the image processing unit 1208 detects a face area from the partner image, and sets face area coordinates indicating the position of the detected face area as reference coordinates. As shown in FIG. 23, the image processing unit 1208 includes a determination unit 121, a buffer 122, a parameter calculation unit 123, an image superimposition unit 124, and a reference coordinate setting unit 1225. Since the determination unit 121, the buffer 122, the parameter calculation unit 123, and the image superimposing unit 124 perform the same processing as in the first embodiment, the description thereof is omitted.
  • the reference coordinate setting unit 1225 detects a face area from the partner image input from the image decoding unit 107. Similar to the face detection unit 103, the reference coordinate setting unit 1225 detects a face region by a method such as template matching or ellipse detection. Then, the reference coordinate setting unit 1225 outputs the detected center coordinate and radius of the face area to the parameter calculation unit 123 as the reference coordinate and the reference radius, respectively. Further, the partner image is output to the image superimposing unit 124. When the size of the counterpart image and the input image are different, the reference coordinates and the reference radius are corrected according to the size of the input image.
  • FIG. 24 is a flowchart illustrating a reception process of the image communication apparatus 1200 according to the fifth embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
  • FIG. 24 The operation of the flowchart shown in FIG. 24 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
  • a storage device such as a ROM or a flash memory
  • processing steps assigned with the same reference numerals as those in FIG. 3 are the same operations as those in FIG.
  • the image decoding unit 107 generates a partner image by decoding image data received from another image communication apparatus via the network 111, and outputs the partner image to the reference coordinate setting unit 1225 (S201).
  • the reference coordinate setting unit 1225 sets a reference coordinate and a reference radius by detecting a face area from the partner image (S602). At this time, only one of the reference coordinates and the reference radius may be set depending on the type of processing.
  • the reference coordinate setting unit 1225 detects the face area from the partner image, calculates the center coordinates (X0, Y0) and the radius R0 of the face area, and calculates the calculated center coordinates (X0, Y0) and the radius R0, respectively. , The reference coordinates (x0, y0) and the reference radius R0 are set. When the face area cannot be detected, the reference coordinates are not updated. In addition, it is possible to update only the horizontal coordinates and only the vertical coordinates as the reference coordinates.
  • FIG. 25A and 25B are diagrams showing the relationship between the reference coordinates and the reference radius in the counterpart image 1301 and the input image 1311 of the fifth embodiment.
  • FIG. 25A is a diagram illustrating an example of the partner image 1301.
  • the partner image 1301 includes a partner person image 1302 and a face area 1303.
  • FIG. 25B is a diagram illustrating an example of the input image 1311.
  • the input image 1311 includes a speaker image 1312 and a face area 1313.
  • the reference coordinate setting unit 1225 uses the center coordinates (X0, Y0) and the radius R0 obtained by detecting the face from the counterpart image 1301 as the reference coordinates (x0, y0) and the reference radius R0 are set.
  • the reference coordinate setting unit 1225 sets the partner face area in the partner image to the reference coordinate and the reference radius, and the speaker face area detected by the face detection unit 103 is set.
  • the superimposed image is displayed on the partner image according to the deviation direction from the face area coordinates and the difference value.
  • the speaker sees the partner image superimposed on the monitor to determine how much the speaker's face area is shifted with respect to the partner's face. It can be determined.
  • the image communication apparatus is an apparatus that detects a face area from an input image obtained by capturing a wider area than an image transmitted to another image communication apparatus. That is, the image communication apparatus according to the sixth embodiment cuts out a part of the input image and transmits the cut image to another image communication apparatus.
  • FIG. 26 is a block diagram illustrating a configuration of the image communication apparatus 1400 according to the sixth embodiment.
  • the image communication apparatus 1400 shown in the figure is different from the image communication apparatus 100 shown in FIG. 1 in that an image cutout unit 1412 is newly provided.
  • the constituent elements denoted by the same reference numerals as those in FIG. 1 perform the same processing as in the first embodiment, and therefore description thereof will be omitted here and different points will be mainly described.
  • the image cutout unit 1412 cuts out a part of the input image input from the image input unit 102 via the face detection unit 103 and outputs the cut out image to the image encoding unit 104.
  • the image cutout unit 1412 cuts out a region having a constant area centered on the reference coordinates or a region having a constant area centered on the center coordinates of the input image.
  • FIG. 27 is a flowchart showing a transmission process of the image communication apparatus 1400 according to the sixth embodiment. Since the operation at the time of image reception is the same as that in Embodiment 1 (FIG. 3), description thereof is omitted here.
  • FIG. 27 The operation of the flowchart shown in FIG. 27 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
  • a storage device such as a ROM or a flash memory
  • processing steps assigned with the same reference numerals as those in FIG. 2 are the same operations as those in FIG.
  • the image input unit 102 acquires an uncompressed captured image from the camera 101 in units of frames, and outputs the acquired image to the face detection unit 103 (S101). Note that the camera 101 preferably captures as wide a region as possible.
  • the face detection unit 103 performs face region detection from the uncompressed captured image acquired from the image input unit 102 by a method such as template matching or ellipse detection (S102). Then, the face detection unit 103 calculates information such as the face detection flag, the center coordinates of the face region, and the radius of the face region, and outputs the calculated information to the image processing unit 108 and also cuts the input image. Output to the output unit 1412.
  • the image cutout unit 1412 cuts out a preset transmission image area from the input image acquired from the image input unit 102 via the face detection unit 103 and outputs the cut out transmission image to the image encoding unit 104 ( S703).
  • the image encoding unit 104 converts the transmission image input from the image cutout unit 1412 to H.264.
  • the image data is encoded using the H.264 compression encoding method, and the image data is output to the image transmission unit 105.
  • the image transmitting unit 105 converts the image data input from the image encoding unit 104 into RTP packets according to a packet transmission method such as RTP, and outputs the packet to the network 111 (S103).
  • FIG. 28 is a diagram illustrating an example of an input image 1501 and a transmission image 1502 according to the sixth embodiment.
  • the transmission image 1502 is a part of the input image 1501, is cut out from the input image by the image cutout unit 1412, and is output to the image encoding unit 104.
  • the face detection unit 103 can detect the center coordinates (x1, y1) and the radius R of the face area from the input image 1501. . Therefore, since the face area can be detected from a wider range of images, the partner image can be processed based on the actual speaker position.
  • the face detection unit 103 detects a face region from a wide range including the outside of the transmission image region, and the image processing unit 108 detects the horizontal and horizontal coordinates of the center coordinate and the reference coordinate of the face region. Another image is superimposed on the partner image according to the distance and direction in the vertical direction.
  • the speaker can see the position and size of the image superimposed on the partner image on the monitor and how much the speaker is shifted in which direction on the screen. Can be determined.
  • the speaker's face area exists outside the transmission image area, it is possible to perform superimposed image display according to the position of the face area accurately.
  • the image communication apparatus is an apparatus that detects a face region from a captured image acquired from a camera that captures a wider range using two cameras having different imaging ranges.
  • FIG. 29 is a block diagram illustrating a configuration of the image communication apparatus 1600 according to the seventh embodiment.
  • the image communication apparatus 1600 shown in the figure acquires not only the first captured image captured by the camera 101 but also the second captured image captured by the camera 1601.
  • the image communication apparatus 1600 is different from the image communication apparatus 100 of FIG. 1 in that an image input unit 1602 and a face detection unit 1603 are newly provided.
  • the same reference numerals as those in FIG. 1 perform the same processing as that in the first embodiment, and thus description thereof will be omitted here, and different points will be mainly described.
  • the image input unit 1602 is an interface connected to a camera 1601 that captures an image, and acquires a second captured image captured by the camera 1601.
  • the image input unit 1602 outputs the second captured image to the face detection unit 1603 as a second input image.
  • the image input unit 102 outputs the first captured image captured by the camera 101 to the face detection unit 1603 as a first input image.
  • FIG. 30 is a diagram showing a positional relationship between the two cameras 101 and 1601 and the speaker 1701.
  • the camera 101 and the camera 1601 are installed in the upper center of the monitor 110.
  • An area that can be imaged by the camera 101 is an imaging area 202
  • an area that can be imaged by the camera 1601 is an imaging area 1702.
  • the camera 1601 is installed in the vicinity of the camera 101 and further has a zoom rate smaller than that of the camera 101 or is equipped with a wide-angle lens.
  • the imaging area 1702 of the camera 1601 substantially includes the imaging area 202 of the camera 101 and is wider than the imaging area 202.
  • the face detection unit 1603 detects a face area from the second input image input from the image input unit 1602. That is, the face detection unit 1603 detects a face area from the second captured image captured by the camera 1601 that captures a wider range.
  • the face detection unit 1603 performs face area detection processing by a method such as template matching or ellipse detection, and information such as a face detection flag indicating the presence or absence of a face area, center coordinates, and the radius of the face area is an image processing unit. It outputs to 108.
  • the face detection unit 1603 outputs the first input image input from the image input unit 102 to the image encoding unit 104. That is, the face detection unit 1603 does not perform face detection processing from the first captured image captured by the camera 101. Then, the image encoding unit 104 encodes the first input image input from the image input unit 102 via the face detection unit 1603.
  • the image communication apparatus 1600 can detect a face area from a wider range, and thus can generate a processed image reflecting the position of the speaker 1701 more accurately.
  • the second captured image captured by the camera 1601 is not transmitted to another image communication apparatus or the like and is used only for detecting the face area, so that the image quality may not be good. That is, an inexpensive camera with low accuracy can be used.
  • FIG. 31 is a flowchart showing transmission processing of the image communication apparatus 1600 according to the seventh embodiment. Since the operation at the time of image reception is the same as that in Embodiment 1 (FIG. 3), description thereof is omitted here.
  • the operation of the flowchart shown in FIG. 31 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
  • a storage device such as a ROM or a flash memory
  • a CPU not shown
  • the image input unit 102 acquires an uncompressed image in units of frames from the camera 101 and outputs it to the face detection unit 1603. Further, the image input unit 1602 acquires an uncompressed image from the camera 1601 in units of frames, and outputs it to the face detection unit 1603 (S801).
  • the face detection unit 1603 performs face area detection from the uncompressed second captured image input from the image input unit 1602 by a technique such as template matching or ellipse detection (S802). Then, the face detection unit 1603 calculates information such as a face detection flag, the center coordinates of the face area, and the radius of the face area. The face detection unit 1603 outputs the face detection flag, the face area coordinates converted into the coordinate system of the first input image 1801, and information such as the face radius to the image processing unit 108 and is input from the image input unit 102. The first input image 1801 is output to the image encoding unit 104.
  • the image encoding unit 104 encodes the first input image 1801 having a small size, and outputs the image data to the image transmission unit 105.
  • the image transmitting unit 105 converts the image data input from the image encoding unit 104 into RTP packets according to a packet transmission method such as RTP, and outputs the packet to the network 111 (S803).
  • FIG. 32 is a diagram illustrating an example of the first input image 1801 and the second input image 1802 according to the seventh embodiment.
  • the first input image 1801 is an image for transmission that is captured by the camera 101, acquired by the image input unit 102, and transmitted to another image communication apparatus.
  • the second input image 1802 is a face detection image that is captured by the camera 1601, acquired by the image input unit 1602, and for detecting a face area by the face detection unit 1603. As shown in FIG. 32, since the second input image 1802 input from the image input unit 1602 has a wider angle than the first input image 1801, the face detection unit 1603 determines the center of the face area according to Equations 8 to 10. The coordinates and the radius information of the face area are converted into the coordinate system of the first input image 1801.
  • x1 ′ and y1 ′ are the coordinates of the face area after conversion into the coordinate system of the first input image 1801, and R ′ is the radius of the face area after conversion.
  • width_1 and height_1 are the horizontal size and vertical size of the first input image 1801, respectively, and width_2 and height_2 are the horizontal size and vertical size of the second input image 1802, respectively.
  • a speaker image 1803 exists at 1802. Since the face detection unit 1603 detects the face area 1804 from the second input image 1802, it correctly detects the face area 1804 even when the speaker image 1803 does not exist in the first input image 1801 for transmission. Can do.
  • the face detection unit 1603 performs the face detection process from the second input image acquired from the camera 1601 that captures an image having a wider angle than the camera 101. That is, the center coordinates of the face area are determined from a wide area including outside the area of the transmission image (first input image), and the image processing unit 108 determines the horizontal and vertical distances between the center coordinates of the face area and the reference coordinates. In accordance with the direction, another image is superimposed and displayed on the partner image.
  • the speaker does not display the self-portrait on the monitor, but looks at the position and size of the image superimposed on the partner image on the monitor to determine how much the speaker is displaced in which direction on the screen. Can be determined. At this time, even if the speaker's face area exists outside the transmission image area, it is possible to perform superimposed image display according to the position of the face area accurately.
  • the image communication apparatus of the present invention may enlarge the partner image based on the difference between the center coordinates of the face area and the reference coordinates. Specifically, it is as follows.
  • FIG. 33 is a diagram illustrating an example of the partner image 1901 and the processed image 1911.
  • FIG. 33B is a view in which a region 1903 excluding the predetermined region 1902 is cut out from the counterpart image 1901 in FIG. 33A and the cut-out region 1903 is enlarged.
  • the parameter calculation unit 123 calculates the following four as image processing parameters.
  • Horizontal cut-out position left end or right end of counterpart image 1901 Horizontal cut-out size W (dx) 3.
  • Vertical cut-out position upper end or lower end of the partner image 1901 Vertical cut-out size ... H (dy)
  • Each image processing parameter is calculated by the parameter calculation unit 123 in the same manner as in the first embodiment, for example.
  • parameter calculation is performed by replacing the horizontal cutout position with the horizontal overlap start position, the horizontal cutout size with the horizontal overlap size, the vertical cutout position with the vertical overlap start position, and the vertical cutout size with the vertical overlap size.
  • the unit 123 can calculate each image processing parameter.
  • the area obtained by removing the area of the width H (dy) from the upper end of the partner image 1901 is enlarged.
  • the area of the width H (dy) is excluded from the lower end of the partner image 1901. The area is enlarged.
  • the method of determining the horizontal cutout position and the vertical cutout position may be the reverse of the first embodiment. That is, it may be the right end when dx is positive and the left end when dx is negative. Similarly, the lower end may be used when dy is positive, and the upper end may be used when negative.
  • the center coordinates indicating the center of the face area are used as the face area coordinates.
  • any value may be used as long as the value indicates the position of the face area.
  • the radius of the face area is used as the size of the face area.
  • any shape may be used as long as it indicates the size of the face area. For example, the area of the face region or the number of pixels may be used.
  • the superimposition process, the projective transformation process, the blur process, and the size change process have been described as the techniques for processing the counterpart image.
  • the counterpart image may be processed by other techniques. .
  • the image processing unit included in the image communication apparatus of the present invention calculates a difference between at least one of the position and size of the face area detected by the face detection unit and a predetermined reference, and the larger the calculated difference, the more the processing is performed.
  • the partner image is processed so that the previous partner image and the processed image after processing are greatly different. Further, at this time, if the position of the face area is away from the reference position in a predetermined direction, the partner image is processed so as to indicate the direction to be returned so that the speaker can return to the reference position. Is preferred.
  • the coordinates of the face may be detected using an IC tag that can specify the speaker position.
  • the image communication apparatus of the present invention may acquire captured images from two physically different cameras. That is, you may acquire the 1st captured image imaged with the 1st camera, and the 2nd captured image imaged with the 2nd camera.
  • FIG. 34 is a block diagram showing an example of a different form of the image communication apparatus of the present invention.
  • image communication apparatus 2000 is not provided with image encoding section 104 and image decoding section 107, as compared with image communication apparatus 100 shown in FIG. Is different. That is, the image transmission unit 105 transmits the captured image acquired by the image input unit 102 without encoding. Similarly, the image receiving unit 106 receives a non-encoded counterpart image from another image communication apparatus via the network 111.
  • the image communication apparatus performs image coding that performs image compression coding. And an image decoding unit that performs decompression decoding of an image may not be provided.
  • the present invention can be realized not only as an image communication apparatus and an image communication method, but also as a program for causing a computer to execute the image communication method according to the present embodiment. Further, it may be realized as a computer-readable recording medium such as a CD-ROM for recording the program. Furthermore, it may be realized as information, data, or a signal indicating the program. These programs, information, data, and signals may be distributed via a communication network such as the Internet.
  • the constituent elements constituting the image communication apparatus may be configured from one system LSI.
  • the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip.
  • the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. .
  • the image transmission apparatus of the present invention has an effect that it is possible to guide the speaker to the reference position while viewing the processed partner image without displaying the self-portrait. It can be used for a TV conference device.

Abstract

The objective is to prevent the obstruction of communication due to a warning tone or the display of one's own image, and to show a speaker the degree to which the speaker is shifted with respect to a reference point. Disclosed is an image communication device (100) which communicates image data with another device via a network (111), and which is equipped with: an image input unit (102) that acquires a photographic image picked up with a camera (101); a face detection unit (103) that detects the face region in the photographic image; an image transmission unit (105) that transmits first image data containing the photographic image; a image reception unit (106) that receives second image data containing the other party's image transmitted from another device; an image processing unit (108) that generates a processed image by processing the other party's image contained in the second image data; and an image output unit (109) that outputs the processed image to a monitor (110). The image processing unit (108) processes the other party's image such that there is a greater difference between the other party's image and the processed image as the difference between the position or size of the face region and a preset reference increases.

Description

画像通信装置及び画像通信方法Image communication apparatus and image communication method
 本発明は、大画面でTV(Television)会議を行うための画像通信装置に関し、特に、話者に自分の位置を認識させることができる画像通信装置に関する。 The present invention relates to an image communication apparatus for conducting a TV (television) conference on a large screen, and more particularly to an image communication apparatus that allows a speaker to recognize his / her position.
 近年、ADSL(Asymmetric Digital Subscriber Line)や光ファイバー網が急速に普及し、低価格で高速なインターネット接続が利用可能となってきている。また、こうした低価格の高速インターネットを利用して、遠隔の拠点間の映像音声データを双方向に伝送することにより、簡易にTV会議システムを構築することが可能となってきている。さらに、HD(High Definition)解像度を有する画像を撮像可能なカメラの出現と、PDP(Plasma Display Panel)に代表されるディスプレイの大型化とにともない、大画面ディスプレイに等身大で人物を表示することにより対面感のある臨場感の高いTV会議システムが登場してきている。 In recent years, ADSL (Asymmetric Digital Subscriber Line) and optical fiber networks have spread rapidly, and low-speed and high-speed Internet connection has become available. Also, it has become possible to easily construct a TV conference system by bidirectionally transmitting video / audio data between remote bases using such a low-cost high-speed Internet. In addition, with the advent of cameras capable of capturing images with HD (High Definition) resolution and the increase in the size of displays typified by PDP (Plasma Display Panel), it is possible to display life-size people on a large screen display. As a result, a TV conference system with a sense of reality and high presence has appeared.
 臨場感の高い大画面TV会議システムでは、人物を等身大で表示するためにカメラの視野角が狭くなりがちであり、話者が大きく移動するとカメラの撮像エリアから外れてしまい、相手側には話者の顔が含まれない画像が送信されることになる。従来のTV会議システムではカメラで撮像した画像を自画像としてディスプレイに表示し、自画像を確認しながら撮像エリアに自分が移動することで対処していた。 In a large-screen TV conference system with a high sense of realism, the viewing angle of the camera tends to be narrow in order to display a person in real size, and if the speaker moves greatly, the camera will move out of the imaging area, An image that does not include the speaker's face is transmitted. In a conventional TV conference system, an image captured by a camera is displayed as a self-portrait on a display, and the user moves to an image capturing area while confirming the self-image.
 しかしながら、対面感のある大画面TV会議システムでは、自画像を表示してしまうと画面面積として相手画像の表示できる面積が小さくなる、あるいは、自画像を表示することによってあたかも対面している臨場感が低下してしまうという課題がある。さらには、自画像を常に表示していないとカメラ撮像エリア外に移動したことに気づくことができないといった課題もある。 However, in a large-screen TV conference system with a face-to-face feeling, if the self-portrait is displayed, the area that can be displayed on the other party's image is reduced as the screen area, or the presence of face-to-face is reduced by displaying the self-portrait. There is a problem of doing it. Furthermore, there is also a problem that if the self-portrait is not always displayed, it cannot be noticed that it has moved out of the camera imaging area.
 これを解決する従来方法として、特許文献1では、カメラ撮影装置において、被写体が撮像エリア外にいる場合は、警告音を発生させることにより撮像エリア外にいることを通知する技術が示されている。 As a conventional method for solving this problem, Patent Document 1 discloses a technique for notifying an outside of an imaging area by generating a warning sound when a subject is outside the imaging area in a camera photographing apparatus. .
特開2000-201348号公報JP 2000-201348 A
 しかしながら、上記従来技術では、話者である被写体がカメラの撮像エリアの中心などの基準位置からどれだけずれているか不明であり、話者はどれだけ移動してよいか判別できないだけでなく、警告音の再生によりコミュニケーションが阻害されるため、大画面で臨場感の高いTV会議に代表されるコミュニケーション用途では不適である。 However, in the above prior art, it is not clear how much the subject that is the speaker is deviated from the reference position such as the center of the imaging area of the camera. Since communication is hindered by sound reproduction, it is not suitable for communication applications represented by a TV conference with a large screen and high presence.
 そこで、本発明は、警告音の発生、又は、ディスプレイへの自画像の表示のためにコミュニケーションを阻害されることを防止し、かつ、基準位置からどの程度ずれているのかを話者に視覚的に示すことができる画像通信装置を提供することを目的とする。 Therefore, the present invention prevents the communication from being interrupted due to the generation of a warning sound or the display of the self-portrait on the display, and visually indicates to the speaker how far the reference position is deviated. An object of the present invention is to provide an image communication apparatus that can be shown.
 上記従来の課題を解決するため、本発明の画像通信装置は、ネットワークを介して他の画像通信装置との間で画像データを通信する画像通信装置であって、カメラで撮像された撮像画像を取得する画像入力部と、前記画像入力部によって取得された撮像画像を含む第1画像データを送信する画像送信部と、前記画像入力部によって取得された撮像画像から顔領域を検出する顔検出部と、前記他の画像通信装置から送信される、相手画像を含む第2画像データを受信する画像受信部と、前記画像受信部によって受信された第2画像データに含まれる相手画像を加工することで、加工画像を生成する画像加工部と、前記画像加工部によって生成された加工画像を表示装置に出力する画像出力部とを備え、前記画像加工部は、前記顔検出部によって検出された顔領域の位置及びサイズの少なくとも一方と予め定められた基準との差分を算出し、算出した差分が大きいほど、前記相手画像と前記加工画像とが大きく異なるように前記相手画像を加工する。 In order to solve the above-described conventional problems, an image communication apparatus according to the present invention is an image communication apparatus that communicates image data with another image communication apparatus via a network. An image input unit to be acquired, an image transmission unit that transmits first image data including a captured image acquired by the image input unit, and a face detection unit that detects a face region from the captured image acquired by the image input unit An image receiving unit that receives the second image data including the partner image transmitted from the other image communication device, and processes the partner image included in the second image data received by the image receiving unit. An image processing unit that generates a processed image, and an image output unit that outputs the processed image generated by the image processing unit to a display device, and the image processing unit is operated by the face detection unit. The difference between at least one of the position and size of the detected face area and a predetermined reference is calculated, and the partner image is processed so that the partner image and the processed image differ greatly as the calculated difference increases. To do.
 この構成によれば、顔領域の位置又はサイズと所定の基準との差分が大きくなればなるほど、表示装置に表示する相手画像を大きく加工するので、表示装置に自画像を表示することなく、表示された相手の画像を見ることによって、話者は自分の位置が基準位置からどの程度ずれているかを確認することができる。これにより、警告音や自画像の表示などによってコミュニケーションが阻害されることがなく、より臨場感の高いTV会議などを行うことができる。 According to this configuration, the larger the difference between the position or size of the face area and the predetermined reference, the larger the partner image displayed on the display device, so that the self-image is not displayed on the display device. By looking at the image of the other party, the speaker can check how much his / her position deviates from the reference position. Thereby, communication is not hindered by the display of a warning sound or a self-portrait, and a TV conference with a higher sense of reality can be performed.
 また、前記顔検出部は、前記撮像画像内の前記顔領域の位置を示す顔領域座標を検出し、前記画像加工部は、前記顔領域座標と予め定められた基準座標との差分絶対値を算出し、算出した差分絶対値が予め定められた閾値より大きい場合に、前記差分絶対値が大きいほど前記相手画像と前記加工画像とが大きく異なるように、前記相手画像を加工することで前記加工画像を生成してもよい。 Further, the face detection unit detects face region coordinates indicating the position of the face region in the captured image, and the image processing unit calculates an absolute value of a difference between the face region coordinates and a predetermined reference coordinate. When the calculated absolute difference value is greater than a predetermined threshold, the processed image is processed so that the larger the absolute difference value, the larger the counterpart image differs from the processed image. An image may be generated.
 この構成によれば、撮像画像内の顔領域の位置に基づいて相手画像を加工するので、話者は、表示装置に表示された相手の画像を見ることで、自分が基準位置からどの程度ずれているかを確認することができる。このとき、基準位置からのズレ量(差分絶対値)が所定の閾値以下の場合は画像を加工しないので、自分が基準位置から所定の範囲内にいる場合は相手の画像をそのまま表示装置に表示させることができる。 According to this configuration, since the other party image is processed based on the position of the face area in the captured image, the speaker can see how much he / she deviates from the reference position by looking at the other party's image displayed on the display device. It can be confirmed. At this time, if the amount of deviation from the reference position (difference absolute value) is less than or equal to a predetermined threshold, the image is not processed. If the user is within a predetermined range from the reference position, the image of the other party is displayed on the display device. Can be made.
 また、前記画像加工部は、前記顔領域座標と前記基準座標との差分絶対値が大きいほど予め定められた重畳画像の面積が大きくなるように、前記相手画像に前記重畳画像を重畳することで前記加工画像を生成してもよい。 Further, the image processing unit superimposes the superimposed image on the counterpart image such that the larger the absolute value of the difference between the face area coordinates and the reference coordinates, the larger the area of the predetermined superimposed image becomes. The processed image may be generated.
 この構成によれば、表示装置に表示される加工画像は、重畳画像が相手画像に重畳された画像であるので、話者は、表示装置に表示された重畳画像の面積を見ることで、自分が基準位置からどの程度ずれているかを確認することができる。 According to this configuration, since the processed image displayed on the display device is an image in which the superimposed image is superimposed on the partner image, the speaker can see the area of the superimposed image displayed on the display device by himself / herself. It can be confirmed how much is deviated from the reference position.
 また、前記画像加工部は、前記顔領域座標が前記基準座標の右側にある場合、前記相手画像の左端から、又は、前記顔領域座標が前記基準座標の左側にある場合、前記相手画像の右端から、前記顔領域座標と前記基準座標との水平方向の差分絶対値が大きいほど前記重畳画像の面積が大きくなるように前記相手画像に前記重畳画像を重畳してもよい。 In addition, the image processing unit, when the face area coordinates are on the right side of the reference coordinates, from the left end of the counterpart image, or when the face area coordinates are on the left side of the reference coordinates, From the above, the superimposed image may be superimposed on the counterpart image so that the larger the absolute difference in the horizontal direction between the face region coordinates and the reference coordinates, the larger the area of the superimposed image.
 この構成によれば、基準座標からの水平方向のズレ方向と距離とに応じて、相手画像に重畳する重畳画像の面積を制御するため、話者は、表示装置に表示された重畳画像の位置と面積とを見ることによって、自分が水平方向にどの程度ずれているかを確認することができる。例えば、基準座標からの距離が大きいほど重畳画像の面積は大きくなるので、重畳画像の面積が大きいほど、話者は基準位置から大きくずれていることを認識することができる。また、撮像画像内で顔領域座標が基準座標の右側にある場合は、重畳画像を相手画像の左端から重畳するので、話者は、左側に表示された重畳画像を見ることで、自分が基準位置の左側にいることを認識することができる。逆に、撮像画像内で顔領域座標が基準座標の左側にある場合は、重畳画像を相手画像の右端から重畳するので、話者は、右側に表示された重畳画像を見ることで、自分が基準位置の右側にいることを認識することができる。 According to this configuration, in order to control the area of the superimposed image to be superimposed on the partner image according to the horizontal shift direction and the distance from the reference coordinate, the speaker can position the superimposed image displayed on the display device. It is possible to confirm how much they are displaced in the horizontal direction by looking at the area. For example, the larger the distance from the reference coordinate, the larger the area of the superimposed image. Therefore, the larger the area of the superimposed image, the greater the deviation of the speaker from the reference position. In addition, when the face region coordinates are on the right side of the reference coordinates in the captured image, the superimposed image is superimposed from the left end of the partner image, so that the speaker can see the reference image by looking at the superimposed image displayed on the left side. You can recognize that you are on the left side of the position. Conversely, when the face area coordinates are on the left side of the reference coordinates in the captured image, the superimposed image is superimposed from the right end of the partner image, so that the speaker can see himself / herself by looking at the superimposed image displayed on the right side. It can be recognized that the user is on the right side of the reference position.
 また、前記画像加工部は、前記顔領域座標が前記基準座標の上側にある場合、前記相手画像の上端から、又は、前記顔領域座標が前記基準座標の下側にある場合、前記相手画像の下端から、前記顔領域座標と前記基準座標との垂直方向の差分絶対値が大きいほど前記重畳画像の面積が大きくなるように前記相手画像に前記重畳画像を重畳してもよい。 In addition, the image processing unit, when the face area coordinates are above the reference coordinates, from the upper end of the counterpart image, or when the face area coordinates are below the reference coordinates, From the lower end, the superimposed image may be superimposed on the counterpart image such that the larger the absolute value of the vertical difference between the face area coordinates and the reference coordinates, the larger the area of the superimposed image.
 この構成によれば、基準座標からの垂直方向のズレ方向と距離とに応じて、相手画像に重畳する重畳画像の面積を制御するため、話者は、表示装置に表示された重畳画像の位置と面積とを見ることによって、自分が垂直方向にどの程度ずれているかを確認することができる。例えば、撮像画像内で顔領域座標が基準座標の上側にある場合は、重畳画像を相手画像の上端から重畳するので、話者は、上側に表示された重畳画像を見ることで、自分が基準位置の上側にいることを認識することができる。逆に、撮像画像内で顔領域座標が基準座標の下側にある場合は、重畳画像を相手画像の下端から重畳するので、話者は、下側に表示された重畳画像を見ることで、自分が基準位置の下側にいることを認識することができる。 According to this configuration, in order to control the area of the superimposed image to be superimposed on the partner image according to the vertical deviation direction and distance from the reference coordinate, the speaker can position the superimposed image displayed on the display device. And the area, it is possible to confirm how much they are displaced in the vertical direction. For example, when the face area coordinates are above the reference coordinates in the captured image, the superimposed image is superimposed from the upper end of the partner image, so that the speaker can see the reference image by looking at the superimposed image displayed on the upper side. You can recognize that you are above the position. Conversely, if the face area coordinates are below the reference coordinates in the captured image, the superimposed image is superimposed from the lower end of the partner image, so the speaker can see the superimposed image displayed on the lower side, You can recognize that you are below the reference position.
 また、前記顔検出部は、さらに、顔領域が前記撮像画像内にあるか否かを判定し、前記顔領域の有無を示すフラグを生成し、前記画像加工部は、顔領域がないことを前記フラグが示す場合、予め定められた面積の前記重畳画像を前記相手画像の予め定められた領域に重畳してもよい。 Further, the face detection unit further determines whether or not a face region is in the captured image, generates a flag indicating the presence or absence of the face region, and the image processing unit determines that there is no face region. When the flag indicates, the superimposed image having a predetermined area may be superimposed on a predetermined region of the counterpart image.
 この構成によれば、顔領域が検出されない場合であっても相手画像に一定面積の重畳画像を重畳するため、話者は、表示装置に表示された重畳画像の面積を見ることによって、自分が撮像エリア外にいることを確認することができる。 According to this configuration, even if a face area is not detected, a superimposed image of a certain area is superimposed on the partner image, so that the speaker can see himself / herself by looking at the area of the superimposed image displayed on the display device. It can be confirmed that the user is outside the imaging area.
 また、前記画像通信装置は、さらに、前記顔検出部によって検出された顔領域座標を記憶するバッファを備え、前記画像加工部は、顔領域が前記撮像画像内にないことを前記フラグが示す場合、前記バッファに記憶された顔領域座標が前記基準座標の右側にある場合には前記相手画像の左端から、又は、前記基準座標の左側にある場合には前記相手画像の右端から、及び、前記バッファに記憶された顔領域座標が前記基準座標の上側にある場合には前記相手画像の上端から、又は、前記基準座標の下側にある場合には前記相手画像の下端から、予め定められた面積の前記重畳画像を前記相手画像に重畳してもよい。 Further, the image communication apparatus further includes a buffer for storing the face area coordinates detected by the face detection unit, and the image processing unit has the flag indicating that the face area is not in the captured image. , When the face area coordinates stored in the buffer are on the right side of the reference coordinates, from the left end of the counterpart image, or when the face area coordinates are on the left side of the reference coordinates, from the right end of the counterpart image, and The face area coordinates stored in the buffer are predetermined from the upper end of the counterpart image when they are above the reference coordinates, or from the lower end of the counterpart image when they are below the reference coordinates. The superimposed image of the area may be superimposed on the counterpart image.
 この構成によれば、顔領域が検出されない場合であっても最後に検出された顔領域の位置に応じて相手画像に一定面積の重畳画像を重畳表示するため、話者は、表示装置に表示された重畳画像の位置と面積とを見ることによって、自分が上下左右どちらの方向で撮像エリア外にいるのかを確認することができる。 According to this configuration, even when a face area is not detected, a superimposed image of a certain area is superimposed on the partner image in accordance with the position of the face area detected last, so that the speaker can display on the display device. By looking at the position and area of the superimposed image, it is possible to confirm whether the user is outside the imaging area in the up, down, left, or right direction.
 また、前記画像加工部は、前記顔領域座標と前記基準座標との差分絶対値が大きいほど射影変換の傾斜が大きくなるように、前記相手画像を射影変換することで前記加工画像を生成してもよい。 Further, the image processing unit generates the processed image by projective transforming the counterpart image so that the gradient of the projective transformation increases as the absolute difference between the face region coordinates and the reference coordinates increases. Also good.
 この構成によれば、表示装置に表示される加工画像は、相手画像が射影変換された画像であるので、表示装置に表示された加工画像の射影変換の傾斜を見ることで、話者は、自分が基準位置からどの程度ずれているのかを確認することができる。 According to this configuration, since the processed image displayed on the display device is an image obtained by projective transformation of the partner image, by looking at the inclination of the projective transformation of the processed image displayed on the display device, the speaker can You can check how far you are from the reference position.
 また、前記画像加工部は、前記顔領域座標と前記基準座標との差分絶対値が大きいほど大きい拡大率で前記相手画像を拡大してもよい。 In addition, the image processing unit may enlarge the counterpart image with a larger enlargement ratio as the difference absolute value between the face area coordinates and the reference coordinates is larger.
 この構成によれば、表示装置に表示される加工画像は、相手画像の一部が拡大された画像であるので、話者は、表示装置に表示された加工画像の拡大の程度を見ることで、自分が基準位置からどの程度ずれているかを確認することができる。 According to this configuration, since the processed image displayed on the display device is an image obtained by enlarging a part of the partner image, the speaker can see the degree of enlargement of the processed image displayed on the display device. It is possible to check how much the user is deviated from the reference position.
 また、前記画像通信装置は、さらに、前記画像受信部によって受信された第2画像データに含まれる相手画像から顔領域を検出し、検出した顔領域の位置を示す顔領域座標を前記基準座標に設定する基準座標設定部を備えてもよい。 Further, the image communication apparatus further detects a face area from the counterpart image included in the second image data received by the image receiving unit, and uses the face area coordinates indicating the position of the detected face area as the reference coordinates. You may provide the reference coordinate setting part to set.
 この構成によれば、基準座標を相手画像の顔領域の座標として画像加工を行うこととなり、話者の基準位置を固定的なものではなく、相手の位置に応じて設定することができる。つまり、相手が動いた場合は、それに応じて話者の基準位置も変更することができる。 According to this configuration, image processing is performed using the reference coordinates as the coordinates of the face area of the partner image, and the speaker reference position is not fixed and can be set according to the partner position. That is, when the other party moves, the reference position of the speaker can be changed accordingly.
 また、前記顔検出部は、前記顔領域のサイズを検出し、前記画像加工部は、前記顔領域のサイズと予め定められた基準サイズとの差分絶対値を算出し、算出した差分絶対値が予め定められた閾値より大きい場合、前記差分絶対値が大きいほど、前記相手画像と前記加工画像とが大きく異なるように前記相手画像を加工することで、前記加工画像を生成してもよい。 Further, the face detection unit detects the size of the face region, the image processing unit calculates a difference absolute value between the size of the face region and a predetermined reference size, and the calculated difference absolute value is When the threshold value is larger than a predetermined threshold, the processed image may be generated by processing the counterpart image so that the larger the difference absolute value is, the larger the counterpart image and the processed image are.
 この構成によれば、撮像画像内の顔領域のサイズに基づいて相手画像を加工するので、話者は、表示装置に表示された相手の画像を見ることで、自分が基準位置からどの程度ずれているかを確認することができる。このとき、基準位置からのズレ量(差分絶対値)が所定の閾値以下の場合は画像を加工しないので、自分が基準位置から所定の範囲内にいる場合は相手の画像をそのまま表示装置に表示させることができる。 According to this configuration, since the partner image is processed based on the size of the face area in the captured image, the speaker can see how much he / she is out of the reference position by looking at the partner image displayed on the display device. It can be confirmed. At this time, if the amount of deviation from the reference position (difference absolute value) is less than or equal to a predetermined threshold, the image is not processed. If the user is within a predetermined range from the reference position, the image of the other party is displayed on the display device. Can be made.
 また、前記画像加工部は、前記顔領域のサイズと前記基準サイズとの差分絶対値が大きいほど、ぼかし量が大きくなるように前記相手画像をぼかすことで、前記加工画像を生成してもよい。 Further, the image processing unit may generate the processed image by blurring the counterpart image so that the blur amount becomes larger as the difference absolute value between the size of the face area and the reference size is larger. .
 この構成によれば、顔領域のサイズが大きいほど又は小さいほど相手画像のぼかし量を大きくするので、話者は、表示装置に表示された加工画像のボケの程度を見ることによって、自分が基準位置からどの程度ずれているかを確認することができる。具体的には、話者は、自分がカメラに近づきすぎていること又は遠ざかりすぎていることを認識することができる。 According to this configuration, the blurring amount of the partner image is increased as the size of the face region is larger or smaller. Therefore, the speaker can see the reference level by looking at the degree of blur of the processed image displayed on the display device. It can be confirmed how much the position is deviated from the position. Specifically, the speaker can recognize that he is too close to the camera or too far away.
 また、前記画像加工部は、前記顔領域のサイズが前記基準サイズより大きい場合には前記相手画像を拡大し、前記顔領域のサイズが前記基準サイズより小さい場合には前記相手画像を縮小することで、前記加工画像を生成し、前記顔領域のサイズと前記基準サイズとの差分絶対値が大きいほど拡大率又は縮小率が大きくてもよい。 The image processing unit enlarges the counterpart image when the size of the face area is larger than the reference size, and reduces the counterpart image when the size of the face area is smaller than the reference size. Thus, the processed image may be generated, and the enlargement ratio or reduction ratio may be larger as the difference absolute value between the size of the face area and the reference size is larger.
 この構成によれば、顔領域のサイズが大きいほど相手画像を拡大し、顔領域のサイズが小さいほど相手画像を縮小するので、話者は、表示装置に表示された加工画像の顔の大きさを見ることによって、自分がカメラに近づきすぎているか遠ざかりすぎているかを確認することができる。 According to this configuration, the larger the face area size is, the larger the partner image is, and the smaller the face area size is, the smaller the partner image is, so that the speaker can adjust the size of the face of the processed image displayed on the display device. By seeing you can see if you are too close or too far away from the camera.
 なお、本発明は、画像通信装置として実現できるだけではなく、当該画像通信装置を構成する処理部をステップとする方法として実現することもできる。また、これらステップをコンピュータに実行させるプログラムとして実現してもよい。さらに、当該プログラムを記録したコンピュータ読み取り可能なCD-ROM(Compact Disc-Read Only Memory)などの記録媒体、並びに、当該プログラムを示す情報、データ又は信号として実現してもよい。そして、それらプログラム、情報、データ及び信号は、インターネットなどの通信ネットワークを介して配信してもよい。 Note that the present invention can be realized not only as an image communication apparatus but also as a method using a processing unit constituting the image communication apparatus as a step. Moreover, you may implement | achieve as a program which makes a computer perform these steps. Furthermore, it may be realized as a recording medium such as a computer-readable CD-ROM (Compact Disc-Read Only Memory) in which the program is recorded, and information, data, or a signal indicating the program. These programs, information, data, and signals may be distributed via a communication network such as the Internet.
 また、上記の各画像通信装置を構成する構成要素の一部又は全部は、1個のシステムLSI(Large Scale Integration:大規模集積回路)から構成されていてもよい。システムLSIは、複数の構成部を1個のチップ上に集積して製造された超多機能LSIであり、具体的には、マイクロプロセッサ、ROM及びRAM(Random Access Memory)などを含んで構成されるコンピュータシステムである。 In addition, some or all of the constituent elements constituting each of the image communication apparatuses described above may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically includes a microprocessor, ROM, RAM (Random Access Memory), and the like. Computer system.
 本発明によれば、警告音の発生、又は、ディスプレイへ自画像を表示することでコミュニケーションを阻害されることを防止し、かつ、カメラの撮像エリアからどの程度外れているのかを話者に視覚的に示すことができる。 According to the present invention, it is possible to prevent an alarm sound from being generated or to prevent communication from being hindered by displaying a self-portrait on a display, and to determine how far the camera is out of the imaging area of the camera. Can be shown.
図1は、実施の形態1の画像通信装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of the image communication apparatus according to the first embodiment. 図2は、実施の形態1の画像通信装置の送信処理を示すフローチャートである。FIG. 2 is a flowchart illustrating transmission processing of the image communication apparatus according to the first embodiment. 図3は、実施の形態1の画像通信装置の受信処理を示すフローチャートである。FIG. 3 is a flowchart illustrating a reception process of the image communication apparatus according to the first embodiment. 図4は、実施の形態1の話者とカメラとモニタとの位置関係を示す図である。FIG. 4 is a diagram illustrating a positional relationship among the speaker, the camera, and the monitor according to the first embodiment. 図5は、実施の形態1のカメラで撮像した入力画像内の話者の位置関係を示す図である。FIG. 5 is a diagram illustrating a positional relationship between speakers in an input image captured by the camera according to the first embodiment. 図6は、実施の形態1の画像重畳処理を実行する前の相手画像と、実行後の加工画像との一例を示す図である。FIG. 6 is a diagram illustrating an example of a counterpart image before execution of the image superimposing process of Embodiment 1 and a processed image after execution. 図7Aは、水平距離(dx)と水平方向重畳サイズ(W(dx))との関係を示すグラフの一例を示す図である。FIG. 7A is a diagram illustrating an example of a graph showing the relationship between the horizontal distance (dx) and the horizontal overlap size (W (dx)). 図7Bは、垂直距離(dy)と垂直方向重畳サイズ(H(dy))との関係を示すグラフの一例を示す図である。FIG. 7B is a diagram illustrating an example of a graph showing the relationship between the vertical distance (dy) and the vertical superimposition size (H (dy)). 図8は、実施の形態1の顔領域が検出されない場合の加工画像の一例を示す図である。FIG. 8 is a diagram illustrating an example of a processed image when the face area according to the first embodiment is not detected. 図9は、実施の形態1に係る話者が移動したときの相手画像の変化の様子を示す模式図である。FIG. 9 is a schematic diagram showing how the partner image changes when the speaker according to Embodiment 1 moves. 図10は、実施の形態2の画像通信装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of the image communication apparatus according to the second embodiment. 図11は、実施の形態2の画像通信装置の受信処理を示すフローチャートである。FIG. 11 is a flowchart illustrating a reception process of the image communication apparatus according to the second embodiment. 図12は、実施の形態2の射影変換を実行する前の相手画像と、実行後の加工画像との一例を示す図である。FIG. 12 is a diagram illustrating an example of a partner image before execution of projective transformation according to Embodiment 2 and a processed image after execution. 図13は、実施の形態2の射影変換用の画像加工パラメータを算出するためのグラフの一例を示す図である。FIG. 13 is a diagram illustrating an example of a graph for calculating image processing parameters for projective conversion according to the second embodiment. 図14は、実施の形態2に係る話者が移動したときの相手画像の変化の様子を示す模式図である。FIG. 14 is a schematic diagram showing how the partner image changes when the speaker according to Embodiment 2 moves. 図15は、実施の形態3の画像通信装置の構成を示すブロック図である。FIG. 15 is a block diagram illustrating a configuration of the image communication apparatus according to the third embodiment. 図16は、実施の形態3の画像通信装置の受信処理を示すフローチャートである。FIG. 16 is a flowchart illustrating a reception process of the image communication apparatus according to the third embodiment. 図17は、実施の形態3のブラーパラメータを算出するためのグラフの一例を示す図である。FIG. 17 is a diagram illustrating an example of a graph for calculating the blur parameter according to the third embodiment. 図18は、実施の形態3の入力画像と相手画像と加工画像との一例を示す図である。FIG. 18 is a diagram illustrating an example of an input image, a partner image, and a processed image according to the third embodiment. 図19は、実施の形態4の画像通信装置の構成を示すブロック図である。FIG. 19 is a block diagram illustrating a configuration of the image communication apparatus according to the fourth embodiment. 図20は、実施の形態4の画像通信装置の受信処理を示すフローチャートである。FIG. 20 is a flowchart illustrating a reception process of the image communication apparatus according to the fourth embodiment. 図21は、実施の形態4の拡大縮小パラメータを算出するためのグラフの一例を示す図である。FIG. 21 is a diagram illustrating an example of a graph for calculating an enlargement / reduction parameter according to the fourth embodiment. 図22は、実施の形態4の入力画像と相手画像と加工画像との一例を示す図である。FIG. 22 is a diagram illustrating an example of an input image, a partner image, and a processed image according to the fourth embodiment. 図23は、実施の形態5の画像通信装置の構成を示すブロック図である。FIG. 23 is a block diagram illustrating a configuration of the image communication apparatus according to the fifth embodiment. 図24は、実施の形態5の画像通信装置の受信処理を示すフローチャートである。FIG. 24 is a flowchart illustrating a reception process of the image communication apparatus according to the fifth embodiment. 図25Aは、実施の形態5の相手画像の一例を示す図である。FIG. 25A is a diagram illustrating an example of a partner image according to the fifth embodiment. 図25Bは、実施の形態5の入力画像の一例を示す図である。FIG. 25B is a diagram illustrating an example of an input image according to the fifth embodiment. 図26は、実施の形態6の画像通信装置の構成を示すブロック図である。FIG. 26 is a block diagram illustrating a configuration of the image communication apparatus according to the sixth embodiment. 図27は、実施の形態6の画像通信装置の送信処理を示すフローチャートである。FIG. 27 is a flowchart illustrating transmission processing of the image communication apparatus according to the sixth embodiment. 図28は、実施の形態6の入力画像と送信画像との一例を示す図である。FIG. 28 is a diagram illustrating an example of an input image and a transmission image according to the sixth embodiment. 図29は、実施の形態7の画像通信装置の構成を示すブロック図である。FIG. 29 is a block diagram illustrating a configuration of the image communication apparatus according to the seventh embodiment. 図30は、実施の形態7の2つのカメラと話者との位置関係を示す図である。FIG. 30 is a diagram illustrating a positional relationship between the two cameras and the speaker according to the seventh embodiment. 図31は、実施の形態7の画像通信装置の送信処理を示すフローチャートである。FIG. 31 is a flowchart illustrating a transmission process of the image communication apparatus according to the seventh embodiment. 図32は、実施の形態7の第1入力画像と第2入力画像との一例を示す図である。FIG. 32 is a diagram illustrating an example of a first input image and a second input image according to the seventh embodiment. 図33は、相手画像と加工画像の一例を示す図である。FIG. 33 is a diagram illustrating an example of a partner image and a processed image. 図34は、本発明に係る画像通信装置の異なる形態の一例を示すブロック図である。FIG. 34 is a block diagram showing an example of a different form of the image communication apparatus according to the present invention.
 以下、本発明の実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
 (実施の形態1)
 実施の形態1の画像通信装置は、カメラなどから取得した撮像画像内における話者の顔の位置に基づいて、他の画像通信装置から受信した相手画像に予め定められた画像を重畳表示する装置である。
(Embodiment 1)
The image communication apparatus according to Embodiment 1 is an apparatus that superimposes and displays a predetermined image on a partner image received from another image communication apparatus based on the position of the speaker's face in a captured image acquired from a camera or the like. It is.
 図1は、実施の形態1の画像通信装置100の構成を示すブロック図である。図1に示す画像通信装置100は、カメラ101で撮像された撮像画像を含む画像データを他の画像通信装置にネットワーク111を介して送信し、かつ、他の画像通信装置からネットワーク111を介して受信した画像データに含まれる相手画像をモニタ110に表示させる装置である。図1に示すように、画像通信装置100は、画像入力部102と、顔検出部103と、画像符号化部104と、画像送信部105と、画像受信部106と、画像復号化部107と、画像加工部108と、画像出力部109とを備える。 FIG. 1 is a block diagram illustrating a configuration of the image communication apparatus 100 according to the first embodiment. The image communication apparatus 100 illustrated in FIG. 1 transmits image data including a captured image captured by the camera 101 to another image communication apparatus via the network 111, and from the other image communication apparatus via the network 111. This is a device for displaying a partner image included in the received image data on the monitor 110. As illustrated in FIG. 1, the image communication apparatus 100 includes an image input unit 102, a face detection unit 103, an image encoding unit 104, an image transmission unit 105, an image reception unit 106, and an image decoding unit 107. The image processing unit 108 and the image output unit 109 are provided.
 画像入力部102は、画像を撮像するカメラ101と接続するインタフェースなどであり、カメラ101で撮像された撮像画像を取得する。画像入力部102は、取得した撮像画像を入力画像として顔検出部103に出力する。 The image input unit 102 is an interface connected to the camera 101 that captures an image, and acquires a captured image captured by the camera 101. The image input unit 102 outputs the acquired captured image to the face detection unit 103 as an input image.
 顔検出部103は、画像入力部102によって取得される入力画像から顔領域を検出する。例えば、顔検出部103は、顔領域を検出することで、顔領域の位置を示す顔領域座標、及び、顔領域のサイズを検出する。顔領域は、入力画像に写っている話者の顔を含む円形の領域である。 The face detection unit 103 detects a face area from the input image acquired by the image input unit 102. For example, the face detection unit 103 detects a face area, thereby detecting face area coordinates indicating the position of the face area and the size of the face area. The face area is a circular area including the speaker's face in the input image.
 具体的には、顔検出部103は、テンプレートマッチング又は楕円検出などの手法による顔領域検出処理を行い、顔領域の有無を示す顔検出フラグ、顔領域座標の一例である顔領域の中心位置を示す中心座標、及び、顔領域の半径などの情報を画像加工部108に出力する。また、顔検出部103は、入力画像を画像符号化部104に出力する。なお、顔領域は楕円形でもよく、この場合、顔領域の中心位置は、2つの焦点の中点であり、顔領域の半径は、長軸と短軸との平均値であるとする。なお、本実施の形態では、顔領域の半径情報を出力しなくてもよい。 Specifically, the face detection unit 103 performs face area detection processing using a technique such as template matching or ellipse detection, and determines the face detection flag indicating the presence or absence of a face area, and the center position of the face area as an example of face area coordinates. Information such as the center coordinates to be shown and the radius of the face area is output to the image processing unit 108. Further, the face detection unit 103 outputs the input image to the image encoding unit 104. The face area may be elliptical. In this case, the center position of the face area is the midpoint between the two focal points, and the radius of the face area is an average value of the major axis and the minor axis. In the present embodiment, the radius information of the face area need not be output.
 画像符号化部104は、H.264などの圧縮符号化方式を用いて、入力画像を圧縮符号化することで、圧縮された画像データを生成する。つまり、生成された画像データは、画像入力部102によって取得された撮像画像を含んでいる。画像符号化部104は、生成した画像データを画像送信部105に出力する。 The image encoding unit 104 is H.264. By compressing and encoding the input image using a compression encoding method such as H.264, compressed image data is generated. That is, the generated image data includes a captured image acquired by the image input unit 102. The image encoding unit 104 outputs the generated image data to the image transmission unit 105.
 画像送信部105は、ネットワーク111を介して、画像符号化部104によって圧縮された画像データを他の画像通信装置などに伝送する。例えば、画像送信部105は、画像データをRTP(Real-time Transport Protocol)などのパケット伝送方式に従ってパケット化し、パケット化により生成されたRTPパケットをネットワーク111に出力する。 The image transmission unit 105 transmits the image data compressed by the image encoding unit 104 to another image communication apparatus or the like via the network 111. For example, the image transmission unit 105 packetizes the image data according to a packet transmission method such as RTP (Real-time Transport Protocol), and outputs the RTP packet generated by packetization to the network 111.
 画像受信部106は、ネットワーク111を介して、他の画像通信装置から相手画像を含む画像データを受信する。例えば、画像受信部106は、ネットワーク111からRTPパケットを受信し、RTPヘッダを除去することで圧縮画像データを取得する。画像受信部106は、取得した圧縮画像データを画像復号化部107に出力する。 The image receiving unit 106 receives image data including a partner image from another image communication apparatus via the network 111. For example, the image receiving unit 106 receives an RTP packet from the network 111 and acquires compressed image data by removing the RTP header. The image receiving unit 106 outputs the acquired compressed image data to the image decoding unit 107.
 画像復号化部107は、画像受信部106によって受信された圧縮画像データを復号化することで、相手画像を生成する。画像復号化部107は、生成した相手画像を画像加工部108に出力する。なお、相手画像は、他の画像通信装置を利用している通信相手が写っている画像であり、他の画像通信装置に接続されたカメラなどによって取得された画像である。 The image decoding unit 107 generates a partner image by decoding the compressed image data received by the image receiving unit 106. The image decoding unit 107 outputs the generated partner image to the image processing unit 108. The partner image is an image showing a communication partner using another image communication apparatus, and is an image acquired by a camera or the like connected to the other image communication apparatus.
 画像加工部108は、相手画像を加工することで加工画像を生成する画像加工部の一例である。ここでいう「加工」には、自画像である撮像画像を相手画像に重畳することは含まない。例えば、画像加工部108は、顔検出部103によって検出された顔領域の位置と予め定められた閾値との差分を算出し、算出した差分が大きいほど、相手画像と加工画像とが大きく異なるように相手画像を加工する。具体的には、画像加工部108は、顔検出部103によって検出された顔領域の位置に応じて画像加工パラメータを算出し、算出した画像加工パラメータを用いて相手画像を加工する。図1に示すように、画像加工部108は、判定部121と、バッファ122と、パラメータ算出部123と、画像重畳部124とを備える。 The image processing unit 108 is an example of an image processing unit that generates a processed image by processing a counterpart image. “Processing” here does not include superimposing a captured image, which is a self-portrait, on the partner image. For example, the image processing unit 108 calculates the difference between the position of the face area detected by the face detection unit 103 and a predetermined threshold, and the larger the calculated difference, the larger the partner image and the processed image differ. Process the other party's image. Specifically, the image processing unit 108 calculates image processing parameters according to the position of the face area detected by the face detection unit 103, and processes the partner image using the calculated image processing parameters. As illustrated in FIG. 1, the image processing unit 108 includes a determination unit 121, a buffer 122, a parameter calculation unit 123, and an image superimposition unit 124.
 判定部121は、顔検出部103から入力される顔検出フラグを用いて、入力画像に顔領域が存在するか否かを判定する。顔領域が存在する場合、判定部121は、顔検出部103から入力される中心座標と顔領域の半径とをパラメータ算出部123に出力すると共に、バッファ122に中心座標と半径とを記憶させる。顔領域が存在しない場合、判定部121は、バッファ122から中心座標を読み出し、読み出した中心座標をパラメータ算出部123に出力する。 The determination unit 121 determines whether a face area exists in the input image using the face detection flag input from the face detection unit 103. When the face area exists, the determination unit 121 outputs the center coordinates and the radius of the face area input from the face detection unit 103 to the parameter calculation unit 123 and causes the buffer 122 to store the center coordinates and the radius. When the face area does not exist, the determination unit 121 reads the center coordinates from the buffer 122 and outputs the read center coordinates to the parameter calculation unit 123.
 バッファ122は、顔領域の中心座標と顔領域の半径とを記憶するメモリである。バッファ122は、常に最新の中心座標と半径とのみ、すなわち、最後に顔領域が検出されたときの顔領域の中心座標のみを記憶していてもよい。又は、バッファ122は、顔領域が検出された時刻と対応付けて複数の中心座標と複数の半径とを記憶していてもよい。 The buffer 122 is a memory for storing the center coordinates of the face area and the radius of the face area. The buffer 122 may always store only the latest center coordinates and radius, that is, only the center coordinates of the face area when the face area was last detected. Alternatively, the buffer 122 may store a plurality of center coordinates and a plurality of radii in association with the time when the face area is detected.
 パラメータ算出部123は、判定部121から入力される中心座標と予め定められた基準座標との差分を算出し、算出した差分を用いて画像加工パラメータを算出する。具体的には、パラメータ算出部123は、算出した差分の正負と差分の絶対値とに基づいて、画像を重畳するための画像加工パラメータを算出する。パラメータ算出部123は、算出した画像加工パラメータを画像重畳部124に出力する。画像重畳用の画像加工パラメータの算出処理については、後述する。 The parameter calculation unit 123 calculates a difference between the center coordinates input from the determination unit 121 and a predetermined reference coordinate, and calculates an image processing parameter using the calculated difference. Specifically, the parameter calculation unit 123 calculates an image processing parameter for superimposing an image based on the calculated positive / negative of the difference and the absolute value of the difference. The parameter calculation unit 123 outputs the calculated image processing parameters to the image superimposing unit 124. The calculation processing of image processing parameters for image superimposition will be described later.
 画像重畳部124は、画像加工パラメータを用いて、画像復号化部107によって生成された相手画像に予め定められた重畳画像を重畳することで、加工画像を生成する。重畳画像は、元の相手画像、及び、自画像である撮像画像と異なる画像であれば、いかなる画像でもよい。ただし、重畳画像は、例えば、単一色の画像、又は、半透明の画像など、重畳されていること及びその位置を話者が容易に判別できる画像が好ましい。画像重畳部124は、生成した加工画像を画像出力部109に出力する。 The image superimposing unit 124 generates a processed image by superimposing a predetermined superimposed image on the counterpart image generated by the image decoding unit 107 using the image processing parameter. The superimposed image may be any image as long as it is different from the original counterpart image and the captured image that is the self-portrait. However, the superimposed image is preferably a single color image or a semi-transparent image, for example, an image that allows the speaker to easily determine that it is superimposed and its position. The image superimposing unit 124 outputs the generated processed image to the image output unit 109.
 以上のように、画像加工部108は、顔領域座標の一例である中心座標と基準座標との差分絶対値を算出する。そして、画像加工部108は、算出した差分絶対値が予め定められた閾値より大きい場合に、差分絶対値が大きいほど相手画像と加工画像とが大きく異なるように、相手画像を加工することで加工画像を生成する。具体的には、差分絶対値が大きいほど、予め定められた重畳画像の面積が大きくなるように、相手画像に重畳画像を重畳することで加工画像を生成する。 As described above, the image processing unit 108 calculates the absolute value of the difference between the center coordinates, which are an example of face area coordinates, and the reference coordinates. Then, when the calculated difference absolute value is larger than a predetermined threshold, the image processing unit 108 processes the partner image so that the partner image and the processed image are greatly different as the difference absolute value is larger. Generate an image. Specifically, the processed image is generated by superimposing the superimposed image on the counterpart image so that the larger the difference absolute value is, the larger the area of the predetermined superimposed image is.
 画像出力部109は、画像加工部108によって生成された加工画像を表示装置の一例であるモニタ110に出力する。具体的には、画像出力部109は、画像を表示するモニタ110と接続するインタフェースなどであり、加工画像をモニタ110に表示させる。 The image output unit 109 outputs the processed image generated by the image processing unit 108 to the monitor 110 which is an example of a display device. Specifically, the image output unit 109 is an interface connected to a monitor 110 that displays an image, and displays a processed image on the monitor 110.
 次いで、上記構成を有する画像通信装置100の動作について、図2及び図3に示すフローチャートを用いて説明する。なお、図2及び図3に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(Central Processing Unit)(図示せず)によって制御される。 Next, the operation of the image communication apparatus 100 having the above configuration will be described using the flowcharts shown in FIGS. 2 and 3 is stored as a control program in a storage device (not shown) such as a ROM or flash memory, and is controlled by a CPU (Central Processing Unit) (not shown). The
 まず、画像通信装置100の送信処理について図2を用いて説明する。図2は、実施の形態1の画像通信装置100の送信処理を示すフローチャートである。 First, transmission processing of the image communication apparatus 100 will be described with reference to FIG. FIG. 2 is a flowchart showing transmission processing of the image communication apparatus 100 according to the first embodiment.
 画像入力部102は、カメラ101から、非圧縮の撮像画像をフレーム単位で取得し、顔検出部103に出力する(S101)。 The image input unit 102 acquires an uncompressed captured image from the camera 101 in units of frames, and outputs the acquired image to the face detection unit 103 (S101).
 顔検出部103は、画像入力部102から入力された非圧縮の撮像画像からテンプレートマッチングや楕円検出などの手法により顔領域検出を行う(S102)。そして、顔検出部103は、顔検出フラグ、顔領域の中心座標、及び、顔領域の半径などの情報を算出し、算出した各情報を画像加工部108に出力すると共に、入力画像を画像符号化部104に出力する。 The face detection unit 103 detects a face area from a non-compressed captured image input from the image input unit 102 by a method such as template matching or ellipse detection (S102). Then, the face detection unit 103 calculates information such as a face detection flag, the center coordinates of the face region, and the radius of the face region, and outputs the calculated information to the image processing unit 108 and also converts the input image into an image code. To the conversion unit 104.
 画像符号化部104は、H.264圧縮符号化方式を用いて顔検出部103から入力された入力画像を符号化し、圧縮符号化された画像データを画像送信部105に出力する。画像送信部105は、画像符号化部104より入力された画像データをRTPなどのパケット伝送方式に従いRTPパケット化を行い、ネットワーク111に出力する(S103)。 The image encoding unit 104 is H.264. The input image input from the face detection unit 103 is encoded using the H.264 compression encoding method, and the compression-encoded image data is output to the image transmission unit 105. The image transmitting unit 105 converts the image data input from the image encoding unit 104 into RTP packets according to a packet transmission method such as RTP, and outputs the packet to the network 111 (S103).
 なお、圧縮符号化方式はH.264に限定されるものではなく、MPEG-2、MPEG-4、H.261、H.263などいかなる圧縮符号化方式も利用可能である。またパケット伝送方式はRTPに限定されるものではなく、RTSP(Real Time Streaming Protocol)などいかなる伝送方式も利用可能である。 Note that the compression encoding method is H.264. H.264 is not limited to MPEG-2, MPEG-4, H.264. 261, H.H. Any compression coding scheme such as H.263 can be used. The packet transmission method is not limited to RTP, and any transmission method such as RTSP (Real Time Streaming Protocol) can be used.
 以下では、顔検出部103が行う顔検出処理の具体例について説明する。 Hereinafter, a specific example of the face detection process performed by the face detection unit 103 will be described.
 図4は、実施の形態1の話者201とカメラ101とモニタ110との位置関係を示す図である。図4(a)は、話者201とカメラ101とモニタ110とを上から見た図であり、図4(b)は、話者201とカメラ101とモニタ110とを正面から見た図である。 FIG. 4 is a diagram illustrating a positional relationship among the speaker 201, the camera 101, and the monitor 110 according to the first embodiment. 4A is a view of the speaker 201, the camera 101, and the monitor 110 as viewed from above. FIG. 4B is a view of the speaker 201, the camera 101, and the monitor 110 as viewed from the front. is there.
 図4に示すように、カメラ101は、モニタ110上部中央に設置してあり、カメラ101が撮像可能なエリアを撮像エリア202とする。また、モニタ110には、相手画像のみが表示され、自画像(話者201)は表示されていない。本発明では、話者201が、モニタ110に自画像が表示されていない場合であっても、撮像エリア202内で話者201がどの場所にいるかをモニタ110上に表示される相手の画像で確認することを目的とする。 As shown in FIG. 4, the camera 101 is installed in the upper center of the monitor 110, and an area that can be imaged by the camera 101 is an imaging area 202. Further, only the partner image is displayed on the monitor 110, and the self-portrait (speaker 201) is not displayed. In the present invention, even when the speaker 201 does not display the self-portrait on the monitor 110, the speaker 201 confirms where the speaker 201 is in the imaging area 202 with the image of the other party displayed on the monitor 110. The purpose is to do.
 図5は、図4のカメラ101で撮像した入力画像301内における話者201の位置関係を示す図である。図4に示すように、カメラ101は、モニタ110の上部中央に設置されているので、話者201がモニタ110に向かって左側に位置する場合、カメラ101で撮像される入力画像301内の右側に話者画像302は存在する。また、話者201がモニタ110に向かって右側に位置する場合、カメラ101で撮像される入力画像内の左側に話者画像は存在する。なお、話者画像302は、入力画像301に写っている話者201である。 FIG. 5 is a diagram showing the positional relationship of the speaker 201 in the input image 301 captured by the camera 101 in FIG. As shown in FIG. 4, since the camera 101 is installed in the upper center of the monitor 110, when the speaker 201 is positioned on the left side of the monitor 110, the right side in the input image 301 captured by the camera 101 is displayed. The speaker image 302 exists. When the speaker 201 is positioned on the right side of the monitor 110, the speaker image exists on the left side in the input image captured by the camera 101. Note that the speaker image 302 is the speaker 201 shown in the input image 301.
 図5において、顔検出部103は、入力画像301の中から話者画像302の顔領域303を検出し、顔領域の中心である中心座標(x1,y1)と半径(R)とを算出する。なお、基準座標(x0,y0)は、予め設定した話者位置を適正とする座標であり、顔座標のズレ量を検出する際に用いる。例えば、基準座標は、カメラ101の撮像エリア202の中心である。 In FIG. 5, the face detection unit 103 detects the face area 303 of the speaker image 302 from the input image 301 and calculates center coordinates (x1, y1) and a radius (R) that are the centers of the face areas. . Note that the reference coordinates (x0, y0) are coordinates that make a preset speaker position appropriate, and are used when detecting the amount of deviation of face coordinates. For example, the reference coordinate is the center of the imaging area 202 of the camera 101.
 次に、画像通信装置100の受信処理について図3を用いて説明する。 Next, the reception process of the image communication apparatus 100 will be described with reference to FIG.
 図3は、実施の形態1の画像通信装置100の受信処理を示すフローチャートである。 FIG. 3 is a flowchart showing the reception process of the image communication apparatus 100 according to the first embodiment.
 画像受信部106は、ネットワーク111経由でRTPパケットを受信し、RTPヘッダを除去することにより、圧縮画像データを取得し、取得した圧縮画像データを画像復号化部107に出力する。画像復号化部107は、画像受信部106から入力された画像データをH.264に基づいて復号化することで非圧縮の相手画像を生成し、生成した相手画像を画像加工部108に出力する(S201)。 The image receiving unit 106 receives the RTP packet via the network 111, acquires the compressed image data by removing the RTP header, and outputs the acquired compressed image data to the image decoding unit 107. The image decoding unit 107 converts the image data input from the image receiving unit 106 into the H.264 format. The non-compressed counterpart image is generated by decoding based on H.264, and the generated counterpart image is output to the image processing unit 108 (S201).
 判定部121は、顔検出部103から入力される顔検出フラグを用いて、入力画像に顔領域が存在するか否かを判定する(S202)。 The determination unit 121 determines whether a face area exists in the input image using the face detection flag input from the face detection unit 103 (S202).
 顔領域が存在する場合(S202でYes)、判定部121は、顔検出部103から入力される中心座標(x1,y1)をパラメータ算出部123に出力する。このとき、中心座標(x1,y1)をバッファ122に記憶しておく。パラメータ算出部123は、判定部121から入力される中心座標(x1,y1)と基準座標(x0,y0)との差分を算出する(S203)。 When the face area exists (Yes in S202), the determination unit 121 outputs the center coordinates (x1, y1) input from the face detection unit 103 to the parameter calculation unit 123. At this time, the center coordinates (x1, y1) are stored in the buffer 122. The parameter calculation unit 123 calculates the difference between the center coordinates (x1, y1) input from the determination unit 121 and the reference coordinates (x0, y0) (S203).
 次に、パラメータ算出部123は、算出した差分の絶対値が所定の閾値より大きい場合(S204でYes)、算出した差分の正負と絶対値とに基づいて画像加工パラメータを算出する(S207)。 Next, when the absolute value of the calculated difference is larger than the predetermined threshold (Yes in S204), the parameter calculation unit 123 calculates an image processing parameter based on the positive / negative and absolute value of the calculated difference (S207).
 顔領域が存在しない場合(S202でNo)、判定部121は、バッファ122から過去の中心座標を読み出し(S205)、読み出した中心座標(x1,y1)をパラメータ算出部123に出力する。バッファ122に複数の中心座標が保持されている場合は、判定部121は、最新の中心座標、すなわち、最後に顔領域が検出されたときの中心座標を読み出す。 When the face area does not exist (No in S202), the determination unit 121 reads the past center coordinates from the buffer 122 (S205), and outputs the read center coordinates (x1, y1) to the parameter calculation unit 123. When a plurality of center coordinates are held in the buffer 122, the determination unit 121 reads the latest center coordinates, that is, the center coordinates when the face area was last detected.
 次に、パラメータ算出部123は、中心座標(x1,y1)と基準座標(x0,y0)との差分を算出する(S206)。そして、パラメータ算出部123は、算出した差分の正負に基づいて画像加工パラメータを算出する(S207)。なお、画像加工パラメータの算出処理の具体例については後述する。 Next, the parameter calculation unit 123 calculates the difference between the center coordinates (x1, y1) and the reference coordinates (x0, y0) (S206). Then, the parameter calculation unit 123 calculates an image processing parameter based on the sign of the calculated difference (S207). A specific example of the image processing parameter calculation process will be described later.
 次に、画像重畳部124は、画像加工パラメータを用いて、相手画像に予め定められた重畳画像を重畳する(S208)。そして、画像重畳部124は、重畳により生成した加工画像を画像出力部109に出力する。 Next, the image superimposing unit 124 superimposes a predetermined superimposed image on the counterpart image using the image processing parameter (S208). Then, the image superimposing unit 124 outputs the processed image generated by the superimposition to the image output unit 109.
 算出した差分が閾値以下である場合(S204でNo)、画像重畳部124は、入力される相手画像に重畳処理を実行することなく、相手画像をそのまま画像出力部109に出力する。 When the calculated difference is equal to or less than the threshold (No in S204), the image superimposing unit 124 outputs the partner image as it is to the image output unit 109 without executing the superimposing process on the input partner image.
 最後に、画像出力部109は、画像重畳部124から入力される加工画像(又は、相手画像)をモニタ110に出力し、モニタ110に表示させる(S209)。 Finally, the image output unit 109 outputs the processed image (or the partner image) input from the image superimposing unit 124 to the monitor 110 and displays it on the monitor 110 (S209).
 続いて、画像加工部108が行う画像加工パラメータの算出処理と、重畳処理との詳細について説明する。 Subsequently, details of the image processing parameter calculation processing performed by the image processing unit 108 and the superimposition processing will be described.
 画像加工部108は、顔検出部103より入力された顔検出フラグと、顔領域の中心座標とを用いて、重畳用の画像加工パラメータを算出する。 The image processing unit 108 calculates an image processing parameter for superimposition using the face detection flag input from the face detection unit 103 and the center coordinates of the face area.
 まず、顔検出フラグを用いて、顔領域が存在する場合は、パラメータ算出部123は、式1と式2とに従い基準座標(x0,y0)と顔領域の中心座標(x1,y1)との差分、すなわち、水平距離(dx)と垂直距離(dy)とを算出する。 First, when a face area exists using the face detection flag, the parameter calculation unit 123 calculates the reference coordinates (x0, y0) and the center coordinates (x1, y1) of the face area according to Expression 1 and Expression 2. The difference, that is, the horizontal distance (dx) and the vertical distance (dy) are calculated.
 (式1) dx=x1-x0
 (式2) dy=y1-y0
(Formula 1) dx = x1-x0
(Formula 2) dy = y1-y0
 次に、パラメータ算出部123は、水平距離(dx)及び垂直距離(dy)に、それぞれ閾値判定を行い、差分である水平距離(dx)及び垂直距離(dy)が閾値より大きい場合には、画像加工(画像の重畳)を行うための画像加工パラメータを算出する。 Next, the parameter calculation unit 123 performs threshold determination on the horizontal distance (dx) and the vertical distance (dy), respectively, and when the horizontal distance (dx) and the vertical distance (dy) that are the differences are larger than the threshold, Image processing parameters for image processing (image superimposition) are calculated.
 図6は、実施の形態1の画像重畳処理を実行する前の相手画像401と、実行後の加工画像402との一例を示す図である。図6(b)は、図6(a)に示す相手画像401に重畳画像403を重畳することで生成される加工画像402を示す。 FIG. 6 is a diagram illustrating an example of the partner image 401 before the execution of the image superimposing process according to the first embodiment and the processed image 402 after the execution. FIG. 6B shows a processed image 402 generated by superimposing the superimposed image 403 on the counterpart image 401 shown in FIG.
 また、図7A及び図7Bは、重畳用の画像加工パラメータを算出するためのグラフの一例を示す図である。図7Aは、水平距離(dx)と水平方向重畳サイズ(W(dx))との関係を示すグラフの一例を示す図である。図7Bは、垂直距離(dy)と垂直方向重畳サイズ(H(dy))との関係を示すグラフの一例を示す図である。 7A and 7B are diagrams illustrating an example of a graph for calculating an image processing parameter for superimposition. FIG. 7A is a diagram illustrating an example of a graph showing the relationship between the horizontal distance (dx) and the horizontal overlap size (W (dx)). FIG. 7B is a diagram illustrating an example of a graph showing the relationship between the vertical distance (dy) and the vertical superimposition size (H (dy)).
 重畳用の画像加工パラメータは、相手画像401のどの位置にどの大きさの重畳画像403を重畳するかを示すパラメータであり、例えば、下記に示す4点である。 The image processing parameter for superimposition is a parameter indicating which size of the superimposed image 403 is to be superimposed at which position of the counterpart image 401, and is, for example, the following four points.
 重畳用の画像加工パラメータ
  1.水平方向重畳開始位置 … 相手画像401の左端又は右端
  2.水平方向重畳サイズ  … W(dx)
  3.垂直方向重畳開始位置 … 相手画像401の上端又は下端
  4.垂直方向重畳サイズ  … H(dy)
Image processing parameters for superimposition Horizontal superimposition start position: Left end or right end of partner image 401 Horizontal overlap size ... W (dx)
3. Vertical superimposition start position: upper end or lower end of the partner image 401 Vertical overlap size ... H (dy)
 パラメータ算出部123は、まず、下記に従い水平方向の画像加工パラメータを設定及び算出する。 The parameter calculation unit 123 first sets and calculates horizontal image processing parameters according to the following.
 1.水平方向重畳開始位置(相手画像401の左端又は右端)の決定
 パラメータ算出部123は、dxの正負符号に従い、水平方向重畳開始位置を決定する。具体的には、パラメータ算出部123は、dxの符号が正の場合は左端、負の場合は右端に決定する。
1. Determination of horizontal superimposition start position (left end or right end of counterpart image 401) The parameter calculation unit 123 determines the horizontal superimposition start position according to the sign of dx. Specifically, the parameter calculation unit 123 determines the left end when the sign of dx is positive and the right end when the sign is negative.
 なお、図5に示すように、dxが正であることは、入力画像301内の右側に顔領域303が存在すること、すなわち、図4に示すように、モニタ110に向かって左側(カメラ101から見て右側)に話者201が存在することである。この場合、相手を左側から見ることになるので、画像重畳部124は、相手画像401の左端から重畳画像403を重畳する。dxが負である場合は、この逆である。 As shown in FIG. 5, the fact that dx is positive means that the face region 303 exists on the right side in the input image 301, that is, as shown in FIG. That is, the speaker 201 exists on the right side as viewed from the right. In this case, since the partner is viewed from the left side, the image superimposing unit 124 superimposes the superimposed image 403 from the left end of the partner image 401. The opposite is true when dx is negative.
 2.水平方向重畳サイズW(dx)の算出
 図7Aに示すように、|dx|が閾値(th_x)より大きい場合には、パラメータ算出部123は、|dx|に比例する大きさのW(dx)を算出する。なお、パラメータ算出部123は、|dx|が閾値(th_x)未満である場合には、W(dx)=0とする。
2. Calculation of Horizontal Superimposition Size W (dx) As shown in FIG. 7A, when | dx | is larger than a threshold value (th_x), the parameter calculation unit 123 has W (dx) having a size proportional to | dx | Is calculated. The parameter calculation unit 123 sets W (dx) = 0 when | dx | is less than the threshold value (th_x).
 次に、パラメータ算出部123は、下記に従い垂直方向の画像加工パラメータを算出する。 Next, the parameter calculation unit 123 calculates vertical image processing parameters according to the following.
 3.垂直方向重畳開始位置(相手画像401の上端又は下端)の決定
 パラメータ算出部123は、dyの正負符号に従い、垂直方向重畳開始位置を決定する。具体的には、パラメータ算出部123は、dyの符号が正の場合は上端、負の場合は下端に決定する。
3. Determination of vertical superimposition start position (upper end or lower end of counterpart image 401) The parameter calculation unit 123 determines the vertical superimposition start position according to the sign of dy. Specifically, the parameter calculation unit 123 determines the upper end when the sign of dy is positive and the lower end when it is negative.
 なお、図5に示すように、dyが正であることは、入力画像301内の上側に顔領域303が存在することである。この場合、相手を上側から見ることになるので、画像重畳部124は、相手画像401の上端から重畳画像403を重畳する。dyが負である場合、この逆である。 As shown in FIG. 5, dy being positive means that the face region 303 exists on the upper side in the input image 301. In this case, since the partner is viewed from the upper side, the image superimposing unit 124 superimposes the superimposed image 403 from the upper end of the partner image 401. The opposite is true when dy is negative.
 4.垂直方向重畳サイズH(dy)の算出
 図7Bに示すように、|dy|が閾値(th_y)より大きい場合には、パラメータ算出部123は、|dy|に比例する大きさのH(dy)を算出する。なお、パラメータ算出部123は、|dy|が閾値(th_y)未満である場合には、H(dy)=0とする。
4). Calculation of Vertical Superimposition Size H (dy) As shown in FIG. 7B, when | dy | is larger than a threshold value (th_y), the parameter calculation unit 123 sets H (dy) having a size proportional to | dy | Is calculated. The parameter calculation unit 123 sets H (dy) = 0 when | dy | is less than the threshold value (th_y).
 なお、算出方法はこの例に限定されるわけではなく、水平距離又は垂直距離が大きくなるのに従い画面加工パラメータの値が大きくなる方法であればいかなる方法も利用可能である。つまり、水平距離|dx|と水平方向重畳サイズW(dx)と、又は、垂直距離|dy|と垂直方向重畳サイズH(dy)とは、正の相関関係を有していればよい。また、閾値th_x及びth_yは0でもよい。 The calculation method is not limited to this example, and any method can be used as long as the value of the screen processing parameter increases as the horizontal distance or the vertical distance increases. That is, the horizontal distance | dx | and the horizontal superimposition size W (dx) or the vertical distance | dy | and the vertical superimposition size H (dy) need only have a positive correlation. Further, the thresholds th_x and th_y may be zero.
 以上のように、画像加工部108は、入力画像内において顔領域座標が基準座標の右側にある場合、相手画像の左端から重畳画像を重畳し、入力画像内において顔領域座標が基準座標の左側にある場合、相手画像の右端から重畳画像を重畳する。このとき、画像加工部108は、顔領域座標と基準座標との水平方向の差分絶対値が大きいほど、重畳画像の面積が大きくなるように相手画像に重畳画像を重畳する。 As described above, when the face area coordinates are on the right side of the reference coordinates in the input image, the image processing unit 108 superimposes the superimposed image from the left end of the counterpart image, and the face area coordinates are on the left side of the reference coordinates in the input image. If it is, the superimposed image is superimposed from the right end of the partner image. At this time, the image processing unit 108 superimposes the superimposed image on the partner image so that the larger the horizontal difference absolute value between the face region coordinates and the reference coordinates, the larger the area of the superimposed image.
 同様に、画像加工部108は、入力画像内において顔領域座標が基準座標の上側にある場合、相手画像の上端から重畳画像を重畳し、入力画像内において顔領域座標が基準座標の下側にある場合、相手画像の下端から重畳画像を重畳する。このとき、画像加工部108は、顔領域座標と基準座標との垂直方向の差分絶対値が大きいほど、重畳画像の面積が大きくなるように相手画像に重畳画像を重畳する。 Similarly, when the face area coordinates are above the reference coordinates in the input image, the image processing unit 108 superimposes the superimposed image from the upper end of the partner image, and the face area coordinates are below the reference coordinates in the input image. In some cases, the superimposed image is superimposed from the lower end of the partner image. At this time, the image processing unit 108 superimposes the superimposed image on the partner image so that the larger the absolute value of the vertical difference between the face area coordinates and the reference coordinates, the larger the area of the superimposed image.
 一方、顔検出フラグを参照し、顔領域が存在しない場合については以下の通りである。 On the other hand, referring to the face detection flag, the case where the face area does not exist is as follows.
 図8は、実施の形態1の顔領域が検出されない場合の加工画像の一例を示す図である。図8(a)に示すように、過去の入力画像501には、話者画像502が存在し、顔領域503が含まれている。入力画像501は、例えば、最後に顔領域503が検出されたときの入力画像である。 FIG. 8 is a diagram showing an example of a processed image when the face area of the first embodiment is not detected. As shown in FIG. 8A, the past input image 501 includes a speaker image 502 and includes a face area 503. The input image 501 is, for example, an input image when the face area 503 is detected last.
 図8(b)に示すように、話者が上部に移動する(話者画像502から話者画像512に移動)と、入力画像511内に顔領域513が含まれなくなる。この場合は、パラメータ算出部123は、下記のように加工パラメータを算出する。 As shown in FIG. 8B, the face area 513 is not included in the input image 511 when the speaker moves upward (moves from the speaker image 502 to the speaker image 512). In this case, the parameter calculation unit 123 calculates the machining parameter as follows.
  1.水平方向重畳開始位置 … 最後に顔領域が検出された際に決定した水平方向重畳開始位置
  2.水平方向重畳サイズ  … W/N (Wは画面の水平方向サイズ、Nは1以上の固定値)
  3.垂直方向重畳開始位置 … 最後に顔領域が検出された際に決定した垂直方向重畳開始位置
  4.垂直方向重畳サイズ  … H/M (Hは画面の垂直方向サイズ、Mは1以上の固定値)
1. Horizontal superimposition start position: Horizontal superimposition start position determined when the face area was detected last. Horizontal superimposition size W / N (W is the horizontal size of the screen, N is a fixed value of 1 or more)
3. Vertical superimposition start position: Vertical superimposition start position determined when the face area was detected last. Vertical overlap size: H / M (H is the vertical size of the screen, M is a fixed value of 1 or more)
 より具体的には、顔領域が存在しない場合、判定部121は、バッファ122から図8(a)に示す顔領域503の中心座標(x1,y1)を読み出し、読み出した中心座標をパラメータ算出部123に出力する。パラメータ算出部123は、中心座標(x1,y1)と基準座標(x0,y0)との差分(dx及びdy)を算出し、算出した差分の正負に基づいて重畳開始位置を決定する。このとき、重畳サイズは、差分の絶対値に関わらず、上記のように固定値である。 More specifically, when the face area does not exist, the determination unit 121 reads the center coordinates (x1, y1) of the face area 503 shown in FIG. 8A from the buffer 122, and uses the read center coordinates as the parameter calculation unit. It outputs to 123. The parameter calculation unit 123 calculates a difference (dx and dy) between the center coordinates (x1, y1) and the reference coordinates (x0, y0), and determines a superposition start position based on the sign of the calculated difference. At this time, the superimposition size is a fixed value as described above regardless of the absolute value of the difference.
 例えば、図8の例では、過去の中心座標と基準座標との差分は、水平方向は0で、垂直方向は正である。したがって、図8(c)に示すように、画像重畳部124は、水平方向には重畳を行わず、垂直方向には、上端から固定面積の重畳画像523を相手画像521に重畳することで、加工画像522が生成する。 For example, in the example of FIG. 8, the difference between the past center coordinates and the reference coordinates is 0 in the horizontal direction and positive in the vertical direction. Therefore, as illustrated in FIG. 8C, the image superimposing unit 124 does not perform superimposition in the horizontal direction, but superimposes the superimposed image 523 having a fixed area from the upper end on the partner image 521 in the vertical direction. A processed image 522 is generated.
 以上のように、画像加工部108は、顔領域が入力画像内にないことを顔検出フラグが示す場合、バッファ122に記憶された顔領域座標が基準座標の右側にある場合には、左端から、又は、バッファ122に記憶された顔領域座標が基準座標の左側にある場合には、右端から、予め定められた面積の重畳画像を相手画像に重畳する。同様に、バッファ122に記憶された顔領域座標が基準座標の上側にある場合には、上端から、又は、バッファ122に記憶された顔領域座標が基準座標の下側にある場合には、下端から、予め定められた面積の重畳画像を相手画像に重畳する。 As described above, when the face detection flag indicates that the face area is not included in the input image, the image processing unit 108 starts from the left end when the face area coordinates stored in the buffer 122 are on the right side of the reference coordinates. Alternatively, when the face area coordinates stored in the buffer 122 are on the left side of the reference coordinates, a superimposed image having a predetermined area is superimposed on the counterpart image from the right end. Similarly, when the face area coordinates stored in the buffer 122 are above the reference coordinates, from the upper end, or when the face area coordinates stored in the buffer 122 are below the reference coordinates, the lower end From the above, a superimposed image having a predetermined area is superimposed on the counterpart image.
 このようにすることで、顔領域が存在しない場合には、水平方向及び垂直方向の固定面積の重畳画像を重ねることが可能である。 In this way, when there is no face area, it is possible to superimpose superimposed images with fixed areas in the horizontal and vertical directions.
 以上のように、本実施の形態では、顔検出部103が顔領域の中心座標を検出し、画像加工部108が中心座標と基準座標との水平及び垂直方向の距離及び方向に従い、相手画像に対して別の画像を重畳する。 As described above, in the present embodiment, the face detection unit 103 detects the center coordinates of the face region, and the image processing unit 108 determines the other image according to the horizontal and vertical distances and directions between the center coordinates and the reference coordinates. On the other hand, another image is superimposed.
 具体的には、本実施の形態の画像通信装置100は、話者201がモニタ110に向かって左側に居る場合、すなわち、入力画像301内で顔領域303の中心座標が基準座標より右側にある場合、相手画像401の左端から重畳画像403を重畳することで生成した加工画像402をモニタ110に表示させる。このように、相手を左側から見る場合に、相手の左側の領域を見えなくすることで、自分(話者)が中心から左側に居ることを認識することができる。右側、上側及び下側の場合も同様である。 Specifically, in the image communication apparatus 100 according to the present embodiment, when the speaker 201 is on the left side toward the monitor 110, that is, the center coordinates of the face area 303 are on the right side of the reference coordinates in the input image 301. In this case, the processed image 402 generated by superimposing the superimposed image 403 from the left end of the counterpart image 401 is displayed on the monitor 110. In this way, when viewing the opponent from the left side, it is possible to recognize that the person (speaker) is on the left side from the center by making the area on the left side of the opponent invisible. The same applies to the right side, upper side and lower side.
 図9は、実施の形態1に係る話者が移動したときの相手画像の変化の様子を示す模式図である。 FIG. 9 is a schematic diagram showing how the partner image changes when the speaker according to Embodiment 1 moves.
 図9(a)に示すように、話者201がモニタ110に向かって左側に居る場合、相手画像401の左側から重畳画像403が相手画像401に重畳されている。つまり、モニタ110には、相手画像401の左側に重畳画像403が重畳されることで生成された加工画像402が表示されている。 As shown in FIG. 9A, when the speaker 201 is on the left side facing the monitor 110, the superimposed image 403 is superimposed on the partner image 401 from the left side of the partner image 401. That is, the processed image 402 generated by superimposing the superimposed image 403 on the left side of the partner image 401 is displayed on the monitor 110.
 話者201は、モニタ110に表示されている加工画像402において、重畳画像403が重畳されている位置と重畳画像403の面積とを見ることで、どちらの方向にどの程度移動すればよいかを判断することができる。例えば、図9(a)に示す例では、話者201は右側へ移動すればよい。 In the processed image 402 displayed on the monitor 110, the speaker 201 sees the position where the superimposed image 403 is superimposed and the area of the superimposed image 403, so that the speaker 201 can move in which direction. Judgment can be made. For example, in the example shown in FIG. 9A, the speaker 201 may move to the right side.
 図9(b)に示すように、話者201が右側へ移動することで、モニタ110に表示される加工画像402において、重畳画像403の面積が小さくなっている。これにより、話者201は、適切な方向に移動していることを確認することができるとともに、まだ適正な位置(すなわち、基準位置)には居ないことが分かる。 As shown in FIG. 9B, the area of the superimposed image 403 is reduced in the processed image 402 displayed on the monitor 110 by the speaker 201 moving to the right side. As a result, the speaker 201 can confirm that the speaker 201 is moving in an appropriate direction, and it can be understood that the speaker 201 is not yet in an appropriate position (that is, a reference position).
 以上のように、本実施の形態の画像通信装置100は、話者201の位置に応じて、具体的には、カメラ101によって撮像された撮像画像内の話者201の顔領域の位置に応じて、相手画像401を加工する。具体的には、画像通信装置100は、話者201の位置が基準位置から離れているほど大きな面積の重畳画像403を相手画像401に重畳することで、加工画像402を生成する。 As described above, the image communication apparatus 100 according to the present embodiment corresponds to the position of the speaker 201, specifically, the position of the face area of the speaker 201 in the captured image captured by the camera 101. Then, the partner image 401 is processed. Specifically, the image communication apparatus 100 generates the processed image 402 by superimposing the superimposed image 403 having a larger area on the partner image 401 as the position of the speaker 201 is farther from the reference position.
 これにより、例えば、あたかも窓枠を介して話者と相手とが対面してコミュニケーションを行っているような印象を話者に与えることができる。すなわち、話者が窓枠の正面に居る場合は、相手が正面に見え、話者が窓枠に向かって左側に居る場合は、窓枠(重畳画像に相当する)によって相手の左側が見えなくなることと同様の印象を話者に与えることができる。したがって、本実施の形態の画像通信装置100をTV会議システムに適用することで、より臨場感の高いコミュニケーションを行うことができる。 Thus, for example, it is possible to give the speaker an impression that the speaker and the other party are communicating through the window frame. That is, when the speaker is in front of the window frame, the other party can be seen in front, and when the speaker is on the left side toward the window frame, the left side of the other party cannot be seen by the window frame (corresponding to the superimposed image). The same impression can be given to the speaker. Therefore, by applying the image communication apparatus 100 according to the present embodiment to the TV conference system, communication with higher presence can be performed.
 これにより、話者はモニタに自画像を表示することなく、モニタ上の相手画像に重畳表示された画像の位置及び大きさを見て、話者が画面上でどの方向にどれだけずれているかを判別することが可能である。 As a result, the speaker does not display the self-portrait on the monitor, but looks at the position and size of the image superimposed on the partner image on the monitor to determine how much the speaker is displaced in which direction on the screen. It is possible to determine.
 さらに、本実施の形態では、顔検出部103が顔領域を検出できない場合には、画像加工部108は、顔検出フラグを参照し、相手画像に対して最後に顔領域が検出されたときの中心座標と基準座標との位置関係に基づいて一定面積の別の画像を重畳表示する。 Further, in the present embodiment, when the face detection unit 103 cannot detect the face area, the image processing unit 108 refers to the face detection flag, and the last time the face area is detected with respect to the partner image. Another image having a certain area is superimposed and displayed based on the positional relationship between the center coordinates and the reference coordinates.
 これにより、話者はモニタに自画像を表示することなく、モニタ上に表示された相手画像の一定面積が別の画像で重畳されていることを見て、話者の顔領域が撮像されていないことと、撮像エリアのどちらにずれているかとを判別することが可能である。 As a result, the speaker does not display his / her own image on the monitor, and the face area of the speaker is not imaged by seeing that a certain area of the partner image displayed on the monitor is superimposed on another image. It is possible to determine whether or not the image area is shifted.
 なお、バッファ122は、最後に顔領域が検出されたときの中心座標を記憶するのではなく、最後に顔領域が検出されたときにパラメータ算出部123が算出した画像加工パラメータを記憶してもよい。このとき、顔領域が検出されない場合、画像重畳部124は、固定面積の重畳画像を重畳するのではなく、バッファ122に記憶された水平方向重畳サイズと垂直方向重畳サイズとによって示される面積を有する重畳画像を相手画像に重畳してもよい。 Note that the buffer 122 does not store the center coordinates when the face area was last detected, but may store the image processing parameters calculated by the parameter calculation unit 123 when the face area was last detected. Good. At this time, if a face region is not detected, the image superimposing unit 124 does not superimpose a fixed area superposed image, but has an area indicated by the horizontal superimposition size and the vertical superimposition size stored in the buffer 122. The superimposed image may be superimposed on the partner image.
 あるいは、本実施の形態では、水平方向と垂直方向とのそれぞれの差分の正負に基づいて重畳画像を重畳する位置を決定するので、バッファ122は、画像加工パラメータのうち、水平方向重畳開始位置と垂直方向重畳開始位置とのみを記憶していてもよい。 Alternatively, in the present embodiment, since the position at which the superimposed image is superimposed is determined based on the difference between the horizontal direction and the vertical direction, the buffer 122 includes the horizontal direction superposition start position among the image processing parameters. Only the vertical superimposition start position may be stored.
 (実施の形態2)
 実施の形態2の画像通信装置は、入力画像内における話者の顔領域の位置に基づいて、他の画像通信装置から受信した相手画像を射影変換する装置である。
(Embodiment 2)
The image communication apparatus according to the second embodiment is an apparatus that performs projective transformation on the partner image received from another image communication apparatus based on the position of the speaker's face area in the input image.
 図10は、実施の形態2の画像通信装置600の構成を示すブロック図である。同図に示す画像通信装置600は、図1の画像通信装置100と比較して、画像加工部108の代わりに画像加工部608を備える点が異なっている。図10において、図1と同一の参照符号を付した構成要素は、実施の形態1と同じ処理を行うため、ここでは説明を省略し、異なる点を中心に説明する。 FIG. 10 is a block diagram illustrating a configuration of the image communication apparatus 600 according to the second embodiment. The image communication apparatus 600 shown in the figure is different from the image communication apparatus 100 of FIG. 1 in that an image processing unit 608 is provided instead of the image processing unit 108. 10, components having the same reference numerals as those in FIG. 1 perform the same processes as those in the first embodiment, and thus description thereof will be omitted here and different points will be mainly described.
 画像加工部608は、相手画像を加工することで加工画像を生成する画像加工部の一例である。具体的には、画像加工部608は、顔領域座標と基準座標との差分絶対値が大きいほど射影変換の傾斜が大きくなるように、相手画像を射影変換することで加工画像を生成する。図10に示すように、画像加工部608は、判定部121と、バッファ122と、パラメータ算出部623と、射影変換部624とを備える。判定部121とバッファ122とは、実施の形態1と同じ処理を行うため説明を省略する。 The image processing unit 608 is an example of an image processing unit that generates a processed image by processing the counterpart image. Specifically, the image processing unit 608 generates a processed image by projective conversion of the partner image so that the gradient of the projective conversion increases as the absolute difference between the face area coordinates and the reference coordinates increases. As illustrated in FIG. 10, the image processing unit 608 includes a determination unit 121, a buffer 122, a parameter calculation unit 623, and a projective conversion unit 624. Since the determination unit 121 and the buffer 122 perform the same processing as in the first embodiment, the description thereof is omitted.
 パラメータ算出部623は、判定部121から入力される中心座標と基準座標との差分を算出し、算出した差分を用いて画像加工パラメータを算出する。具体的には、パラメータ算出部623は、算出した差分の正負と差分の絶対値とに基づいて、射影変換するための画像加工パラメータを算出する。パラメータ算出部623は、算出した画像加工パラメータを射影変換部624に出力する。なお、射影変換用の画像加工パラメータの算出処理については、後述する。 The parameter calculation unit 623 calculates a difference between the center coordinates and the reference coordinates input from the determination unit 121, and calculates an image processing parameter using the calculated difference. Specifically, the parameter calculation unit 623 calculates an image processing parameter for projective transformation based on the calculated difference between positive and negative and the absolute value of the difference. The parameter calculation unit 623 outputs the calculated image processing parameters to the projective conversion unit 624. The process for calculating the image processing parameters for projective transformation will be described later.
 射影変換部624は、射影変換用の画像加工パラメータを用いて、画像復号化部107によって生成された相手画像を射影変換することで、加工画像を生成する。 The projective conversion unit 624 generates a processed image by performing projective conversion on the counterpart image generated by the image decoding unit 107 using the image processing parameters for projective conversion.
 次いで、上記構成を有する画像通信装置600の画像受信時の動作について、図11に示すフローチャートを用いて説明する。図11は、実施の形態2の画像通信装置600の受信処理を示すフローチャートである。なお、画像送信時の動作については、実施の形態1(図2)と同様であるので、ここでは説明を省略する。 Next, the operation at the time of image reception of the image communication apparatus 600 having the above configuration will be described using the flowchart shown in FIG. FIG. 11 is a flowchart illustrating a reception process of the image communication apparatus 600 according to the second embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
 なお、図11に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(図示せず)によって制御される。図11において、図3と同じ参照符号を付与した処理ステップは、図3と同一の動作であり、説明を省略する。 The operation of the flowchart shown in FIG. 11 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown). In FIG. 11, the processing steps assigned with the same reference numerals as those in FIG. 3 are the same operations as those in FIG.
 実施の形態1と同様に、顔領域が存在する場合で(S202でYes)、かつ、算出した差分、すなわち、水平距離(dx)と垂直距離(dy)との少なくとも一方が、予め定められた閾値より大きい場合(S204でYes)、パラメータ算出部623は、算出した差分の正負と絶対値とに基づいて、射影変換用の画像加工パラメータを算出する(S307)。 As in the first embodiment, when a face area exists (Yes in S202), the calculated difference, that is, at least one of the horizontal distance (dx) and the vertical distance (dy) is determined in advance. If it is larger than the threshold (Yes in S204), the parameter calculation unit 623 calculates an image processing parameter for projective transformation based on the positive / negative and absolute value of the calculated difference (S307).
 顔領域が存在しない場合は(S202でNo)、パラメータ算出部623は、算出した差分の正負に基づいて射影変換用の画像加工パラメータを算出する(S307)。なお、画像パラメータ算出処理の具体例については後述する。 If the face area does not exist (No in S202), the parameter calculation unit 623 calculates an image processing parameter for projective transformation based on the sign of the calculated difference (S307). A specific example of the image parameter calculation process will be described later.
 次に、射影変換部624は、算出された画像加工パラメータを用いて、相手画像を射影変換する(S308)。そして、射影変換部624は、射影変換により生成した加工画像を画像出力部109に出力する。 Next, the projective transformation unit 624 performs projective transformation of the partner image using the calculated image processing parameter (S308). Then, the projective transformation unit 624 outputs the processed image generated by the projective transformation to the image output unit 109.
 以降、実施の形態1と同様に、画像出力部109は、加工画像(又は相手画像)をモニタ110に出力し、モニタ110に表示させる(S209)。 Thereafter, as in the first embodiment, the image output unit 109 outputs the processed image (or the partner image) to the monitor 110 and displays it on the monitor 110 (S209).
 続いて、画像加工部608が行う画像加工パラメータの算出処理と、射影変換処理との詳細について説明する。 Subsequently, details of the image processing parameter calculation processing and the projective transformation processing performed by the image processing unit 608 will be described.
 画像加工部608は、顔検出部103より入力された顔検出フラグと、顔領域の中心座標とを用いて、射影変換用の画像加工パラメータを算出する。 The image processing unit 608 calculates an image processing parameter for projective transformation using the face detection flag input from the face detection unit 103 and the center coordinates of the face area.
 まず、顔検出フラグを用いて、顔領域が存在する場合は、パラメータ算出部623は、(式1)と(式2)とに従い基準座標(x0,y0)との顔領域の中心座標(x1,y1)との差分、すなわち、水平距離(dx)と垂直距離(dy)とを算出する。 First, when a face area exists using the face detection flag, the parameter calculation unit 623 performs center coordinates (x1) of the face area with reference coordinates (x0, y0) according to (Expression 1) and (Expression 2). , Y1), that is, the horizontal distance (dx) and the vertical distance (dy) are calculated.
 次に、パラメータ算出部623は、水平距離(dx)と垂直距離(dy)とに対してそれぞれ閾値判定を行い、閾値を越える場合には、射影変換を行うための画像加工パラメータを算出する。 Next, the parameter calculation unit 623 performs threshold determination for each of the horizontal distance (dx) and the vertical distance (dy), and calculates an image processing parameter for performing projective transformation when the threshold is exceeded.
 図12は、実施の形態2の射影変換を実行する前の相手画像701と、実行後の加工画像702との一例を示す図である。図12(b)は、図12(a)に示す相手画像701を射影変換することで生成される加工画像702を示す。なお、以下では、相手画像701を水平方向に算出された角度で射影変換する例を示している。 FIG. 12 is a diagram illustrating an example of a partner image 701 before execution of the projective transformation according to the second embodiment and a processed image 702 after execution. FIG. 12B shows a processed image 702 generated by projective transformation of the counterpart image 701 shown in FIG. In the following, an example of projective transformation of the counterpart image 701 at an angle calculated in the horizontal direction is shown.
 また、図13は、射影変換用の画像加工パラメータを算出するためのグラフの一例を示す図である。具体的には、水平距離(dx)と、水平方向射影変換幅W(dx)との関係の一例を示す。 FIG. 13 is a diagram showing an example of a graph for calculating image processing parameters for projective transformation. Specifically, an example of the relationship between the horizontal distance (dx) and the horizontal direction projection conversion width W (dx) is shown.
 射影変換用の画像加工パラメータは、相手画像701のどの方向にどの程度の傾斜で射影変換を行うかを示すパラメータである。ここでは、水平方向に射影変換するので、画像加工パラメータは、下記に示す2点である。 The image processing parameter for projective conversion is a parameter indicating in which direction of the counterpart image 701 and how much the projective conversion is performed. Here, since the projective transformation is performed in the horizontal direction, the image processing parameters are the following two points.
 射影変換用の画像加工パラメータ
  1.水平方向射影変換方向 … 相手画像701の左側又は右側
  2.水平方向射影変換幅  … W(dx)
Image processing parameters for projective transformation Horizontal projection conversion direction: Left side or right side of partner image 701 Horizontal projection conversion width ... W (dx)
 ここで、画像加工パラメータの1つである水平方向射影変換方向が相手画像701の左側であるとは、相手画像の左側が遠方にあるように射影変換することである。すなわち、相手画像701の左端の長さが、右端の長さより小さくなるように相手画像701を射影変換することである。逆に、水平方向射影変換方向が相手画像701の右側であるとは、相手画像の右側が遠方にあるように射影変換することである(図12(b))。すなわち、相手画像701の右端の長さが、左端の長さより小さくなるように相手画像701を射影変換することである。このように、射影変換の傾斜は、左端の長さと右端の長さとの比によって表される。 Here, the horizontal direction projective transformation direction as one of the image processing parameters being the left side of the partner image 701 is a projective transformation so that the left side of the partner image is far away. That is, projective transformation of the partner image 701 is performed so that the length of the left end of the partner image 701 is smaller than the length of the right end. On the other hand, the horizontal direction projective transformation direction being the right side of the partner image 701 is a projective transformation so that the right side of the partner image is far away (FIG. 12B). That is, projective transformation of the partner image 701 is performed so that the length of the right end of the partner image 701 is smaller than the length of the left end. Thus, the slope of the projective transformation is represented by the ratio of the length at the left end to the length at the right end.
 図12に示すように、射影変換部624は、相手画像701の4点(左上TL、右上TR、左下BL、右下BR)が、それぞれ加工画像702の4点(左上TL’、右上TR’、左下BL’、右下BR’)に投影されるように射影変換を行う。なお、図12(b)は、水平方向射影変換方向が右側である場合の一例を示している。同図に示すように、水平方向射影変換幅W(dx)が大きいほど、右側が遠方にあるように射影変換することができる。 As shown in FIG. 12, the projective transformation unit 624 has four points (upper left TL, upper right TR, lower left BL, lower right BR) of the counterpart image 701, and four points (upper left TL ′, upper right TR ′) of the processed image 702. , Lower left BL ′, lower right BR ′). FIG. 12B shows an example in which the horizontal projection conversion direction is the right side. As shown in the figure, the projective transformation can be performed so that the right side is farther as the horizontal direction projection transformation width W (dx) is larger.
 パラメータ算出部623は、顔領域が存在する場合には、下記に従い、水平方向の画像加工パラメータを算出する。 The parameter calculation unit 623 calculates the image processing parameter in the horizontal direction according to the following when there is a face area.
 1.水平方向射影変換方向(相手画像701の左側又は右側)の決定
 パラメータ算出部623は、dxの正負符号に従い、水平方向射影変換方向を決定する。具体的には、パラメータ算出部623は、dxの符号が正の場合は右側、負の場合は左側に決定する。
1. Determination of horizontal projection conversion direction (left side or right side of counterpart image 701) The parameter calculation unit 623 determines the horizontal projection conversion direction according to the sign of dx. Specifically, the parameter calculation unit 623 determines the right side when the sign of dx is positive and the left side when the sign is negative.
 図5に示すように、dxが正であることは、入力画像301内の右側に顔領域303が存在すること、すなわち、図4に示すように、モニタ110の左側(カメラ101から見て右側)に話者201が存在することである。この場合、相手を左側から見ることになるので、相手画像701の右側ほど射影変換の傾斜を大きくすることで、話者201に右側が遠くに位置するように見せることができる。 As shown in FIG. 5, the fact that dx is positive means that the face region 303 exists on the right side in the input image 301, that is, as shown in FIG. ) Presents the speaker 201. In this case, since the partner is seen from the left side, the right side of the partner image 701 can be shown to the speaker 201 so that the right side is farther away by increasing the inclination of the projective transformation.
 2.水平方向射影変換幅W(dx)の算出
 図13に示すように、|dx|が閾値(th_x)より大きい場合には、パラメータ算出部623は、|dx|に比例する大きさのW(dx)を算出する。なお、パラメータ算出部623は、|dx|が閾値(th_x)未満である場合には、W(dx)=0とする。
2. Calculation of Horizontal Projection Conversion Width W (dx) As shown in FIG. 13, when | dx | is larger than a threshold value (th_x), parameter calculation unit 623 calculates W (dx | ) Is calculated. The parameter calculation unit 623 sets W (dx) = 0 when | dx | is less than the threshold value (th_x).
 なお、算出方法はこの例に限定されるわけではなく、水平距離又は垂直距離が大きくなるのに従い射影変換幅の値が大きくなる方法であればいかなる方法も利用可能である。つまり、水平距離|dx|と水平方向射影変換幅W(dx)とが正の相関関係を有していればよい。また、閾値th_xは0でもよい。 Note that the calculation method is not limited to this example, and any method can be used as long as the value of the projective transformation width increases as the horizontal distance or the vertical distance increases. That is, it is only necessary that the horizontal distance | dx | and the horizontal projection transformation width W (dx) have a positive correlation. The threshold th_x may be 0.
 以上のように、画像加工部608は、入力画像内において顔領域座標が基準座標の右側にある場合、相手画像の右端が左端より短くなるように、又は、入力画像内において顔領域座標が基準座標の左側にある場合、相手画像の左端が右端より短くなるように、相手画像を射影変換することで加工画像を生成する。このとき、画像加工部608は、顔領域座標と基準座標との水平方向の差分絶対値が大きい程、射影変換の傾斜が大きくなるように、言い換えると、右端の長さと左端の長さとの差が大きくなるように、相手画像を射影変換する。 As described above, when the face region coordinates are on the right side of the reference coordinates in the input image, the image processing unit 608 makes the right end of the counterpart image shorter than the left end, or the face region coordinates in the input image When it is on the left side of the coordinates, the processed image is generated by projective transformation of the partner image so that the left end of the partner image is shorter than the right end. At this time, the image processing unit 608 increases the inclination of the projective transformation as the absolute difference between the face region coordinates and the reference coordinates in the horizontal direction increases, in other words, the difference between the right end length and the left end length. Projective transformation of the partner image is performed so that becomes larger.
 一方、顔検出フラグを参照し、顔領域が存在しない場合については以下の通りである。 On the other hand, referring to the face detection flag, the case where the face area does not exist is as follows.
  1.水平方向射影変換方向 … 最後に顔領域が検出された際に算出した側
  2.水平方向射影変換幅  … H/K ( H:画像の垂直サイズ、K:1以上の固定値)
1. 1. Horizontal projection conversion direction: The side calculated when the face area was detected last. Horizontal projection conversion width ... H / K (H: vertical size of image, K: fixed value of 1 or more)
 つまり、実施の形態1と同様に、判定部121は、バッファ122から直前の顔領域座標を読み出し、読み出した顔領域座標をパラメータ算出部623に出力する。パラメータ算出部623は、読み出された顔領域座標を用いて、水平方向射影変換方向を決定する。水平方向射影変換幅は、上記のように固定値である。 That is, as in the first embodiment, the determination unit 121 reads the previous face area coordinates from the buffer 122 and outputs the read face area coordinates to the parameter calculation unit 623. The parameter calculation unit 623 determines the horizontal projection conversion direction using the read face area coordinates. The horizontal projection transformation width is a fixed value as described above.
 このようにすることで、顔領域が存在しない場合であっても、水平方向に一定角度で射影変換を行うことが可能である。 By doing in this way, even if there is no face area, it is possible to perform projective transformation at a constant angle in the horizontal direction.
 以上のように、画像加工部608は、顔領域が入力画像内にないことを顔検出フラグが示す場合、バッファ122に記憶された顔領域座標が基準座標の右側にある場合には相手画像の右端が左端より短くなるように、又は、バッファ122に記憶された顔領域座標が基準座標の左側にある場合には相手画像の左端が右端より短くなるように、予め定められた傾斜で相手画像を射影変換する。 As described above, when the face detection flag indicates that the face area is not included in the input image, the image processing unit 608 determines that the image of the partner image is in the case where the face area coordinates stored in the buffer 122 are on the right side of the reference coordinates. The other image with a predetermined inclination so that the right end is shorter than the left end, or when the face area coordinates stored in the buffer 122 are on the left side of the reference coordinates, the left end of the other image is shorter than the right end. Is projectively transformed.
 なお、垂直方向についても同様に画像加工パラメータを算出することが可能であり、水平方向の射影変換を行う代わりに、垂直方向の射影変換を行ってもよい。あるいは、水平及び垂直両方向の射影変換を利用することも可能である。 It should be noted that image processing parameters can be calculated in the same way in the vertical direction, and instead of performing horizontal projective conversion, vertical projective conversion may be performed. Alternatively, both horizontal and vertical projective transformations can be used.
 垂直方向を用いる場合は、画像加工パラメータは下記に示す2点である。 When using the vertical direction, the image processing parameters are the following two points.
 画像加工パラメータ
  3.垂直方向射影変換方向 … 相手画像701の上端又は下端
  4.垂直方向射影変換幅  … H(dy)
 以上のように、画像加工部608は、入力画像内において顔領域座標が基準座標の上側にある場合、相手画像の下端が上端より短くなるように、又は、入力画像内において顔領域座標が基準座標の下側にある場合、相手画像の上端が下端より短くなるように、相手画像を射影変換することで加工画像を生成する。このとき、画像加工部608は、顔領域座標と基準座標との垂直方向の差分絶対値が大きい程、射影変換の傾斜が大きくなるように、言い換えると、上端の長さと下端の長さとの差が大きくなるように、相手画像を射影変換する。
2. Image processing parameters 3. Vertical projection conversion direction: upper end or lower end of the partner image 701 Vertical projection transformation width ... H (dy)
As described above, when the face area coordinates are above the reference coordinates in the input image, the image processing unit 608 makes the lower end of the counterpart image shorter than the upper end, or the face area coordinates in the input image When it is below the coordinates, the processed image is generated by projective transformation of the partner image so that the upper end of the partner image is shorter than the lower end. At this time, the image processing unit 608 increases the inclination of the projective transformation as the absolute value of the vertical difference between the face area coordinates and the reference coordinates increases, in other words, the difference between the upper end length and the lower end length. Projective transformation of the partner image is performed so that becomes larger.
 また、画像加工部608は、顔領域が入力画像内にないことを顔検出フラグが示す場合、バッファ122に記憶された顔領域座標が基準座標の上側にある場合には相手画像の下端が上端より短くなるように、又は、バッファ122に記憶された顔領域座標が基準座標の下側にある場合には相手画像の上端が下端より短くなるように、予め定められた傾斜で相手画像を射影変換する。 In addition, when the face detection flag indicates that the face area is not included in the input image, the image processing unit 608 determines that the lower end of the partner image is the upper end when the face area coordinates stored in the buffer 122 are above the reference coordinates. The partner image is projected at a predetermined inclination so that the upper end of the partner image is shorter than the lower end when the face area coordinates stored in the buffer 122 are below the reference coordinates. Convert.
 以上のように、本実施の形態では、顔検出部103が顔領域の中心座標を検出し、画像加工部608が中心座標と基準座標との水平及び垂直方向の距離及び方向に従い、相手画像に対しての射影変換を行う。 As described above, in the present embodiment, the face detection unit 103 detects the center coordinates of the face region, and the image processing unit 608 applies the distance and direction in the horizontal and vertical directions between the center coordinates and the reference coordinates to the partner image. Projective transformation is performed.
 具体的には、本実施の形態の画像通信装置600は、話者201がモニタ110の左側に居る場合、すなわち、入力画像301内で顔領域303の中心座標が基準座標より右側にある場合、相手画像701の左端が右端より短くなるように、相手画像701を射影変換する。このように、相手を左側から見る場合に相手の右側が遠方にあるように表示させることで、自分(話者)が中心から左側に居ることを認識することができる。右側、上側及び下側の場合も同様である。 Specifically, in the image communication apparatus 600 of the present embodiment, when the speaker 201 is on the left side of the monitor 110, that is, when the center coordinates of the face area 303 are on the right side of the reference coordinates in the input image 301, Projective transformation is performed on the partner image 701 so that the left end of the partner image 701 is shorter than the right end. In this way, when the partner is viewed from the left side, it is possible to recognize that the right side of the partner is far from the center so that the user (speaker) is on the left side from the center. The same applies to the right side, upper side and lower side.
 図14は、実施の形態2に係る話者が移動したときの相手画像の変化の様子を示す模式図である。 FIG. 14 is a schematic diagram showing how the partner image changes when the speaker according to the second embodiment moves.
 図14(a)に示すように、話者201がモニタ110に向かって左側に居る場合、相手画像701の右端が左端より短くなるように、すなわち、相手画像701の右側が遠くに見えるように、相手画像701は射影変換されている。つまり、モニタ110には、相手画像701の右側が遠くに見えるように射影変換されて生成された加工画像702が表示されている。 As shown in FIG. 14A, when the speaker 201 is on the left side facing the monitor 110, the right end of the partner image 701 is shorter than the left end, that is, the right side of the partner image 701 is seen far away. The partner image 701 has undergone projective transformation. That is, the monitor 110 displays a processed image 702 generated by projective transformation so that the right side of the partner image 701 can be seen in the distance.
 話者201は、モニタ110に表示されている加工画像702において相手画像701の射影変換の程度(すなわち、傾き)と方向とを見ることで、どちらの方向にどの程度移動すればよいかを判断することができる。例えば、図14(a)に示す例では、話者201は右側へ移動すればよい。 The speaker 201 determines how much to move in which direction by looking at the degree (ie, tilt) and direction of the projective transformation of the partner image 701 in the processed image 702 displayed on the monitor 110. can do. For example, in the example shown in FIG. 14A, the speaker 201 may move to the right side.
 図14(b)に示すように、話者201が右側へ移動することで、モニタ110に表示される加工画像702において、相手画像701の射影変換の程度(傾き)が小さくなっている。これにより、話者201は、適切な方向に移動していることを確認することができるとともに、まだ適正な位置(すなわち、基準位置)には居ないことが分かる。 As shown in FIG. 14B, when the speaker 201 moves to the right side, in the processed image 702 displayed on the monitor 110, the degree (inclination) of the projective transformation of the counterpart image 701 is reduced. As a result, the speaker 201 can confirm that the speaker 201 is moving in an appropriate direction, and it can be understood that the speaker 201 is not yet in an appropriate position (that is, a reference position).
 以上のように、本実施の形態の画像通信装置600は、話者201の位置に応じて、具体的には、カメラ101によって撮像された撮像画像内の話者201の顔領域の位置に応じて、相手画像701を加工する。具体的には、画像通信装置600は、話者201の位置が基準位置から離れているほど大きな傾斜で相手画像701を射影変換することで、加工画像702を生成する。 As described above, the image communication apparatus 600 according to the present embodiment corresponds to the position of the speaker 201, specifically, the position of the face area of the speaker 201 in the captured image captured by the camera 101. Then, the partner image 701 is processed. Specifically, the image communication apparatus 600 generates a processed image 702 by projectively transforming the partner image 701 with a larger inclination as the position of the speaker 201 is farther from the reference position.
 これにより、話者はモニタに自画像を表示することなく、モニタ上に射影変換された相手画像の傾き方向と傾きの大きさとを見て、話者が画面中央に対してどの方向にどれだけずれているかを判別することが可能である。 As a result, the speaker does not display the self-portrait on the monitor, but looks at the tilt direction and the tilt magnitude of the projected image on the monitor, and how much the speaker is displaced in which direction with respect to the center of the screen. Can be determined.
 さらに、本実施の形態では、顔検出部103が顔領域を検出できない場合には、画像加工部608は顔検出フラグを参照し、相手画像に対して最後の顔検出時に設定した側に一定角度の射影変換を行う。 Furthermore, in the present embodiment, when the face detection unit 103 cannot detect a face area, the image processing unit 608 refers to the face detection flag, and sets a fixed angle on the side set at the time of the last face detection with respect to the partner image. Perform a projective transformation.
 これにより、話者はモニタに自画像を表示することなく、モニタ上に表示された相手画像が一定角度で射影変換されていることを見て、話者の顔領域が撮像されていないことと、撮像エリアのどちらにずれているかを判別することが可能である。具体的には、本実施の形態の画像通信装置600によれば、話者が適正位置から左側にずれている場合、相手画像の右側が遠くに見え、話者が適正位置から右側にずれている場合は、相手画像の左側が遠くに見えるように、相手画像を加工する。このため、話者は、相手画像のどの部分が遠くに見えるかを判断するだけで、自分がどれだけ適正位置からずれているかを判断することができる。 As a result, the speaker does not display the self-portrait on the monitor, but sees that the other party's image displayed on the monitor is projective transformed at a constant angle, and the speaker's face area is not imaged, It is possible to determine which of the imaging areas is shifted. Specifically, according to the image communication apparatus 600 of the present embodiment, when the speaker is shifted from the proper position to the left side, the right side of the partner image appears far and the speaker is shifted from the proper position to the right side. If so, the partner image is processed so that the left side of the partner image can be seen far away. For this reason, the speaker can determine how much he / she is deviated from the appropriate position only by determining which part of the partner image is seen far away.
 なお、バッファ122は、最後に顔領域が検出されたときの中心座標を記憶するのではなく、最後に顔領域が検出されたときにパラメータ算出部123が算出した画像加工パラメータを記憶してもよい。このとき、顔領域が検出されない場合、画像重畳部124は、一定角度で射影変換するのではなく、バッファ122に記憶された射影変換幅で相手画像を射影変換してもよい。 Note that the buffer 122 does not store the center coordinates when the face area was last detected, but may store the image processing parameters calculated by the parameter calculation unit 123 when the face area was last detected. Good. At this time, when the face area is not detected, the image superimposing unit 124 may perform the projective conversion on the partner image with the projective conversion width stored in the buffer 122 instead of performing the projective conversion at a constant angle.
 あるいは、本実施の形態では、水平方向と垂直方向とのそれぞれの差分の正負に基づいて射影変換する方向を決定するので、バッファ122は、画像加工パラメータのうち、水平方向射影変換方向と垂直方向射影変換方向とのみを記憶していてもよい。 Alternatively, in the present embodiment, since the direction for projective transformation is determined based on the sign of the difference between the horizontal direction and the vertical direction, the buffer 122 includes the horizontal direction projective transformation direction and the vertical direction among the image processing parameters. Only the projective transformation direction may be stored.
 (実施の形態3)
 実施の形態3の画像通信装置は、検出された話者の顔領域の半径と基準半径との差分に応じて、他の画像通信装置から受信した相手画像をぼかす装置である。
(Embodiment 3)
The image communication apparatus according to the third embodiment is an apparatus that blurs a partner image received from another image communication apparatus in accordance with the difference between the detected radius of the speaker's face area and the reference radius.
 図15は、実施の形態3の画像通信装置800の構成を示すブロック図である。同図に示す画像通信装置800は、図1の画像通信装置100と比較して、画像加工部108の代わりに画像加工部808を備える点が異なっている。図15において、図1と同一の参照符号を付した構成要素は、実施の形態1と同じ処理を行うため、ここでは説明を省略し、異なる点を中心に説明する。 FIG. 15 is a block diagram illustrating a configuration of the image communication apparatus 800 according to the third embodiment. The image communication apparatus 800 shown in the figure is different from the image communication apparatus 100 in FIG. 1 in that an image processing unit 808 is provided instead of the image processing unit 108. In FIG. 15, the components denoted by the same reference numerals as those in FIG. 1 perform the same processing as in the first embodiment, and thus description thereof will be omitted here and different points will be mainly described.
 画像加工部808は、相手画像を加工することで加工画像を生成する画像加工部の一例である。具体的には、画像加工部808は、顔領域のサイズと予め定められた基準サイズのとの差分絶対値が大きいほど、ぼかし量が大きくなるように相手画像をぼかすことで、加工画像を生成する。図15に示すように、画像加工部808は、判定部121と、バッファ122と、パラメータ算出部823と、ブラー処理部824とを備える。判定部121とバッファ122とは、実施の形態1と同じ処理を行うため説明を省略する。 The image processing unit 808 is an example of an image processing unit that generates a processed image by processing a counterpart image. Specifically, the image processing unit 808 generates a processed image by blurring the partner image so that the blur amount increases as the absolute value of the difference between the size of the face area and a predetermined reference size increases. To do. As illustrated in FIG. 15, the image processing unit 808 includes a determination unit 121, a buffer 122, a parameter calculation unit 823, and a blur processing unit 824. Since the determination unit 121 and the buffer 122 perform the same processing as in the first embodiment, the description thereof is omitted.
 パラメータ算出部823は、判定部121から入力される顔領域の半径と予め定められた基準半径との差分を算出し、算出した差分を用いて画像加工パラメータを算出する。具体的には、画像加工パラメータとして、画像をぼかす処理の1つであるブラー処理に用いられるブラーパラメータを算出する。ブラーパラメータの算出処理については、後述する。 The parameter calculation unit 823 calculates a difference between the radius of the face area input from the determination unit 121 and a predetermined reference radius, and calculates an image processing parameter using the calculated difference. Specifically, a blur parameter used for blur processing, which is one of the processes for blurring an image, is calculated as the image processing parameter. The blur parameter calculation process will be described later.
 ブラー処理部824は、ブラーパラメータを用いて、画像復号化部107によって生成された相手画像にブラー処理を実行することで、加工画像を生成する。 The blur processing unit 824 generates a processed image by performing blur processing on the counterpart image generated by the image decoding unit 107 using the blur parameter.
 以上のように、画像加工部808は、顔領域のサイズの一例である顔領域の半径と、基準サイズの一例である基準半径との差分絶対値を算出し、算出した差分絶対値が予め定められた閾値より大きい場合、差分絶対値が大きいほど相手画像と加工画像とが大きく異なるように相手画像を加工することで、加工画像を生成する。 As described above, the image processing unit 808 calculates the absolute difference between the radius of the face area, which is an example of the size of the face area, and the reference radius, which is an example of the reference size, and the calculated difference absolute value is determined in advance. When it is larger than the threshold value, the processed image is generated by processing the counterpart image so that the counterpart image and the processed image differ greatly as the difference absolute value increases.
 次いで、上記構成を有する画像通信装置800の画像受信時の動作について、図16に示すフローチャートを用いて説明する。図16は、実施の形態3の画像通信装置800の受信処理を示すフローチャートである。なお、画像送信時の動作については、実施の形態1(図2)と同様であるので、ここでは説明を省略する。 Next, the operation at the time of image reception of the image communication apparatus 800 having the above configuration will be described using the flowchart shown in FIG. FIG. 16 is a flowchart illustrating a reception process of the image communication apparatus 800 according to the third embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
 なお、図16に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(図示せず)によって制御される。図16において、図3と同じ参照符号を付与した処理ステップは、図3と同一の動作であり、説明を省略する。 The operation of the flowchart shown in FIG. 16 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown). In FIG. 16, the processing steps given the same reference numerals as those in FIG. 3 are the same operations as those in FIG.
 実施の形態1と同様に、判定部121によって、入力画像に顔領域が存在すると判定された場合(S202でYes)、判定部121は、顔検出部103から入力される顔領域の半径Rをパラメータ算出部823に出力する。このとき、半径Rをバッファ122に記憶しておく。 As in the first embodiment, when the determination unit 121 determines that a face area exists in the input image (Yes in S202), the determination unit 121 determines the radius R of the face area input from the face detection unit 103. It outputs to the parameter calculation part 823. At this time, the radius R is stored in the buffer 122.
 顔領域が存在しない場合(S202でNo)、判定部121は、バッファ122から顔領域の半径Rを読み出し(S403)、読み出した顔領域の半径Rをパラメータ算出部823に出力する。バッファ122には、上述したように、顔領域が最後に検出されたときの顔領域の半径Rが記憶されており、以下に示す処理により、最後に検出された顔領域に基づいてブラー処理を実行する。 When the face area does not exist (No in S202), the determination unit 121 reads the radius R of the face area from the buffer 122 (S403), and outputs the read radius R of the face area to the parameter calculation unit 823. As described above, the buffer 122 stores the radius R of the face area when the face area was last detected, and performs the blur process based on the last detected face area by the following process. Execute.
 パラメータ算出部823は、判定部121から入力される半径Rと基準半径R0との差分を算出する(S404)。 The parameter calculation unit 823 calculates the difference between the radius R input from the determination unit 121 and the reference radius R0 (S404).
 次に、パラメータ算出部823は、算出した差分の絶対値が所定の閾値より大きい場合(S405でYes)、算出した差分の絶対値に基づいて画像加工パラメータを算出する(S406)。算出した画像加工パラメータを用いて、ブラー処理部824は、相手画像をぼかす(S407)。例えば、ブラーパラメータを用いて相手画像にブラー処理を実行する。そして、ブラー処理により生成した加工画像を画像出力部109に出力する。 Next, when the calculated absolute value of the difference is larger than a predetermined threshold (Yes in S405), the parameter calculation unit 823 calculates an image processing parameter based on the calculated absolute value of the difference (S406). Using the calculated image processing parameter, the blur processing unit 824 blurs the partner image (S407). For example, blur processing is executed on the partner image using the blur parameter. Then, the processed image generated by the blur processing is output to the image output unit 109.
 算出した差分の絶対値が閾値以下である場合(S405でNo)、ブラー処理部824は、入力される相手画像にブラー処理を実行することなく、相手画像をそのまま画像出力部109に出力する。 If the absolute value of the calculated difference is equal to or smaller than the threshold (No in S405), the blur processing unit 824 outputs the partner image as it is to the image output unit 109 without performing blur processing on the input partner image.
 最後に、画像出力部109は、ブラー処理部824から入力される加工画像(又は、相手画像)をモニタ110に出力し、モニタ110に表示させる(S209)。 Finally, the image output unit 109 outputs the processed image (or the partner image) input from the blur processing unit 824 to the monitor 110 and displays it on the monitor 110 (S209).
 続いて、画像加工部808が行う画像加工パラメータの算出処理と、ブラー処理との詳細について説明する。 Next, details of the image processing parameter calculation processing and blur processing performed by the image processing unit 808 will be described.
 画像加工部808は、顔検出部103より入力された顔検出フラグと、顔の半径情報とを用いて、ブラー処理用の画像加工パラメータを算出する。 The image processing unit 808 calculates an image processing parameter for blur processing using the face detection flag input from the face detection unit 103 and the face radius information.
 まず、顔検出フラグを参照し、顔領域が存在する場合には、パラメータ算出部823は、以下のようにして、顔領域の半径Rを用いてブラー処理するためのブラーパラメータαを決定する。 First, referring to the face detection flag, if a face area exists, the parameter calculation unit 823 determines a blur parameter α for blur processing using the radius R of the face area as follows.
 まず、パラメータ算出部823は、式3を用いて予め決められた基準半径R0と半径Rとの差分絶対値drを算出する。 First, the parameter calculation unit 823 calculates a difference absolute value dr between a reference radius R0 and a radius R determined in advance using Expression 3.
 (式3) dr=|R-R0| (Formula 3) dr = | R−R0 |
 次に、パラメータ算出部823は、算出した差分絶対値drを用いて、ブラーパラメータαを決定する。図17は、ブラーパラメータαを算出するためのグラフの一例を示す図である。図17において、drは顔領域の半径Rと基準半径R0との差分絶対値である。α(dr)は、ブラーパラメータである。th_rは、予め定められた閾値である。α_maxは、最大ブラーパラメータである。図17に示すように、半径の差分絶対値drが閾値th_rを超える場合、ブラーパラメータαは、差分絶対値drに比例して大きくなる特徴を持つ。 Next, the parameter calculation unit 823 determines the blur parameter α using the calculated difference absolute value dr. FIG. 17 is a diagram illustrating an example of a graph for calculating the blur parameter α. In FIG. 17, dr is an absolute difference between the radius R of the face area and the reference radius R0. α (dr) is a blur parameter. th_r is a predetermined threshold value. α_max is a maximum blur parameter. As shown in FIG. 17, when the absolute difference value dr of the radius exceeds the threshold th_r, the blur parameter α has a feature that increases in proportion to the absolute difference value dr.
 なお、ブラーパラメータαは、半径Rが基準半径R0から大きく異なるほど大きな値となればよい。すなわち、ブラーパラメータαは、差分絶対値drと正の相関関係を有していればよい。また、閾値th_rは、0でもよい。 Note that the blur parameter α only needs to be larger as the radius R is significantly different from the reference radius R0. That is, the blur parameter α only needs to have a positive correlation with the difference absolute value dr. The threshold th_r may be 0.
 続いて、ブラー処理部824は、式4を用いて相手画像を加工することで加工画像を生成する。なお、式4は、水平方向(x軸方向)にブラー処理を実行する場合に用いられる式である。 Subsequently, the blur processing unit 824 generates a processed image by processing the counterpart image using Expression 4. Expression 4 is an expression used when blur processing is executed in the horizontal direction (x-axis direction).
 (式4)out(x,y)={in(x-1,y)×α/2+in(x,y)×(α_max-α)+in(x+1,y)×α/2}/α_max (Formula 4) out (x, y) = {in (x-1, y) × α / 2 + in (x, y) × (α_max−α) + in (x + 1, y) × α / 2} / α_max
 式4において、out(x,y)は、ブラーリング後のx,y座標における画像の画素値である。in(x,y)は、相手画像のx,y座標における画素値である。ブラーパラメータα、及び、最大ブラーパラメータα_maxは、図17に示すグラフを用いてパラメータ算出部823が算出した値である。 In Expression 4, out (x, y) is a pixel value of the image in the x, y coordinates after blurring. in (x, y) is a pixel value in the x and y coordinates of the counterpart image. The blur parameter α and the maximum blur parameter α_max are values calculated by the parameter calculation unit 823 using the graph shown in FIG.
 このように、ブラーパラメータαが大きくなるにつれて、周辺画素の積分和の値が大きくなり、加工画像は、相手画像よりもぼけた画像とすることができる。なお、ブラー処理部824は、水平だけでなく垂直方向、あるいは水平垂直両方向に対してもブラー処理することも可能である。 Thus, as the blur parameter α increases, the value of the integral sum of the surrounding pixels increases, and the processed image can be an image that is more blurred than the counterpart image. Note that the blur processing unit 824 can perform blur processing not only in the horizontal direction but also in the vertical direction or in both the horizontal and vertical directions.
 図18は、実施の形態3の入力画像901及び911と相手画像921と加工画像922との一例を示す図である。図18(a)は、入力画像901内の話者画像902の顔領域903が基準半径R0より大きい場合を示す図である。図18(b)は、入力画像911内の話者画像912の顔領域913が基準半径R0より小さい場合を示す図である。図18(c)は、実施の形態3のブラー処理を実行する前の相手画像921の一例を示す図である。図18(d)は、ブラー処理を実行後の加工画像922の一例を示す図である。 FIG. 18 is a diagram illustrating an example of the input images 901 and 911, the counterpart image 921, and the processed image 922 according to the third embodiment. FIG. 18A is a diagram illustrating a case where the face area 903 of the speaker image 902 in the input image 901 is larger than the reference radius R0. FIG. 18B is a diagram showing a case where the face area 913 of the speaker image 912 in the input image 911 is smaller than the reference radius R0. FIG. 18C is a diagram illustrating an example of the partner image 921 before the blur processing according to the third embodiment is executed. FIG. 18D is a diagram illustrating an example of the processed image 922 after the blur processing is executed.
 例えば、図4において、話者201がカメラ101に近づきすぎている場合、図18(a)に示すように、顔検出部103が検出した顔領域903の半径Rは、基準半径R0より大きくなる。逆に、話者201がカメラ101から遠ざかりすぎている場合、図18(b)に示すように、顔検出部103が検出した顔領域913の半径Rは、基準半径R0より小さくなる。 For example, in FIG. 4, when the speaker 201 is too close to the camera 101, the radius R of the face area 903 detected by the face detection unit 103 is larger than the reference radius R0, as shown in FIG. . Conversely, when the speaker 201 is too far away from the camera 101, the radius R of the face region 913 detected by the face detection unit 103 is smaller than the reference radius R0, as shown in FIG.
 いずれかの場合において、半径Rと基準半径R0との差分drが閾値th_rを超えている場合、パラメータ算出部823は、図17に示すグラフに従ってブラーパラメータαを決定する。そして、ブラー処理部824は、決定したブラーパラメータαを用いて、相手画像921にブラー処理を実行することで、加工画像922を生成する。 In any case, when the difference dr between the radius R and the reference radius R0 exceeds the threshold th_r, the parameter calculation unit 823 determines the blur parameter α according to the graph shown in FIG. Then, the blur processing unit 824 generates a processed image 922 by performing blur processing on the counterpart image 921 using the determined blur parameter α.
 以上のように、本実施の形態では、画像加工部808は、顔領域の半径と基準半径との差分絶対値が大きくなるにつれてブラー処理に用いるブラーパラメータαの値が大きくなるように制御することで、相手画像へのぼかし度合いを大きくする。 As described above, in the present embodiment, the image processing unit 808 performs control so that the value of the blur parameter α used for blur processing increases as the difference absolute value between the radius of the face area and the reference radius increases. Then, increase the degree of blur on the partner image.
 これにより、話者は、モニタに自画像が表示されていなくても、モニタ上に表示される相手画像のボケの程度を見て、話者とカメラとの距離が適正でなく、顔の大きさが基準半径から大きく異なっていることを判別することが可能である。具体的には、本実施の形態の画像通信装置800によれば、話者がカメラに近づきすぎている、又は、遠ざかりすぎている場合に、相手画像の顔がボケる。このため、話者は、相手画像を見るだけで、自分がカメラに対して適切な位置にいるのか否かを判断することができる。 As a result, even if the self-portrait is not displayed on the monitor, the speaker looks at the degree of blurring of the partner image displayed on the monitor, the distance between the speaker and the camera is not appropriate, and the size of the face Can be distinguished from the reference radius. Specifically, according to the image communication apparatus 800 of the present embodiment, the face of the partner image is blurred when the speaker is too close to the camera or too far away. For this reason, the speaker can determine whether or not he / she is at an appropriate position with respect to the camera only by looking at the partner image.
 また、顔検出部103が顔領域を検出できなかった場合であっても、過去に検出された顔領域の半径を用いて相手画像をぼかすことができるので、話者は自分が適切な位置に居ないことを確認することができる。 Even if the face detection unit 103 cannot detect the face area, the other party image can be blurred using the radius of the face area detected in the past. You can confirm that you are not there.
 なお、顔領域が検出されなかった場合、固定値のブラーパラメータαを用いて相手画像をぼかしてもよい。 If no face area is detected, the partner image may be blurred using a fixed blur parameter α.
 (実施の形態4)
 実施の形態4の画像通信装置は、検出された顔領域の半径と基準半径との差分に応じて、他の画像通信装置から受信した相手画像を拡大又は縮小することで、相手画像のサイズを変更する装置である。
(Embodiment 4)
The image communication device according to the fourth embodiment enlarges or reduces the size of the partner image received from another image communication device according to the difference between the detected radius of the face area and the reference radius, thereby reducing the size of the partner image. It is a device to change.
 なお、図19は実施の形態4の画像通信装置1000の構成を示すブロック図である。同図に示す画像通信装置1000は、図15の画像通信装置800と比較して、画像加工部808の代わりに画像加工部1008を備える点が異なっている。図19において、図15と同一の参照符号を付した構成要素は、実施の形態3と同じ処理を行うため、ここでは説明を省略し、異なる点を中心に説明する。 FIG. 19 is a block diagram showing the configuration of the image communication apparatus 1000 according to the fourth embodiment. The image communication apparatus 1000 shown in the figure is different from the image communication apparatus 800 in FIG. 15 in that an image processing unit 1008 is provided instead of the image processing unit 808. 19, components having the same reference numerals as those in FIG. 15 perform the same processes as those in the third embodiment, and thus description thereof will be omitted here and different points will be mainly described.
 画像加工部1008は、相手画像を加工することで加工画像を生成する画像加工部の一例である。具体的には、画像加工部1008は、顔領域のサイズが基準サイズより大きい場合には相手画像を拡大し、顔領域のサイズが基準サイズより小さい場合には相手画像を縮小することで、加工画像を生成する。このとき、顔領域のサイズと基準サイズとの差分絶対値が大きいほど、拡大率又は縮小率が大きくなる。 The image processing unit 1008 is an example of an image processing unit that generates a processed image by processing the counterpart image. Specifically, the image processing unit 1008 enlarges the partner image when the size of the face region is larger than the reference size, and reduces the partner image when the size of the face region is smaller than the reference size. Generate an image. At this time, the larger the difference absolute value between the size of the face area and the reference size, the larger the enlargement ratio or reduction ratio.
 図19に示すように、画像加工部1008は、判定部121と、バッファ122と、パラメータ算出部1023と、サイズ変更部1024とを備える。判定部121とバッファ122とは、実施の形態3と同じ処理を行うため説明を省略する。 As shown in FIG. 19, the image processing unit 1008 includes a determination unit 121, a buffer 122, a parameter calculation unit 1023, and a size changing unit 1024. The determination unit 121 and the buffer 122 perform the same processing as in the third embodiment, and thus description thereof is omitted.
 パラメータ算出部1023は、判定部121から入力される顔領域の半径と予め定められた基準半径との差分を算出し、算出した差分を用いて画像加工パラメータを算出する。具体的には、画像加工パラメータとして、相手画像のサイズを変更するための拡大縮小パラメータZを決定する。拡大縮小パラメータZは、具体的には、拡大率又は縮小率である。拡大縮小パラメータZが1より小さい値である場合に、相手画像は縮小され、1より大きい値である場合に、相手画像は拡大される。 The parameter calculation unit 1023 calculates a difference between the radius of the face area input from the determination unit 121 and a predetermined reference radius, and calculates an image processing parameter using the calculated difference. Specifically, an enlargement / reduction parameter Z for changing the size of the counterpart image is determined as the image processing parameter. Specifically, the enlargement / reduction parameter Z is an enlargement rate or a reduction rate. When the enlargement / reduction parameter Z is a value smaller than 1, the counterpart image is reduced. When the enlargement / reduction parameter Z is a value larger than 1, the counterpart image is enlarged.
 サイズ変更部1024は、拡大縮小パラメータを用いて、画像復号化部107によって生成された相手画像のサイズを変更することで、加工画像を生成する。なお、拡大縮小パラメータは、生成した加工画像と相手画像との水平方向のサイズ比(又は、垂直方向のサイズ比)でもある。 The size changing unit 1024 generates a processed image by changing the size of the counterpart image generated by the image decoding unit 107 using the enlargement / reduction parameter. The enlargement / reduction parameter is also a horizontal size ratio (or a vertical size ratio) between the generated processed image and the counterpart image.
 次いで、上記構成を有する画像通信装置1000の画像受信時の動作について、図20に示すフローチャートを用いて説明する。図20は、実施の形態4の画像通信装置1000の受信処理を示すフローチャートである。なお、画像送信時の動作については、実施の形態1(図2)と同様であるので、ここでは説明を省略する。 Next, the operation at the time of image reception of the image communication apparatus 1000 having the above configuration will be described using the flowchart shown in FIG. FIG. 20 is a flowchart illustrating a reception process of the image communication apparatus 1000 according to the fourth embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
 なお、図20に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(図示せず)によって制御される。図20において、図16と同じ参照符号を付与した処理ステップは、図16と同一の動作であり、説明を省略する。 The operation of the flowchart shown in FIG. 20 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown). In FIG. 20, the processing steps to which the same reference numerals as those in FIG. 16 are given are the same operations as those in FIG.
 実施の形態3と同様に、パラメータ算出部1023は、判定部121から入力される半径Rと基準半径R0との差分を算出する(S404)。 As in the third embodiment, the parameter calculation unit 1023 calculates the difference between the radius R input from the determination unit 121 and the reference radius R0 (S404).
 パラメータ算出部1023は、算出した差分が所定の第1閾値より大きい場合(S505で“第1閾値より大きい”)、算出した差分に基づいて画像加工パラメータを算出する(S506)。そして、算出した画像加工パラメータを用いて、サイズ変更部1024は、相手画像を拡大することで加工画像を生成する(S507)。そして、サイズ変更部1024は、生成した加工画像を画像出力部109に出力する。 The parameter calculation unit 1023 calculates an image processing parameter based on the calculated difference when the calculated difference is larger than the predetermined first threshold (“larger than the first threshold” in S505) (S506). Then, using the calculated image processing parameter, the size changing unit 1024 generates a processed image by enlarging the partner image (S507). Then, the size changing unit 1024 outputs the generated processed image to the image output unit 109.
 パラメータ算出部1023は、算出した差分が所定の第2閾値より小さい場合(S505で“第2閾値より小さい”)、算出した差分に基づいて画像加工パラメータを算出する(S508)。そして、算出した画像加工パラメータを用いて、サイズ変更部1024は、相手画像を縮小することで加工画像を生成する(S509)。そして、サイズ変更部1024は、生成した加工画像を画像出力部109に出力する。 The parameter calculation unit 1023 calculates an image processing parameter based on the calculated difference when the calculated difference is smaller than the predetermined second threshold (“smaller than the second threshold” in S505) (S508). Then, using the calculated image processing parameter, the size changing unit 1024 generates a processed image by reducing the partner image (S509). Then, the size changing unit 1024 outputs the generated processed image to the image output unit 109.
 算出した差分が第2閾値以上かつ第1閾値以下の場合(S505で“第2閾値以上第1閾値以下”)、サイズ変更部1024は、入力される相手画像のサイズを変更することなく、相手画像をそのまま画像出力部109に出力する。 When the calculated difference is greater than or equal to the second threshold and less than or equal to the first threshold (“second threshold and greater than or equal to the first threshold” in S505), the size changing unit 1024 does not change the size of the input partner image. The image is output as it is to the image output unit 109.
 最後に、画像出力部109は、サイズ変更部1024から入力される加工画像(又は、相手画像)をモニタ110に出力し、モニタ110に表示させる(S209)。 Finally, the image output unit 109 outputs the processed image (or the partner image) input from the size changing unit 1024 to the monitor 110 and displays it on the monitor 110 (S209).
 続いて、画像加工部1008が行う画像加工パラメータの算出処理と、サイズ変更処理との詳細について説明する。 Next, details of the image processing parameter calculation process and the size change process performed by the image processing unit 1008 will be described.
 画像加工部1008は、顔検出部103より入力された顔検出フラグと、顔の半径情報とを用いて、サイズ変更処理用の画像加工パラメータを算出する。 The image processing unit 1008 calculates the image processing parameters for the size change processing using the face detection flag input from the face detection unit 103 and the face radius information.
 まず、顔検出フラグを参照し、顔領域が存在する場合には、パラメータ算出部1023は、以下のようにして、顔の半径情報Rを用いて拡大縮小パラメータZを決定する。 First, referring to the face detection flag, if a face area exists, the parameter calculation unit 1023 determines the enlargement / reduction parameter Z using the face radius information R as follows.
 まず、パラメータ算出部1023は、式5を用いて予め決められた基準半径R0と半径との差分値dRを算出する。 First, the parameter calculation unit 1023 calculates a difference value dR between a reference radius R0 and a radius determined in advance using Equation 5.
 (式5) dR=R-R0 (Formula 5) dR = R−R0
 次に、パラメータ算出部1023は、算出した差分値dRを用いて、拡大縮小パラメータZを決定する。図21は、拡大縮小パラメータZを算出するためのグラフの一例を示す図である。図21において、dRは半径Rと基準半径R0との差分値、Z(dR)は拡大縮小パラメータ、th_Rは予め設定した閾値、Z_max及びZ_minはそれぞれ拡大縮小パラメータの最大値、最小値である。つまり、th_Rは図20に示す第1閾値に相当し、-th_Rは第2閾値に相当する。 Next, the parameter calculation unit 1023 determines the enlargement / reduction parameter Z using the calculated difference value dR. FIG. 21 is a diagram illustrating an example of a graph for calculating the enlargement / reduction parameter Z. In FIG. 21, dR is a difference value between the radius R and the reference radius R0, Z (dR) is an enlargement / reduction parameter, th_R is a preset threshold value, and Z_max and Z_min are the maximum value and the minimum value of the enlargement / reduction parameter, respectively. That is, th_R corresponds to the first threshold shown in FIG. 20, and −th_R corresponds to the second threshold.
 図21に示すように、差分値dRの絶対値が閾値th_R以下である場合には、拡大縮小パラメータは1となる。顔の半径Rが基準半径R0よりも大きく、差分値dRが閾値th_Rを超える場合には、Zは1以上の値となる。逆に、顔の半径Rが基準半径R0よりも小さく、差分値dRが閾値-th_Rを下回る場合には、Zは1以下の値となる。 As shown in FIG. 21, when the absolute value of the difference value dR is less than or equal to the threshold th_R, the enlargement / reduction parameter is 1. When the face radius R is larger than the reference radius R0 and the difference value dR exceeds the threshold th_R, Z is a value of 1 or more. On the other hand, when the face radius R is smaller than the reference radius R0 and the difference value dR is less than the threshold −th_R, Z is a value of 1 or less.
 なお、拡大するかしないかを決定する第1閾値(th_R)と、縮小するかしないかを決定する第2閾値(-th_R)との絶対値は等しくなくてもよい。また、第1閾値及び第2閾値は共に、0であってもよい。 Note that the absolute values of the first threshold value (th_R) for determining whether to enlarge or not and the second threshold value (-th_R) for determining whether to reduce or not should not be equal. Further, both the first threshold value and the second threshold value may be zero.
 このように決定した拡大縮小パラメータZを用いて、サイズ変更部1024は、式6と式7とを用いて相手画像のサイズを変更する。 Using the enlargement / reduction parameter Z determined in this way, the size changing unit 1024 changes the size of the counterpart image using Equation 6 and Equation 7.
 (式6)W_out=W_in×Z
 (式7)H_out=H_in×Z
(Expression 6) W_out = W_in × Z
(Expression 7) H_out = H_in × Z
 式6及び式7において、W_inは、相手画像の水平方向のサイズであり、H_inは、相手画像の垂直方向のサイズである。W_outは、拡大又は縮小により生成される加工画像の水平方向のサイズであり、H_outは、拡大又は縮小により生成される加工画像の垂直方向のサイズである。Zは、パラメータ算出部1023によって決定される拡大縮小パラメータである。 In Equations 6 and 7, W_in is the horizontal size of the counterpart image, and H_in is the vertical size of the counterpart image. W_out is the horizontal size of the processed image generated by enlargement or reduction, and H_out is the vertical size of the processed image generated by enlargement or reduction. Z is an enlargement / reduction parameter determined by the parameter calculation unit 1023.
 図22は、実施の形態4の入力画像1101及び1111と相手画像1121と加工画像1122及び1123との一例を示す図である。図22(a)は、入力画像1101内の話者画像1102の顔領域1103が基準半径R0より大きい場合を示す図である。図22(b)は、入力画像1111内の話者画像1112の顔領域1113が基準半径R0より小さい場合を示す図である。図22(c)は、実施の形態4のサイズ変更処理を実行する前の相手画像1121の一例を示す図である。図22(d)は、相手画像1121を拡大することで生成される加工画像1122の一例を示す図である。図22(e)は、相手画像1121を縮小することで生成される加工画像1123の一例を示す図である。 FIG. 22 is a diagram illustrating an example of the input images 1101 and 1111, the counterpart image 1121, and the processed images 1122 and 1123 according to the fourth embodiment. FIG. 22A is a diagram illustrating a case where the face area 1103 of the speaker image 1102 in the input image 1101 is larger than the reference radius R0. FIG. 22B is a diagram illustrating a case where the face area 1113 of the speaker image 1112 in the input image 1111 is smaller than the reference radius R0. FIG. 22C is a diagram illustrating an example of the partner image 1121 before executing the size changing process of the fourth embodiment. FIG. 22D is a diagram illustrating an example of the processed image 1122 generated by enlarging the partner image 1121. FIG. 22E is a diagram showing an example of a processed image 1123 generated by reducing the partner image 1121.
 図22(a)に示すように、差分値dRが正であり、かつ、閾値th_Rより大きい場合、相手画像1121を拡大し、画面の中央に配置することで、図22(d)に示す加工画像1122が生成される。なお、このとき、拡大により加工画像1122のサイズがモニタ110の画面より大きくなった場合、画面外の部分は削除される。 As shown in FIG. 22 (a), when the difference value dR is positive and larger than the threshold th_R, the other image 1121 is enlarged and arranged at the center of the screen, whereby the processing shown in FIG. 22 (d) is performed. An image 1122 is generated. At this time, when the size of the processed image 1122 becomes larger than the screen of the monitor 110 due to enlargement, the portion outside the screen is deleted.
 図22(b)に示すように、差分値dRが負であり、かつ、閾値th_Rより小さい場合、相手画像1121を縮小し、画面の中央に配置することで、図22(e)に示す加工画像1123が生成される。なお、このとき、縮小により加工画像1123のサイズがモニタ110の画面より小さくなった場合、余った領域には予め定められた画像(単一色の画像など)を重畳する。 As shown in FIG. 22B, when the difference value dR is negative and smaller than the threshold value th_R, the other image 1121 is reduced and placed in the center of the screen, whereby the processing shown in FIG. An image 1123 is generated. At this time, when the size of the processed image 1123 becomes smaller than the screen of the monitor 110 due to the reduction, a predetermined image (single color image or the like) is superimposed on the remaining area.
 以上のように、本実施の形態では、画像加工部1008は顔領域の半径を用いて、基準半径よりも大きい場合は相手画像を拡大表示し、逆に、基準半径よりも小さくなる場合については相手画像を縮小表示する。 As described above, in the present embodiment, the image processing unit 1008 uses the radius of the face region to enlarge and display the partner image when it is larger than the reference radius, and conversely, when it is smaller than the reference radius. The other party's image is reduced and displayed.
 これにより、話者は、モニタに自画像が表示されなくても、相手画像の拡大縮小にともない変化するモニタ上の相手画像の顔の大きさを見て、話者とカメラとの距離が適正でなく、顔の大きさが基準半径の範囲より大きい、又は、小さいことを判別することが可能である。具体的には、本実施の形態の画像通信装置1000によれば、話者がカメラに近づきすぎると、相手画像の顔が大きくなり、話者がカメラから遠ざかりすぎると、相手画像の顔が小さくなる。このため、話者は相手画像を見るだけで、自分の位置がカメラに対して近づきすぎているのか、遠ざかりすぎているのかを判断することができる。 As a result, even if the speaker does not display his / her own image on the monitor, the speaker's distance between the speaker and the camera is appropriate by looking at the size of the partner image's face on the monitor which changes as the partner image is enlarged or reduced. It is possible to determine that the size of the face is larger or smaller than the reference radius range. Specifically, according to the image communication apparatus 1000 of the present embodiment, when the speaker is too close to the camera, the face of the partner image becomes large, and when the speaker is too far away from the camera, the face of the partner image is small. Become. For this reason, the speaker can determine whether his / her position is too close or too far away from the camera simply by looking at the partner image.
 また、実施の形態3と同様に、顔検出部103が顔領域を検出できなかった場合、過去に検出された顔領域の半径を用いて相手画像のサイズを変更することができるので、話者は自分が適切な位置に居ないことを確認することができる。 Similarly to the third embodiment, when the face detection unit 103 cannot detect the face area, the size of the partner image can be changed using the radius of the face area detected in the past. Can confirm that he is not in the right position.
 なお、顔領域が検出されなかった場合、固定値の拡大率又は縮小率を用いて相手画像を拡大又は縮小してもよい。このとき、拡大するか縮小するかは、バッファ122に保持された過去の顔領域の半径が基準半径より大きいか小さいかに基づいて決定することができる。 If no face area is detected, the partner image may be enlarged or reduced using a fixed enlargement rate or reduction rate. At this time, whether to enlarge or reduce can be determined based on whether the radius of the past face area held in the buffer 122 is larger or smaller than the reference radius.
 (実施の形態5)
 実施の形態5の画像通信装置は、画像加工パラメータを算出するときに用いる基準座標と基準半径とを相手画像の顔領域座標と顔領域半径とを用いて設定する装置である。
(Embodiment 5)
The image communication apparatus according to the fifth embodiment is an apparatus that sets the reference coordinates and the reference radius used when calculating the image processing parameters using the face area coordinates and the face area radius of the counterpart image.
 図23は、実施の形態5の画像通信装置1200の構成を示すブロック図である。同図に示す画像通信装置1200は、図1の画像通信装置100と比較して、画像加工部108の代わりに画像加工部1208を備える点が異なっている。図23において、図1と同一の参照符号を付した構成要素は、実施の形態1と同じ処理を行うため、ここでは説明を省略し、異なる点を中心に説明する。 FIG. 23 is a block diagram illustrating a configuration of the image communication apparatus 1200 according to the fifth embodiment. The image communication apparatus 1200 shown in the figure is different from the image communication apparatus 100 in FIG. 1 in that an image processing unit 1208 is provided instead of the image processing unit 108. In FIG. 23, the constituent elements denoted by the same reference numerals as those in FIG. 1 perform the same processing as in the first embodiment, and therefore description thereof will be omitted here and different points will be mainly described.
 画像加工部1208は、相手画像を加工することで加工画像を生成する画像加工部の一例である。具体的には、画像加工部1208は、相手画像から顔領域を検出し、検出した顔領域の位置を示す顔領域座標を基準座標に設定する。図23に示すように、画像加工部1208は、判定部121と、バッファ122と、パラメータ算出部123と、画像重畳部124と、基準座標設定部1225とを備える。判定部121と、バッファ122と、パラメータ算出部123と、画像重畳部124とは、実施の形態1と同じ処理を行うため説明を省略する。 The image processing unit 1208 is an example of an image processing unit that generates a processed image by processing the counterpart image. Specifically, the image processing unit 1208 detects a face area from the partner image, and sets face area coordinates indicating the position of the detected face area as reference coordinates. As shown in FIG. 23, the image processing unit 1208 includes a determination unit 121, a buffer 122, a parameter calculation unit 123, an image superimposition unit 124, and a reference coordinate setting unit 1225. Since the determination unit 121, the buffer 122, the parameter calculation unit 123, and the image superimposing unit 124 perform the same processing as in the first embodiment, the description thereof is omitted.
 基準座標設定部1225は、画像復号化部107から入力される相手画像から顔領域を検出する。基準座標設定部1225は、顔検出部103と同様に、例えば、テンプレートマッチング又は楕円検出などの手法で顔領域を検出する。そして、基準座標設定部1225は、検出した顔領域の中心座標と半径とをそれぞれ、基準座標と基準半径としてパラメータ算出部123に出力する。また、相手画像を画像重畳部124に出力する。なお、相手画像と入力画像とのサイズが異なる場合は、基準座標と基準半径とを入力画像のサイズに合わせて補正する。 The reference coordinate setting unit 1225 detects a face area from the partner image input from the image decoding unit 107. Similar to the face detection unit 103, the reference coordinate setting unit 1225 detects a face region by a method such as template matching or ellipse detection. Then, the reference coordinate setting unit 1225 outputs the detected center coordinate and radius of the face area to the parameter calculation unit 123 as the reference coordinate and the reference radius, respectively. Further, the partner image is output to the image superimposing unit 124. When the size of the counterpart image and the input image are different, the reference coordinates and the reference radius are corrected according to the size of the input image.
 次いで、上記構成を有する画像通信装置1200の画像受信時の動作について、図24に示すフローチャートを用いて説明する。図24は、実施の形態5の画像通信装置1200の受信処理を示すフローチャートである。なお、画像送信時の動作については、実施の形態1(図2)と同様であるので、ここでは説明を省略する。 Next, the operation at the time of image reception of the image communication apparatus 1200 having the above configuration will be described using the flowchart shown in FIG. FIG. 24 is a flowchart illustrating a reception process of the image communication apparatus 1200 according to the fifth embodiment. Since the operation at the time of image transmission is the same as that in Embodiment 1 (FIG. 2), description thereof is omitted here.
 なお、図24に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(図示せず)によって制御される。図24において、図3と同じ参照符号を付与した処理ステップは、図3と同一の動作であり、説明を省略する。 The operation of the flowchart shown in FIG. 24 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown). In FIG. 24, processing steps assigned with the same reference numerals as those in FIG. 3 are the same operations as those in FIG.
 画像復号化部107は、他の画像通信装置からネットワーク111を介して受信された画像データを復号化することで相手画像を生成し、相手画像を基準座標設定部1225に出力する(S201)。 The image decoding unit 107 generates a partner image by decoding image data received from another image communication apparatus via the network 111, and outputs the partner image to the reference coordinate setting unit 1225 (S201).
 基準座標設定部1225は、相手画像から顔領域を検出することで、基準座標と基準半径とを設定する(S602)。なお、このとき、加工処理の種類によっては、基準座標と基準半径とのいずれか一方だけを設定してもよい。 The reference coordinate setting unit 1225 sets a reference coordinate and a reference radius by detecting a face area from the partner image (S602). At this time, only one of the reference coordinates and the reference radius may be set depending on the type of processing.
 以降、実施の形態1と同様に、加工画像が生成される(S202~S209)。 Thereafter, similarly to the first embodiment, a processed image is generated (S202 to S209).
 続いて、基準座標と基準半径の設定処理の詳細について説明する。 Next, details of the process for setting the reference coordinates and the reference radius will be described.
 基準座標設定部1225は、相手画像中から顔領域を検出し、顔領域の中心座標(X0,Y0)と半径R0とを算出し、算出した中心座標(X0,Y0)と半径R0とをそれぞれ、基準座標(x0,y0)と基準半径R0ととして設定する。なお、顔領域が検出できない場合は、基準座標の更新は行わないものとする。また、基準座標は水平座標のみ、垂直座標のみを更新することも可能である。 The reference coordinate setting unit 1225 detects the face area from the partner image, calculates the center coordinates (X0, Y0) and the radius R0 of the face area, and calculates the calculated center coordinates (X0, Y0) and the radius R0, respectively. , The reference coordinates (x0, y0) and the reference radius R0 are set. When the face area cannot be detected, the reference coordinates are not updated. In addition, it is possible to update only the horizontal coordinates and only the vertical coordinates as the reference coordinates.
 図25A及び図25Bは、実施の形態5の相手画像1301と入力画像1311とにおける基準座標と基準半径との関係を示す図である。図25Aは、相手画像1301の一例を示す図である。同図に示すように相手画像1301には、相手人物画像1302と顔領域1303とが含まれている。図25Bは、入力画像1311の一例を示す図である。同図に示すように入力画像1311には、話者画像1312と顔領域1313とが含まれている。 25A and 25B are diagrams showing the relationship between the reference coordinates and the reference radius in the counterpart image 1301 and the input image 1311 of the fifth embodiment. FIG. 25A is a diagram illustrating an example of the partner image 1301. As shown in the figure, the partner image 1301 includes a partner person image 1302 and a face area 1303. FIG. 25B is a diagram illustrating an example of the input image 1311. As shown in the figure, the input image 1311 includes a speaker image 1312 and a face area 1313.
 図25A及び図25Bに示すように、基準座標設定部1225は、相手画像1301から顔検出することで得られる中心座標(X0,Y0)と半径R0とを、入力画像1311の基準座標(x0,y0)と基準半径R0と設定する。 As shown in FIGS. 25A and 25B, the reference coordinate setting unit 1225 uses the center coordinates (X0, Y0) and the radius R0 obtained by detecting the face from the counterpart image 1301 as the reference coordinates (x0, y0) and the reference radius R0 are set.
 このように、本実施の形態では、基準座標設定部1225は、相手画像中の相手の顔領域を基準座標と基準半径とに設定し、顔検出部103により検出された話者の顔領域の顔領域座標とのズレ方向と差分値とに応じて、相手画像に対して重畳画像表示を行う。 As described above, in the present embodiment, the reference coordinate setting unit 1225 sets the partner face area in the partner image to the reference coordinate and the reference radius, and the speaker face area detected by the face detection unit 103 is set. The superimposed image is displayed on the partner image according to the deviation direction from the face area coordinates and the difference value.
 これにより、話者は、モニタに自画像が表示されなくても、モニタ上の重畳表示された相手画像を見て、相手の顔を基準として話者の顔領域がどちらにどれだけずれているかを判別可能である。 As a result, even if the speaker does not display the self-portrait on the monitor, the speaker sees the partner image superimposed on the monitor to determine how much the speaker's face area is shifted with respect to the partner's face. It can be determined.
 (実施の形態6)
 実施の形態6の画像通信装置は、他の画像通信装置に送信する画像よりも広範囲を撮像した入力画像から顔領域を検出する装置である。すなわち、実施の形態6の画像通信装置は、入力画像の一部を切り出し、切り出した画像を他の画像通信装置に送信する。
(Embodiment 6)
The image communication apparatus according to the sixth embodiment is an apparatus that detects a face area from an input image obtained by capturing a wider area than an image transmitted to another image communication apparatus. That is, the image communication apparatus according to the sixth embodiment cuts out a part of the input image and transmits the cut image to another image communication apparatus.
 図26は、実施の形態6の画像通信装置1400の構成を示すブロック図である。同図に示す画像通信装置1400は、図1の画像通信装置100と比較して、新たに画像切出部1412を備える点が異なっている。図26において、図1と同一の参照符号を付した構成要素は、実施の形態1と同じ処理を行うため、ここでは説明を省略し、異なる点を中心に説明する。 FIG. 26 is a block diagram illustrating a configuration of the image communication apparatus 1400 according to the sixth embodiment. The image communication apparatus 1400 shown in the figure is different from the image communication apparatus 100 shown in FIG. 1 in that an image cutout unit 1412 is newly provided. In FIG. 26, the constituent elements denoted by the same reference numerals as those in FIG. 1 perform the same processing as in the first embodiment, and therefore description thereof will be omitted here and different points will be mainly described.
 画像切出部1412は、顔検出部103を介して画像入力部102から入力される入力画像の一部を切り出し、切り出した画像を画像符号化部104に出力する。例えば、画像切出部1412は、基準座標を中心とする一定面積の領域、又は、入力画像の中心座標を中心とする一定面積の領域を切り出す。 The image cutout unit 1412 cuts out a part of the input image input from the image input unit 102 via the face detection unit 103 and outputs the cut out image to the image encoding unit 104. For example, the image cutout unit 1412 cuts out a region having a constant area centered on the reference coordinates or a region having a constant area centered on the center coordinates of the input image.
 次いで、上記構成を有する画像通信装置1400の画像送信時の動作について、図27に示すフローチャートを用いて説明する。図27は、実施の形態6の画像通信装置1400の送信処理を示すフローチャートである。なお、画像受信時の動作については、実施の形態1(図3)と同様であるので、ここでは説明を省略する。 Next, the operation at the time of image transmission of the image communication apparatus 1400 having the above configuration will be described with reference to the flowchart shown in FIG. FIG. 27 is a flowchart showing a transmission process of the image communication apparatus 1400 according to the sixth embodiment. Since the operation at the time of image reception is the same as that in Embodiment 1 (FIG. 3), description thereof is omitted here.
 なお、図27に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(図示せず)によって制御される。図27において、図2と同じ参照符号を付与した処理ステップは、図2と同一の動作である。 The operation of the flowchart shown in FIG. 27 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown). In FIG. 27, processing steps assigned with the same reference numerals as those in FIG. 2 are the same operations as those in FIG.
 画像入力部102は、カメラ101から、非圧縮の撮像画像をフレーム単位で取得し、顔検出部103に出力する(S101)。なお、カメラ101は、できるだけ広範囲の領域を撮像するのが好ましい。 The image input unit 102 acquires an uncompressed captured image from the camera 101 in units of frames, and outputs the acquired image to the face detection unit 103 (S101). Note that the camera 101 preferably captures as wide a region as possible.
 顔検出部103は、画像入力部102から取得された非圧縮の撮像画像からテンプレートマッチングや楕円検出などの手法により顔領域検出を行う(S102)。そして、顔検出部103は、顔検出フラグ、顔領域の中心座標、及び、顔領域の半径などの情報を算出し、算出した各情報を画像加工部108に出力すると共に、入力画像を画像切出部1412に出力する。 The face detection unit 103 performs face region detection from the uncompressed captured image acquired from the image input unit 102 by a method such as template matching or ellipse detection (S102). Then, the face detection unit 103 calculates information such as the face detection flag, the center coordinates of the face region, and the radius of the face region, and outputs the calculated information to the image processing unit 108 and also cuts the input image. Output to the output unit 1412.
 画像切出部1412は、顔検出部103を介して画像入力部102から取得された入力画像から、予め設定された送信画像領域を切り出し、切り出した送信画像を画像符号化部104に出力する(S703)。 The image cutout unit 1412 cuts out a preset transmission image area from the input image acquired from the image input unit 102 via the face detection unit 103 and outputs the cut out transmission image to the image encoding unit 104 ( S703).
 画像符号化部104は、画像切出部1412から入力された送信画像をH.264圧縮符号化方式を用いて符号化し、画像データを画像送信部105に出力する。画像送信部105は、画像符号化部104より入力された画像データをRTPなどのパケット伝送方式に従いRTPパケット化を行い、ネットワーク111に出力する(S103)。 The image encoding unit 104 converts the transmission image input from the image cutout unit 1412 to H.264. The image data is encoded using the H.264 compression encoding method, and the image data is output to the image transmission unit 105. The image transmitting unit 105 converts the image data input from the image encoding unit 104 into RTP packets according to a packet transmission method such as RTP, and outputs the packet to the network 111 (S103).
 図28は、実施の形態6の入力画像1501と送信画像1502との一例を示す図である。同図に示すように、送信画像1502は、入力画像1501の一部であり、画像切出部1412によって入力画像から切り出され、画像符号化部104に出力される。 FIG. 28 is a diagram illustrating an example of an input image 1501 and a transmission image 1502 according to the sixth embodiment. As shown in the figure, the transmission image 1502 is a part of the input image 1501, is cut out from the input image by the image cutout unit 1412, and is output to the image encoding unit 104.
 入力画像1501には、話者画像1503と顔領域1504とが存在するので、顔検出部103は、入力画像1501から顔領域の中心座標(x1,y1)と半径Rとを検出することができる。したがって、より広範囲の画像から顔領域を検出することができるので、実際の話者の位置に基づいた相手画像の加工を行うことができる。 Since the input image 1501 includes a speaker image 1503 and a face area 1504, the face detection unit 103 can detect the center coordinates (x1, y1) and the radius R of the face area from the input image 1501. . Therefore, since the face area can be detected from a wider range of images, the partner image can be processed based on the actual speaker position.
 以上のように、本実施の形態では、顔検出部103が、送信画像領域外も含む広範囲な領域から顔領域を検出し、画像加工部108が顔領域の中心座標と基準座標との水平及び垂直方向の距離及び方向に従い、相手画像に対して別の画像を重畳する。 As described above, in the present embodiment, the face detection unit 103 detects a face region from a wide range including the outside of the transmission image region, and the image processing unit 108 detects the horizontal and horizontal coordinates of the center coordinate and the reference coordinate of the face region. Another image is superimposed on the partner image according to the distance and direction in the vertical direction.
 これにより、話者は、モニタに自画像が表示されなくても、モニタ上の相手画像に重畳表示された画像の位置及び大きさを見て、話者が画面上でどの方向にどれだけずれているかを判別することができる。このとき、話者の顔領域が送信画像領域外に存在する場合であっても、正確に顔領域の位置に応じた重畳画像表示を行うことが可能である。 As a result, even if the speaker does not display the self-portrait on the monitor, the speaker can see the position and size of the image superimposed on the partner image on the monitor and how much the speaker is shifted in which direction on the screen. Can be determined. At this time, even if the speaker's face area exists outside the transmission image area, it is possible to perform superimposed image display according to the position of the face area accurately.
 (実施の形態7)
 実施の形態7の画像通信装置は、互いに撮像範囲の異なる2つのカメラを用いて、より広範囲を撮像するカメラから取得した撮像画像から顔領域を検出する装置である。
(Embodiment 7)
The image communication apparatus according to the seventh embodiment is an apparatus that detects a face region from a captured image acquired from a camera that captures a wider range using two cameras having different imaging ranges.
 図29は、実施の形態7の画像通信装置1600の構成を示すブロック図である。同図に示す画像通信装置1600は、カメラ101が撮像した第1撮像画像だけでなく、カメラ1601が撮像した第2撮像画像も取得する。図29に示すように、画像通信装置1600は、図1の画像通信装置100と比較して、新たに画像入力部1602と顔検出部1603とを備える点が異なっている。図29において、図1と同一の参照符号を付した構成要素は、実施の形態1と同じ処理を行うため、ここでは説明を省略し、異なる点を中心に説明する。 FIG. 29 is a block diagram illustrating a configuration of the image communication apparatus 1600 according to the seventh embodiment. The image communication apparatus 1600 shown in the figure acquires not only the first captured image captured by the camera 101 but also the second captured image captured by the camera 1601. As shown in FIG. 29, the image communication apparatus 1600 is different from the image communication apparatus 100 of FIG. 1 in that an image input unit 1602 and a face detection unit 1603 are newly provided. 29, the same reference numerals as those in FIG. 1 perform the same processing as that in the first embodiment, and thus description thereof will be omitted here, and different points will be mainly described.
 画像入力部1602は、画像を撮像するカメラ1601と接続するインタフェースなどであり、カメラ1601で撮像された第2撮像画像を取得する。画像入力部1602は、第2撮像画像を第2入力画像として顔検出部1603に出力する。 The image input unit 1602 is an interface connected to a camera 1601 that captures an image, and acquires a second captured image captured by the camera 1601. The image input unit 1602 outputs the second captured image to the face detection unit 1603 as a second input image.
 なお、画像入力部102は、カメラ101で撮像された第1撮像画像を第1入力画像として顔検出部1603に出力する。 Note that the image input unit 102 outputs the first captured image captured by the camera 101 to the face detection unit 1603 as a first input image.
 ここで、図30は、2つのカメラ101及び1601と話者1701との位置関係を示す図である。図30に示すように、カメラ101とカメラ1601とは、モニタ110上部中央に設置してある。カメラ101が撮像可能なエリアを撮像エリア202とし、カメラ1601が撮像可能なエリアを撮像エリア1702とする。 Here, FIG. 30 is a diagram showing a positional relationship between the two cameras 101 and 1601 and the speaker 1701. As shown in FIG. 30, the camera 101 and the camera 1601 are installed in the upper center of the monitor 110. An area that can be imaged by the camera 101 is an imaging area 202, and an area that can be imaged by the camera 1601 is an imaging area 1702.
 図30に図示するように、カメラ1601は、カメラ101の近傍に設置され、さらにカメラ101よりもズーム率が小さいあるいは広角レンズを装着しているものとする。これにより、カメラ1601の撮像エリア1702は、カメラ101の撮像エリア202をほぼ包含し、撮像エリア202よりも広いものとする。 30, it is assumed that the camera 1601 is installed in the vicinity of the camera 101 and further has a zoom rate smaller than that of the camera 101 or is equipped with a wide-angle lens. Thus, the imaging area 1702 of the camera 1601 substantially includes the imaging area 202 of the camera 101 and is wider than the imaging area 202.
 図29に戻ると、顔検出部1603は、画像入力部1602から入力される第2入力画像から顔領域を検出する。つまり、顔検出部1603は、より広範囲を撮像するカメラ1601で撮像された第2撮像画像から顔領域を検出する。例えば、顔検出部1603は、テンプレートマッチング又は楕円検出などの手法による顔領域検出処理を行い、顔領域の有無を示す顔検出フラグ、中心座標、及び、顔領域の半径などの情報を画像加工部108に出力する。 29, the face detection unit 1603 detects a face area from the second input image input from the image input unit 1602. That is, the face detection unit 1603 detects a face area from the second captured image captured by the camera 1601 that captures a wider range. For example, the face detection unit 1603 performs face area detection processing by a method such as template matching or ellipse detection, and information such as a face detection flag indicating the presence or absence of a face area, center coordinates, and the radius of the face area is an image processing unit. It outputs to 108.
 また、顔検出部1603は、画像入力部102から入力される第1入力画像を画像符号化部104に出力する。すなわち、顔検出部1603は、カメラ101で撮像された第1撮像画像から顔検出処理は行わない。そして、画像符号化部104は、顔検出部1603を介して画像入力部102から入力される第1入力画像を符号化する。 Further, the face detection unit 1603 outputs the first input image input from the image input unit 102 to the image encoding unit 104. That is, the face detection unit 1603 does not perform face detection processing from the first captured image captured by the camera 101. Then, the image encoding unit 104 encodes the first input image input from the image input unit 102 via the face detection unit 1603.
 これにより、実施の形態7の画像通信装置1600は、より広い範囲から顔領域を検出することができるので、話者1701の位置をより正確に反映した加工画像を生成することができる。 Thereby, the image communication apparatus 1600 according to the seventh embodiment can detect a face area from a wider range, and thus can generate a processed image reflecting the position of the speaker 1701 more accurately.
 なお、カメラ1601で撮像された第2撮像画像は、他の画像通信装置などには送信されずに顔領域を検出するためだけに用いられるので、良い画質でなくてもよい。つまり、精度の低い安価なカメラを用いることができる。 Note that the second captured image captured by the camera 1601 is not transmitted to another image communication apparatus or the like and is used only for detecting the face area, so that the image quality may not be good. That is, an inexpensive camera with low accuracy can be used.
 次いで、上記構成を有する画像通信装置1600の画像送信時の動作について、図31に示すフローチャートを用いて説明する。図31は、実施の形態7の画像通信装置1600の送信処理を示すフローチャートである。なお、画像受信時の動作については、実施の形態1(図3)と同様であるので、ここでは説明を省略する。 Next, the operation at the time of image transmission of the image communication apparatus 1600 having the above configuration will be described using the flowchart shown in FIG. FIG. 31 is a flowchart showing transmission processing of the image communication apparatus 1600 according to the seventh embodiment. Since the operation at the time of image reception is the same as that in Embodiment 1 (FIG. 3), description thereof is omitted here.
 なお、図31に示すフローチャートの動作は、ROMやフラッシュメモリなどの記憶装置(図示せず)に制御プログラムとして記憶されており、CPU(図示せず)によって制御される。 The operation of the flowchart shown in FIG. 31 is stored as a control program in a storage device (not shown) such as a ROM or a flash memory, and is controlled by a CPU (not shown).
 画像入力部102は、カメラ101から、非圧縮の画像をフレーム単位で取得し、顔検出部1603に出力する。さらに、画像入力部1602は、カメラ1601から非圧縮の画像をフレーム単位で取得し、顔検出部1603に出力する(S801)。 The image input unit 102 acquires an uncompressed image in units of frames from the camera 101 and outputs it to the face detection unit 1603. Further, the image input unit 1602 acquires an uncompressed image from the camera 1601 in units of frames, and outputs it to the face detection unit 1603 (S801).
 顔検出部1603は、画像入力部1602より入力された非圧縮の第2撮像画像から、テンプレートマッチングや楕円検出などの手法により顔領域検出を行う(S802)。そして、顔検出部1603は、顔検出フラグ、顔領域の中心座標、及び、顔領域の半径などの情報を算出する。顔検出部1603は、顔検出フラグと、第1入力画像1801の座標系に変換した顔領域座標と顔の半径などの情報とを画像加工部108に出力すると共に、画像入力部102から入力された第1入力画像1801を画像符号化部104に出力する。 The face detection unit 1603 performs face area detection from the uncompressed second captured image input from the image input unit 1602 by a technique such as template matching or ellipse detection (S802). Then, the face detection unit 1603 calculates information such as a face detection flag, the center coordinates of the face area, and the radius of the face area. The face detection unit 1603 outputs the face detection flag, the face area coordinates converted into the coordinate system of the first input image 1801, and information such as the face radius to the image processing unit 108 and is input from the image input unit 102. The first input image 1801 is output to the image encoding unit 104.
 画像符号化部104は、サイズが小さい第1入力画像1801を符号化し、画像データを画像送信部105に出力する。画像送信部105は、画像符号化部104から入力された画像データをRTPなどのパケット伝送方式に従いRTPパケット化を行い、ネットワーク111に出力する(S803)。 The image encoding unit 104 encodes the first input image 1801 having a small size, and outputs the image data to the image transmission unit 105. The image transmitting unit 105 converts the image data input from the image encoding unit 104 into RTP packets according to a packet transmission method such as RTP, and outputs the packet to the network 111 (S803).
 図32は、実施の形態7の第1入力画像1801と第2入力画像1802との一例を示す図である。 FIG. 32 is a diagram illustrating an example of the first input image 1801 and the second input image 1802 according to the seventh embodiment.
 第1入力画像1801は、カメラ101で撮像され、画像入力部102によって取得され、他の画像通信装置に送信される送信用の画像である。第2入力画像1802は、カメラ1601で撮像され、画像入力部1602によって取得され、顔検出部1603によって顔領域を検出するための顔検出用の画像である。図32に示すように、画像入力部1602より入力された第2入力画像1802は、第1入力画像1801よりも広角であるため、顔検出部1603は、式8~式10に従い顔領域の中心座標と顔領域の半径情報とを第1入力画像1801の座標系に変換を行う。 The first input image 1801 is an image for transmission that is captured by the camera 101, acquired by the image input unit 102, and transmitted to another image communication apparatus. The second input image 1802 is a face detection image that is captured by the camera 1601, acquired by the image input unit 1602, and for detecting a face area by the face detection unit 1603. As shown in FIG. 32, since the second input image 1802 input from the image input unit 1602 has a wider angle than the first input image 1801, the face detection unit 1603 determines the center of the face area according to Equations 8 to 10. The coordinates and the radius information of the face area are converted into the coordinate system of the first input image 1801.
 (式8) x1’=M×x1
 (式9) y1’=M×y1
 (式10) R’=M×R
(Formula 8) x1 ′ = M × x1
(Formula 9) y1 ′ = M × y1
(Formula 10) R ′ = M × R
 式8~式10において、x1’及びy1’は、第1入力画像1801の座標系に変換後の顔領域の座標、R’は変換後の顔領域の半径である。倍率Mを第2入力画像1802の入力座標系における顔領域座標(x1,y1)、及び、半径Rに対して乗算することで、第1入力画像1801の座標系に近似できるものとして算出する。倍率Mは、以下の式11又は式12によって算出される。 In Expressions 8 to 10, x1 ′ and y1 ′ are the coordinates of the face area after conversion into the coordinate system of the first input image 1801, and R ′ is the radius of the face area after conversion. By multiplying the magnification M by the face region coordinates (x1, y1) in the input coordinate system of the second input image 1802 and the radius R, it is calculated as being approximate to the coordinate system of the first input image 1801. The magnification M is calculated by the following formula 11 or formula 12.
 (式11) M=width_1/width_2
 (式12) M=height_1/height_2
(Formula 11) M = width_1 / width_2
(Formula 12) M = height_1 / height_2
 図32に示すように、width_1及びheight_1はそれぞれ、第1入力画像1801の水平サイズと垂直サイズとであり、width_2及びheight_2はそれぞれ、第2入力画像1802の水平サイズと垂直サイズとである。 32, width_1 and height_1 are the horizontal size and vertical size of the first input image 1801, respectively, and width_2 and height_2 are the horizontal size and vertical size of the second input image 1802, respectively.
 なお、座標系の変換は他にも利用可能であり、本実施の形態で開示する方法に限定するものではない。 Note that the transformation of the coordinate system can be used elsewhere, and is not limited to the method disclosed in this embodiment.
 図30に示すように、話者1701がカメラ101の撮像エリア202から外れている場合であっても、カメラ1601の撮像エリア1702内にいる場合は、図32に示すように、第2入力画像1802に話者画像1803が存在する。顔検出部1603は、第2入力画像1802から顔領域1804を検出するので、送信用の第1入力画像1801に話者画像1803が存在しない場合であっても、顔領域1804を正しく検出することができる。 As shown in FIG. 30, even if the speaker 1701 is out of the imaging area 202 of the camera 101, when the speaker 1701 is within the imaging area 1702 of the camera 1601, as shown in FIG. A speaker image 1803 exists at 1802. Since the face detection unit 1603 detects the face area 1804 from the second input image 1802, it correctly detects the face area 1804 even when the speaker image 1803 does not exist in the first input image 1801 for transmission. Can do.
 以上のように、本実施の形態では、カメラ101よりも広角の画像を撮像するカメラ1601から取得される第2入力画像から、顔検出部1603が顔検出処理を行う。つまり、送信画像(第1入力画像)の領域外も含む広範囲な領域から顔領域の中心座標を決定し、画像加工部108が、顔領域の中心座標と基準座標との水平及び垂直方向の距離及び方向に従い、相手画像に対しての別の画像を重畳表示する。 As described above, in this embodiment, the face detection unit 1603 performs the face detection process from the second input image acquired from the camera 1601 that captures an image having a wider angle than the camera 101. That is, the center coordinates of the face area are determined from a wide area including outside the area of the transmission image (first input image), and the image processing unit 108 determines the horizontal and vertical distances between the center coordinates of the face area and the reference coordinates. In accordance with the direction, another image is superimposed and displayed on the partner image.
 これにより、話者はモニタに自画像を表示することなく、モニタ上の相手画像に重畳表示された画像の位置及び大きさを見て、話者が画面上でどの方向にどれだけずれているかを判別することができる。このとき、話者の顔領域が送信画像領域外に存在する場合であっても、正確に顔領域の位置に応じた重畳画像表示を行うことが可能である。 As a result, the speaker does not display the self-portrait on the monitor, but looks at the position and size of the image superimposed on the partner image on the monitor to determine how much the speaker is displaced in which direction on the screen. Can be determined. At this time, even if the speaker's face area exists outside the transmission image area, it is possible to perform superimposed image display according to the position of the face area accurately.
 以上、本発明の画像通信装置及び画像通信方法について、実施の形態に基づいて説明したが、本発明は、これらの実施の形態に限定されるものではない。本発明の趣旨を逸脱しない限り、当業者が思いつく各種変形を当該実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、本発明の範囲内に含まれる。 As mentioned above, although the image communication apparatus and the image communication method of the present invention have been described based on the embodiments, the present invention is not limited to these embodiments. Unless it deviates from the meaning of this invention, the form which carried out the various deformation | transformation which those skilled in the art will think to the said embodiment, and the form constructed | assembled combining the component in a different embodiment is also contained in the scope of the present invention. .
 例えば、本発明の画像通信装置は、顔領域の中心座標と基準座標との差分に基づいて相手画像を拡大してもよい。具体的には、以下の通りである。 For example, the image communication apparatus of the present invention may enlarge the partner image based on the difference between the center coordinates of the face area and the reference coordinates. Specifically, it is as follows.
 図33は、相手画像1901と加工画像1911の一例を示す図である。図33(b)は、図33(a)の相手画像1901から所定の領域1902を除いた領域1903を切り出し、切り出した領域1903を拡大したものである。 FIG. 33 is a diagram illustrating an example of the partner image 1901 and the processed image 1911. FIG. 33B is a view in which a region 1903 excluding the predetermined region 1902 is cut out from the counterpart image 1901 in FIG. 33A and the cut-out region 1903 is enlarged.
 例えば、パラメータ算出部123は、画像加工パラメータとして、以下の4つを算出する。 For example, the parameter calculation unit 123 calculates the following four as image processing parameters.
 切り出し及び拡大用の画像加工パラメータ
 1.水平方向切り出し位置  … 相手画像1901の左端又は右端
 2.水平方向切り出しサイズ … W(dx)
 3.垂直方向切り出し位置  … 相手画像1901の上端又は下端
 4.垂直方向切り出しサイズ … H(dy)
Image processing parameters for clipping and enlargement Horizontal cut-out position: left end or right end of counterpart image 1901 Horizontal cut-out size W (dx)
3. Vertical cut-out position: upper end or lower end of the partner image 1901 Vertical cut-out size ... H (dy)
 各画像加工パラメータは、例えば、実施の形態1と同じようにしてパラメータ算出部123によって算出される。すなわち、水平方向切り出し位置を水平方向重畳開始位置、水平方向切り出しサイズを水平方向重畳サイズ、垂直方向切り出し位置を垂直方向重畳開始位置、垂直方向切り出しサイズを垂直方向重畳サイズに置き換えることで、パラメータ算出部123は、各画像加工パラメータを算出することができる。 Each image processing parameter is calculated by the parameter calculation unit 123 in the same manner as in the first embodiment, for example. In other words, parameter calculation is performed by replacing the horizontal cutout position with the horizontal overlap start position, the horizontal cutout size with the horizontal overlap size, the vertical cutout position with the vertical overlap start position, and the vertical cutout size with the vertical overlap size. The unit 123 can calculate each image processing parameter.
 このようにすることで、話者が基準位置より左側に居る場合、すなわち、撮像画像内で顔領域が基準座標より右側に存在する場合は、相手画像1901の左端から幅W(dx)の領域を除いた領域が拡大される。逆に、話者が基準位置より右側に居る場合、すなわち、撮像画像内で顔領域が基準座標より左側に存在する場合は、相手画像1901の右側から幅W(dx)の領域を除いた領域が拡大される。 In this way, when the speaker is on the left side of the reference position, that is, when the face area is on the right side of the reference coordinates in the captured image, an area having a width W (dx) from the left end of the counterpart image 1901 The area excluding is expanded. Conversely, if the speaker is on the right side of the reference position, that is, if the face area is on the left side of the reference coordinates in the captured image, the area obtained by removing the area of the width W (dx) from the right side of the partner image 1901 Is enlarged.
 同様に、話者が基準位置より上側に居る場合、すなわち、撮像画像内で顔領域が基準座標より上側に存在する場合は、相手画像1901の上端から幅H(dy)の領域を除いた領域が拡大される。同様に、話者が基準位置より下側に居る場合、すなわち、撮像画像内で顔領域が基準座標より下側に存在する場合は、相手画像1901の下端から幅H(dy)の領域を除いた領域が拡大される。 Similarly, when the speaker is above the reference position, that is, when the face area exists above the reference coordinates in the captured image, the area obtained by removing the area of the width H (dy) from the upper end of the partner image 1901 Is enlarged. Similarly, when the speaker is below the reference position, that is, when the face area is below the reference coordinates in the captured image, the area of the width H (dy) is excluded from the lower end of the partner image 1901. The area is enlarged.
 なお、水平方向切り出し位置と垂直方向切り出し位置との決定方法は、実施の形態1とは逆でもよい。すなわち、dxが正の場合に右端、負の場合に左端としてもよい。同じく、dyが正の場合に下端、負の場合に上端としてもよい。 Note that the method of determining the horizontal cutout position and the vertical cutout position may be the reverse of the first embodiment. That is, it may be the right end when dx is positive and the left end when dx is negative. Similarly, the lower end may be used when dy is positive, and the upper end may be used when negative.
 また、各実施の形態では、顔領域座標として顔領域の中心を示す中心座標を用いたが、顔領域の位置を示す値であれば、いかなるものでもよい。また、各実施の形態では、顔領域のサイズとして顔領域の半径を用いたが、顔領域の大きさを示すものであれば、いかなるものでもよい。例えば、顔領域の面積又は画素数などでもよい。 In each embodiment, the center coordinates indicating the center of the face area are used as the face area coordinates. However, any value may be used as long as the value indicates the position of the face area. In each embodiment, the radius of the face area is used as the size of the face area. However, any shape may be used as long as it indicates the size of the face area. For example, the area of the face region or the number of pixels may be used.
 また、本発明の実施の形態では、相手画像を加工する手法として、重畳処理、射影変換処理、ブラー処理、及び、サイズ変更処理について説明したが、他の手法により相手画像を加工してもよい。 In the embodiment of the present invention, the superimposition process, the projective transformation process, the blur process, and the size change process have been described as the techniques for processing the counterpart image. However, the counterpart image may be processed by other techniques. .
 このとき、本発明の画像通信装置が備える画像加工部は、顔検出部によって検出された顔領域の位置及びサイズの少なくとも一方と所定の基準との差分を算出し、算出した差分が大きいほど加工前の相手画像と加工後の加工画像とが大きく異なるように、相手画像を加工する。さらに、このとき、顔領域の位置が基準位置から所定の方向に離れている場合には、話者が基準位置に戻ることができるように戻るべき方向を示すように、相手画像を加工するのが好ましい。 At this time, the image processing unit included in the image communication apparatus of the present invention calculates a difference between at least one of the position and size of the face area detected by the face detection unit and a predetermined reference, and the larger the calculated difference, the more the processing is performed. The partner image is processed so that the previous partner image and the processed image after processing are greatly different. Further, at this time, if the position of the face area is away from the reference position in a predetermined direction, the partner image is processed so as to indicate the direction to be returned so that the speaker can return to the reference position. Is preferred.
 また、話者位置を特定可能なICタグを用いて顔の座標を検出してもよい。 Also, the coordinates of the face may be detected using an IC tag that can specify the speaker position.
 また、実施の形態7で説明したように、本発明の画像通信装置は、物理的に異なる2つのカメラのそれぞれから撮像画像を取得してもよい。つまり、第1カメラで撮像された第1撮像画像と、第2カメラで撮像された第2撮像画像とを取得してもよい。 As described in the seventh embodiment, the image communication apparatus of the present invention may acquire captured images from two physically different cameras. That is, you may acquire the 1st captured image imaged with the 1st camera, and the 2nd captured image imaged with the 2nd camera.
 また、本発明の画像通信装置は、画像符号化部104及び画像復号化部107を備えていなくてもよい。図34は、本発明の画像通信装置の異なる形態の一例を示すブロック図である。 Also, the image communication apparatus of the present invention may not include the image encoding unit 104 and the image decoding unit 107. FIG. 34 is a block diagram showing an example of a different form of the image communication apparatus of the present invention.
 図34に示すように、画像通信装置2000は、実施の形態1の図1に示す画像通信装置100と比較して、画像符号化部104と、画像復号化部107とを備えていない点が異なっている。すなわち、画像送信部105は、画像入力部102によって取得された撮像画像を符号化することなく、送信する。同様に、画像受信部106は、ネットワーク111を介して他の画像通信装置から、符号化されていない相手画像を受信する。 As shown in FIG. 34, image communication apparatus 2000 is not provided with image encoding section 104 and image decoding section 107, as compared with image communication apparatus 100 shown in FIG. Is different. That is, the image transmission unit 105 transmits the captured image acquired by the image input unit 102 without encoding. Similarly, the image receiving unit 106 receives a non-encoded counterpart image from another image communication apparatus via the network 111.
 このように、例えば、ネットワーク111の帯域に余裕がある場合などは、画像の通信を行う際に符号化及び復号化する必要はないので、画像通信装置は、画像の圧縮符号化を行う画像符号化部と、画像の伸張復号化を行う画像復号化部を備えていなくてもよい。 As described above, for example, when there is a margin in the bandwidth of the network 111, it is not necessary to perform encoding and decoding when performing image communication. Therefore, the image communication apparatus performs image coding that performs image compression coding. And an image decoding unit that performs decompression decoding of an image may not be provided.
 なお、本発明は、上述したように、画像通信装置及び画像通信方法として実現できるだけではなく、本実施の形態の画像通信方法をコンピュータに実行させるためのプログラムとして実現してもよい。また、当該プログラムを記録するコンピュータ読み取り可能なCD-ROMなどの記録媒体として実現してもよい。さらに、当該プログラムを示す情報、データ又は信号として実現してもよい。そして、これらプログラム、情報、データ及び信号は、インターネットなどの通信ネットワークを介して配信されてもよい。 As described above, the present invention can be realized not only as an image communication apparatus and an image communication method, but also as a program for causing a computer to execute the image communication method according to the present embodiment. Further, it may be realized as a computer-readable recording medium such as a CD-ROM for recording the program. Furthermore, it may be realized as information, data, or a signal indicating the program. These programs, information, data, and signals may be distributed via a communication network such as the Internet.
 また、本発明は、画像通信装置を構成する構成要素の一部又は全部を、1個のシステムLSIから構成してもよい。システムLSIは、複数の構成部を1個のチップ上に集積して製造された超多機能LSIであり、具体的には、マイクロプロセッサ、ROM及びRAMなどを含んで構成されるコンピュータシステムである。 Further, in the present invention, some or all of the constituent elements constituting the image communication apparatus may be configured from one system LSI. The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip. Specifically, the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. .
 本発明の画像送信装置は、自画像を表示することなく、加工された相手画像を見ながら話者を基準位置に誘導することができるという効果を奏し、例えば、大画面を利用する臨場感の高いTV会議装置などに利用することができる。 The image transmission apparatus of the present invention has an effect that it is possible to guide the speaker to the reference position while viewing the processed partner image without displaying the self-portrait. It can be used for a TV conference device.
100、600、800、1000、1200、1400、1600、2000 画像通信装置
101、1601 カメラ
102、1602 画像入力部
103、1603 顔検出部
104 画像符号化部
105 画像送信部
106 画像受信部
107 画像復号化部
108、608、808、1008、1208 画像加工部
109 画像出力部
110 モニタ
111 ネットワーク
121 判定部
122 バッファ
123、623、823、1023 パラメータ算出部
124 画像重畳部
201、1701 話者
202、1702 撮像エリア
301、501、511、901、911、1101、1111、1311、1501 入力画像
302、502、512、902、912、1102、1112、1312、1503、1803 話者画像
303、503、513、903、913、1103、1113、1303、1313、1504、1804 顔領域
401、521、701、921、1121、1301、1901 相手画像
402、522、702、922、1122、1123、1911 加工画像
403、523 重畳画像
624 射影変換部
824 ブラー処理部
1024 サイズ変更部
1225 基準座標設定部
1302 相手人物画像
1412 画像切出部
1502 送信画像
1801 第1入力画像
1802 第2入力画像
1902、1903 領域
100, 600, 800, 1000, 1200, 1400, 1600, 2000 Image communication apparatus 101, 1601 Camera 102, 1602 Image input unit 103, 1603 Face detection unit 104 Image encoding unit 105 Image transmission unit 106 Image reception unit 107 Image decoding Conversion unit 108, 608, 808, 1008, 1208 image processing unit 109 image output unit 110 monitor 111 network 121 determination unit 122 buffer 123, 623, 823, 1023 parameter calculation unit 124 image superimposition unit 201, 1701 speaker 202, 1702 Areas 301, 501, 511, 901, 911, 1101, 1111, 1311, 1501 Input images 302, 502, 512, 902, 912, 1102, 1112, 1312, 1503, 1803 Speaker images 303, 50 513, 903, 913, 1103, 1113, 1303, 1313, 1504, 1804 Facial area 401, 521, 701, 921, 1121, 1301, 1901 Corresponding image 402, 522, 702, 922, 1122, 1123, 1911 Processed image 403, 523 Superimposed image 624 Projection conversion unit 824 Blur processing unit 1024 Size change unit 1225 Reference coordinate setting unit 1302 Counter image 1412 Image cutout unit 1502 Transmission image 1801 First input image 1802 Second input image 1902, 1903

Claims (15)

  1.  ネットワークを介して他の画像通信装置との間で画像データを通信する画像通信装置であって、
     カメラで撮像された撮像画像を取得する画像入力部と、
     前記画像入力部によって取得された撮像画像を含む第1画像データを送信する画像送信部と、
     前記画像入力部によって取得された撮像画像から顔領域を検出する顔検出部と、
     前記他の画像通信装置から送信される、相手画像を含む第2画像データを受信する画像受信部と、
     前記画像受信部によって受信された第2画像データに含まれる相手画像を加工することで、加工画像を生成する画像加工部と、
     前記画像加工部によって生成された加工画像を表示装置に出力する画像出力部とを備え、
     前記画像加工部は、
     前記顔検出部によって検出された顔領域の位置及びサイズの少なくとも一方と予め定められた基準との差分を算出し、算出した差分が大きいほど、前記相手画像と前記加工画像とが大きく異なるように前記相手画像を加工する
     画像通信装置。
    An image communication device for communicating image data with another image communication device via a network,
    An image input unit for acquiring a captured image captured by the camera;
    An image transmission unit that transmits first image data including a captured image acquired by the image input unit;
    A face detection unit for detecting a face region from the captured image acquired by the image input unit;
    An image receiving unit for receiving second image data including the counterpart image transmitted from the other image communication device;
    An image processing unit that generates a processed image by processing the counterpart image included in the second image data received by the image receiving unit;
    An image output unit that outputs a processed image generated by the image processing unit to a display device;
    The image processing unit
    The difference between at least one of the position and size of the face area detected by the face detection unit and a predetermined reference is calculated, and the larger the calculated difference is, the greater the difference between the counterpart image and the processed image is. An image communication apparatus for processing the counterpart image.
  2.  前記顔検出部は、前記撮像画像内の前記顔領域の位置を示す顔領域座標を検出し、
     前記画像加工部は、前記顔領域座標と予め定められた基準座標との差分絶対値を算出し、算出した差分絶対値が予め定められた閾値より大きい場合に、前記差分絶対値が大きいほど前記相手画像と前記加工画像とが大きく異なるように、前記相手画像を加工することで前記加工画像を生成する
     請求項1記載の画像通信装置。
    The face detection unit detects face area coordinates indicating the position of the face area in the captured image;
    The image processing unit calculates a difference absolute value between the face region coordinates and a predetermined reference coordinate, and when the calculated difference absolute value is larger than a predetermined threshold, the larger the difference absolute value, The image communication apparatus according to claim 1, wherein the processed image is generated by processing the counterpart image so that the counterpart image and the processed image are greatly different.
  3.  前記画像加工部は、前記顔領域座標と前記基準座標との差分絶対値が大きいほど予め定められた重畳画像の面積が大きくなるように、前記相手画像に前記重畳画像を重畳することで前記加工画像を生成する
     請求項2記載の画像通信装置。
    The image processing unit superimposes the superimposed image on the counterpart image such that the larger the absolute value of the difference between the face area coordinates and the reference coordinates, the larger the predetermined superimposed image area. The image communication apparatus according to claim 2, wherein an image is generated.
  4.  前記画像加工部は、前記顔領域座標が前記基準座標の右側にある場合、前記相手画像の左端から、又は、前記顔領域座標が前記基準座標の左側にある場合、前記相手画像の右端から、前記顔領域座標と前記基準座標との水平方向の差分絶対値が大きいほど前記重畳画像の面積が大きくなるように前記相手画像に前記重畳画像を重畳する
     請求項3記載の画像通信装置。
    The image processing unit, when the face region coordinates are on the right side of the reference coordinates, from the left end of the counterpart image, or when the face region coordinates are on the left side of the reference coordinates, from the right end of the counterpart image, The image communication apparatus according to claim 3, wherein the superimposed image is superimposed on the partner image so that the area of the superimposed image increases as the absolute difference value in the horizontal direction between the face region coordinates and the reference coordinates increases.
  5.  前記画像加工部は、前記顔領域座標が前記基準座標の上側にある場合、前記相手画像の上端から、又は、前記顔領域座標が前記基準座標の下側にある場合、前記相手画像の下端から、前記顔領域座標と前記基準座標との垂直方向の差分絶対値が大きいほど前記重畳画像の面積が大きくなるように前記相手画像に前記重畳画像を重畳する
     請求項3記載の画像通信装置。
    The image processing unit, when the face area coordinates are above the reference coordinates, from the upper end of the counterpart image, or when the face area coordinates are below the reference coordinates, from the lower end of the counterpart image The image communication apparatus according to claim 3, wherein the superimposed image is superimposed on the partner image such that the larger the absolute difference in the vertical direction between the face region coordinates and the reference coordinates, the larger the area of the superimposed image.
  6.  前記顔検出部は、さらに、顔領域が前記撮像画像内にあるか否かを判定し、前記顔領域の有無を示すフラグを生成し、
     前記画像加工部は、顔領域がないことを前記フラグが示す場合、予め定められた面積の前記重畳画像を前記相手画像の予め定められた領域に重畳する
     請求項3記載の画像通信装置。
    The face detection unit further determines whether or not a face area is in the captured image, generates a flag indicating the presence or absence of the face area,
    The image communication device according to claim 3, wherein the image processing unit superimposes the superimposed image of a predetermined area on a predetermined region of the counterpart image when the flag indicates that there is no face region.
  7.  前記画像通信装置は、さらに、
     前記顔検出部によって検出された顔領域座標を記憶するバッファを備え、
     前記画像加工部は、顔領域が前記撮像画像内にないことを前記フラグが示す場合、前記バッファに記憶された顔領域座標が前記基準座標の右側にある場合には前記相手画像の左端から、又は、前記基準座標の左側にある場合には前記相手画像の右端から、及び、前記バッファに記憶された顔領域座標が前記基準座標の上側にある場合には前記相手画像の上端から、又は、前記基準座標の下側にある場合には前記相手画像の下端から、予め定められた面積の前記重畳画像を前記相手画像に重畳する
     請求項6記載の画像通信装置。
    The image communication device further includes:
    A buffer for storing the face area coordinates detected by the face detection unit;
    When the flag indicates that a face area is not in the captured image, the image processing unit starts from the left end of the counterpart image when the face area coordinates stored in the buffer are on the right side of the reference coordinates. Or from the right end of the counterpart image when it is on the left side of the reference coordinates, and from the top end of the counterpart image when the face area coordinates stored in the buffer are above the reference coordinates, or The image communication apparatus according to claim 6, wherein the superimposed image having a predetermined area is superimposed on the partner image from a lower end of the partner image when the coordinate is below the reference coordinates.
  8.  前記画像加工部は、前記顔領域座標と前記基準座標との差分絶対値が大きいほど射影変換の傾斜が大きくなるように、前記相手画像を射影変換することで前記加工画像を生成する
     請求項2記載の画像通信装置。
    The image processing unit generates the processed image by projective transformation of the counterpart image such that the gradient of the projective transformation increases as the difference absolute value between the face area coordinates and the reference coordinates increases. The image communication apparatus described.
  9.  前記画像加工部は、前記顔領域座標と前記基準座標との差分絶対値が大きいほど大きい拡大率で前記相手画像を拡大する
     請求項2記載の画像通信装置。
    The image communication apparatus according to claim 2, wherein the image processing unit enlarges the counterpart image with a larger enlargement ratio as the difference absolute value between the face area coordinates and the reference coordinates is larger.
  10.  前記画像通信装置は、さらに、
     前記画像受信部によって受信された第2画像データに含まれる相手画像から顔領域を検出し、検出した顔領域の位置を示す顔領域座標を前記基準座標に設定する基準座標設定部を備える
     請求項2記載の画像通信装置。
    The image communication device further includes:
    A reference coordinate setting unit configured to detect a face region from a partner image included in second image data received by the image receiving unit and set a face region coordinate indicating a position of the detected face region as the reference coordinate. 2. The image communication apparatus according to 2.
  11.  前記顔検出部は、前記顔領域のサイズを検出し、
     前記画像加工部は、前記顔領域のサイズと予め定められた基準サイズとの差分絶対値を算出し、算出した差分絶対値が予め定められた閾値より大きい場合、前記差分絶対値が大きいほど、前記相手画像と前記加工画像とが大きく異なるように前記相手画像を加工することで、前記加工画像を生成する
     請求項1記載の画像通信装置。
    The face detection unit detects a size of the face region;
    The image processing unit calculates a difference absolute value between the size of the face area and a predetermined reference size, and when the calculated difference absolute value is larger than a predetermined threshold, the larger the difference absolute value, The image communication apparatus according to claim 1, wherein the processed image is generated by processing the counterpart image so that the counterpart image and the processed image are greatly different.
  12.  前記画像加工部は、前記顔領域のサイズと前記基準サイズとの差分絶対値が大きいほど、ぼかし量が大きくなるように前記相手画像をぼかすことで、前記加工画像を生成する
     請求項11記載の画像通信装置。
    The said image process part produces | generates the said processed image by blurring the said other party image so that the amount of blurring becomes large, so that the difference absolute value of the size of the said face area | region and the said reference | standard size is large. Image communication device.
  13.  前記画像加工部は、前記顔領域のサイズが前記基準サイズより大きい場合には前記相手画像を拡大し、前記顔領域のサイズが前記基準サイズより小さい場合には前記相手画像を縮小することで、前記加工画像を生成し、
     前記顔領域のサイズと前記基準サイズとの差分絶対値が大きいほど拡大率又は縮小率が大きい
     請求項11記載の画像通信装置。
    The image processing unit enlarges the counterpart image when the size of the face area is larger than the reference size, and reduces the counterpart image when the size of the face area is smaller than the reference size. Generating the processed image;
    The image communication apparatus according to claim 11, wherein the enlargement ratio or the reduction ratio increases as the difference absolute value between the size of the face area and the reference size increases.
  14.  ネットワークを介して他の画像通信装置との間で画像データを通信する画像通信方法であって、
     カメラで撮像された撮像画像を取得する画像入力ステップと、
     前記画像入力ステップで取得された撮像画像を含む第1画像データを送信する画像送信ステップと、
     前記画像入力ステップで取得された撮像画像から顔領域を検出する顔検出ステップと、
     前記他の画像通信装置から送信される、相手画像を含む第2画像データを受信する画像受信ステップと、
     前記画像受信ステップで受信された第2画像データに含まれる相手画像を加工することで、加工画像を生成する画像加工ステップと、
     前記画像加工ステップで生成された加工画像を表示装置に出力する画像出力ステップとを含み、
     前記画像加工ステップでは、
     前記顔検出ステップで検出された顔領域の位置及びサイズの少なくとも一方と予め定められた基準との差分を算出し、算出した差分が大きいほど、前記相手画像と前記加工画像とが大きく異なるように前記相手画像を加工する
     画像通信方法。
    An image communication method for communicating image data with another image communication apparatus via a network,
    An image input step for acquiring a captured image captured by the camera;
    An image transmission step of transmitting first image data including the captured image acquired in the image input step;
    A face detection step of detecting a face region from the captured image acquired in the image input step;
    An image receiving step of receiving second image data including the counterpart image transmitted from the other image communication device;
    An image processing step for generating a processed image by processing the counterpart image included in the second image data received in the image receiving step;
    An image output step of outputting the processed image generated in the image processing step to a display device,
    In the image processing step,
    A difference between at least one of the position and size of the face area detected in the face detection step and a predetermined reference is calculated, and the larger the calculated difference is, the larger the partner image and the processed image differ. An image communication method for processing the counterpart image.
  15.  ネットワークを介して他の画像通信装置との間で画像データを通信する集積回路であって、
     カメラで撮像された撮像画像を取得する画像入力部と、
     前記画像入力部によって取得された撮像画像を含む第1画像データを送信する画像送信部と、
     前記画像入力部によって取得された撮像画像から顔領域を検出する顔検出部と、
     前記他の画像通信装置から送信される、相手画像を含む第2画像データを受信する画像受信部と、
     前記画像受信部によって受信された第2画像データに含まれる相手画像を加工することで、加工画像を生成する画像加工部と、
     前記画像加工部によって生成された加工画像を表示装置に出力する画像出力部とを備え、
     前記画像加工部は、
     前記顔検出部によって検出された顔領域の位置及びサイズの少なくとも一方と予め定められた基準との差分を算出し、算出した差分が大きいほど、前記相手画像と前記加工画像とが大きく異なるように前記相手画像を加工する
     集積回路。
    An integrated circuit that communicates image data with another image communication device via a network,
    An image input unit for acquiring a captured image captured by the camera;
    An image transmission unit that transmits first image data including a captured image acquired by the image input unit;
    A face detection unit for detecting a face region from the captured image acquired by the image input unit;
    An image receiving unit for receiving second image data including the counterpart image transmitted from the other image communication device;
    An image processing unit that generates a processed image by processing the counterpart image included in the second image data received by the image receiving unit;
    An image output unit that outputs a processed image generated by the image processing unit to a display device;
    The image processing unit
    The difference between at least one of the position and size of the face area detected by the face detection unit and a predetermined reference is calculated, and the larger the calculated difference is, the larger the partner image and the processed image differ. An integrated circuit for processing the counterpart image.
PCT/JP2009/006382 2008-12-17 2009-11-26 Image communication device and image communication method WO2010070820A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2008-321532 2008-12-17
JP2008321532 2008-12-17

Publications (1)

Publication Number Publication Date
WO2010070820A1 true WO2010070820A1 (en) 2010-06-24

Family

ID=42268503

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2009/006382 WO2010070820A1 (en) 2008-12-17 2009-11-26 Image communication device and image communication method

Country Status (1)

Country Link
WO (1) WO2010070820A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683692A (en) * 2015-02-04 2015-06-03 广东欧珀移动通信有限公司 Continuous shooting method and continuous shooting device
JP2015119424A (en) * 2013-12-19 2015-06-25 カシオ計算機株式会社 Communication display device, communication display method and program
WO2018025458A1 (en) * 2016-08-01 2018-02-08 ソニー株式会社 Information processing device, information processing method, and program
CN111405236A (en) * 2020-04-24 2020-07-10 杭州大轶科技有限公司 Video conference big data analysis method and system
JP7250101B1 (en) 2021-12-03 2023-03-31 レノボ・シンガポール・プライベート・リミテッド Image processing device, information processing device, video conference server, and video conference system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08251561A (en) * 1995-03-09 1996-09-27 Nec Corp User interface of image communication, terminal device
JP2005151430A (en) * 2003-11-19 2005-06-09 Nec Corp Video telephone apparatus and video telephone method
JP2006140747A (en) * 2004-11-11 2006-06-01 Nippon Telegr & Teleph Corp <Ntt> Video communication apparatus and method for controlling same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08251561A (en) * 1995-03-09 1996-09-27 Nec Corp User interface of image communication, terminal device
JP2005151430A (en) * 2003-11-19 2005-06-09 Nec Corp Video telephone apparatus and video telephone method
JP2006140747A (en) * 2004-11-11 2006-06-01 Nippon Telegr & Teleph Corp <Ntt> Video communication apparatus and method for controlling same

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015119424A (en) * 2013-12-19 2015-06-25 カシオ計算機株式会社 Communication display device, communication display method and program
CN104683692A (en) * 2015-02-04 2015-06-03 广东欧珀移动通信有限公司 Continuous shooting method and continuous shooting device
CN104683692B (en) * 2015-02-04 2017-10-17 广东欧珀移动通信有限公司 A kind of continuous shooting method and device
WO2018025458A1 (en) * 2016-08-01 2018-02-08 ソニー株式会社 Information processing device, information processing method, and program
JPWO2018025458A1 (en) * 2016-08-01 2019-05-30 ソニー株式会社 INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
US11082660B2 (en) 2016-08-01 2021-08-03 Sony Corporation Information processing device and information processing method
CN111405236A (en) * 2020-04-24 2020-07-10 杭州大轶科技有限公司 Video conference big data analysis method and system
JP7250101B1 (en) 2021-12-03 2023-03-31 レノボ・シンガポール・プライベート・リミテッド Image processing device, information processing device, video conference server, and video conference system
JP2023082816A (en) * 2021-12-03 2023-06-15 レノボ・シンガポール・プライベート・リミテッド Image processing device, information processing device, video conference server, and video conference system

Similar Documents

Publication Publication Date Title
KR101899877B1 (en) Apparatus and method for improving quality of enlarged image
JP4863937B2 (en) Encoding processing apparatus and encoding processing method
WO2012035783A1 (en) Stereoscopic video creation device and stereoscopic video creation method
US20120127261A1 (en) Teleconferencing device and image display processing method
JP2008225600A (en) Image display system, image transmitter, image transmission method, image display device, image display method, and program
WO2010070820A1 (en) Image communication device and image communication method
JP2009005238A (en) Coder and encoding method
JP2012513719A (en) Generating image scaling curves
WO2018225518A1 (en) Image processing device, image processing method, program, and telecommunication system
JP2007212664A (en) Liquid crystal display device
JP2010263500A (en) Video processing system, photography device, and method thereof
JP2008028606A (en) Imaging device and imaging system for panoramically expanded image
WO2017141584A1 (en) Information processing apparatus, information processing system, information processing method, and program
CN103686056B (en) The method for processing video frequency of conference terminal and the conference terminal
CN111246224A (en) Video live broadcast method and video live broadcast system
US11636571B1 (en) Adaptive dewarping of wide angle video frames
JP2010171690A (en) Television conference system and video communication method
JP2010200150A (en) Terminal, server, conference system, conference method, and conference program
US8717410B2 (en) Video communication method, apparatus, and system
JP2010219872A (en) Camera apparatus, display, system and method for processing image
TW201414307A (en) Conference terminal and video processing method thereof
EP3884461B1 (en) Selective distortion or deformation correction in images from a camera with a wide angle lens
JP4649640B2 (en) Image processing method, image processing apparatus, and content creation system
JP6004978B2 (en) Subject image extraction device and subject image extraction / synthesis device
JP2005142765A (en) Apparatus and method for imaging

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09833130

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 12011500731

Country of ref document: PH

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 09833130

Country of ref document: EP

Kind code of ref document: A1